Number of Attention Heads vs. Number of Transformer-encoders in Computer Vision

Item Type Journal paper
Abstract Determining an appropriate number of attention heads on one hand and the number of transformer-encoders, on the other hand, is an important choice for Computer Vision (CV) tasks using the Transformer architecture. Computing experiments confirmed the expectation that the total number of parameters has to satisfy the condition of overdetermination (i.e., number of constraints significantly exceeding the number of parameters). Then, good generalization performance can be expected. This sets the boundaries within which the number of heads and the number of transformers can be chosen. If the role of context in images to be classified can be assumed to be small, it is favorable to use multiple transformers with a low number of heads (such as one or two). In classifying objects whose class may heavily depend on the context within the image (i.e., the meaning of a patch being dependent on other patches), the number of heads is equally important as that of transformers.
Authors Hrycej, Tomas; Bermeitinger, Bernhard & Handschuh, Siegfried
Research Team Data Science and Natural Language Processing
Journal or Publication Title Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR
Language English
Subjects computer science
HSG Classification contribution to scientific community
HSG Profile Area None
Refereed Yes
Date October 2022
Publisher SciTePress
Page Range 315-321
ISSN 2184-3228
Publisher DOI https://doi.org/10.5220/0011578000003335
Official URL https://www.scitepress.org/PublicationsDetail.aspx...
Contact Email Address bernhard.bermeitinger@unisg.ch
Depositing User Bernhard Bermeitinger
Date Deposited 28 Oct 2022 19:07
Last Modified 28 Oct 2022 19:07
URI: https://www.alexandria.unisg.ch/publications/267726

Download

[img] Text
115780.pdf - Published Version
Restricted to Repository staff only

Download (390kB) | Request a copy
[img] Slideshow
2022.09.18-NumberofAttentionHeadsvs.NumberofTransformer-EncodersinComputerVisionKDIR2022Valletta.pdf - Presentation
Available under License Creative Commons Attribution Share Alike.

Download (550kB)
[img] Text
2209.07221.pdf - Accepted Version
Available under License Creative Commons Attribution.

Download (215kB)

Citation

Hrycej, Tomas; Bermeitinger, Bernhard & Handschuh, Siegfried (2022) Number of Attention Heads vs. Number of Transformer-encoders in Computer Vision. Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR, 315-321. ISSN 2184-3228

Statistics

https://www.alexandria.unisg.ch/id/eprint/267726
Edit item Edit item
Feedback?