Integrating Multiple Visual Attention Mechanisms in Deep Neural Networks
Published in COMPSAC, 2023
Recommended citation: F. Martinez and Y. Zhao, "Integrating Multiple Visual Attention Mechanisms in Deep Neural Networks," 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC), Torino, Italy, 2023, pp. 1191-1196, doi: 10.1109/COMPSAC57700.2023.00180.
Inspired by the success of various visual attention techniques in computer vision, we introduce a novel method for integrating multiple attention mechanisms to boost model performance. Our approach involves augmenting a base model with a Parallel Visual Attention Encoder (PVAE) branch, which concurrently employs two different attention modules (modified large kernel attention and modified convolutional block attention) to capture essential visual features. To reduce the training cost incurred by these additional components, we apply an encoder for efficient feature extraction and dimensionality reduction before applying the attention modules. The proposed PVAE architecture can be combined with cutting-edge models (e.g., EfficientNet, ResNet, DenseNet, etc.) to create a Parallel Visual Attention Network (PVAN). We evaluate the efficacy of our approach by devising a PVAN with EfficientNet as the base model for the task of classifying dog breeds. Our experimental results demonstrate the effectiveness of the proposed hybrid visual attention architecture, which achieves superior performance compared to the base model and models with a single attention mechanism. We further present an interactive web application developed for the general public to identify dog breeds using their photographs to test our model’s performance in real-life scenarios.