Facial expression recognition (FER) has applications in many scenarios, making it a valuable research direction. However, due to the uncertainty of real-world images, their recognition accuracy is not satisfying. To deal with this problem, we propose a two attention modules-based network that uses two different attention modules to extract features. The main classifier uses the self-attention module (SAM), and the auxiliary classifier uses the channel enhancement module. At the same time, the classification results of the two classifiers are constrained by variance regularization to suppress uncertainty. In addition, we use knowledge distillation to further optimize the predicted results using the mutual information between the teacher network and the student network. Our model achieves an optimal accuracy rate of 88.75% in the RAF-DB dataset and 80.59% in the FERPlus dataset, which is occluded with the lower half being visible. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
CITATIONS
Cited by 4 scholarly publications.
Facial recognition systems
Data modeling
Virtual reality
Convolution
Feature extraction
Mouth
Visual process modeling