Emotion is strongly subjective, and different parts of the image may have a different degree of impact on emotion. The key for solving image emotion recognition is to effectively mine different discriminative local regions. We present a deep architecture to guide the network to extract discriminative and diverse affective semantic information. First, training a full convolutional network with a cross-channel max pooling strategy (CCMP) to extract discriminative feature maps. Second, to ensure that most of the discriminative sentiment regions are located accurately, we add a module consisting of the convolution layer and the CCMP. After obtaining the discriminative regions of the first module, the feature elements corresponding to the discriminant regions are erased, and then the erased features are fed into the second module. Such adversarial erasure operation can force the network to discover different sentiment discriminative regions. Third, an adaptive feature fusion mechanism is proposed to better integrate discriminative and diverse sentiment representations. Sufficient experiments are conducted on the benchmark datasets FI, EmotionROI, Instagram, and Twitter1 to achieve 72.17%, 61.13%, 81.97%, and 85.44% recognition accuracies, respectively. The results of the experiment demonstrate that the proposed network outperforms the state-of-the-art results. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
CITATIONS
Cited by 3 scholarly publications.
Image fusion
Image classification
Visualization
Data modeling
Network architectures
Convolution
Classification systems