We aim to enhance the ConvNext model using high-order spatial interactions and structural reparameterization techniques. By comparing with current mainstream convolutional neural networks and Transformer models, a more efficient model structure named Rep2former was designed. The proposed model retains efficient spatial interaction capabilities while avoiding the problem of quadratic complexity, which is achieved by not using self-attention. Furthermore, a balanced adjustment was made in parameter quantity, floating-point operations per second, and accuracy to ensure that the model is suitable for deployment in applications. Experimental evaluations and comparisons were conducted on the ImageNet-1k and CIFAR-10 datasets. The results indicate that the Rep2former model outperformed popular models such as ResNet-50, Swin Transformer, and ConvNext in terms of parameter quantity, computational complexity, and accuracy. Therefore, the proposed model is considered more effective and user-friendly. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Design and modelling
Transformers
Data modeling
Convolution
Performance modeling
Visual process modeling
Visualization