Encrypted traffic classification (ETC) plays an important role in network management. In most research, the statistical features, transformed traffic images, or text are used for classification. However, the statistical features’ design is time-consuming and labor-intensive, and the transformed traffic data lack spatial or semantic features. Considering that the headers of traffic packets have a uniform structure and are independent of each other, traffic data are most similar to tabular data. Thus we propose a data processing approach to convert packet headers into traffic tables in which each field is viewed as a column (feature). In addition, traffic data are hard to label in real traffic environments, and each field contributes differently to the classification. Therefore, a self-supervised learning algorithm, SubTab, is used as the baseline network to reduce the reliance on labeled data and assign different weights to different fields. To the best of our knowledge, this is the first time that the ETC problem is solved from the tabular domain. Experimental results on two real-world datasets, ISCX VPN-nonVPN and the self-collected dataset SHU-ET, demonstrate that our method surpasses state-of-the-art methods based on traffic images or text and proves that traffic tables are more suitable for ETC problems. In addition, our method achieves a great performance with only 10% of labeled data and reduces the reliance on labeling data. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Machine learning
Education and training
Data modeling
Performance modeling
Classification systems
Image classification
Feature extraction