French summaries
针对网络入侵检测中特征不足和类别不平衡问题,提出一种通用特定自编码器架构,从给定特征中提取有用特征,并在CICIDS2017数据集上取得新基准结果。
The rapid growth of network-related services in the last decade has produced a huge amount of sensitive data on the internet. But networks are very much prone to intrusions where unauthorized users attempt to access sensitive information and even disrupt the system. Building a competent network intrusion detection system (IDS) is necessary to prevent such attacks. IDSs generally use machine learning algorithms for classifying the attacks. But the features used for classification are not always suitable or sufficient. Besides, the number of intrusions is much less than the number of non-intrusions. Hence naive approaches may fail to provide acceptable performance due to this class imbalance. To counter this problem, in this paper, we propose a model that extracts useful features from the given features and then uses a deep learning algorithm to classify the intrusions. It is to be noted that underlying data points cannot be thought of as sampled from the same distribution, rather from two different distributions - one generic to all network intrusions, and the other specific to the domain. Keeping this fact in mind, we propose a unique Generic-Specific autoencoder architecture where the generic one learns the features that are common across all forms of network intrusions, and the specific ones learn features that are pertaining only to that domain. The model has been evaluated on the CICIDS2017 dataset, which is the largest dataset of this type available online, and we have set new benchmark results on this dataset. Source code of this work is available at: https://github.com/SoumyadeepThakur/Intrusion-AE