On the Analysis and Evaluation of Protocol-Based Attack Traffic Samples for Network Intrusion Detection System (AEPBAT - NIDS)

  • Hasheem Danbatta

Student thesis: Doctoral Thesis


Network intrusion detection systems (NIDS) are widely deployed in variety of real-life applications to protect and secure network against any security breaches. Many researches have been conducted to analyse and classify attack samples, but due to the changing nature in which zero day attacks occurs, most of the past models established with outdated datasets are no longer fit their purpose. In an attempt to strengthen the attack detection in modern settings, this thesis explores, examines and analyses the challenges in developing robust classifiers from the new UNSW-NB15 benchmark data set. The UNSW-NB15 data was created at the University of New South Wales to provide an up-to-date data set on which to build a modern intrusion detection system. The UNSW benchmark claims to be a better representation of modern network traffic than older data, such as KDD99 and its variants. We highlight the difficulties that arise when attempting to build a classifier which can distinguish between normal data and attack data within the UNSW-NB15 dataset. Specifically, we show how the problem of imbalanced data needs to be considered, and investigate a range of sampling strategies that might be useful in addressing the imbalance; we look at the within-sample distributions and show that sub-classes exist which again pose difficulties with sample construction; we demonstrate some of the difficulties encountered when a mixture of numerical and categorical features are present; finally, we show how a local classifier at protocol-level approach leads to the development of an voting classifier, potentially solving the problems identified, and provide an efficient mechanisms for retraining sub-component classifiers. Our proposed model is one of the most efficient and effective model having the highest recall of 100% with support vector machine (SVM), Logistic Regression (LR) and Voting Classifier. The highest precision of 99.22% with gradient Boosting (GB) classifier. Also, our proposed model perform really well especially in identifying rare attacks, such as Worm and Backdoor that are difficult to classify due to their rare occurrences within our dataset samples.
Date of Award5 Jan 2024
Original languageEnglish
SupervisorAndrew Crampton (Main Supervisor) & Simon Parkinson (Co-Supervisor)

Cite this