TY - JOUR
T1 - Comparative evaluation of data imbalance addressing techniques for CNN-based insider threat detection
AU - Al-Shehari, Taher
AU - Kadrie, Mohammed
AU - Al-Mhiqani, Mohammed Nasser
AU - Alfakih, Taha
AU - Alsalman, Hussain
AU - Uddin, Mueen
AU - Ullah, Syed Sajid
AU - Dandoush, Abdulhalim
N1 - Funding Information:
This research was supported by the Researchers Supporting Project Number (RSP2024R244), King Saud University, Riyadh, Saudi Arabia.
Publisher Copyright:
© The Author(s) 2024.
PY - 2024/12/1
Y1 - 2024/12/1
N2 - Insider threats pose a significant challenge in cybersecurity, demanding advanced detection methods for effective risk mitigation. This paper presents a comparative evaluation of data imbalance addressing techniques for CNN-based insider threat detection. Specifically, we integrate Convolutional Neural Networks (CNN) with three popular data imbalance addressing techniques: Synthetic Minority Over-sampling Technique (SMOTE), Borderline-SMOTE, and Adaptive Synthetic Sampling (ADASYN). The objective is to enhance insider threat detection accuracy and robustness in imbalanced datasets common to cybersecurity domains. Our study addresses the lack of consensus in the literature regarding the superiority of data imbalance addressing techniques in this field. We analyze a human behavior-based dataset (i.e., CERT) that reports users' Information Technology (IT) activities with a substantial number of samples to provide a clear conclusion on the effectiveness of these balancing techniques when coupled with CNN. Experimental results demonstrate that ADASYN, in conjunction with CNN, achieves a ROC curve of 96%, surpassing SMOTE and Borderline-SMOTE in enhancing detection accuracy in imbalanced datasets. We compare the results of these three hybrid models (CNN + imbalance addressing techniques) with state-of-the-art selective studies focusing on ROC, recall, and accuracy measures. Our findings contribute to the advancement of insider threat detection methodologies.
AB - Insider threats pose a significant challenge in cybersecurity, demanding advanced detection methods for effective risk mitigation. This paper presents a comparative evaluation of data imbalance addressing techniques for CNN-based insider threat detection. Specifically, we integrate Convolutional Neural Networks (CNN) with three popular data imbalance addressing techniques: Synthetic Minority Over-sampling Technique (SMOTE), Borderline-SMOTE, and Adaptive Synthetic Sampling (ADASYN). The objective is to enhance insider threat detection accuracy and robustness in imbalanced datasets common to cybersecurity domains. Our study addresses the lack of consensus in the literature regarding the superiority of data imbalance addressing techniques in this field. We analyze a human behavior-based dataset (i.e., CERT) that reports users' Information Technology (IT) activities with a substantial number of samples to provide a clear conclusion on the effectiveness of these balancing techniques when coupled with CNN. Experimental results demonstrate that ADASYN, in conjunction with CNN, achieves a ROC curve of 96%, surpassing SMOTE and Borderline-SMOTE in enhancing detection accuracy in imbalanced datasets. We compare the results of these three hybrid models (CNN + imbalance addressing techniques) with state-of-the-art selective studies focusing on ROC, recall, and accuracy measures. Our findings contribute to the advancement of insider threat detection methodologies.
KW - ADASYN
KW - CNN
KW - Data imbalance addressing
KW - Deep learning
KW - Insider threat detection
KW - SMOTE
UR - http://www.scopus.com/inward/record.url?scp=85206962382&partnerID=8YFLogxK
U2 - 10.1038/s41598-024-73510-9
DO - 10.1038/s41598-024-73510-9
M3 - Article
C2 - 39433789
AN - SCOPUS:85206962382
VL - 14
JO - Scientific Reports
JF - Scientific Reports
SN - 2045-2322
IS - 1
M1 - 24715
ER -