Makine öğrenmesi algoritmalarının hibrit yaklaşımı ile ağ anomalisi tespiti
Network anomaly detection with a hybrid approach of machine learning algorthms
- Tez No: 836936
- Danışmanlar: DOÇ. DR. HALİT ÖZTEKİN
- Tez Türü: Yüksek Lisans
- Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
- Anahtar Kelimeler: Ağ güvenliği, denetimli öğrenme, makine öğrenmesi, metasezgisel algoritmalar, KDD Cup 1999, Network security, supervised learning, machine learning, metaheuristic algorithms, KDDCup 1999
- Yıl: 2023
- Dil: Türkçe
- Üniversite: Sakarya Uygulamalı Bilimler Üniversitesi
- Enstitü: Lisansüstü Eğitim Enstitüsü
- Ana Bilim Dalı: Elektrik-Elektronik Mühendisliği Ana Bilim Dalı
- Bilim Dalı: Belirtilmemiş.
- Sayfa Sayısı: 64
Özet
İnternet, insanların iletişim kurması, bilgiye erişimi sağlaması, ticaret yapması ve birçok günlük aktivitesini gerçekleştirmesi için hayati bir öneme sahiptir. Ancak, bu artış beraberinde siber saldırılar ve tehditlerin de artmasına neden olmuştur. Siber saldırganlar her geçen gün daha sofistike yöntemler geliştirerek kişisel verileri çalmak, sistemlere zarar vermek veya hizmetleri engellemek gibi kötü niyetli eylemlerde bulunmaktadır. Bu durum, siber güvenlikte tespit sistemlerinin önemini daha da artırmıştır. Özellikle network anomali tespiti gibi sistemler, ağ trafiğindeki normal davranışları öğrenerek beklenmeyen veya anormal aktiviteleri tespit edebilmektedir. Bu sayede saldırıların erken aşamada tespit edilmesi ve önlenmesi sağlanmaktadır. Bu teknolojiler, bireylerin ve organizasyonların bilgilerini koruyarak dijital saldırıların potansiyel hasarını minimize etmeye yardımcı oluyor. Bu nedenle, siber güvenlik algılama sistemlerine yönelik araştırmalar kritik bir değere sahiptir. Hibrit modeller ile siber saldırı tespitinin yüksek başarıyla yapıldığı gözlemlenmiştir. Ağ anamolisinde kullanılan makine öğrenmesi algoritmalarının performansları genellikle KDD Cup 1999 veri kümesi üzerinde değerlendirilmiştir. Araştırmada, genellikle yüksek doğruluk seviyeleri gösterdikleri ve literatürde sıkça tercih edildikleri için Karar Ağacı (DT), Lojistik Regresyon (LR), Naive Bayes (NB), Rastgele Orman (RF) ve En Yakın Komşu (KNN) makine öğrenimi yöntemleri test edilmiştir. Veri madenciliği ve makine öğrenimi tekniklerinin ağ güvenliği alanındaki etkinliğini değerlendirmek amacıyla iki farklı hibrit öznitelik indirgeme yöntemi olan PCA + RFECV ve RFECV + FS yöntemleri karşılaştırılmıştır. PCA + RFECV yönteminde, temel bileşen analizi ile boyut indirgeme yapılmış ve ardından Recursive Feature Elimination with Cross-Validation (RFECV) yöntemi ile en iyi öznitelikler seçilmiştir. Değerlendirme metrikleri olarak, Çapraz Doğrulama ve ROC eğrileri tercih edilmiştir; bu metriklerin seçimi, algoritmaların performansının kapsamlı ve objektif bir şekilde analiz edilmesini sağlaması amacıyla yapılmıştır. Öznitelik indirgeme uygulanmadan, RF sınıflandırıcısı %98,15 ile en yüksek doğrulukta iken, KNN %96,31 doğruluk, %97,41 kesinlik, %95,24 duyarlılık ve %96,31 F1 skoru ile dikkat çekmiştir. PCA + RFECV (Tablo 4.5) uygulamasında, KNN'nin metrikleri benzer kalmış fakat NB sınıflandırıcısında %61,93 doğruluk ile büyük bir düşüş gözlemlenmiştir. RFECV + FS (Tablo 4.7) kullanıldığında, KNN %96,68 doğruluk, %97,76 kesinlik, %95,64 duyarlılık ve %96,69 F1 skoru ile öne çıkmıştır, bu da öznitelik indirgeme yöntemlerine duyarlılığını vurgulamaktadır. Sonuçlar, öznitelik seçiminin sınıflandırma performansındaki kritik rolünü vurgulamakta olup, veri kümesinin boyutunu azaltma, anlamlı öznitelikleri seçme ve hibrit yöntemler kullanma stratejilerinin sınıflandırma performansını artırabileceğini ortaya koymaktadır.
Özet (Çeviri)
The Internet has become an essential medium for people to communicate, access information, conduct business, and carry out many daily activities. However, this surge has also led to a rise in cyberattacks and threats. Cyber adversaries are increasingly devising sophisticated methods to steal personal data, harm systems, or disrupt services with malicious intent. This escalation underscores the critical importance of cybersecurity detection systems. Particularly, network anomaly detection systems, which learn normal behavior patterns in network traffic, can identify unexpected or anomalous activities. This facilitates the early detection and mitigation of attacks. Strengthening and developing cybersecurity detection systems is of paramount importance in today's digital landscape. These systems safeguard individual users and institutions by protecting their data and information, thereby minimizing potential damages from cyberattacks. Moreover, they prevent service interruptions, ensuring the seamless operation of the Internet. In this context, research and studies on cybersecurity detection systems hold immense significance. The process of anomaly detection, aimed at identifying unexpected or deviant behaviors in datasets, is conducted through various techniques, prominently including machine learning, statistical methods, and data analysis techniques. This detection primarily focuses on identifying values that are either above or below the norm and holds critical importance across various domains. Anomaly detection techniques are primarily categorized into three main types: Point Anomaly, which defines situations where a single data point significantly deviates from the rest, such as an unexpected high transaction amount in a bank account. Contextual Anomaly pertains to the identification of a data point that is abnormal in relation to other data within a specific context; this could involve an unexpected change in network traffic. Collective Anomaly refers to the deviation of a combination of multiple features or attributes from the general behavior pattern; employee performance evaluations serve as an example for this kind of analysis. Applications of these detection methods span a wide range, from optimizing business processes to identifying potential threats. Network Anomaly Detection is utilized to identify unexpected behaviors in computer networks. It operates based on three main methods: Signature-Based, which detects pre-defined patterns; Behavior-Based, which distinguishes between normal and abnormal behaviors using statistical parameters; and Machine Learning-Based, which classifies new and unknown anomalies through trained models. These techniques are of critical importance for optimizing the security and performance of networks. Network attacks refer to threats aimed at computer networks and connected devices. The objective of these attacks is to engage in malicious activities such as seizing network resources, stealing data, causing service disruptions, or crashing the system. For instance, DDoS attacks aim to disrupt the service by flooding the network with excessive traffic. UDP Flood attacks can exhaust target system resources by persistently sending a large amount of UDP traffic. Smurf attacks cause service interruptions by bombarding network devices with deceptive ICMP Echo Request messages. Teardrop attacks induce crashes in the target machine by using faulty fragment information. Botnet attacks orchestrate infected devices to create service disruptions. Clickjacking permits malicious actions without the user's knowledge, while DRDoS attacks amplify the attack impact using reflection techniques. On the other hand, malware attacks seize devices with malicious software, and Man-in-the-Middle attacks monitor and alter communication within the network. Ransomware attacks demand payment from users, while password cracking attacks aim to decode passwords. Social engineering attacks target the theft of personal information; whereas SQL injection, XSS, and phishing attacks target websites. ARP, DNS, and IP spoofing attacks misdirect network traffic with the intention of stealing or monitoring information. Ping of Death and SYN Flood attacks target disrupting network services. Popular test datasets used for network anomaly detection include NSL-KDD, UNSW-NB15, CICIDS2017, DARPA 1998, and KDDCUP99. NSL-KDD is an improved version of the KDD Cup 1999 dataset and contains 41 different network attack types. UNSW-NB15 consists of real network traffic data and is a detailed labeled set with 49 features; CICIDS2017 has 80 million event records encompassing 15 network attack types. DARPA1998 includes attacks conducted on a real network along with normal traffic, while KDDCUP99 is based on 5 million data samples obtained from a real computer network and has an imbalanced structure. These datasets serve as significant tools for network security and the training of machine learning algorithms. In this research endeavor, we employed the Anaconda distribution and crafted the code using Python in the Jupyter Notebook environment. Anaconda serves as both a package handler and an environment, encompassing Python and various tools tailored for data manipulation and machine learning. Key libraries like Sklearn, Numpy, and Pandas played a role in our study. The machine used for this research runs a 64-bit Microsoft Windows OS and boasts suitable technical features. • CPU: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 2.42 GHz • RAM: 16 GB In this study, an initial overview of network attacks was presented, followed by an extensive literature review. Conducted researches were thoroughly evaluated, and frequently used supervised machine learning algorithms and metaheuristic algorithms for network anomaly detection were identified. Studies where machine learning and metaheuristic algorithms were used in a hybrid manner were also analyzed. The performance of machine learning algorithms was tested on the KDD Cup 1999 dataset. During the data preprocessing phase, data related to attacks and normal traffic in the dataset were distributed evenly, with attack data labeled as 1 and normal traffic data labeled as 0. Missing values in the dataset were filled using the calculated median value. Categorical attributes (such as protocol_type, flag, service) were digitized using the one-hot encoding method. To scale the data and ensure they are on the same scale, the columns src_bytes and dst_bytes underwent normalization. Furthermore, a correlation matrix was calculated to measure the relationship between the features in the dataset, and highly correlated values were identified. In this case, the PCA method was employed to minimize the relationship between the features. In the study, classification was performed using machine learning algorithms such as Decision Tree (DT), Logistic Regression (LR), Naive Bayes (NB), Random Forest (RF), and K-Nearest Neighbors (KNN). Cross-Validation and ROC curves were employed as evaluation metrics. Additionally, to evaluate the role of data mining and machine learning methods in the field of network security, two distinct hybrid feature reduction methods, namely PCA + RFECV and RFECV + FS, were compared. In the PCA + RFECV method, dimensionality reduction was conducted using principal component analysis, followed by the selection of the best features through the Recursive Feature Elimination with Cross-Validation (RFECV) method. In this method, the Random Forest (RF) classifier was observed to achieve the highest accuracy results, while the K-Nearest Neighbors (KNN) classifier was successful in terms of the precision metric.On the other hand, with the RFECV + FS method, important features were first identified with RFECV, followed by the selection of the best features using the Forward Selection (FS) method. In this method, the KNN classifier stood out with the highest accuracy, precision, sensitivity, and F1 metrics. The results highlight the critical role of feature selection in classification performance, demonstrating that strategies of reducing dataset size, selecting meaningful features, and using hybrid methods can enhance classification performance. This study will be a valuable resource for academics researching network security and industrial organizations.
Benzer Tezler
- Prediction of COVID 19 disease using chest X-ray images based on deep learning
Derin öğrenmeye dayalı göğüs röntgen görüntüleri kullanarak COVID 19 hastalığının tahmini
ISMAEL ABDULLAH MOHAMMED AL-RAWE
Yüksek Lisans
İngilizce
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolGazi ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. ADEM TEKEREK
- Classification of abnormal respiratory sounds using deep learning techniques
Solunum seslerinin derin öğrenme yöntemleri ile sınıflandırılması
AHAMADI ABDALLAH IDRISSE
Yüksek Lisans
İngilizce
2023
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolGazi ÜniversitesiBilgisayar Bilimleri Ana Bilim Dalı
DOÇ. DR. OKTAY YILDIZ
- Design and deployment of deep learning based fuzzy logicsystems
Derin öğrenme tabanlı bulanık sistemlerin geliştirilmesi ve uygulanması
AYKUT BEKE
Doktora
İngilizce
2023
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrolİstanbul Teknik ÜniversitesiKontrol ve Otomasyon Mühendisliği Ana Bilim Dalı
DOÇ. DR. TUFAN KUMBASAR
- Hibrid makine öğrenmesi teknikleri ile yol yüzey durumunun modellenmesi
Modeling the roadway surface status by hybrid machine learning techniques
BEGENCH YARMATOV
Yüksek Lisans
Türkçe
2017
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolSüleyman Demirel ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. OKAN BİNGÖL
PROF. DR. SERDAL TERZİ
- Derin öğrenme ile içerik tabanlı siber tehdit tespiti
Content-based cyber threat detection with deep learning
EMRE KOÇYİĞİT
Yüksek Lisans
Türkçe
2021
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolYıldız Teknik ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. BANU DİRİ