Kolektif makine öğrenmesi tabanlı ağ saldırı tespiti

Collective machine learning based network intrusion detection

PDF İndir

Tez No: 765497
Yazar: ŞURA EMANET
Danışmanlar: DOÇ. DR. ÖNDER DEMİR, DR. ÖĞR. ÜYESİ GÖZDE KARATAŞ BAYDOĞMUŞ
Tez Türü: Yüksek Lisans
Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2022
Dil: Türkçe
Üniversite: Marmara Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Bilgisayar Mühendisliği Ana Bilim Dalı
Bilim Dalı: Bilgisayar Mühendisliği Bilim Dalı
Sayfa Sayısı: 85

Özet

İnternet kullanımının hızla yayılması ve buna paralel olarak çevrimiçi ortamlarda vakit geçiren kullanıcı sayısının gün geçtikçe fazlalaşması, siber risk ve tehditleri de beraberinde getirmektedir. Kötü amaçlı kullanıcılar bilgi, fikir, para gibi birçok önemli unsurun paylaşıldığı bu ortamlarda bulunan sistem ve uygulamaları önemli ölçüde zarara uğratabilmektedir. Saldırı Tespit Sistemleri (STS), İnternet ortamındaki sistem ve uygulama güvenliğinin sağlanmasında kritik bir role sahiptir. Bu sistemler yardımıyla internet ağında gerçekleşen aktiviteler ve trafik analiz edilerek olası atak, ihlal ve tehditler tespit edilir. Eğitimlerinde klasik yöntemlerin yanı sıra, çok sayıda makine öğrenmesi teknikleri kullanılabilmektedir. Son geliştirilen STS'ler, -dinamik bir güvenlik mekanizması oluşturulabilmesi için- makine öğrenmesi tekniklerinin tercih edildiği çalışmaların sayısının giderek arttığını göstermektedir. Bu çalışmada, öznitelik seçimi ve kolektif öğrenme yöntemlerinden faydalanılarak yüksek doğruluk oranına sahip performanslı bir STS elde etme üzerinde durulmuştur. Kullanılan veri kümesi kalitesinin de doğrudan STS verimliliği üzerinde etkisi olması sebebiyle, veri kümesi olarak saldırı çeşitliliği yüksek, bilinen güncel STS veri kümelerinden olan CIC-CSE-IDS2018 tercih edilmiştir. İlk aşamada, -saldırı tespit sürecinin iyileşmesi ve süresinin azalması adına- öznitelikler Spearman'ın Korelasyon Analizi, Özyinelemeli Öznitelik Seçimi (RFE) ve Ki-Kare Test metotları uygulanarak belirlenmiştir. Belirlenen özniteliklerle oluşturulan yeni veri kümeleri ile orijinal boyuttaki veri kümelerinin karşılaştırılmasında Karar Ağacı, Gradyan Artırma, Adaptif Yükseltme, Lojistik Regresyon, Pasif-Agresif, Ekstra Ağaçlar ve Çok Katmanlı Algılayıcı sınıflandırıcılarından faydalanılmıştır. Yapılan performans denemelerinde Katmanlı 5-Katlamalı Çapraz Doğrulama tekniği kullanılmıştır. Bu tekniğin kullanılması nedeniyle oluşan hesaplama ve zaman maliyetini düşürmek için çok-çekirdekli paralelleştirme (multi-core parallellism) uygulanmıştır. Sonrasında, elde edilen performans sonuçlarının karşılaştırmalı bir analizi yapılmıştır. Sonuçlar, sistem başarımının Spearman'ın korelasyon analizi ve Ki-Kare test yöntemleri ile düştüğünü fakat RFE yöntemi ile arttığını göstermiştir. %98,76 doğruluk oranı ile en başarılı sınıflandırıcı Ekstra Ağaçlar olsa da çalışma süre kriteri göz önünde bulundurulduğunda sırayla %95,15 ve %98,65 doğruluk oranları ile Lojistik Regresyon ve Karar Ağacı sınıflandırıcıları da ön plana çıkmıştır. Pek çok çalışma, topluluk modelini kullanan bir sistemin sınıflandırmada tek bir sınıflandırıcı kullanan sisteme göre daha iyi sonuçlar verebileceğini göstermiştir. Bu sebeple ikinci aşamada, kompleks fakat daha yüksek doğruluk oranı sağlayan bir topluluk modeli oluşturma fikri üzerinde durulmuştur. Sınıflandırma algoritmalarından her birinin faydasını birleştiren“oylama”isimli toplu öğrenme yaklaşımı uygulanarak, ilk aşamada yer alan performans sonuçları üzerinden seçilen sınıflandırıcılar ile kolektif bir model üretilmiştir. Kolektif model için Karar Ağacı, Ekstra Ağaç ve Lojistik Regresyon sınıflandırıcıları seçilmiştir. Sonuçlar, %98,82 doğruluk oranı ile kolektif modelin tek bir sınıflandırıcının bulunduğu bireysel yaklaşımlardan daha üst bir performans gösterdiğini ortaya koymuştur.

Özet (Çeviri)

The fast-moving propagation of internet usage and the corresponding increase in the number of user spending time online bring cyber risks and threats along. Malicious computer users can cause momentous damage to the systems and applications in the internet environment where many important elements such as information, ideas and money are shared. Intrusion Detection Systems (IDSs) have a critical role in ensuring system and application security in the Internet environment. With the help of these systems, activities and traffic on the Internet network are analyzed and possible attacks, violations and threats are detected. In addition to classical methods, many machine learning techniques can be used in their training. Recently developed IDSs show that the number of studies in which machine learning techniques are preferred in order to create a dynamic security mechanism, is increasing day by day. In this study, it is focused on obtaining a high-performance IDS that works with high accuracy by using feature selection and ensemble learning methods. Since the quality of the dataset used has a direct effect on IDS efficiency, CIC-CSE-IDS2018, which is one of an up-to-date IDS dataset known, with a high attack variety, was preferred. In the first stage, the features were determined by applying Spearman's correlation analysis, Recursive Feature Elimination (RFE) and Chi-Square test methods in order to improve attack detection process and reduce its time. Decision Tree, Gradient Boosting, Adaptive Boosting, Logistic Regression, Passive-Aggressive, Extra Trees and Multilayer Perceptron classifiers were used to compare the original datasets with the new datasets consisting of the specified features. Stratified 5-Fold Cross Validation technique was used in performance tests. In order to reduce computational and time cost incurred due to the fact that all experiments were performed with using this technique, multi-core parallelism has been applied. Afterwards, a comparative analysis was made for the performance results obtained. The results showed that, the system performance decreased with Spearman's correlation analysis and Chi-Square test methods, but increased with RFE method. Although the model with the highest performance belongs to the Extra Trees classifier with an accuracy rate of 98.76%, considering the execution time metric, Logistic Regression and Decision Tree classifiers came to the fore with accuracy rates of 95.15% and 98.65%, respectively. Many studies have shown that a system using the ensemble model can give better results in classification than a system using a single classifier. For this reason, in the second stage, the idea of creating a complex but higher accuracy ensemble model was discussed. By applying the ensemble learning approach called“voting”, which combines the benefits of each of the classification algorithms, a collective model was produced with the classifiers selected based on the performance results obtained in the first stage. Decision Tree, Extra Tree and Logistic Regression classifiers were chosen for the collective model. The results revealed that the collective model outperformed the individual approaches consisting of a single classifier, with an accuracy rate of 98.82%.

Benzer Tezler

Tez No
804903
Nesnelerin interneti tabanlı ağ trafiğinde ileri makine öğrenimi ve derin öğrenme yöntemleri ile anomali tespiti
Anomaly detection in internet of things based network traffic with advanced machine learning and deep learning methods
YAĞIZ ONUR KOLCU
Yüksek Lisans
Türkçe
2023
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Afyon Kocatepe Üniversitesi
Bilgisayar Ana Bilim Dalı
DR. ÖĞR. ÜYESİ AHMET HAŞİM YURTTAKAL
Tez No
774531
Evrimsel tabanlı kolektif öğrenme sistemleri kullanarak saldırı tespit sistemleri tasarımı
Intrusion detection systems design using evolutionary basedensemble learning systems
YAHYA BİLİR
Yüksek Lisans
Türkçe
2022
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Fatih Sultan Mehmet Vakıf Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ BERNA KİRAZ
Tez No
887328
Privacy and security enhancements of federated learning
Federe öğrenme uygulamalarında mahremiyet ve güvenlik geliştirmeleri
ŞÜKRÜ ERDAL
Yüksek Lisans
İngilizce
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
Bilişim Uygulamaları Ana Bilim Dalı
PROF. DR. ENVER ÖZDEMİR
DR. FERHAT KARAKOÇ
Tez No
734940
Eğitimde veriye dayalı yönetim uygulamalarının değerlendirilmesi
The evaluation of data driven management applications in education
AYHAN DUYKULUOĞLU
Doktora
Türkçe
2022
Eğitim ve Öğretim Gazi Üniversitesi
Eğitim Bilimleri Ana Bilim Dalı
PROF. DR. NECATİ CEMALOĞLU
Tez No
943086
Essays on economic sentiment: Case of türkiye
Ekonomik algı üzerine çalışmalar: Türkiye üzerine
DİDEM GÜNEŞ
Doktora
İngilizce
2025
Ekonomi Hacettepe Üniversitesi
İktisat (İngilizce) Ana Bilim Dalı
PROF. DR. İBRAHİM ÖZKAN

Geri Dön