Farklı mutasyon stratejileri kullanılarak XML temelli sistemlerde güvenlik zafiyetlerinin fuzz testi ile tespiti

Detection of security vulnerabilities in XML-based systems with fuzz testing using different mutation strategies

PDF İndir

Tez No: 956984
Yazar: ŞERAFETTİN ŞENTÜRK
Danışmanlar: PROF. DR. NEJAT YUMUŞAK, DR. VAHİD GAROUSİ
Tez Türü: Doktora
Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2025
Dil: Türkçe
Üniversite: Sakarya Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Bilgisayar Mühendisliği Ana Bilim Dalı
Bilim Dalı: Bilgisayar Mühendisliği Bilim Dalı
Sayfa Sayısı: 114

Özet

Fuzz testleri, yazılım sistemlerindeki hataları ve güvenlik açıklarını tespit etmek için kullanılan otomatize edilmiş işlemler bütünü olarak bilinir. Genel itibarıyla fuzz testleri üçe ayrılmaktadır; kara kutu testler, beyaz kutu testler ve gri kutu testler. Fuzz testlerini başlatmak için kullanılan girdi verilerinin üretimi açısından ise fuzz testleri ikiye ayrılır; birisi dilbilgisi tabanlı, diğeri ise mutasyon tabanlıdır. Dilbilgisi tabanlı fuzz testi, bir spesifikasyondan girdiler üretir ve karmaşık yapılandırılmış girdiler alırken, mutasyon tabanlı fuzz, iyi biçimlendirilmiş başlangıç dosyalarını ve soyut sözdizimi ağaçlarını rastgele değiştirerek girdiler üretir. Dilbilgisi veya mutasyon tabanlı çalışan fuzz testi yaklaşımları vardır, ancak bunların hata tespit yeteneklerini karşılaştıran yeterli sayıda vaka çalışması yoktur. Bu alandaki deneysel kanıtlara katkıda bulunabilmek için, bu vaka çalışması gri kutu fuzz testini farklı mutasyon stratejileriyle karşılaştırarak üç açıdan etkinliklerini değerlendirir: hata tespit etkinliği, hata tespit performansı ve tespit edilen hata türleri. Ayrıca, farklı başlangıç oluşturma tekniklerinin fuzz testinin etkinliğine olan etkilerini araştırıyoruz. Fuzz testi yürütmeleri için üç farklı test edilen sistem seçilmiştir; XML ayrıştırıcıları, Libxml2, Apache Xerces ve Expat. Fuzz testlerini uygulamak için farklı mutasyon stratejilerine sahip iyi bilinen fuzz araçları olan AFL (American Fuzzy Lop) ve Superion aracı kullanılmıştır. Genele açık olan tohumlar ve PCSG (Olasılıksal Bağlam Duyarlı Dilbilgisi) tabanlı tohumlar kullanılarak tohum üretiminin fuzz testlerine olan etkilerini bulmak için araştırma yürütülmüştür. Hata tespiti etkinliği ve performansı açısından, çökme sayısı ve üretilen test vaka dosya sayısı ölçülmüştür. Kullanılan mutasyon stratejileri açısından, sonuçlar bit/bayt düzeyindeki mutasyon stratejisinin ağaç düzeyindeki mutasyon stratejisinden daha fazla hata tespit ettiğini göstermektedir. Fuzz testleri sonuçlarına göre, PCSG tabanlı tohumlar genele açık olanlardan daha fazla çökme tespit etmeye yardımcı olmaktadır. Test vaka dosyası üretimi açısından, genele açık olarak seçilen tohumlara kıyasla PCSG tabanlı tohumlar için daha az test vakası üretilirken, bit/bayt düzeyindeki mutasyonlar ağaç düzeyindeki mutasyonlara kıyasla daha fazla test vakası ile sonuçlanmaktadır. Ampirik sonuçlar, fuzz testlerinin çökme tespit yeteneklerinin kullandığı mutasyon stratejisine göre önemli ölçüde farklılık gösterdiğini ortaya çıkarmıştır. Fuzz testleri, yazılımın çalışamaz hale geldiği durumları ve güvenlik açıklarını otomatik şekilde bulmak için yaygın olarak kullanılan etkili bir tekniktir. Fuzz testlerinde, sistem başka bir program tarafından oluşturulan sürekli ve rastgele test durumlarıyla test edilir. Aynı zamanda, sistem bu girdiyi işlerken herhangi bir hatayı ortaya çıkarmak için izlenir. Bu nedenle, her fuzz test metodunun yürütme motorunda çalıştırılacak test durumlarını otomatik olarak oluşturması gerektiğinden girdi oluşturma, test süreci için önemlidir. Fuzz testlerinde girdi ve test senaryoları oluşturmak için iki temel biçim vardır: mutasyon ve dilbilgisi tabanlı girdi oluşturma. Birincisi girdi biçiminin farkındayken, ikincisi değildir. Birincisi bir referans girdisinde dönüşümler içerirken, ikincisi sistem spesifikasyonunu kullanır. Mutasyon tabanlı fuzz test metotları çoğunlukla kapsama alanından yoksundur ve beklenenden farklı girdilerle daha az etkilidir. Mutasyon tabanlı yaklaşımlarda, girdideki değişiklik bit/bayt düzeyinde gerçekleştirilir. Bu yöntemler, yeni dosyalar oluşturmak için bazı bitleri rastgele çevirir, siler veya kopyalar. Ancak, rastgele bit çevirmelerinin karmaşık dosya biçimlerini işleyen uygulamalar için geçerli dosyalar üretmesi olası değildir. Bu nedenle, mutasyona dayalı fuzz testi, iyi yapılandırılmış girdileri (örneğin XML, JavaScript) işleyen programlar için pek uygun değildir. Bu tür programlar girdileri üç temel aşamada alır ve işler; sözdizimi ayrıştırma, anlamsal kontrol ve uygulama yürütme. Mutasyona dayalı fuzz testleri tarafından üretilen çoğu kötü biçimli girdi, sözdizimi ayrıştırmada başarısız olur ve bu nedenle işlemenin erken bir aşamasında reddedilir. Dilbilgisi tabanlı fuzz testleri, girdi modelleri veya veri parçalarının biçim ve bütünlük kısıtlamalarını belirleyen bağlamdan bağımsız dilbilgisi gibi iyi tanımlanmış bir spesifikasyona göre girdiler üretir. Mutasyon tabanlı yaklaşımların aksine, dilbilgisi tabanlı yaklaşımlar fuzz testlerinde sözdizimi ayrıştırma aşamasını daha kolay geçer. Ancak bu yöntemle üretilen test senaryolarının çoğu genellikle anlamsal kontrolden geçmez. Bununla birlikte, anlamsal olarak geçerli test senaryolarının sistematik olarak üretilmesi genellikle çözülmesi oldukça masraflı ve zor bir durumdur. Bu aynı zamanda programın hem sözdizimsel hem de anlamsal özelliklerinin farkında olan akıllı bir fuzz test aracı geliştirmek için yaygın olarak bilinen bir sorundur. Bu nedenle, dilbilgisi tabanlı fuzz testlerinde üretilen girdinin yalnızca küçük bir kısmı, derin hataların gizlendiği uygulama yürütme aşamasına ulaşabilir; bu nedenle, uygulama kodunun çoğuna erişilemez ve sistemde çökmeye neden olan vakaların çoğu açığa çıkarılamaz. Bu sorunun üstesinden gelmek için, PCSG'ye (Olasılıksal Bağlam Duyarlı Dilbilgisi) dayalı olarak çalışan bazı yaklaşımlar vardır. Bu yaklaşımlar, dil bilgisi ve anlam bilim kurallarının bilgisini otomatik olarak çıkarmak ve ardından çökmeye neden olan durumları tetiklemesi beklenen girdiler üretmek için çok sayıda örnek kullanır. Bu akıllı girdilerin, anlam bilimsel ayrıştırma aşamasını kolayca geçebildiği ve uygulama yürütme aşamasına ulaşabildiği için, test edilen uygulamada gizli olan bilinmeyen hata ve çökmeleri tetikleme şansının daha fazla olduğu iddia edilmektedir. Bu nedenle, öğrenme algoritması açısından üretilen akıllı girdilerin etkinliğini, genele açık girdi dosyalarına kıyasla çökmeleri bulma açısından daha iyi performans gösterebileceklerini ortaya çıkarmaya yönelik artan bir ihtiyaç vardır. Bu tez çalışmasında, iyi yapılandırılmış girdiler alan XML ayrıştırıcıları için fuzz testi yürütmeleri gerçekleştirme açısından bu iki tür girdi dosyası arasında kapsamlı bir karşılaştırma yaparak bu ihtiyaca vurgu yapmaktayız. Bu tez çalışması boyunca, fuzz testi yürütmeleri için AFL ve Superion olmak üzere iki ana test aracı kullanıyoruz. AFL, CGF (Kapsam Rehberli Fuzz Testi) prensiplerine göre çalışan son teknoloji gri kutu test aracıdır. Superion, ayrıştırılmış girdilerin soyut sözdizimi ağaçlarını (AST'ler) kullanarak ağaç düzeyinde test girdilerini kırpmak için dilbilgisi farkında bir kırpma stratejisi kullanır. AFLdeki mutasyonlar dilbilgisi körü olup bit/bayt düzeyinde çalışırken, Superiondaki mutasyonlar dilbilgisi farkındadır ve ağaç düzeyinde işlev görür. Ayrıca, AFLdeki dilbilgisi körü mutasyon stratejilerinin etkinliği ve verimliliği engellediği iddia edilmektedir. Öte yandan, Superion iyi yapılandırılmış girdi dosyaları için oldukça uygundur, yani fuzz testi için geçerli girdiler üretebilir. Bu anlamda, üretilen ve genele açık girdi dosyaları için nasıl performans gösterdiklerini bulmak için bit/bayt ve ağaç düzeyindeki mutasyonlar arasında karşılaştırma yapılması da gerekir. Bu açıdan, test çalışmamız ayrıca farklı mutasyon stratejilerinin ve bunların çökme algılama yetenekleri üzerindeki etkilerinin karşılaştırılmasını içermektedir. Giriş oluşturma yöntemlerine ve mutasyon stratejilerine ek olarak, fuzz testlerindeki bir diğer ölçüt kod kapsamıdır. Bu, fuzz testi yürütmesinin test edilen programın farklı bölümlerine ne kadar ulaşabildiğiyle ilgilidir. Fuzz testi araçlarının yeni girdinin kalitesini değerlendirmek ve bu girdinin oluşturulmasını yönlendirmek için bir yardımcı işleve ihtiyacı vardır. Bu yardımcı işlev temelde kod kapsamı olabilir. Bu bağlamda, yardımcı işlev olarak mutasyon testi ile gri kutu fuzz testini yönlendirmek için bazı çalışmalar vardır. Çalışmalara göre, mutasyon testinin etkili gövde giriş dosyaları oluştururken kod kapsamına bir alternatif olduğu gösterilmiştir, bu da bir test gövdesinin yapay olarak enjekte edilen hataları algılama yeteneğine göre değerlendirilebileceği anlamına gelir. Ayrıca çalışma, fuzz testi kalitesini değerlendirmek için bir ölçüt olarak mutasyon puanını kullanır. Gri kutu fuzz testi yaklaşımları için bu çalışma, vaka çalışması araştırması ilkelerine dayanarak yürütülmüştür. Vaka çalışmamızın doğası hem“keşifsel”hem de“iyileştirici”dir; amacımız (fuzz test yaklaşımını geliştirme sürecinde) ne olduğunu bulmak, yeni içgörüler aramak ve takip iyileştirmeleri (fuzz test yaklaşımında) ve araştırma (keşifsel yön) için fikirler ve hipotezler üretmektir. Ayrıca,“çalışılan olgunun”test etkinliğini ve verimliliğini iyileştirmek istediğimiz için çalışmanın“iyileştirici”yönleri vardır.

Özet (Çeviri)

Fuzz tests are known as a set of automated processes used to detect crashes and security vulnerabilities in software systems. In general, fuzz tests are divided into three; black box tests, white box tests and gray box tests. In terms of the generation of input data used to initiate fuzz tests, fuzz tests are divided into two; one is grammar-based and the other one is mutation-based. While the grammar-based fuzzing generates inputs from a specification and takes complex structured inputs, mutation-based fuzzing generates inputs by modifying well-formed seed files and abstract syntax trees randomly. There are fuzzing approaches working grammar or mutation based, but there are not enough case studies comparing their crash detection capabilities. To add to the body of empirical evidence in this area, this case study compares gray-box fuzzing with different mutation strategies to evaluate their effectiveness in three aspects: fault detection effectiveness, fault detection performance and types of faults detected. Also we investigate the affects of different seed generation techniques to fuzzing effectiveness. We perform the fuzzing executions on three Systems Under Test (SUT); XML parsers, Libxml2, Apache Xerces and Expat. To execute fuzzing, we use the well-known fuzzers AFL (American Fuzzy Lop) and Superion which have different mutation strategies. We carry out the investigations for finding the affects of seed generation to fuzzing by leveraging publicly-available seeds and the PCSG (Probabilistic Context Sensitive Grammar) based seeds. In terms of fault detection effectiveness and performance, we measure the number of crashes, and the number of corpus files generated. With respect to mutation strategies used, our results demonstrate that the bit/byte-level mutation strategy detects more crashes than tree-level mutation strategy. According to the fuzzing results, PCSG-based seeds help detect higher number of crashes than publicly-available ones. In terms of corpus file generation, while there is less corpus generated for PCSG based seeds compared to publicly-selected ones, bit/byte level mutations result in more corpus when compared to tree level mutation. Empirical results show that crash detection capabilities of fuzzing differ significantly based on the mutation strategy it uses. Fuzzing is an effective and popular technique for finding crashes and vulnerabilities in an automated way. The system is tested with the continuous and random test cases generated by another program. At the same time, the system is monitored to expose any defects while processing this input. Therefore, input generation is important for the fuzzing process since every fuzzer should automatically generate test cases to be run on the execution engine. There are two fundamental forms to create input and test scenarios in fuzz tests: mutation- and grammar-based input generation. While the first is aware of input format, the latter is not. The former includes transformations on a reference input, while the latter employs system specification. Mutation-based fuzzers mainly lack coverage and are less effective, with inputs that are different from what is expected. In mutation-based approaches, the modification on the input is performed at the bit/byte level. These methods randomly flips, deletes or copies some bits to generate new files. However, random bitflips are not likely to produce valid files for applications processing complex file formats. Therefore, mutation-based fuzzing is not well suited for programs that handle highly structured inputs (e.g. XML, JavaScript). Those types of programs take and process inputs in three basic stages; syntax parsing, semantic checking, and application execution. Most malformed inputs produced by mutation-based fuzz tests fail at syntax parsing and are therefore rejected at an early stage of processing. Grammar-based fuzz tests generate entries according to a well-defined specification, such as input models or context-free grammar that determine the format and integrity constraints of data pieces. In contrary to mutation-based approaches, grammar-based approaches pass the syntax parsing stage in fuzz tests more easily. But most of the test cases produced by this method usually do not pass the semantic check. However, systematic generation of semantically valid test cases is often a highly expensive and difficult problem to solve. This is also a widely known issue for developing an intelligent fuzz testing tool which is aware of both syntactic and semantic features of the program. So, only a small fraction of the input produced in grammar-based fuzz tests can reach the application execution stage, where deep errors are hidden; therefore, most of the application code cannot be accessed and most of the crash inducing cases cannot be exposed. In order to overcome this issue, there are some approaches that work based on the PCSG (Probabilistic Context Sensitive Grammar). Those approaches use a large number of examples to automatically extract knowledge of grammar and semantic rules and then generate seeds that are expected to trigger crash inducing cases. It is claimed that since these smart seeds can pass the semantic parsing stage easily and reach out the application execution phase, it gains more chance to trigger the unknown bugs and crashes hidden in the tested application. For this reason, there is a growing need to show the effectiveness of the smart seeds generated in terms of a learning algorithm that they can outperform in terms of finding crashes compared to publicly-available seed files. In this paper, we address this need by making exhaustive comparison between these two types of seed files in terms of performing fuzzing executions for XML parsers that take highly structured inputs. Throughout this study, we use two main fuzzers, namely AFL and Superion, for the fuzzing executions. AFL is the state-of-the art gray-box fuzzer which functions based on CGF (Coverage Guided Fuzzing) principles. Superion uses a grammar-aware trimming strategy to trim test inputs at the tree level by using the abstract syntax trees (ASTs) of those parsed inputs. Whereas mutations in AFL are grammar-blind and works in bit/byte level, mutations in Superion are grammar-aware and functioning in tree level. It is also claimed that grammar-blind mutation strategies in AFL hinder the effectiveness and efficiency. On the other hand, Superion is well-suited for complex-structured input files meaning that it can produce valid inputs for fuzzing. In this sense, there is also need to compare between the bit/byte- and tree-level mutations to find out how they perform for generated and publicly available seed files. From this angle, our case study also involves comparing different types of mutation strategies and their effects on their crash detection capabilities. In addition to input generation methods and mutation strategies, another metric in fuzzing is code coverage. It is related with how far the fuzzing execution can reach out different parts of the tested program. Fuzzing tools need a helping function to evaluate the quality of the new input and guide the generation of this input. This helping function can be basically code coverage. In this context, there are some studies to guide gray-box fuzzing with mutation testing as a helping function. According to their study, mutation testing has shown to be an alternative to code coverage while generating effective corpus input files, meaning that a test corpus can be evaluated on its ability to detect artificially injected faults. Also the study uses the mutation score as a metric for evaluating fuzzing quality. This study for the gray-box fuzzing approaches is conducted based on the principles of case-study research. The nature of our case study is both“exploratory”and“improving”in that our objective is to find out what happened (in the process of developing the fuzz testing approach), to seek new insights, and to generate ideas and hypotheses for follow-up improvements (on the fuzz test approach) and research (the“exploratory”aspect). Furthermore, the study has“improving”aspects since we wanted to improve test effectiveness and efficiency of the“studied phenomenon”.

Benzer Tezler

Tez No
527720
Energy and daylight performance optimization for high-rise office buildings
Yüksek ofis binaları tasarımı için enerji ve gün ışığı performans optimizasyonu
MUHİTTİN YUFKA
Yüksek Lisans
İngilizce
2018
Mimarlık Yaşar Üniversitesi
Mimarlık Ana Bilim Dalı
PROF. DR. İKBAL SEVİL SARIYILDIZ
Tez No
423959
An ensemble of differential evolution algorithm for real-parameter optimization and its application to multidimensional knapsack problem
Gerçek parametre optimizasyonu için toplu diferensiyel evrim algoritması ve çok buyutlu sırt çantası problemine uygulanması
MERT PALDRAK
Yüksek Lisans
İngilizce
2016
Endüstri ve Endüstri Mühendisliği Yaşar Üniversitesi
Endüstri Mühendisliği Ana Bilim Dalı
PROF. DR. MEHMET FATİH TAŞGETİREN
Tez No
410006
Su dağıtım şebekelerinde ek klorlama planlarının simülasyon optimizasyon modelleri kullanılarak belirlenmesi
Determination of the booster chlorination plans in water distribution networks using simulation optimization models
ŞERİFE GÖKÇE
Yüksek Lisans
Türkçe
2014
İnşaat Mühendisliği Pamukkale Üniversitesi
İnşaat Mühendisliği Ana Bilim Dalı
DOÇ. DR. MUSTAFA TAMER AYVAZ
Tez No
720165
Protein engineering applications of novel esterase enzyme using rational design and directed evolution approaches
Rasyonel tasarım ve yönlendirilmiş evrim yaklaşımlarıyla yeni esteraz enziminin protein mühendisliği uygulamaları
ŞEYMA YILMAZ KÜPOĞLU
Yüksek Lisans
İngilizce
2022
Biyomühendislik İstanbul Teknik Üniversitesi
Moleküler Biyoloji-Genetik ve Biyoteknoloji Ana Bilim Dalı
PROF. DR. NEVİN GÜL KARAGÜLER
Tez No
223057
Directed evolution of industrially important properties of Fusarium oxysporum
Evrimsel mühendislik yöntemi ile Fusarium oxysporumun endüstriyel özelliklerinin geliştirilmesi
HANDE ASIMGİL
Yüksek Lisans
İngilizce
2006
Biyoteknoloji İstanbul Teknik Üniversitesi
Moleküler Biyoloji Ana Bilim Dalı
DOÇ. DR. Z. PETEK ÇAKAR ÖZTEMEL
DOÇ. DR. CANDAN TAMERLER BEHAR

Geri Dön