Yüksek çözünürlüklü görüntülerde derin öğrenme tabanlı nesne tespiti için yeni bir önişleme yöntemi geliştirilmesi

Development of a new preprocessing method for deep learning based object detection in high resolution images

PDF İndir

Tez No: 905307
Yazar: MUHAMMED TELÇEKEN
Danışmanlar: PROF. DR. DEVRİM AKGÜN, PROF. DR. SEZGİN KAÇAR
Tez Türü: Doktora
Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2024
Dil: Türkçe
Üniversite: Sakarya Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Bilgisayar Mühendisliği Ana Bilim Dalı
Bilim Dalı: Bilgisayar Mühendisliği Bilim Dalı
Sayfa Sayısı: 93

Özet

Son zamanlarda yüksek çözünürlüklü uydu ve insansız hava aracı (İHA) görüntüleri, güvenlik, trafik analizi, afet yönetimi ve çevre izleme gibi alanlarda giderek popüler hale gelmektedir. Ancak, pratikte bu tür yüksek çözünürlüklü görüntülerde küçük nesnelerin tespitinde, düşük piksel yoğunluğu ve ayrıntıların kaybolması gibi problemlerden dolayı başarım düşmektedir. Bundan dolayı, küçük nesneleri daha doğru bir şekilde tespit edebilecek, yüksek doğruluk oranına sahip yeni yöntemlerin geliştirilmesine ihtiyaç duyulmaktadır. Bu tez çalışmasında, yüksek çözünürlüklü görüntülerde nesne tespiti için yeni bir derin öğrenme tabanlı algoritma geliştirilmiştir. Tezin temel amacı, yüksek çözünürlüklü uydu ve insansız hava aracı (İHA) görüntülerinde küçük nesnelerin tespitinde karşılaşılan zorlukları aşmak ve nesne tespit performansını artırmaktır. Önerilen yöntem, yeni geliştirilmiş ISA (Image Slicing Algorithm) adı verilen ön işleme algoritması, güncel olarak kullanılan SAM (Segment Anything) ile etiket optimizasyonu, çözünürlüğü iyileştiren SRGAN (Super-Resolution Generative Adversarial Network) ve YOLO (You Only Look Once) modellerinin birleştirilmesi ile oluşturulan SROD (Super-Resolution Object Detection) Mimarisinden oluşmaktadır. Önerilen yöntem, özellikle küçük nesnelerin tespiti sırasında kaybolan piksel verilerinin korunmasını ve nesnelerin daha yüksek doğrulukla algılanmasını sağlamaktadır. Tezde önerilen yöntem şu şekilde çalışmaktadır: ilk olarak, yüksek çözünürlüklü görüntülerdeki nesnelerin doğru bir şekilde dilimlenmesi için ISA algoritması kullanılmaktadır. Bu algoritma, dilimleme sırasında nesnelerin bütünlüğünü koruyarak, nesne tespiti için veri hazırlama sürecinde önemli bir rol oynamaktadır. Dilimleme işlemi sırasında oluşan etiketleme hatalarının kontrol edilmesi ve düzeltilmesi için SAM segmentasyon modeli kullanılmıştır. Bu sayede, dilimlenen görüntülerdeki nesnelerin etiketleme doğruluğu ve modelin performansı artırılmıştır. Daha sonra, SRGAN modeli ile görüntülerin çözünürlüğü artırılarak, YOLO tabanlı nesne tespit modellerine çözünürlüğü iyileştirilmiş giriş verileri sağlanmıştır. SRGAN modeli, görüntülerdeki detay seviyesini artırarak, özellikle küçük nesnelerin daha iyi tespit edilmesini sağlamaktadır. Çalışmada hava görüntülerinde sıklıkla kullanılan xView ve VisDrone veri setleri kullanılmıştır. Bu veri setleri, çok sınıflı ve karmaşık görüntüler içermektedir. Bu özellikleri ile önerilen sistemin performansını test etmek için uygun bir problem sunmaktadır. Tezdeki deneysel çalışmalar YOLOv5, YOLOv7, YOLOv8 ve, YOLOv9 gibi farklı YOLO versiyonları kullanılarak yapılmıştır. Elde edilen deneysel sonuçlar, önerilen yöntemin nesne tespit doğruluğunu önemli ölçüde geliştirdiğini ve mevcut yöntemlere kıyasla daha yüksek doğruluk oranları sağladığını göstermektedir. Tezde geliştirilen diğer önemli bir yöntem ise etiket dönüşüm algoritmasıdır. Bu algoritma, maske etiketli veri setlerinin YOLO formatına uygun hale getirilmesini sağlamaktadır. Etiket dönüşüm algoritması sayesinde, farklı veri setlerinde kullanılan etiketleme formatlarının uyumsuzluk sorunları çözülmüş ve YOLO modellerinde kolaylıkla kullanılabilir hale getirilmiştir. Sonuç olarak, bu tez çalışması, yüksek çözünürlüklü görüntülerde nesne tespiti alanında önemli bir katkı sunmakta ve özellikle küçük nesnelerin tespitinde karşılaşılan zorlukları minimize eden yeni bir yöntem önermektedir. Bu yöntemin, güvenlik, trafik analizi, afet yönetimi ve çevre izleme gibi alanlarda kullanım potansiyeli bulunmaktadır. Geliştirilen yöntemlerin literatüre katkısı, nesne tespiti alanındaki performans iyileştirmeleri ve çoklu veri setlerinde elde edilen başarılı sonuçlar ile kanıtlanmıştır.

Özet (Çeviri)

Recently, high-resolution satellite and Unmanned Aerial Vehicle (UAV) images have become increasingly popular in areas such as security, traffic analysis, disaster management and environmental monitoring. Such images enable the collection of critical information by scanning large areas in great detail, thus contributing to data-driven decision-making processes. However, in practice, the performance of detecting small objects in such high-resolution images is degraded due to problems such as low pixel density and loss of details. This poses a significant challenge, especially for security and monitoring applications where critical analyses are performed. Therefore, there is a need to develop new methods with high accuracy that can detect small objects more accurately. In this thesis, the focus is on deep learning-based object detection in high-resolution images, with a particular emphasis on addressing the challenges encountered in detecting small objects in satellite and UAV imagery. A novel algorithm has been developed to optimize this detection process, offering new solutions to these challenges. The increasing use of high-resolution imagery, especially in areas such as urban planning, traffic monitoring, disaster management, and security, has heightened the importance of image-processing-based decision support systems. However, detecting small objects in such images poses significant challenges due to scale variations and large data sizes. The method proposed in this thesis presents a new approach to overcome these difficulties, making the object detection process both more efficient and accurate. One of the key components of this study is a specialized slicing algorithm called the Image Slicing Algorithm (ISA). ISA works by slicing high-resolution images into more manageable sizes while minimizing data loss during object detection. Traditional methods, which resize images, often result in the loss of details for small objects, adversely affecting detection accuracy. By slicing the images, the ISA algorithm addresses these issues and preserves the integrity of objects during the process. As a result, both large and small objects can be detected more effectively. High-accuracy small object detection is crucial, especially in critical fields such as security and disaster management, and ISA was developed to achieve this level of precision. The advantages of ISA extend beyond this. Another important function of the algorithm is its integration with the Segment Anything Model (SAM), which controls label accuracy during the slicing process. SAM checks the accuracy of object labels in sliced images using segmentation masks and automatically corrects any detected errors. By preserving label accuracy after slicing, SAM improves the quality of the training data, significantly reducing failure rates during the training of object detection models. Given the complexity of the images and the size of the objects, this automatic correction mechanism plays a crucial role in enhancing detection accuracy. Another significant contribution of the thesis is the integration of the Super-Resolution Generative Adversarial Network (SRGAN) model, which enhances object resolution, with the You Only Look Once (YOLO) model used for object detection. The developed Super-Resolution Object Detection (SROD) ar combines SRGAN with YOLO to restore lost details in low-resolution images, enabling clearer detection of small objects. SRGAN enriches the data for object detection by enhancing the resolution of images through deep learning techniques, which is especially important for small object detection. YOLO, known for its speed and accuracy in object detection, when combined with SRGAN in the SROD algorithm, allows for more accurate and faster detection of objects in satellite and UAV imagery. The datasets used in this thesis were carefully selected to evaluate the real-world applicability of the object detection algorithms. The xView dataset, composed of satellite images, covers different geographical regions and environmental conditions worldwide. This dataset stands out due to its wide variety of objects and presents complex images requiring multi-class detection. The VisDrone dataset consists of high-resolution images captured by UAVs under various weather conditions and environments. Both datasets provide challenging test environments for processing and evaluating high-resolution images. Experiments conducted on these datasets showed that the proposed SROD algorithm yielded significant performance improvements compared to existing methods. Specifically, experiments on the xView dataset using YOLOv5 achieved the highest accuracy rates, while YOLOv8 produced the best results on the VisDrone dataset. Another important component of the thesis is the label conversion algorithm developed to transform label formats used in the datasets. This algorithm allows mask-based labels to be adapted to the YOLO format, facilitating the integration of various datasets with deep learning models. By ensuring compatibility between different dataset formats, the algorithm broadens the applicability of YOLO models. This conversion algorithm has great potential for use in various fields, including medical imaging and industrial applications. Developed to address the incompatibility issues of different labeling formats in datasets, this algorithm serves as a tool that extends the use of deep learning models. The thesis does not only develop new algorithms but also tests their performance through comprehensive experimental studies. The results reveal significant advancements in the field of object detection. In particular, the resolution enhancement provided by SRGAN and the slicing process offered by ISA play a major role in improving the overall accuracy of the model. According to the results, the developed SROD algorithm achieved much higher success rates compared to existing methods. This success is not limited to satellite images but also extends to other high-resolution datasets, such as UAV imagery. The performance evaluations of the algorithms developed in this thesis have been comprehensively analyzed through various experimental results. These experiments compare both existing methods and the proposed algorithms, clearly demonstrating the contributions of the developed approach to the field of object detection. The integrated use of the ISA, SAM, and SROD algorithms has provided a significant advantage, especially in detecting small objects. To evaluate the system performance, the complexity matrices were examined, which provide a comprehensive summary of the classification accuracy of the models. The confusion matrix is a very important tool for evaluating the performance of the classification models, as it provides a detailed breakdown of the predicted and true class labels, highlighting the proportion of correct and incorrect predictions for each class. Therefore, the Average Precision (AP) for each class is examined, allowing the identification of classes that are more frequently misclassified or confused with others. This analysis provided valuable information about the model's strengths and areas for improvement. Experimental studies show that the ISA algorithm substantially improves object detection during the slicing process in high-resolution images. Unlike traditional methods, where data loss occurs during resizing, the ISA algorithm preserves the integrity of objects, leading to more accurate detection results in satellite and UAV imagery. For instance, in experiments conducted on the xView dataset, the detection of small objects improved by 20% due to the slicing performed with ISA. This improvement is attributed to the preservation of details that are typically lost when high-resolution images are resized. SAM's label verification and correction processes also contributed to this performance increase, significantly reducing the rate of incorrectly labeled objects. The SROD algorithm, developed in this thesis, achieved significant performance improvements in object detection thanks to the resolution enhancement capability of the SRGAN model. By increasing the resolution of small objects, SRGAN made their details more visible, allowing the YOLO model to detect objects more accurately. Experimental results indicate that the integration of SRGAN with YOLO produces better outcomes than using YOLO alone for object detection. In studies conducted on the VisDrone dataset, the SROD algorithm showed accuracy improvements particularly in detecting small objects such as vehicles and pedestrians. Tests with YOLOv5 and YOLOv8 models demonstrated a substantial performance increase due to SRGAN's detail-enhancing effect. In particular, the YOLOv8 model integrated with SRGAN produced higher mAP (mean Average Precision) results for small object detection on the VisDrone dataset. The findings of this thesis have broad application potential, especially in fields such as security, urban planning, traffic monitoring, and disaster management. High-accuracy object detection is a factor that directly contributes to operational efficiency and decision-making processes. For instance, tasks such as monitoring traffic flow in a city or detecting accident or congestion points can be performed quickly and accurately with such advanced object detection systems. Similarly, post-disaster damage assessment can be made more efficient through rapid analysis of high-resolution satellite and UAV images. In this context, the methods developed in the thesis offer significant opportunities for real-world applications in various fields. In conclusion, this thesis makes significant contributions to the field of object detection in high-resolution images and expands the use of deep learning-based methods in this area. The developed algorithms not only improve object detection performance but also offer substantial enhancements in data processing and label validation processes. The findings of this study set a new standard for object detection for high-resolution images, laying the foundation for future research and applications. Also, methods to reduce computational load without sacrificing detection accuracy, such as using model compression techniques or more efficient neural network architectures, offer potential for future work. Testing combinations of these approaches can yield a robust, resource-friendly model without compromising detection performance. The widespread application of these methods will strengthen their place in the literature and enable the development of more advanced technologies in this field.

Benzer Tezler

Tez No
901576
Efficient deep learning approaches for signal and image analysis applications
Sinyal ve görüntü analizi uygulamaları için verimli derin öğrenme yaklaşımları
ONUR CAN KOYUN
Doktora
İngilizce
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
Bilgisayar Bilimleri Ana Bilim Dalı
PROF. DR. BEHÇET UĞUR TÖREYİN
Tez No
753612
Single-frame and multi-frame super-resolution on remote sensing images via deep learning approaches
Derin öğrenme yaklaşımlarıyla uzaktan algılama görüntülerinde tek çerçeve ve çok çerçeve süper çözünürlük
PEIJUAN WANG
Doktora
İngilizce
2022
İletişim Bilimleri İstanbul Teknik Üniversitesi
İletişim Sistemleri Ana Bilim Dalı
PROF. DR. ELİF SERTEL
Tez No
498115
Moving object tracking by regularization via sparsity in wide area aerial video
Hava aracından çekilmiş geniş alan videolarında seyreklik tabanlı regülarizasyon ile hareketli nesne takibi
ERDEM ONUR ÖZYURT
Yüksek Lisans
İngilizce
2017
Elektrik ve Elektronik Mühendisliği İstanbul Teknik Üniversitesi
Elektronik ve Haberleşme Mühendisliği Ana Bilim Dalı
PROF. DR. BİLGE GÜNSEL KALYONCU
Tez No
790277
Metasezgisel algoritmalar ve derin öğrenme kullanılarak çok kaynaklı görüntü füzyonu
Multi-source image fusion using metaheuristic algorithms and deep learning
ASAN IHSAN ABAS ABAS
Doktora
Türkçe
2023
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Selçuk Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. NURDAN BAYKAN
Tez No
885675
On real-world face super-resolution and face image synthesis evaluation
Gerçek dünya yüz süper çözünürlüğü ve yüz görüntüsü sentezi değerlendirmesi üzerine
ERDİ SARITAŞ
Yüksek Lisans
İngilizce
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. HAZIM KEMAL EKENEL

Geri Dön