Kanal tabanlı özellik temsili ve derin öğrenmeye dayalı uykululuk sınıflandırması

Drowsiness classification based on channel-based feature representation and deep learning

PDF İndir

Tez No: 947971
Yazar: MUSTAFA RIFAT ÇELİK
Danışmanlar: PROF. DR. ZÜMRAY ÖLMEZ
Tez Türü: Yüksek Lisans
Konular: Elektrik ve Elektronik Mühendisliği, Electrical and Electronics Engineering
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2025
Dil: Türkçe
Üniversite: İstanbul Teknik Üniversitesi
Enstitü: Lisansüstü Eğitim Enstitüsü
Ana Bilim Dalı: Elektronik ve Haberleşme Mühendisliği Ana Bilim Dalı
Bilim Dalı: Elektronik Mühendisliği Bilim Dalı
Sayfa Sayısı: 97

Özet

Uykululuk tespiti, özellikle sürücü güvenliği ve akıllı ulaşım sistemleri açısından kritik öneme sahip bir konudur. Sürücülerin araç başında uyuklamaları, ciddi trafik kazalarına ve can kayıplarına yol açabilmektedir. Bu nedenle, sürücü davranışlarının gerçek zamanlı olarak izlenmesi ve olası risklerin önceden tespit edilmesi, hem bireysel güvenlik hem de genel trafik güvenliği açısından büyük bir gerekliliktir. Bu çalışma, sürücülerin yüz görüntülerini kullanarak uykulu ya da dinç olduklarını sınıflandırmayı amaçlayan derin öğrenme tabanlı bir yöntem önermektedir. Geliştirilen sistem, MobileNetV2 mimarisi üzerine kurulu bir transfer öğrenme modeli temel alınarak tasarlanmıştır. Bu mimari, özellikle mobil cihazlar gibi sınırlı kaynaklara sahip ortamlarda etkili performans göstermesiyle bilinmektedir. Modelin performansını artırmak ve aşırı öğrenme (overfitting) problemini azaltmak amacıyla L2 düzenlileştirme (regularization) teknikleri ve sınıf ağırlıkları (class weighting) kullanılmıştır. Bu sayede, veri kümelerinde karşılaşılan dengesiz sınıf dağılımının da etkisi en aza indirgenmiştir. Önerilen yöntemin en dikkat çekici yönlerinden biri, kanal temelli önişleme yaklaşımıdır. Bu yaklaşımda, RGB renk uzayındaki görüntüler üç farklı görüntü işleme tekniği kullanılarak işlenmektedir. RGB görüntü öncelikle yoğunluk bilgilerini daha belirgin hâle getirmek ve renk bilgisini kaldırarak görüntüyü sadeleştirmek amacıyla gri tonlamaya (grayscale) dönüştürülmüş ve bu sayede yeni bilgi içeren bir kanal elde edilmiştir. Ardından gri tonlu görüntü, görüntüdeki gürültüleri azaltmak ve temel yapıları korumak için Gauss bulanıklığı (Gaussian blur) filtresi ile yumuşatılmış ve bir diğer kanal elde edilmiştir. Ek olarak, yine gri tonlu görüntü Canny kenar belirleme algoritmasıyla işlenmiş ve üçüncü kanal elde edilmiştir. Bu üç farklı önişleme adımı sonucunda elde edilen kanallar birleştirilerek modele daha yapısal ve anlamlı özellikler sunan zengin bir giriş görüntüsü oluşturulmuştur. Modelin başarımını değerlendirmek amacıyla iki farklı veri kümesi kullanılmıştır. Birinci veri kümesi (Veri Kümesi-1), yaklaşık 36.000 uykulu ve 30.000 dinç yüz görüntüsünden oluşmakta ve %80 eğitim, %15 test, %5 doğrulama oranlarında bölünmüştür. Bu veri kümesi, modelin eğitimi ve ilk testleri için temel kaynak olarak kullanılmıştır. İkinci veri kümesi (Veri Kümesi-2) ise akıllı telefon kamerası ile toplanmış 70 görüntüden oluşmaktadır ve yalnızca modelin genelleme kabiliyetini test etmek amacıyla kullanılmıştır. İlk deneyde herhangi bir önişleme yapılmadan model, Veri Kümesi-1 üzerinde eğitilmiş ve test sonuçlarında bu veri kümesi için %99,22 doğruluk oranı elde edilmiştir. Ancak model, Veri Kümesi-2 üzerinde test edildiğinde doğruluk oranı yalnızca %39,34 seviyesinde kalmıştır. Bu durum, modelin sadece eğitim verisiyle sınırlı kaldığını ve yeni ortamlara karşı yeterli genelleme sağlayamadığını ortaya koymuştur. İkinci deneyde ise önişleme katmanı uygulanarak model yeniden eğitilmiş ve bu kez Veri Kümesi-1'in test verisinde %99,01, Veri Kümesi-2'de ise %78,02 doğruluk oranı elde edilmiştir. Bu önemli fark, önişleme adımının modelin genel performansına büyük katkı sağladığını göstermektedir. Sonuç olarak bu çalışma, kanal bazlı önişleme yönteminin, sınırlı kaynaklara uygun derin öğrenme modelleriyle birlikte kullanıldığında hem yüksek doğruluk sağladığını hem de farklı ortamlara karşı güçlü genelleme yeteneği kazandırdığını ortaya koymaktadır. Bu yaklaşım özellikle gerçek zamanlı sürücü izleme sistemleri ve mobil uygulamalar için uygulanabilir ve etkili bir çözüm sunmaktadır.

Özet (Çeviri)

Drowsiness detection is a vital task in the context of driver safety and intelligent transportation systems. Falling asleep at the wheel can lead to serious accidents, property loss, and even fatalities. Therefore, the ability to automatically detect signs of fatigue and intervene in a timely manner is of great importance for both individual and public safety. In this study, a deep learning-based approach is proposed to classify drivers as drowsy or non-drowsy using facial images. To enhance model performance and generalization capability, a channel-based image preprocessing technique is introduced. In this method, RGB images are used and grayscale transformation, Gaussian blur and Canny edge detection algorithm are applied to create the three different channels which have different information. Finally they are recombined to create enriched inputs for the neural network.On the other hand, this problem has attracted increasing attention due to the rise in long-distance commuting, professional driving, and autonomous vehicle development. Real-time detection methods are therefore not only useful but necessary to ensure safer road environments. Developing automated solutions that can operate with minimal human intervention has the potential to greatly reduce accident rates caused by driver inattention and fatigue. Chapter 1 presents the motivation and significance of the study. It discusses the increasing need for automated driver monitoring systems and the role of computer vision in achieving real-time drowsiness detection. The problem is clearly defined, followed by a brief review of relevant literature and a summary of the thesis structure. Chapter 2 focuses on drowsiness detection methodologies. It provides an overview of the physiological and behavioral indicators of sleepiness and classifies existing methods into three main groups: physiological-based approaches (such as ECG or heart rate monitoring), behavioral-based approaches (such as yawning or eye blinking frequency), and visual-based methods that utilize standard camera input. Among these, visual approaches are emphasized for their non intrusive nature, practicality, and applicability in real-time systems. In addition, the thesis explores the specific advantages and trade-offs of each group, noting that while physiological methods offer high accuracy, they are less practical due to their invasiveness, whereas visual approaches offer a good balance between intrusiveness and efficiency. This classification also helps in identifying the most suitable techniques for cost-effective, scalable, and user-friendly solutions in modern vehicles. Chapter 3 explains the image preprocessing methods in detail. It begins with the rationale for preprocessing in visual recognition tasks, especially under varying lighting, camera quality, or environmental noise. The grayscale representation is applied to image to increase contrast and achieve simplification by removing color information. This forms the first channel. The second channel is generated by applying Gaussian blur to the grayscale image to suppress noise while preserving structure. The third channel is obtained by applying the Canny edge detection algorithm to emphasize key facial features. The last part of image processing method is that these channels are then recombined into single image that acts as a rich input for the deep learning model, enabling better feature extraction and the higher classification performance. This multi-channel input formulation simulates the human visual system's tendency to focus on shape, texture, and boundary cues for recognition, thereby improving the discriminative power of the model. The combination of preprocessing filters ensures that both low-level and structural visual features are emphasized before learning, which reduces the dependency on excessive model complexity. Chapter 4 introduces the deep learning-based classification model. The architecture is based on MobileNetV2, a lightweight convolutional neural network optimized for embedded and real-time applications on edge devices. Transfer learning is employed by reusing pretrained weights, while customizing the final layers to fit the binary classification task. To mitigate overfitting, dropout layers and L2 regularization are incorporated. Additionally, batch normalization is applied to stabilize training. Class imbalance is addressed using class weighting, improving the model's ability to learn from underrepresented samples. Moreover, using MobileNetV2 aligns with the goal of creating deployable models in embedded environments where computation and memory are limited. This makes the system highly adaptable to real-world automotive applications. In contrast to large and resource-hungry architectures, MobileNetV2 offers a strong trade-off between speed, accuracy, and model size, making it suitable for integration in vehicular platforms. Chapter 5 includes the simulation setup and experimental results. Two datasets are used for evaluation. The first (Dataset-1) is a large scale labeled dataset sourced from Kaggle, comprising approximately 66,000 images (36,000 drowsy, 30,000 nondrowsy). The second (Dataset-2) includes only 70 facial images collected from realworld scenarios using a smartphone camera. While the first dataset is used for training and validation, the second serves exclusively to test generalization on unseen data. Two models are compared. Model-A, trained without preprocessing, achieved 99.22% accuracy on the test portion of Dataset-1 but performed poorly on Dataset-2 with only 39.34% accuracy. In contrast, Model-B, trained with the proposed preprocessing method, reached 99.01% accuracy on Dataset-1 and significantly improved accuracy of 78.02% on Dataset-2. This demonstrates that preprocessing greatly contributes to cross-domain generalization, making the model robust against environmental variations such as lighting and camera quality. This outcome reinforces the importance of preprocessing not merely as a supportive step but as a critical component that can reshape the learning behavior of a deep neural network. The improvement observed in cross-dataset testing confirms that preprocessing can reduce domain-specific dependencies that hinder model generalization.The key contribution of this study lies in the introduction of a channel-based preprocessing strategy that enhances the input representation of facial images. By isolating structural information through grayscale conversion, noise reduction through Gaussian blur, and shape boundaries via Canny edge detection, the model learns more effectively. This approach enables compact models like MobileNetV2 to deliver high accuracy even with limited computational resources, making it feasible for real-world deployment in vehicles and mobile devices. Furthermore, the study underscores the importance of generalization in deep learning models for safety-critical systems. Many existing works report high accuracy using randomly partitioned subsets of the same dataset. However, they often fail to evaluate performance on entirely different data distributions. This thesis explicitly addresses domain shift by validating the model on independently collected images, highlighting the practical applicability of the system. In addition, the positive results obtained from cross-dataset evaluations validate the role of preprocessing in enhancing generalization, especially under domain shift conditions. From an implementation perspective, the model demonstrates strong computational efficiency. Training was performed using optimized parameters such as a learning rate of 1e-6, batch size of 16, and the Adam optimizer. Early stopping and learning rate scheduling were employed to ensure convergence without overfitting. Performance metrics such as confusion matrices, loss curves, and accuracy plots validate the efficacy of the proposed approach. Despite its contributions, the study also acknowledges limitations. Dataset-2 is relatively small, and results may be affected by limited demographic diversity or environmental variation. Preprocessing parameters were manually tuned and may not adapt well to all conditions. These challenges open opportunities for future work, such as developing adaptive preprocessing layers or training with larger, more diverse datasets. Chapter 6 summarizes the findings and outlines the study's limitations and future directions. The results highlight that although both models performed well on training data, only the model incorporating the proposed preprocessing strategy (Model-B) maintained high accuracy on unseen data, indicating robust generalization. However, limitations such as a small real-world test set (Dataset-2), limited demographic diversity, and fixed preprocessing parameters are acknowledged. To overcome these issues, future work should aim to collect more diverse datasets, incorporate adaptive preprocessing techniques, and deploy the system in real-time applications such as embedded in-vehicle monitoring. These steps would ensure better domain adaptability, higher reliability, and a pathway toward commercial and safety-critical use. To ensure future scalability, it is also suggested that further testing be carried out in live vehicle environments using continuous video streams and temporal modeling to capture transitions in driver state. This would bring the system one step closer to being deployable in modern smart transportation ecosystems.In conclusion, the proposed system combines a carefully designed preprocessing pipeline with a lightweight deep learning architecture to offer an effective and generalizable solution for driver drowsiness detection. Its low hardware requirements and high accuracy make it particularly suitable for integration into real-time driver monitoring systems, smart vehicle platforms, and mobile applications.

Benzer Tezler

Tez No
581887
Deep learning based three dimensional face expression recognition using geometry images from three dimensional face models
Üç boyutlu yüz modellerinden elde edilen geometri görüntüleri kullanılan derin öğrenme tabanlı üç boyutlu yüz ifadelerini tanıma
NEŞE GÜNEŞ
Yüksek Lisans
İngilizce
2019
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. ULUĞ BAYAZIT
Tez No
737771
Superpixel assisted deep neural network for breast tumor segmentation in ultrasound images
Süperpiksel destekli derin sinir ağı ile meme ultrason görüntülerinde tümör segmentasyonu
NEFİSE UYSAL
Yüksek Lisans
İngilizce
2022
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
Elektronik ve Haberleşme Mühendisliği Ana Bilim Dalı
PROF. DR. ENDER METE EKŞİOĞLU
ÖĞR. GÖR. MURAT GEZER
Tez No
784550
Derin öğrenme ağ yapılarının uyku evreleme problemlerine uygulanması
Application of deep learning architectures on sleep staging problems
ENES EFE
Doktora
Türkçe
2022
Elektrik ve Elektronik Mühendisliği Konya Teknik Üniversitesi
Elektrik-Elektronik Mühendisliği Ana Bilim Dalı
PROF. DR. SERAL ÖZŞEN
Tez No
752741
Çevrimiçi reklamlarda kullanıcının satın alma sürecinde kanalların etkisinin derin öğrenme yöntemleri ile tespiti
Detection of effect of channels on user's purchasing process in online advertisements by deep learning methods
OĞUZ KAHRAMAN
Yüksek Lisans
Türkçe
2022
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Yıldız Teknik Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. MİNE ELİF KARSLIGİL YAVUZ
Tez No
965630
Applications of artificial intelligence for the security of networks
Ağ güvenliği için yapay zeka uygulamalari
SELEN GEÇGEL ÇETİN
Doktora
İngilizce
2025
Elektrik ve Elektronik Mühendisliği İstanbul Teknik Üniversitesi
Elektronik ve Haberleşme Mühendisliği Ana Bilim Dalı
PROF. DR. GÜNEŞ ZEYNEP KARABULUT KURT

Geri Dön