Çok modlu derin öğrenme yöntemlerinin ses patolojilerinin tespiti için geliştirilmesi

Development of multi-modal deep learning methods for voice pathology detection

PDF İndir

Tez No: 801611
Yazar: ASLI NUR ÖMEROĞLU
Danışmanlar: DOÇ. DR. EMİN ARGUN ORAL
Tez Türü: Doktora
Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Elektrik ve Elektronik Mühendisliği, Computer Engineering and Computer Science and Control, Electrical and Electronics Engineering
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2023
Dil: Türkçe
Üniversite: Atatürk Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Elektrik-Elektronik Mühendisliği Ana Bilim Dalı
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 118

Özet

Amaç: Non-invaziv bir teknik olan otomatik ses patolojisi tespiti erken teşhis ve tıbbi müdahalede önemli bir rol oynamaktadır. Otomatik algılama sistemleri, tıbbi teşhisin güvenilirliğini artırma potansiyeline sahiptir. Ayrıca ses patolojilerinin erken evrelerinde objektif değerlendirme ve teşhis sağlayarak hekimlere etkili bir şekilde yardımcı olabilmektedir. Otomatik ses patolojisi tespiti için geliştirilen Yapay Zeka tabanlı teşhis sistemlerinin çoğu farklı konuşma modalitelerini dikkate almadan patoloji tespitine odaklanmıştır. Farklı modalitelerden yararlanmak problemler hakkında daha tamamlayıcı ve yüksek temsilli öznitelikler sağlayarak sistem performansının artmasına yardımcı olmaktadır. Bu tez çalışmasında ses patolojilerinin tespitinde farklı konuşma modaliteleri ve öznitelikleri arasındaki tamamlayıcı bilgileri incelemek ve en iyi sınıflandırma performansını sağlayan otomatik algılama sistemini bulmak amaçlanmıştır. Bu nedenle tez çalışmasında farklı konuşma modalitelerinin ses patolojileri tespitindeki etkisini dikkate alan ve bu modalitelerin farklı stratejilerle füzyonuna dayalı yöntemler geliştirilmiştir. Yöntem: Ses patolojilerinin otomatik tespiti için çok modlu verilerin klasik ve hibrit füzyon yaklaşımına dayalı olarak iki farklı yöntem geliştirilmiştir. Bu yöntemlerde patolojilerin tespiti için konuşma ve elektroglottograf (EGG) sinyalleri kullanılmıştır. Çok modlu verilerin füzyonu için geliştirilen ilk yöntem çok dallı klasik öznitelik çıkarımı ile derin ağlardan elde edilen özniteliklerin füzyonuna dayalı bir yaklaşım sunmaktadır. İkinci yöntem ise yalnızca derin özniteliklerin füzyonuna dayalı çok katmanlı bir hibrit füzyon yaklaşımı kullanmaktadır. Bulgular: Geliştirilen mimarilerde çok modlu verilerin önerilen her iki yöntem ile birleştirilmesinin farklı modalitelerin temsil kabiliyetlerini geliştirerek patoloji tespit performansını arttırdığı belirlenmiştir. Ayrıca ikinci yöntem ile önerilen çok katmanlı hibrit füzyon yaklaşımının çok modlu verilerin füzyonunda hem ilk yönteme hem de tek katmanlı yaklaşımlara göre daha etkili olduğu belirlenmiştir. Sonuç: Ses patolojilerinin tespiti için geliştirilen çok katmanlı hibrit füzyon tabanlı çok modlu mimari patoloji tespit doğruluğunu arttırmıştır. Geliştirilen çok modlu hibrit füzyon yöntemi yalnızca ses patolojilerinin tespiti için değil aynı zamanda daha fazla modalite ile modellenebilen çeşitli uygulamalarda kullanılabilmektedir.

Özet (Çeviri)

Purpose: Automatic voice pathology detection which is a non-invasive technique, plays an important role in early diagnosis and medical intervention. Automated detection systems have the potential to increase the reliability of medical diagnosis. It can also help physicians effectively by providing objective evaluation and diagnosis in the early stages of voice pathologies. Most of the AI-based diagnostic systems developed for automatic voice pathology detection have focused on pathology detection without considering different speech modalities. Utilizing different modalities helps to increase system performance by providing more complementary and highly representative features about the problems. In this thesis, it is aimed to examine the complementary information between different speech modalities and features in the detection of voice pathologies and to find the automatic detection system that provides the best classification performance. For this reason, in the thesis study, methods that take into account the effect of different speech modalities in the detection of voice pathologies and based on the fusion of these modalities with different strategies have been developed. Method: Two different methods have been developed for the automatic detection of voice pathologies based on the classical and hybrid fusion approach of multi-modal data. In these methods, speech and electroglottography (EGG) signals were used for the detection of pathologies. While the first multibranch method for fusion of multi-modal data offers an approach based on classical feature extraction and fusion of features obtained from deep networks, the second method offers a multi-layer hybrid fusion approach based on fusion of deep features only. Findings: It has been determined that combining multi-modal data with both approaches in the developed architectures increases the performance of pathology detection by improving the representation capabilities of different modalities. In addition, it has been determined that the multi-layer hybrid fusion approach proposed in the second approach is more effective in the fusion of multi-modal data than both the first approach and the single layer approaches. Results: Multi-layer hybrid fusion based multi-modal architecture developed for the detection of voice pathologies has increased the accuracy of pathology detection. The developed multi-modal hybrid fusion method can be used not only for the detection of voice pathologies, but also in various applications that can be modeled with more modalities.

Benzer Tezler

Tez No
674883
Derin öğrenme tabanlı çok modlu duygu analizi yöntemlerinin geliştirilmesi
Development of deep learning based multimodal sentiment analysis methods
MEHMET UMUT SALUR
Doktora
Türkçe
2021
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Fırat Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. İLHAN AYDIN
Tez No
859205
Empowering multimodal multimedia information retrieval through semantic deep learning
Semantik derin öğrenme yoluyla multimodal multimedya bilgi erişimini güçlendirme
SAEID SATTARI
Doktora
İngilizce
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Orta Doğu Teknik Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. MEHMET HALİT SEYFULLAH OĞUZTÜZÜN
PROF. DR. ADNAN YAZICI
Tez No
848329
Novel fractional order calculus-based audio processing methods and their applications on neural networks for classification and synthesis problems
Kesirli mertebeden kalkülüs temelli yeni ses işleme yöntemleri ve bunların sinir ağları üzerinde sınıflandırma ve sentez problemlerine uygulanması
BİLGİ GÖRKEM YAZGAÇ
Doktora
İngilizce
2023
Elektrik ve Elektronik Mühendisliği İstanbul Teknik Üniversitesi
Elektronik ve Haberleşme Mühendisliği Ana Bilim Dalı
DOÇ. DR. MÜRVET KIRCI
Tez No
950495
Büyük dil modelleri kullanan derin öğrenme tabanlı dinamik çok modlu veri özetleme yaklaşımları
Deep learning based multi modal data summarization approaches using large language models
TURAN GÖKTUĞ ALTUNDOĞAN
Doktora
Türkçe
2025
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Fırat Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. MEHMET KARAKÖSE
Tez No
957056
Öz denetimli öğrenme yaklaşımları ile derin sahte ses ve görüntü maniplasyonunun tespiti
Detection of deepfake audio and image manuplation with self-supervised learning approach
MERVE YILDIRIM
Yüksek Lisans
Türkçe
2025
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Fırat Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. İLHAN AYDIN

Geri Dön