Konuşma sinyali ve ses telleri görüntülerinden derin öğrenme tabanlı glotal alan kestirimi

Deep learning based estimation of glottal area from speech and vocal folds images

PDF İndir

Tez No: 800056
Yazar: YAŞAR SAİD DERDİMAN
Danışmanlar: DR. ÖĞR. ÜYESİ TURGAY KOÇ
Tez Türü: Yüksek Lisans
Konular: Elektrik ve Elektronik Mühendisliği, Electrical and Electronics Engineering
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2023
Dil: Türkçe
Üniversite: Süleyman Demirel Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Elektrik-Elektronik Mühendisliği Ana Bilim Dalı
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 64

Özet

Bu tez çalışmasında glottis tespiti yapılması için U-Net tabanlı bir model önerilmiştir. Önerilen model klasik modeller ile karşılaştırılarak modelin performansının geleneksel yöntemlerin performansı ile karşılaştırması yapılmıştır. Modellerin karşılaştırmasında glottal alan büyüklüğünün performansa etkisine de bakılmıştır. U-Net, histogram, bölge büyütme ve aktif kontur olarak üç farklı klasik model ile test verisi üzerinde test edilmiştir. Glottal alanın Çok küçük açıklıklarının da bulunduğu glottal alanın sıfırdan büyük olduğu durumda hassasiyet ölçütü yönünden en yüksek başarım 0.867 ile U-Net modelinde elde edilirken, Aktif Kontur modeli kötü bir sonuç elde ederek 0.389'da kalmıştır. Geri çağırma ölçütü açısından en yüksek değeri 0.964 ile histogram elde etmişken en küçük değeri 0.684 ile bölge büyütme elde etmiştir. Doğruluk ölçütü performans yönüyle ele alındığında U-Net 0.997 ile en yüksek başarıma sahiptir. En düşük başarım ise 0.717 ile aktif kontura aittir. Glottal alanın 100 ve 200'den büyük olduğu durumlar için yapılan karşılaştırmalarda da U-Net modeli başarımını sürdürmüştür. Diğer modellerin de başarımının arttığı gözlenmiştir. Özellikle aktif kontur modelinin başarımında hassasiyet ölçütü açısından %59,5 oranında artış gözlenmiştir. Aynı zamanda görüntü işlemede kullanımı yaygın olan dice skoru ölçütü ile modellerin başarımları incelenmiştir. Bu ölçütün doğru yorumlaması için kutu diyagramları üzerinden yorumlanması daha uygundur. Önceki ölçütlerde olduğu gibi Glottal alan büyüklüğüne bağlı olarak modellere üç farklı veri seti üzerinde tahmin yaptırılmıştır. Kutu diyagramları üzerinde dice skorlarının medyan değerleri klasik modeller için histogram 0.63-0.70-0.72, aktif kontur 0.60-0.69-0.75 ve bölge büyütme 0.68- 0.72-0.74 şeklinde iken U-Net modeli ise 0.69-0.80-0.83 sonuçlarını elde etmiştir. Buna göre tüm modeller de glottal alan büyüklüğüne göre tahmin sonuçlarında farklılık olduğu söylenebilir. Glottal alanın her durumunda en iyi başarımı gösteren yine modern derin öğrenme yöntemi olan U-Net modeli olmuştur. Ayrıca konuşma verisinden glottal alan tahmini yapabilen 3 farklı derin öğrenme modeli geliştirilmiştir. Modeller evrişim katmanları içeren oto kodlayıcılardır. Keras, gürültülü ve gürültüsüz olmak üzere 3 farklı model ile eğitim, doğrulama ve test işlemleri yapılmıştır. Eğitim aşamasında evrişim katmanlarının farklı etkilerini gözlemlemek için çekirdek büyüklüğü, filtre büyüklüğü ve katman sayısı yönünden farklı parametre değerleri için sırasıyla 180, 100 ve 100 adet farklı eğitim yapılmıştır. Eğitilen modellerden doğrulama seti üzerinde en iyi başarımı gösteren modeller seçilerek test verisi üzerindeki performansları karşılaştırılmıştır. Modellerin performansları incelenirken ortalama karesel hata performans ölçütü olarak kullanılmıştır. Test setleri üzerindeki ortalama karesel hata başarımları keras, gürültüsüz ve gürültülü modeller için sırasıyla 0.000196, 0.0019063, 0.002085 şeklinde sonuçlar alınmıştır.

Özet (Çeviri)

In this thesis, a U-Net based model is proposed for glottis detection. The proposed model was compared with the classical models and the performance of the model was compared with the performance of the traditional methods. In the comparison of the models, the effect of the glottal area size on the performance was also examined. U-Net has been tested on test data with three different classical models as histogram, region enlargement and active contour. In the case where the glottal area, including very small openings, is greater than zero, the highest performance in terms of sensitivity was obtained in the U-Net model with 0.867, while the Active Contour model achieved a poor result and remained at 0.389. In terms of recall criteria, the histogram obtained the highest value with 0.964, while the smallest value obtained 0.684 region enlargement. When the accuracy criterion is considered in terms of performance, U-Net has the highest performance with 0.997. The lowest performance belongs to the active contour with 0.717. The U-Net model continued to perform well in the comparisons made for cases where the glottal area is greater than 100 and 200. It was observed that the performance of other models increased as well. Especially in the performance of the active contour model, an increase of 59.5% was observed in terms of precision. At the same time, the performance of the models was examined with the dice score criterion, which is widely used in image processing. For the correct interpretation of this criterion, it is more appropriate to interpret it through box diagrams. As in the previous criteria, the models were predicted on three different data sets depending on the Glottal area size. The median values of the dice scores on the box diagrams were 0.63-0.70-0.72 in the histogram, 0.60-0.69-0.75 in the active contour and 0.68-0.72-0.74 in the region enlargement for the classical models, while the U-Net model obtained 0.69-0.80-0.83. Accordingly, it can be said that there is a difference in the estimation results according to the size of the glottal area in all models. The U-Net model, which is also a modern deep learning method, has shown the best performance in all cases of glottal area. In addition, 3 different deep learning models have been developed that can make glottal area estimation from speech data. Models are autoencoders with convolution layers. Training, verification and testing processes were carried out with 3 different models: Keras, noisy and noiseless. In order to observe the different effects of convolution layers during the training phase, 180, 100 and 100 different trainings were conducted for different parameter values in terms of kernel size, filter size and number of layers, respectively. The models that show the best performance on the validation set from the trained models were selected and their performances on the test data were compared. While examining the performances of the models, the mean square error was used as a performance measure. The mean square error performances on the test sets were keras, 0.000196, 0.0019063, 0.002085 for noiseless and noisy models, respectively.

Benzer Tezler

Tez No
142597
Yeni Cami'nin akustik açıdan performans değerlendirmesi
Evaluation of the acoustical performance of the New Mosque
EVREN YILDIRIM
Yüksek Lisans
Türkçe
2003
Mimarlık İstanbul Teknik Üniversitesi
Mimarlık Ana Bilim Dalı
PROF. DR. SEVTAP YILMAZ DEMİRKALE
Tez No
799942
Sahte konuşma sinyali tespit sistemi geliştirilmesi
Spoof speech detection system development
BURAK KASAPOĞLU
Yüksek Lisans
Türkçe
2023
Elektrik ve Elektronik Mühendisliği Süleyman Demirel Üniversitesi
Elektrik-Elektronik Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ TURGAY KOÇ
Tez No
196848
Localization of multiple sound sources in three dimensional environments
Üç boyutlu ortamlarda bulunan çok sayıdaki ses kaynağının yerlerinin tespiti
MURAT ENGİN ÜNAL
Yüksek Lisans
İngilizce
2006
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Boğaziçi Üniversitesi
Sistem ve Kontrol Mühendisliği Ana Bilim Dalı
PROF. DR. FİKRET GÜRGEN
Tez No
484791
Koklear implant kullanıcılarında gürültü algoritmalarının etkilerinin araştırılması
Searching the effects of noise algorithms on cochlear implant users
GÜNNUR İSPİR
Yüksek Lisans
Türkçe
2017
Kulak Burun ve Boğaz İstanbul Üniversitesi
Odyoloji ve Konuşma Bozuklukları Ana Bilim Dalı
DOÇ. DR. HAYDAR MURAT YENER
Tez No
58400
Konuşma sinyallerinin zaman ölçeği ve ses tonunun dalgacık dönüşümüne dayalı olarak çok çözünürlüklü değiştirilmesi
Başlık çevirisi yok
OSMAN EROĞUL
Doktora
Türkçe
1997
Elektrik ve Elektronik Mühendisliği Ankara Üniversitesi
Elektrik-Elektronik Mühendisliği Ana Bilim Dalı
PROF. DR. ÖNDER TÜZÜNALP

Geri Dön