İstatiksel modelleme ile konuşmacı tanıma

Speaker identification with statistics modeling

PDF İndir

Tez No: 202288
Yazar: ÖMER ESKİDERE
Danışmanlar: YRD. DOÇ. DR. FİGEN ERTAŞ
Tez Türü: Doktora
Konular: Elektrik ve Elektronik Mühendisliği, Electrical and Electronics Engineering
Anahtar Kelimeler: Speaker Identification, Gaussian Mixture Models, MFCC, Feature vectors, TIMIT/ NTIMIT databases
Yıl: 2007
Dil: Türkçe
Üniversite: Uludağ Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Elektronik Mühendisliği Ana Bilim Dalı
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 216

Özet

Kisilerin konusmalarından kim olduklarının belirlenebilmesi önemi giderek artan bir ilgi alanı haline gelmistir. Uzun yıllardır kullanılan parmak izi ve retina gibi kisiye has, kisinin kimligini tanımlayıcı biometrik özelliklere son yıllarda ses de eklenmistir. Konusma örneginden kisinin kimliginin belirlenebilmesinin günümüzde özellikle güvenlik, giris ve/veya erisim kontrolü, telefon bankacılıgı gibi önemli uygulama alanları mevcuttur. Bu tip gerçek zamanlı sistemlerde en büyük sorun seslerin kaydedildigi ortamın gürültülü olması ya da konusmaların iletildigi kanalların (özellikle telefon hattı) bozucu etkisidir. Dolayısıyla, son yıllarda amaç, sistem basarımını olumsuz etkileyen bu tip etkileri en aza indirmek ve/veya bu sartlarda çalısacak dayanıklı sistemler gelistirmektir. Bu tezde Gauss Karısım Modeli (GKM) temeline dayanan, telefon hattı etkilerine karsı dayanıklı, bir konusmacı tanıma sistemi olusturulmustur. Sistem egitim ve test olmak üzere iki asamalıdır. Kisinin sesinden kimligini en iyi temsil eden öznitelikler olarak da MFCC kullanılmıs ve model parametreleri beklentinin maksimumlastırılması algoritması ile kestirilmistir. Test asamasında aday konusmacıya ait öznitelikler, egitim asamasında olusturulan her bir konusmacı modele uygulanmakta ve maksimum olasılıgı veren model konusmacıyı belirlenmektedir. Konusmacı tanıma sistemi, temiz konusma (TIMIT) ve telefon konusması (NTIMIT) içeren iki veritabanı ile denenmistir. Her iki veritabanı için, egitim ve test asamalarında, konusmacı tanıma sistemine etkisi olan tüm parametreler incelenmis ve parametrelerin optimum degerleri belirlenmistir. Ayrıca formant frekansları, perde frekansı ve enerji gibi sesin bürünsel özellikleri tek basına ve MFCC öznitelikleri ile birlikte kullanılarak konusmacı tanıma performansı ölçülmüs, perde frekansının, telefon ortamında ortalama 8.34 puan tanıma artısı sagladıgı görülmüstür. Özniteliklerin olusturulmasında kepstrum katsayılarının kümelenerek agırlıklandırılması ve konusmacı frekans bandı parçalara ayrılıp, bu parçalara F-oranına baglı olarak süzgeçler yerlestirilmesi önerilmis olup, bu iki yöntem ile konusmacı tanıma oranında 10 puana varan artıs saglanmıstır. ANAHTAR KELMELER: Konusmacı tanıma, Gauss Karısım Modeli, MFCC, Öznitelik vektörleri, TIMIT/NTIMIT verita

Özet (Çeviri)

Identifying speakers from their voices has been an area of interest that received ever increasing attention. In recent years, voice has also been added to the individualspecific biometric features representing the identity of individuals such as commonly employed finger print and retina, and the identification of speakers from their voice samples has recently found place particularly in security, access control, and telephone banking applications. The problem in such real time systems is the noise and/or distortion induced by the environments where the speech samples are taken and the media (particularly telephone lines) though which the speech samples are transmitted, respectively. In recent years, efforts have been made to minimize the impact of such factors that severely damage the identification performance, or to develop systems that are robust to such disturbances. In this thesis, a speaker identification system based on Gaussian Mixture Model (GMM) has been developed that is robust to telephone line distortion, employing mel frequency cepstrum coefficients (MFCC) as speaker specific features, which are known to best represent speakers? identity, along with the Expectation Maximization algorithm for the estimation of speaker model parameters. The system consists of two stages, namely, training and testing. In the training session, a model is produced for each speaker to represent their identity, and the input speaker is identified in the test session by deciding on the model that provides the highest probability. The system has been tested on both clean speech (TIMIT) and telephone speech (NTIMIT) databases. From feature extraction to model training and testing, various parameters that affect the system performance have been investigated and optimized using both speech databases. Identification performance of the system has been determined for cases where prosodic features of speech such as formant frequency, pitch frequency, and energy are employed on their own and in combination with MFCC. It has been found that pitch frequency provides 8.34 point increase in identification performance on telephone speech when used in combination with MFCC. Weighted clustering of cepstral coefficients and adaptive filtering have been introduced in extracting discriminatory features. Up to 10 point increase in identification performance has been obtained by each technique.

Benzer Tezler

Tez No
197636
Sub-word language modeling for Turkish speech recognition
Türkçe ses tanıma için sözcük altı dil modelleme
OSMAN BÜYÜK
Yüksek Lisans
İngilizce
2005
Elektrik ve Elektronik Mühendisliği Sabancı Üniversitesi
Elektrik ve Elektronik Mühendisliği Ana Bilim Dalı
YRD. DOÇ. HAKAN ERDOĞAN
Tez No
377852
A stress testıng framework for the Turkısh bankıng sector: an augmented approach
Türk bankacılık sektörü için bir stres testi çerçevesi: Bir genişletilmiş yaklaşım
BAHADIR ÇAKMAK
Doktora
İngilizce
2014
Bankacılık Orta Doğu Teknik Üniversitesi
İktisat Ana Bilim Dalı
PROF. DR. NADİR ÖCAL
Tez No
828239
Hafif olefin üretimi için destekli demir temelli fıscher tropsch katalizörleri üzerinde bir kinetik çalışma ve model analizi
A kinetic study and model analysis on supported iron based fischer-tropsch catalysts for light olefin production
KEREM BÜLBÜL
Yüksek Lisans
Türkçe
2023
Kimya Mühendisliği İstanbul Teknik Üniversitesi
Kimya Mühendisliği Ana Bilim Dalı
DOÇ. DR. ALPER SARIOĞLAN
DR. ABDULLAH Z. TURAN
Tez No
301092
Unsupervised morphological analysis using tries
Ağaç yapısı kullanarak gözetimsiz biçimbirim analizi
KORAY AK
Yüksek Lisans
İngilizce
2011
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Işık Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
YRD. DOÇ. DR. OLCAY TANER YILDIZ
Tez No
889884
Derin öğrenme modelleri kullanarak ses işaretlerinden sahtecilik tespiti
Forgery detection from audio signals using deep learning models
FULYA AKDENİZ
Doktora
Türkçe
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Kocaeli Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. YAŞAR BECERİKLİ

Geri Dön