Konuşma işaretlerinin analiz ve sentezi
Analysis and synthesis of speech signals
- Tez No: 22036
- Danışmanlar: PROF. DR. EŞREF ADALI
- Tez Türü: Yüksek Lisans
- Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
- Anahtar Kelimeler: Belirtilmemiş.
- Yıl: 1992
- Dil: Türkçe
- Üniversite: İstanbul Teknik Üniversitesi
- Enstitü: Fen Bilimleri Enstitüsü
- Ana Bilim Dalı: Belirtilmemiş.
- Bilim Dalı: Belirtilmemiş.
- Sayfa Sayısı: 91
Özet
ÖZET Bu çalışmada yapay konuşma üretim tekniği esaslarına dayanılarak konuşma işaretlerinin yapay olarak üretilmesine çalışılmıştır. Bu amaçla Doğrusal Öngörü Analiz Tekniği kullanılmıştır. İnsan sesi vücut içinde değişik ses üretme organlarından geçip ağız ve dudaklara kadar varmaktadır. Konuşmadaki değişik seslerin özellikleri de değişiktir. Bazı sesler periyodik dürtülerden oluşmuşlardır, bazıları da beyaz gürültü şeklinde bir titreşime sahiptirler. İnsan vücudu içinde sesin geçtiği bölgenin ses üzerinde doğrusal bir öngörü filtresi gibi etki yaptığı varsayılmaktadır. Esas amaç değişik seslere karşı böyle bir filtrenin katsayılarının bulunmasıdır. Bu katsayılar bulunduktan sonra periyodik dürtü yada beyaz gürültü şeklindeki işaretlere karşı gelen seslerin elde edilmesi ile konuşmanın gerçekleştirilmesi mümkün olur. Bu çalışmada, Bölüm l' deki kısa girişten sonra Bölüm 2 de Doğrusal öngörü Analizi'nden bahsedilmiş. Bölüm 3'te doğrusal öngörü filtresinin katsayılarının nasıl elde edilebileceği gösterilmiştir. Bölüm 4te konuşma işaretinin özelliklerinin saptanması, sesli -sessizlerle sessiz-seslerin ayırt edilebilmesi için kullanılan bazı yöntemler açıklanmıştır. Bölüm 5'te ise daha önceki bölümlerde elde edilen bilgiler ışığında konuşma işaretlerinin yapay olarak üretilmesine çalışılmıştır.
Özet (Çeviri)
SUMMARY ANALYSIS AND SYNTHESIS OF SPEECH SIGNALS One of the most, powerful speech analysis technique is the method of linear predictive analysis. Linear prediction has been widely used to describe a new approach to speech analysis and synthesis. In this study, based on the basic principles of the Linear Prediction Analysis, it is tried to synthesis the speech signals. It is supposed that the human vocal tract acts as a linear predictive filter whose steady-state system function is of the form HCzD = p -k 1 + £ a z k k=i Therefore the main problem is to solve or to find the predictor coefficients Ca ) of the system. The basic idea behind the linear predictive analysis is that a speech sample can be approximated as a linear combination of past speech samples. The speech samples y n are related to the excitation x by the simple equation y ^ £ a, y, + G x n. *“* k rv-k r k=l A linear predictor with prediction coefficients Ca ) is defined as a system whose output is K ”E ak yn-k k = i VTThe basic problem of linear prediction analysis is to determine a set of predictor coefficient Ca ) directly from the speech signal in such a manner as to obtain a good estimate of the speech. Because of the time-varying nature of the speech signal the predictor coefficients must be estimated from short segment of the speech signal. By minimizing the sum of the squared differences Cover a finite interval D between the actual speech samples, y, and the linearly predicted once, y, a unique set of predictor coefficients can be determined. This approach leads to a set of linear equations that can be efficiently solved to obtain the predictor parameters. The speech signal can be modeled as the output of a time varying linear system exited by either random noise Cfor unvoiced speech} or a quasi -per iodic sequence of impulses Cfor voiced speech}. The parameters of this model are voiced/unvoiced classification, pitch period for voiced speech, gain and the coefficients of the digital filter. This study is composed of five sections. Some sections are devoted to a discussion of how a variety of speech parameters can be reliably estimated using linear prediction methods. As applied to speech processing, the term linear prediction refers to a variety of essentially equivalent formulations of the problem of modeling the speech waveform. The differences among these formulations concern the details of the computations used to obtain the predictor coefficients. In Section 2, the use of two formulations such as autocorrelation and covariance methods are discussed. These two formulations help to solve the prediction coefficients of the speech model. In order to effectively implement a linear analysis system, it is necessary to solve the linear equations in an efficient manner. A variety of techniques can be applied to solve a system of p linear equations in p unknowns. Because of the special properties of the coefficient matrices it is possible to solve the equations much more efficiently than is possible in general. VIIIn Section 3, two methods for obtaining the predictor coefficients are discussed. These two methods are:“Cholesky decomposition solution for the covariance method”and“Levinson-Durbin's recursive solution for the autocorrelation method. In Section 4, the different methods for the estimation of the pitch period and voiced/unvoiced classification are discussed. The amplitude of the speech signal varies appreciably with time. In particular, the amplitude of unvoiced segments is generally much lower then the amplitude of voiced segments. The short-time energy of the speech signal provides a convenient representation that reflects these amplitude variations. In general the short time energy can defined as n E = £ x2Cnû n «>=n-N+l One difficulty with the short time energy function is that it is very sensitive to large signal levels. A simple way of alleviating this problem is to define an average magnitude function n Mn = £ IxCnO | m=n-N+i where the sum of absolute values of the signal is computed instead of the sum of squares. Here N is the window length The major significance of E or M is that it provides a basis for distinguishing voiced speech segments from unvoiced speech segments. The energy function can also be used to locate approximately the time at which voiced speech becomes invoiced, and vice versa, and the energy can be used to distinguish speech from silence. In Section S, it is tried to synthesis the speech signals. For this rai son a computer program is written. A ”Sound Blaster“ voice card is used. The computer program digitizes the speech by the help of the utility programs of this card, and after processing these speech samples, the synthetic speech is obtained, again by using same utilities this synthetic speech samples are converted into analog speech signal. VTIIThe speech is analyzed in a short, lime interval of 20ms. To find the predictor coefficients the autocorrelation method is applied and autocorrelation values are computed. Then applying the Levi nson- Dur bi n * s recursive solution technique the autocorrelation equations are solved and predictor coefficients are obtained for each window. For the pitch period estimation the modified autocorrelation analysis algorithm is used CMar,19902>. In this algorithm the autocorrelation sequence of the predicted input signal, R CkZ>, which can be expressed in terms of the autocorrelation sequence of the actual input and the autocorrelation sequence of the prediction coefficients, a.. P R CJO =£R C j5 R Ck-jD e. a x The autocorrelation function for a is defined as R CJ3 = £ a. a.. v=o The pitch is detected by finding the peak of the normalized autocorrelation sequence R Ck2>/R COD in the time interval that corresponds to XLS to %73 of the selected window. If the value of this peak at least O. 2S, the window is considered voiced with a pitch equal to the value of n at the peak divided by the sampling frequency. If the peak value is less than 0. 25, the frame is considered unvoiced an the pitch is zero. After estimation of the filter coefficients, pitch period, and voiced/unvoiced classification the following relation is used P E *n ”£** *n-k + GXn and the synthesis of the speech is realized. [a, 3 are the coefficients of the digital filter, k [x ] are the unit impulses with estimated pitch IXperiod when the speech is voiced and random white noise when the speech is unvoiced, [y 3 are the output of the prediction filter, p is the number of the coefficients, G is the gain parameter and can be obtained from the following equation G2 - RCOD - £ a, RCk2> k=l RCkZ) is the autocorrelation sequence of the input si gnal. Many synthetic speech signals are produced. After the computation of each synthetic signal the waveform is plotted and compared with the real one. It is remarked that two waveforms are quite identical and have similar sounding. The computer program written in Microsoft Quick Basic Ver: 7.1 for these computations are given in Appendix A.
Benzer Tezler
- Türkçe fonemler için en uygun ana dalgacık fonksiyonunun araştırılması
The investigation of optimum mother wavelet function for turkish phonemes
ÖZKAN ARSLAN
Yüksek Lisans
Türkçe
2014
Elektrik ve Elektronik MühendisliğiEge ÜniversitesiElektrik-Elektronik Mühendisliği Ana Bilim Dalı
YRD. DOÇ. DR. ERKAN ZEKİ ENGİN
- Düşük bir hızlarında konuşma kodlama ve uygulamaları
Low bit rate speech coding and applications
TARIK AŞKIN
Doktora
Türkçe
1999
Elektrik ve Elektronik Mühendisliğiİstanbul Teknik ÜniversitesiPROF.DR. GÜNSEL DURUSOY
- Doğrusal öngörü ile konuşma işareti kodlayıcısı tasarımı
Design of a linear predictive speech coder
YILMAZ KIRÇİÇEK
Yüksek Lisans
Türkçe
2007
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolYıldız Teknik ÜniversitesiHaberleşme Ana Bilim Dalı
PROF. DR. VEDAT TAVŞANOĞLU
- Das Leseverstehen allgemein und lim deutsch als Fremdsprache-unterricht
Başlık çevirisi yok
SERPİL BAL
- Enhancement of the coded speech using filtering
Filtreleme kullanarak kodlanmış sesin iyileştirilmesi
SALİH SİNAN TAYLAN
Yüksek Lisans
İngilizce
2017
Elektrik ve Elektronik MühendisliğiIşık ÜniversitesiElektrik-Elektronik Mühendisliği Ana Bilim Dalı
DOÇ. DR. ÜMİT GÜZ
DOÇ. DR. HAKAN GÜRKAN