Music retrieval systems: Robust performance under the effect of uncertainty

Başlık çevirisi mevcut değil.

PDF İndir

Tez No: 602021
Yazar: ERDEM ÜNAL
Danışmanlar: DR. SHRIKANTH NARAYANAN
Tez Türü: Doktora
Konular: Elektrik ve Elektronik Mühendisliği, Electrical and Electronics Engineering
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2008
Dil: İngilizce
Üniversite: University of Southern California
Enstitü: Yurtdışı Enstitü
Ana Bilim Dalı: Belirtilmemiş.
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 117

Özet

Özet yok.

Özet (Çeviri)

Music Information Retrieval (MIR) is gaining widespread attention and becoming increasingly important. The growing capacity of web servers parallels the explosion of information generated worldwide. The need for efficient and natural access to these databases cannot be overstated. Digital music and its associated information are prime examples of such complex information that can be stored in a variety of formats, such as MP3, MIDI, wav, scores etc. These data can also be accessed in multiple ways. If the user is familiar with the name of the song or the band, and the source material is annotated with metadata, retrieval can be straightforward. However, if one does not know the lyrics, title, or the performer, alternative retrieval methods are necessary, such as through singing, humming, or playing a sample of the piece as a the query to the database. Enabling such kinds of natural human interactions with large databases has thus become an essential component of effective and flexible MIR systems. In this thesis, two general domains for MIR systems are under discussion: a) Retrieval in Monophonic Music, and b) Retrieval in Polyphonic Music. For both domains, this thesis investigates the different sources and effects of uncertainty that is present in the input level, and system level, and present algorithms for solving the robust retrieval problem by combining music knowledge, signal processing techniques and statistical analysis. First, we discuss Query by Humming, a specific instance of music retrieval systems in monophonic music domain, where only one sound source is available at a time. Here, straight forward signal analysis via prosodic features such as pitch and energy can be used to achieve accurate transcription from audio to symbol domain, however, the variability in the way people produce humming is not easy to handle with such straight forward algorithms. Since the transcription provided by the system front end will be used in the query engine, the robustness against user dependent variability is important. The performance of the transcription directly affects the performance that of the retrieval engine. Our approach for achieving robust performance under the effect of uncertatinty is statistical. We first discuss our experiments for collection of real world humming data. The data is important in designing statistical systems. The goal is to achieve a collection of data that represents the general variability that is expected in the input of QBH systems. The data is also used in estimating important parameters of the front end of the system for segmentation, and also it is used in testing our QBH system's retrieval performance. We analyzed the humming performance of different users against different criteria such as, the effects of musical background, musical structure of the target melody, and familiarity. We also tried to observe performance differences over humming different interval levels such as high intervals vs low intervals, perfect intervals vs augmented intervals etc... The final goal is to use the acquired statistical information as a guidence through our retrieval calculations. An Hidden Markov Models (HMM) based speech recognition system is used in the front end of our QBH system. The goal is to segment humming syllables that represent musical notes in the input audio. Accurate segmentation leads accurate representation of the audio in the symbol domain. We use relative information of change in pitch and duration for consecutive notes to ensure key and rhythm independent representation. From the two dimensional transcription of pitch contour and duration ratios, we extract fixed length characteristic finger prints(FP) from the audio at rare pitch movements and duration movements where highest and lowest change in the input is performed. The main assumption we use is a subsequence of pitch contour and duration ratios would be enough for representing a melody. These subsequences, finger prints are mapped onto the database entries and compared to see if any similarities can be found. Statistical measures are used to define and calculate the similarity distance from the extracted finger prints and the database entries to achieve robust performance. We also extended the MIR problem to a next level, which is retrieval in polyphonic domain. Now the retreival task is performing matching between audio files, that has unlimited sound sources, and they might be in different forms with respect to expressive parameters and orchestration. In polyphonic music, since the number of instruments playing at a time and their identity is unknown, mapping the audio signal into a true note transcription is a hard task to achieve. Researchers used different machine learning techniques and they were only able to report around 55% note detection accuracy in monotimbral domain. Here a mid level representation whose performance is not affected by different spectral characteristics of the different instruments should be defined. We used a representation technique that maps small audio frames into a symbolic representation that tracks general tonal movement, behaviour and characteristics of the polyphonic audio. The selected representation is a string sequence of lexical chords for each major and minor chord of twelve distinct sounds in a full octave. The representation is achieved by mapping the audio spectrum of the individual frames onto the Spiral Array, a 3d space for tonality, that has specific tonal marks at specific coordinates. The Spiral Array is updated with respect to the tonal labeling task for faster transcription. From the mapping, a decision is made for identifying which tonal cluster the audio frame belongs to, and appropriate labeling is perfromed. The transcription process is continuously labeling consequtive audio frames with the most appropriate (closest to the tonal center). For modeling, we used sequential statistical models, which are n-grams. An n-gram is a sub-sequence of n items from a given sequence. The n-gram sequential modeling strategy can be applied to the tonal sequences that are transcribed from polyphonic audio for statistically representing tonal movements. This sequential representation technique is similiar to the ones that is used in genetic analysis, instead of protein names, we have chord names in our sequential code. N-grams are extracted from tonal string sequences to create a statistical model for each of the polyphonic melody that is in our melody database. After appropriate smoothing, which is needed to compansate for different audio lengths, the smoothed n-grams will ne accumulated in the melody database. For retrieval, a symbolic sequence is compared to each of the smoothed ngrams in the database using perplexity based scoring. Perplexity calculates the cross enthrophy between a query sequence and the smoothed n-grams in the database. The sequential models which are close to the query sequence will be less surprised from the subsequences generated by the query, so their perplexity score will be less, which will be used as the similarity metric in our retrieval calculations.

Benzer Tezler

Tez No
599242
Audio fingerprinting using wavelet transform
Dalgacık dönüşümleri ile ses parmak izi kontrolü
EVREN KANALICI
Yüksek Lisans
İngilizce
2019
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Yıldız Teknik Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. GÖKHAN BİLGİN
Tez No
172181
Dayanıklı ses hashleme ile içerik tanılama
Robust audio hashing for content identification
OZAN GÜRSOY
Yüksek Lisans
Türkçe
2006
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF.DR. BİLGE GÜNSEL
Tez No
398858
Müzik üst-veri tahmini için türkçe şarkı sözü madenciliği
Turkish lyrics mining for music meta-data estimation
BAŞAR KIRMACI
Yüksek Lisans
Türkçe
2015
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Başkent Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. HASAN OĞUL
Tez No
180414
İçerik tabanlı sorgu ve tarama için yapısal ve anlamsal ses içerik analizi
Structural and semantic analysis of audio content for content-based querying and browsing
MUSTAFA SERT
Doktora
Türkçe
2006
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Gazi Üniversitesi
Elektronik-Bilgisayar Eğitimi Ana Bilim Dalı
PROF.DR. BUYURMAN BAYKAL
Tez No
684648
Türkçe zamansal ifadelerin etiketlenmesi ve normalleştirilmesi
Başlık çevirisi yok
AYŞENUR GENÇ
Yüksek Lisans
Türkçe
2021
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. AHMET CÜNEYD TANTUĞ

Geri Dön