Audio-visual tracking of multiple moving speakers
Başlık çevirisi mevcut değil.
- Tez No: 402365
- Danışmanlar: DR. WENWU WANG
- Tez Türü: Doktora
- Konular: Elektrik ve Elektronik Mühendisliği, Electrical and Electronics Engineering
- Anahtar Kelimeler: Audio-visual speaker tracking, particle filter, adaptive particle filter, random finite set, PHD filter, SMC implementation, multi-speaker tracking
- Yıl: 2016
- Dil: İngilizce
- Üniversite: University of Surrey
- Enstitü: Yurtdışı Enstitü
- Ana Bilim Dalı: Belirtilmemiş.
- Bilim Dalı: Belirtilmemiş.
- Sayfa Sayısı: 199
Özet
Özet yok.
Özet (Çeviri)
The problem of detection and tracking of multiple moving speakers in indoor envi-ronments using audio-visual (AV) modalities has attracted an increasing amount of attention in the last decade due to its potential applications in e.g. automatic camera steering in video conferencing, individual speaker discrimination in multi-speaker envi-ronments, and surveillance and monitoring in security applications. Several challenges are associated with AV tracking including fusion of multiple modalities, estimation of the variable number of speakers and their states, and dealing with various conditions such as occlusion, limited view of cameras, illumination change and room reverbera-tion. This thesis aims to address part of these challenges under the Bayesian framework. This leads to three main contributions summarised as follows. First, a novel approach is proposed for combining audio and video modalities under the particle filter (PF) framework. Audio information such as the direction of arrival (DOA) angles of the audio sources is incorporated into the PF based visual tracking to reshape the typical Gaussian noise distribution of particles in the propagation step and to weight the ob-servation model in the measurement step. The proposed algorithm is further improved to provide adaptive estimation of two critical parameters of the PF: the number of par-ticles and noise variance. In regular implementation, these parameters are determined in the initialization step by rule of thumb and are kept fixed, which makes the tracker inconsistent in practice. With our approach, which is based on tracking error and the area occupied by the particles in the image, the number of particles and noise variance are estimated adaptively during the tracking process. Next, a more realistic and complex scenario is considered where the number of speakers varies with time. The random finite set (RFS) theory is employed here, due to its ability to deal with a variable number of targets. A particle filter algorithm under the RFS framework is devised for AV tracking. In the RFS approach, the computational cost becomes expensive when the number of speakers increases. To address this problem, the probability hypothesis density (PHD) filter, which is the first order approximation of the RFS, is used together with its sequential Monte Carlo (SMC) implementation. Unlike the single type of particles in generic particle filtering, in the SMC-PHD filter, three different types of particles, namely surviving, spawned and born particles are used, to model the state of the speakers and to jointly estimate the number of speakers with their states. In our proposed AV-SMC-PHD algorithm, audio data is used to determine when to propagate and re-allocate these particles based on their types. Finally, the AV-SMC-PHD algorithm is further improved in its estimation accuracy and computational efficiency. The mean-shift method is integrated in our AV tracking system to shift the particles to a local maximum of the distribution function which moves the estimated position closer to the ground truth position of the speaker. With the integration of the mean-shift method, the tracking error in the proposed AVMS-SMC-PHD algorithm is reduced. However, the computational cost is increased due to the application of the mean-shift method to all the particles. To address this issue, a sparse sampling technique is proposed which samples a small subset, named sparse particles, from the source particles using one dimensional bins based on the KLD-Sampling algorithm. The mean-shift method is applied only to the sparse particles rather than all the particles, which reduces computational cost significantly.
Benzer Tezler
- A study on particle filter based audio-visual face tracking on the AV16.3 dataset
Parçacık filtresi tabanlı görsel-işitsel yüz takibi sisteminin AV16.3 veri seti kullanılarak incelenmesi
YUNUS EMRE YILMAZ
Yüksek Lisans
İngilizce
2016
Elektrik ve Elektronik MühendisliğiOrta Doğu Teknik ÜniversitesiElektrik-Elektronik Mühendisliği Ana Bilim Dalı
DOÇ. DR. AFŞAR SARANLI
- Context aware audio-visual environment awareness using convolutional neural network
Konvolüsyonel sinir ağı kullarak ses ve görüntü aracılığıyla ortam farkındalığı
GİRAY YILLIKÇI
Yüksek Lisans
İngilizce
2019
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrolİstanbul Teknik Üniversitesiİletişim Sistemleri Ana Bilim Dalı
PROF. DR. İBRAHİM AKDUMAN
- Dynamic system modeling and state estimation for speech signal
Konuşma işareti için dinamik sistem modelleme ve durum kestirimi
İBRAHİM YÜCEL ÖZBEK
Doktora
İngilizce
2010
Elektrik ve Elektronik MühendisliğiOrta Doğu Teknik ÜniversitesiElektrik ve Elektronik Mühendisliği Bölümü
PROF. DR. MÜBECCEL DEMİREKLER
- The effects of captioning on text recall and cognitive load in audio- vs. video-based l2 listening: Offline and online evidence from a mobile-assisted language learning study
İkinci dilde ses ve/veya videoya dayalı dinlemede altyazıların metni hatırlama ve bilişsel yük üzerine etkileri: Mobil destekli bir dil öğrenme çalışmasından elde edilen çevrimdışı ve çevrimiçi bulgular
NUR BAŞAK KARATAŞ
Yüksek Lisans
İngilizce
2013
Eğitim ve ÖğretimBoğaziçi Üniversitesiİngilizce Öğretmenliği Ana Bilim Dalı
PROF. DR. YASEMİN BAYYURT KERESTECİOĞLU