Automated captioning of image and audio for visually and hearing impaired

Görme ve işitme engelliler için otomatik görüntü ve ses altyazılama

PDF İndir

Tez No: 853226
Yazar: ÖZKAN ÇAYLI
Danışmanlar: DOÇ. DR. VOLKAN KILIÇ, DOÇ. DR. AYTUĞ ONAN
Tez Türü: Yüksek Lisans
Konular: Elektrik ve Elektronik Mühendisliği, Electrical and Electronics Engineering
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2024
Dil: İngilizce
Üniversite: İzmir Katip Çelebi Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Elektrik-Elektronik Mühendisliği Ana Bilim Dalı
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 72

Özet

Görüntülerin ve ses verilerinin işlenmesi ve yorumlanmasında önemli ilerlemeler sunan bu tez, görme ve işitme engelli bireylerin gerçek dünyaya olan erişimlerini genişleterek sosyal izolasyonlarını azaltacak, refahlarını, istihdam olanaklarını ve eğitim deneyimlerini iyileştirecek görüntü ve ses betimlemeleri üretme üzerine algoritmik yaklaşımlarda önemli gelişmeler sunmaktadır. Algoritmik yeniliklere odaklanmak, platformun sadece verimli değil, aynı zamanda çeşitli görsel ve işitsel bilgi türlerine uyum sağlayabilecek şekilde esnek olmasını sağlar. Bu, görme engellilere yardım etmek için çok yönlü bir araç haline gelir. Tez, üç ana katkı bölümünde bu amacı ele almıştır: görüntü altyazılama, video altyazılama ve sesli-görselli video altyazılama yaklaşımları. Bu araştırmanın ilerleyişi ilk olarak görüntü altyazılama ile başlar. Bu ilk aşama, durağan resimleri doğru bir şekilde yorumlayıp tanımlayabilen sofistike algoritmaların geliştirilmesine odaklanır. Bu temel çalışma, ardından gelen video altyazılama aşaması için zemin hazırlar. Burada, algoritmalar dinamik görsel içeriği ele alacak şekilde uyarlanır, video dizilerinin bağlamsal ve zamansal betimlemelerini sağlar. Bu araştırmanın son noktası, sesli-görselli video altyazılama entegrasyonudur. Bu son aşama, önceki aşamalardan elde edilen ilerlemeleri senkronize eder, altyazıların derinliğini ve doğruluğunu artırmak için ses analizini dahil eder. Bu kapsamlı yaklaşım, geniş bir görsel ve işitsel girdi yelpazesi için detaylı açıklamalar sağlayabilen sağlam ve kapsayıcı bir sistem sağlar, böylece görme ve işitme engelli kullanıcılara çevrelerini daha iyi bir şekilde anlama imkanı sunar.

Özet (Çeviri)

Generating captions and text descriptions of images will enable visually and hearing impaired extended accessibility to the real-world, thus reducing their social isolation, and improving their well-being, employability, and education experience. This thesis presents significant advancements in algorithmic approaches for generating captions and text descriptions. These enhancements are pivotal in processing and interpreting both image and audio data. The focus on algorithmic innovation ensures that the platform is not only efficient but also adaptable to various types of visual and auditory information, making it a versatile tool for aiding those with visual impairments. The thesis has addressed this aim in three main contribution chapters, image captioning, video captioning, and audio-visual video captioning approaches. The progression of this research is methodically structured, starting with image captioning. This initial phase concentrates on developing sophisticated algorithms capable of accurately interpreting and describing still images. This foundational work sets the stage for the subsequent phase, video captioning. Here, the complexity increases as the algorithms are adapted to handle dynamic visual content, providing contextual and temporal descriptions of video sequences. The culmination of this research is in the integration of audio-visual video captioning. This final phase synergizes the advances from the previous stages, incorporating audio analysis to enhance the depth and accuracy of captions. This comprehensive approach ensures a robust and inclusive system, capable of providing detailed descriptions for a wide range of visual and auditory inputs, thus offering a more complete understanding of the environment for users with visual and hearing impairments.

Benzer Tezler

Tez No
784956
Automated audio captioning with acoustic and semantic feature representation
Akustik ve anlamsal öznitelik temsili ile otomatik ses başlıklandırma
AYŞEGÜL ÖZKAYA EREN
Doktora
İngilizce
2023
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Başkent Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. MUSTAFA SERT
Tez No
564955
Derin öğrenme yöntemi ile optik uydu görüntülerinden gemi tespiti
Ship detection by optical satellite images with deep learning method
OSMAN DUMAN
Yüksek Lisans
Türkçe
2019
Elektrik ve Elektronik Mühendisliği İstanbul Teknik Üniversitesi
İletişim Sistemleri Ana Bilim Dalı
PROF. DR. MESUT KARTAL
Tez No
602669
Identifying ımage related sentences in news articles
Haber makalelerinde görüntü ile i̇lgili cümlelerin belirlenmesi
MELİKE ESMA İLTER GÜLAÇ
Yüksek Lisans
İngilizce
2019
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Boğaziçi Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. LALE AKARUN ERSOY
DOÇ. DR. ARZUCAN ÖZGÜR TÜRKMEN
Tez No
905937
Tıbbi görüntülerde otomatik alt yazı üretimi
Automatic caption generation in medical images
SEVDENUR KÜTÜK
Yüksek Lisans
Türkçe
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Gazi Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ TUBA ÇAĞLIKANTAR
DR. ÖĞR. ÜYESİ DUYGU SARIKAYA
Tez No
780600
Derin öğrenme yöntemleri yardımıyla görüntüde yer alan nesnelerin birbirlerine göre konumlarının belirlenmesi ve tasvir edilmesi üzerine bir çalışma
A study on the determination and depiction of the positions of the objects in the image relative to each other with the help of deep learning methods
ESİN ERGUVAN ETGİN
Doktora
Türkçe
2023
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Maltepe Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ ERDAL GÜVENOĞLU

Geri Dön