Exploring fusion models in computer vision for medical image computing

Başlık çevirisi mevcut değil.

PDF İndir

Tez No: 403426
Yazar: DUYGU SARIKAYA
Danışmanlar: Dr. JASON J. CORSO
Tez Türü: Doktora
Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2017
Dil: İngilizce
Üniversite: State University of New York at Buffalo
Enstitü: Yurtdışı Enstitü
Ana Bilim Dalı: Belirtilmemiş.
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 177

Özet

Özet yok.

Özet (Çeviri)

Automatic understanding of medical images has been an active research area, and universally-accepted, standard solutions are high in demand. The recent advances of computer vision and machine learning algorithms coupled with the use of digital imaging modalities facilitating ecient storage and access, has led to the improvements in automatic understanding of medical imaging data. Scalable and ecient computer vision algorithms are used in medicine for a broad range of open research problems such as diagnostics, image-guided therapy, automation, and augmented reality for surgical planning and navigation. Intelligent designs and algorithms answer to the needs of medicine by addressing the time, cost, and expertise concerns while hoping to create objective, universally-accepted, standardized, and validated metrics for medical imaging. In this thesis, we propose solutions to the open problems of robot-assisted surgery (RAS) video understanding, and segmentation of the tumor and anatomical structures of the brain. Video understanding of robot-assisted surgery (RAS) videos is an open problem in the computer vision and medical communities. Modeling the gestures and skill level of surgeons presents an in teresting problem. Early identi cation of technical competence in surgical skills is expected to help tailor training to personalized needs of surgeons in training. The insights drawn may be applied in e ective skill acquisition, objective skill assessment, real-time feedback, and human-robot collaborative surgeries. Characterization of anatomical structure of the brain in magnetic resonance human brain images (MRI) has also gained high interest in recent years. There is a strong need for a computer-aided system that will automatically and accurately de ne the tumor and anatomical structures of the brain. A universally-accepted, standardized automatic mapping and segmentation of brain volumetry would be a signi cant improvement for diagnosis, modeling personalized progression, and treatment monitoring of neurologic conditions. Many recent studies has shown that exploiting the relationship across different tasks, jointly reasoning multiple tasks and taking advantage of a combination of shared and task-speci c representations is a good idea and many endto- end multi-task models perform astonishingly better than their single-task counterparts. What stands out in this thesis is the facilitation of information fusion whether it is multiple modalities combining shared and task-speci c representations, joint relationships or using smart fusion of a priori knowledge inferred from multiple atlases. Our studies show that using an appropriate fusion approach increases the accuracy of these tasks and performs better than approaches without fusion. We rst propose an end-to-end solution to the tool detection and localization open problem in robot-assisted surgery (RAS) video understanding, using a strictly computer vision approach and the recent advances of deep learning. We propose a novel architecture using multimodal convolutional neural networks for fast detection and localization of tools in RAS videos. Our architecture applies a Region Proposal Network (RPN), and a multi-modal two stream convolutional network for object detection, to jointly predict objectness and localization on a late fusion of image and temporal motion cues. We also introduce and publicly release a new dataset, ATLAS Dione, for RAS video understanding. Our dataset provides video data (86 full subject task study videos, 910 subtask clips with a total of 5 hours) of ten surgeons from Roswell Park Cancer Institute (RPCI) (Bu alo, NY) performing six di erent surgical tasks on the daVinci Surgical System (dVSSR ) with annotations of robotic tools, timestamps of surgical actions, and expertise levels of the surgeons. We then propose a novel architecture for activity recognition in RAS videos. Our end-to-end architecture is based on the principles of recurrent neural network that jointly learns temporal dynamics and visual features by convolutional network models. We argue that, surgical tasks are better modeled by the visual features that are determined by the objects in the scene, while the gestures, which are small activity segments that reoccur across multiple surgical tasks, are better modeled with motion cues. However, we also believe that visual features are complementary to the motion cues which are independent of object and scene features. We propose an architecture that jointly learns multiple tasks on the two modalities of the input video: visual features and the ow information relating the motion cues. Our architecture simultaneously recognizes the gestures and classi es them under tasks by making use of these joint relationships, combining shared and task-speci c representations to achieve better performance. For the segmentation of the anatomical structures of the brain, we use a priori information of multi-atlases based on the intensity similarity information of voxels as well as the spatial correspondence to infer knowledge, and then we segment brain MRI data with a multiway cut algorithm. For the segmentation of tumor in brain MRI, we de ne 3D joint histograms that are representative of each subject MRI in three di erent modalities. We perform oversegmentation on each subject MRI and de ne a Markov random eld (MRF) on the supervoxels using the probability distributions over the voxels as the unary term and the edge cues of shared boundaries of two supervoxels to de ne the binary term. Our likelihood model on the intensities is based on histogram matching and fusion of this a priori knowledge in a speci c way relating the tumor and brain structures. Inference is ultimately performed with graph cuts.

Benzer Tezler

Tez No
813460
A morphological neural network approach to generative adversarial networks for superior image generation.
Üstün görüntü üretimi için üretken düşman ağlara morfolojik sinir ağı yaklaşımı.
ISLAM MAHMOUD MOMTAZ MOHAMMED NAGEIB AHMED SADEK
Yüksek Lisans
İngilizce
2023
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Altınbaş Üniversitesi
Bilgi Teknolojileri Ana Bilim Dalı
Assist. Prof. Dr. ABDULLAHI ABDU IBRAHIM
Tez No
859475
A hybrid deep learning model for image captioning
Görüntü altyazılama için hibrit derin öğrenme modeli
ZAINAB KHALID TAWFEEQ
Yüksek Lisans
İngilizce
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Karabük Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. NEHAD T.A RAMAHA
Tez No
770941
Spatiotemporal features and deep learning methods for video classification
Başlık çevirisi yok
RUKIYE SAVRAN KIZILTEPE
Doktora
İngilizce
2022
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol University of Essex
PROF. JOHN Q GAN
Tez No
677471
Land cover and land use classification of multi-modal high-resolution satellite images using multi-task deep learning approach
Çok görevli derin öğrenme tekniği ile çok kipli yüksek çözünürlüklü uydu görüntülerinin arazi örtüsü ve arazi kullanımı sınıflandırılması
BURAK EKİM
Yüksek Lisans
İngilizce
2021
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
İletişim Sistemleri Ana Bilim Dalı
PROF. DR. ELİF SERTEL
Tez No
717024
Satellite images super resolution using generative adversarial networks
Uydu görüntülerinde çekişmeli üretici ağ kullanarak süper çözünürlük
MARYAM SERDAR
Yüksek Lisans
İngilizce
2022
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
İletişim Sistemleri Ana Bilim Dalı
PROF. DR. AHMET HAMDİ KAYRAN

Geri Dön