Novel multiple instance learningmodels for digital histopathology

Başlık çevirisi mevcut değil.

PDF İndir

Tez No: 759083
Yazar: MUSTAFA UMIT ONER
Danışmanlar: YRD. DOÇ. DR. LEE HWEE KUAN, PROF. SUNG WİNG-KİN,
Tez Türü: Doktora
Konular: Elektrik ve Elektronik Mühendisliği, Electrical and Electronics Engineering
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2021
Dil: İngilizce
Üniversite: National University of Singapore (NUS)
Enstitü: Yurtdışı Enstitü
Ana Bilim Dalı: Belirtilmemiş.
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 312

Özet

Özet yok.

Özet (Çeviri)

Cancer is estimated to be responsible for 9.3 million deaths globally in 2019. For the early detection and successful treatment of cancer, histopathology is a crucial diagnostic tool. Recently, slide scanners have transformed histopathology into digital, where glass slides are digitized and stored as whole-slide-images (WSIs). WSIs provide us with precious data that powerful deep learning models can exploit. However, a WSI is a huge gigapixel image that traditional deep learning models cannot process. Besides, deep learning models require a lot of labeled data. Nevertheless, most WSIs are either unannotated or annotated with some weak labels indicating sample-level properties. The WSIs are seldom annotated with region-of-interests. This thesis develops novel multiple instance learning (MIL) models to address these challenges in digital histopathology. MIL is a machine learning paradigm that learns the mapping between bags of instances and bag labels. We use the MIL paradigm to tackle huge images (WSIs) and utilize weak labels. We treat a WSI as a bag of small patches cropped over the WSI and use the WSI's weak label as the bag label. We also test our models' usefulness on real-world tasks at the intersection of digital histopathology and genomics. Firstly, we show that digital histopathology tasks can be accomplished even only with weak labels. We develop a weakly supervised clustering framework based on a novel MIL task of predicting unique class count (ucc), which is the number of unique classes among all instances inside a bag. Note that ucc does not provide a label for each instance directly. We formally prove that a perfect ucc classifier1 can be used to cluster individual instances inside the bags perfectly. Furthermore, given only the weak labels of whether an image contains metastases or not, we successfully segment out breast cancer metastases in the lymph node 1The definition of perfect ucc classifier is given in Section 3.3.2.2. xii sections by formulating this task as a ucc task. We show that our framework using only weak labels approximates the performance of a fully supervised medical image segmentation model, which requires tedious and time-consuming exhaustive annotations showing metastases regions in the images. Secondly, we introduce a new family of MIL pooling filters, namely distribution based pooling filters. One common component in all MIL methods is the MIL pooling filter which summarizes extracted features of instances into a bag level representation. Distribution based pooling filters obtain a bag level representation by estimating marginal distributions of the extracted features. We formally prove that the distribution based pooling filters are superior to the point estimate based counterparts, like 'max' and 'mean' pooling, in terms of the amount of information captured while obtaining bag-level representations. Moreover, we empirically show that models with distribution based pooling filters perform equal or better than those with point estimate based ones on real-world MIL tasks. Thirdly, we show that a MIL model with a distribution pooling filter can successfully predict tumor purity from hematoxylin and eosin (H&E) stained WSIs. Tumor purity is the percentage of cancer cells within a tumor. An accurate tumor purity estimation is crucial for accurate pathologic evaluation and for sample selection to minimize normal cell contamination in genomic analysis. Tumor purity is routinely estimated by pathologists; however, pathologists' estimates suffer from inter-observer variability. Moreover, they do not correlate well with genomic tumor purity values, which are computationally inferred from genomic data and accepted as the golden standard. We show that our MIL models successfully predict tumor purity from H&E stained WSIs in eight TCGA cohorts and a local Singapore lung cancer cohort. The predictions are consistent with genomic tumor purity values. Besides, we obtain spatially resolved tumor purity maps showing the spatial variation of tumor purity within slides. Hence, our MIL models can be utilized for sample selection for genomic sequencing, which will help reduce pathologists' workload and decrease inter-observer variability. Moreover, spatial tumor purity maps can help better understand the tumor microenvironment as a key determinant in tumor formation and therapeutic response. Finally, we give a recipe to prepare machine learning datasets for digital xiii histopathology tasks. We show that incorrect data segregation during dataset preparation leads to data leakage, which seriously affects a machine learning model's performance on new patients during real-world deployment. The model can give illusory good results on the test set; however, it is probably not the case for a new patient walking into the clinic. We conclude that patient-level data segregation is necessary to avoid data leakage in digital histopathology tasks. Moreover, it ensures that each patient in the test set is like a new patient walking into the clinic. Hence, it is the correct way of preparing machine learning datasets for real-world clinical applications.

Benzer Tezler

Tez No
945135
Machine learning in longitudinal data analysis: A real-world application in gram-negative bacteremia
Boylamsal veri analizinde makine öğrenimi: Gram-negatif bakteriyemide gerçek dünya uygulaması
OLCAY DİLKEN
Yüksek Lisans
İngilizce
2025
Biyoistatistik Yıldız Teknik Üniversitesi
İstatistik Ana Bilim Dalı
PROF. DR. SERPİL KILIÇ DEPREN
Tez No
929362
Koroner arter hastalığının makine öğrenmesi teknikleriyle teşhisi
Diagnosis of coronary artery disease using machine learning techniques
ŞÜKRÜ ALKAN
Yüksek Lisans
Türkçe
2025
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Sakarya Üniversitesi
Elektrik-Elektronik Mühendisliği Ana Bilim Dalı
DOÇ. DR. MUHAMMED KÜRŞAD UÇAR
Tez No
798238
Security enhancement of image steganography using deepconvolutional neural network (DCNN)
Görüntü güvenliği geliştirme derin kullanarak steganografidönüşümlü sinir ağı (DCNN)
RAFAD IMAD KADHIM ABO KHUSHOOT
Yüksek Lisans
İngilizce
2022
Elektrik ve Elektronik Mühendisliği Altınbaş Üniversitesi
Elektrik ve Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. GALİP CANSEVER
Tez No
736573
Efficient machine learning models for cancer biology
Kanser biyolojisi için etkin yapay öğrenme modelleri
AYYÜCE BEGÜM BEKTAŞ
Doktora
İngilizce
2022
Endüstri ve Endüstri Mühendisliği Koç Üniversitesi
Endüstri Mühendisliği Ana Bilim Dalı
DOÇ. DR. MEHMET GÖNEN
Tez No
401671
Generation and analysis of segmentation trees for natural images
Başlık çevirisi yok
EMRE AKBAŞ
Doktora
İngilizce
2011
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol University of Illinois at Urbana-Champaign
Elektrik ve Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. NARENDRA AHUJA

Geri Dön