Bayesçi gizli sınıf analizi ve makine öğrenimi yöntemleri ile CRISPR Cas9 gen düzenlemesinde hedef dışı skorların tahmini

Prediction of off-target scores in CRISPR Cas9 gene editing with bayesian latent class analysis and machine learning methods

PDF İndir

Tez No: 915991
Yazar: ALİ MERTCAN KÖSE
Danışmanlar: PROF. DR. OZAN KOCADAĞLI
Tez Türü: Doktora
Konular: Biyoistatistik, Genetik, İstatistik, Biostatistics, Genetics, Statistics
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2024
Dil: Türkçe
Üniversite: Mimar Sinan Güzel Sanatlar Üniversitesi
Enstitü: Lisansüstü Eğitim Enstitüsü
Ana Bilim Dalı: İstatistik Ana Bilim Dalı
Bilim Dalı: İstatistik Bilim Dalı
Sayfa Sayısı: 168

Özet

Son zamanlarda CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats- Düzenli Aralıklarla Bölünmüş Palindromik Tekrar Kümeleri), biyoloji alanında en popüler uygulamalardan biridir. CRISPR viral hastalıklara ve enfeksiyonlara karşı bağışıklılık sistemini güçlendiren bir sistem olarak bilinir. Bu sistemde, hedef dizisi (DNA), DNA ve rehber RNA (gRNA) arasındaki eşleşmeyle belirlenir ve hedefteki ve hedef dışındaki konumlar tespit edilir. Hedef dışı bazların (nükleotidler) değiştirilmesiyle viral hastalıkların ve enfeksiyonların yayılması önlenir. Hedefte ve hedef dışı sonuçlar genellikle CFD/MIT skorlar ile yalnızca iki kategoriye göre değerlendirilir. Ancak bu çalışma, hedef dışı durumların alt sınıflarını belirlemeye ve hedef DNA üzerindeki bozunumlara yol açan pozisyonların ortaya çıkarmaya odaklanmaktadır. Çalışmada gizli sınıf analizi ve Bayesçi gizli sınıf analizleri kullanılarak, hedef dışı seviyelerin belirlenmesi amaçlanmaktadır. Çalışmanın ilk amacı, gen bozulmalarını daha doğru tahmin edebilmek için hedef dışı pozisyonlardaki alt sınıfları belirlemek amacıyla Gizli Sınıf Analizi'ni (LCA) kullanmaktır. LCA, nominal ve ordinal değişkenlerle ölçülen kategorik değişkenler arasındaki ilişkiyi analiz eden bir yöntemdir. LCA'da maksimum olabilirlik (ML) tahmin yöntemi, Beklenti Maksimizasyon (E-M) algoritmasını kullanarak en iyi çözümü sağlar. ikinci amaç ise ML yöntemine alternatif olarak, Markov Chain Monte Carlo (MCMC) yöntemiyle Bayesçi gizli sınıf modellerinde bilinmeyen parametrelerin tahmin edilmesidir. Böylece daha iyi sonuçlara ulaşılması hedeflenmiştir. Son olarak, LCA ile belirlenen hedef dışı seviyelere etki eden konumları tahmin etmek amacıyla makine öğrenimi algoritmalarının kullanılmasıdır. Analizde iki farklı CRISPR veriseti uygulanmıştır. Bu verisetleri, CRISPR'a ilişkin veri araçları üzerinden elde edilmiş olup, hedef diziler ve gRNA'ların eşleşme pozisyonlarını içeren 5132 ve 947'lik iki örneklem boyutlarına sahiptir. DNA ve gRNA eşleşmesinde, aynı pozisyondaki iki bazın kombinasyonu bir değişken olarak kabul edilmiştir ve bu nedenle ikili (0/1) ve çoklu (4x4=16) kodlama yöntemleri kullanılarak veri setleri oluşturulmuştur. Daha sonra, her bir kodlama sistemi için gizli sınıfı analizi yapılmıştır. LCA tarafından belirlenen sınıf sayılarıyla, Bayesçi gizli sınıf analizi gerçekleştirilmiştir. İkili ve çoklu kategori veri setleri üzerinde yapılan analizlerde, Lo-Mendel-Rubin (LMR) test sonuçları her bir gizli sınıf için istatistiksel olarak anlamlı bulunmuştur (p

Özet (Çeviri)

Recently, CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is one of the most popular applications in the field of biology. CRISPR is a system that enhances the immune system against viral diseases and infections. In this system, off-target and on-target positions within the target sequence (DNA) are determined by the matching between DNA and guide RNA. The spread and occurrence of viral diseases and infections are prevented by modifying the off-target bases (nucleotides). Typically, off-target and on-target results are evaluated using CFD/MIT scores based on only two categories. However, this study focuses on identifying the subclasses of off-target and determining the positions on the target DNA that lead to disruptions. Latent class analysis (LCA) and Bayesian latent class analysis are utilized to identify off-target levels. The primary aim of this study is to use Latent Class Analysis (LCA) to identify subclasses within off-target positions, thereby enabling more precise prediction of gene disruptions. LCA is a method used to analyze the relationship between categorical variables measured at nominal and ordinal variables. In LCA, the maximum likelihood (ML) estimation method achieves the best solution by using the Expectation-Maximization (E-M) algorithm. The second aim is to employ Bayesian latent class models as an alternative to ML method, estimating unknown parameters using the Markov Chain Monte Carlo (MCMC) method, in order to achieve better results. Finally, machine learning algorithms are used to predict the locations influencing off-target levels identified by LCA. Two different CRISPR datasets were used in the analysis. These datasets were obtained through CRISPR-related data tools and consist of sample sizes of 5132 and 947, containing target sequences and the matching positions of gRNAs. In the DNA and gRNA matching process, the combination of two bases at the same position was considered a variable. Therefore, the datasets were created using binary (0/1) and multi (4x4 = 16) coding methods. Subsequently, latent class analysis was applied for each coding system. Bayesian latent class analysis was then conducted based on the number of classes determined by LCA. In the analysis of the binary and multi-category datasets, the Lo-Mendel-Rubin (LMR) test results were found to be statistically significant for each latent class (p < 0.001). According to the Bayesian Information Criteria and the Consistent Akaike Information Criteria, the latent models were well-fitted to four and five classes for the binary category datasets and three classes for multi-category datasets. Because the entropy values were higher than 0.60, these latent class models were distinctly and reliably classified. After determining the number of classes, latent class models were estimated using the Bayesian approach for both binary and multi-category datasets. Machine learning models have been developed to identify off-target levels and the genomic positions causing mismatches that lead to these levels. However, making too many base changes in the DNA sequence can weaken the immune system or lead to the development of new diseases. This study systematically examines the impacts of mismatches between the target sequence and gRNA, providing an approach aimed at minimizing base changes at each position. The study contributes to ensuring the responsible use of genetic manipulation and the safety of CRISPR-based therapies for future therapeutic applications. The thesis consists of eight chapters totally. In the second chapter, CRISPR-Cas9 technology and basic concepts are introduced. In the third part, the latent class analysis is dicussed and then, the fourth part is followed by Bayesian latent class analysis. Machine learning topics are given theoretically and pratically in the fifth part. The sixth part includes the detailed information about the application datasets and methods. In the application section, the analysis is conducted with CRISPR datasets and the estimated target-off models is explained considering the feature importance procedure. In the conclusion and discussion sections, the estimated models are interpreted, and findings are compared with existence literature. Lastly the future direction and suggestions are dicussed.

Benzer Tezler

Tez No
959126
Data-driven anomaly detection for airspace security using ADS-B surveillance data
ADS-B gözetim verisi ile hava sahası güvenliği için veri tabanlı anomali tespiti
ABDULLAH ÇERKEZOĞLU
Yüksek Lisans
İngilizce
2025
Savunma ve Savunma Teknolojileri İstanbul Teknik Üniversitesi
Savunma Teknolojileri Ana Bilim Dalı
DOÇ. BARIŞ BAŞPINAR
Tez No
873785
Algılanan inme etkisine göre inme latent sınıflarının belirlenmesi
Determination of stroke latent classes based on perceived stroke impact
BİNNUR ÇETİN
Yüksek Lisans
Türkçe
2024
Ergoterapi Hacettepe Üniversitesi
Ergoterapi Ana Bilim Dalı
DOÇ. DR. ORKUN TAHİR ARAN
Tez No
632553
Bayesian approaches for privacy preserving data sharing
Mahremiyeti koruyan veri paylaşımında bayesçi yöntemler
BEYZA ERMİŞ
Doktora
İngilizce
2020
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Boğaziçi Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. ALİ TAYLAN CEMGİL
Tez No
692422
Hybridization of probabilistic graphical models and metaheuristics for handling dynamism and uncertainty
Değişimin ve belirsizliğin ele alınması için olasılıksal çizgesel biçelerin ve sezgi-üstlerinin melezleştirilmesi
GÖNÜL ULUDAĞ
Doktora
İngilizce
2021
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. AYŞE ŞİMA UYAR
Tez No
603751
Bayesian model selection for latent variable causal networks by sequential monte carlo
Gizli değişkenli nedensel ağlarda parçacık süzgeci ile Bayesci model seçimi
MEHMET BURAK KURUTMAZ
Yüksek Lisans
İngilizce
2019
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Boğaziçi Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. ALİ TAYLAN CEMGİL

Geri Dön