Genome-wide prediction of prokaryotic two-component system networks using a sequence-based meta-predictor

Başlık çevirisi mevcut değil.

PDF İndir

Tez No: 402754
Yazar: ALTAN KARA
Danışmanlar: DR. NARCIS FERNANDEZ-FUENTES, DR. DAVID WHITWORTH
Tez Türü: Doktora
Konular: Biyoloji, Biyomühendislik, Biyoteknoloji, Biology, Bioengineering, Biotechnology
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2016
Dil: İngilizce
Üniversite: Aberystwyth University / Prifysgol Aberystwyth
Enstitü: Prifysgol Aberystwyth
Ana Bilim Dalı: Yurtdışı Enstitü
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 628

Özet

Özet yok.

Özet (Çeviri)

Two-component systems (TCSs) are signalling complexes composed of a histidine kinase (receptor) and a response regulator (effector). They are the most abundant signalling pathways in prokaryotes. They control a wide range of biological processes. The pairing of these two components is highly specific, and the interactions between them are fast and transient. This makes their prediction quite challenging, especially when an orphan protein, whose encoding gene is at least 200bp further from any other TCS protein coding gene, involved in the interaction. Thus, determining TCS proteinprotein interactions (PPIs) is often requiring a costly and time-consuming experimental characterisation. Therefore, there is considerable interest in developing accurate computational prediction tools to lessen the burden of experimental work and cope with the ever-increasing amount of genomic information available and also to be able to accurately map TCS PPIs even if an orphan TCS protein involved in the interaction. In this work, a novel meta-predictor, MetaPred2CS, was developed specifically to predict prokaryotic TCS PPIs based on a support vector machine. MetaPred2CS integrates six sequence-based prediction methods, namely in-silico two-hybrid, mirrortree, gene fusion, phylogenetic profiling, gene neighbourhood and gene operon, of orthogonal nature. These methods are selected based on their advantages, disadvantages and characteristics of the TCS PPIs. More detailed information related to this selection can be found in Section 3.5.1. To benchmark MetaPred2CS, a novel training dataset of experimentally validated TCS protein pairs, which are composed of 113 positive (P+) and 1134 negative (P-) interaction pairs, was compiled for k-fold cross validation to act as a gold standard dataset for TCS predictions. Creation of this dataset is required as there is currently no database that provides experimentally proved information, especially regarding negative TCS PPI pairs. MetaPred2CS was also compared against the current state of the art (a Bayesian Network (BN) based method and STRING). Combining individual predictors of different nature improved the overall prediction accuracy, and as a result, MetaPred2CS performed better when compared to the individual methods and outperformed the current state-of-the-art. The prediction performance of MetaPred2CS was compared against the current state of the art based on AUC values. According to these tests, AUC values for MetaPred2CS, STRING and BN based methods obtained 92.8, 88.4 and 83.5, respectively. Among the components of MetaPred2CS, the in-silico two-hybrid method contributed most to its performance (5.93%). Besides performing better than the current state-of-the-art, MetaPred2CS is also the only available option that allows its users to perform de-novo predictions. This thesis will argue that MetaPred2CS is also effective in predicting orphan TCS PPIs, which is the one of the main challenges in the field. A publicly available web server was developed to interface the method and was employed in genome-wide TCS PPI predictions for E. coli K-12 MG1655, M. xanthus DK 1622, P. aeruginosa UCBPP-PA14 and E. amylovora ATCC 49946. Finally, forty novel predictions, which were outputted by MetaPred2CS for these organisms, are evaluated in detail at the end of Chapter 5. The biological relevance of the components of these novel pairs suggests that some of these predictions might be valuable targets for researchers who are interested in understanding the life cycle of these organisms.The MetaPred2CS web server is available at http://metapred2cs.ibers.aber.ac.uk along with newly created gold standard dataset (P+/P-) of TCS interaction pairs. The source code for the MetaPred2CS can be downloaded from https://github.com/martinjvickers/MetaPred2CS and also can be obtained as an OVA file of an implemented Virtual Machine (which provides a preinstalled version of MetaPred2cs) at http://metapred2cs.ibers.aber.ac.uk/MetaPred2CS.ova.

Benzer Tezler

Tez No
688350
Genom-boyu ilişki çalışmalarında poligenik risk skorunun makine öğrenimi ve derin öğrenme yöntemleri ile tahmin edilmesi
Prediction of polygenic risk score by machine learning and deep learning methods in genome-wide association studies
RAGIP ONUR ÖZTORNACI
Doktora
Türkçe
2021
Biyoistatistik Mersin Üniversitesi
Biyoistatistik ve Tıbbi Bilişim Ana Bilim Dalı
PROF. DR. BAHAR TAŞDELEN
PROF. DR. CEMİL ÇOLAK
Tez No
498312
Isı şoku protein genlerinin (HSP) bazı populus taksonlarında fonksiyonel genom analizi ve abiyotik stres koşullarında HSP genlerinin ifade seviyelerinin belirlenmesi
Genome-wide survey of heat shock proteins (HSP) and expression analysis of HSP genes under abiotic stress conditions in some populus taxons
ESRA NURTEN YER
Doktora
Türkçe
2017
Genetik Kastamonu Üniversitesi
Orman Mühendisliği Ana Bilim Dalı
PROF. DR. SEZGİN AYAN
DOÇ. DR. MEHMET CENGİZ BALOĞLU
Tez No
350359
Ağırlıklı çoklu sınıflandırıcı kullanarak biyolojik verilerin tahmini
Prediction of biological data by using weighted ensemble classifiers
TAYLAN İYİDOĞAN
Yüksek Lisans
Türkçe
2013
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol TOBB Ekonomi ve Teknoloji Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
YRD. DOÇ. DR. TANSEL ÖZYER
Tez No
887780
İnsan gen yolaklarında ikâme modelleme ve makine öğrenmesi kullanarak varyant analizi
Variant analysis in human gene networks using surrogate modelling and machine learning
FURKAN AYDIN
Yüksek Lisans
Türkçe
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
Bilgisayar Bilimleri Ana Bilim Dalı
DR. ÖĞR. ÜYESİ SÜHA TUNA
Tez No
860930
Machine learning methods for detecting genetic and infectious diseases
Genetik ve enfeksiyon hastalıklarının tespiti için makine öğrenmesi yöntemleri
YUNUS EMRE IŞIK
Doktora
İngilizce
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Abdullah Gül Üniversitesi
Elektrik ve Bilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. ZAFER AYDIN

Geri Dön