Cluster detection by lifting with application to phylogenetics
Başlık çevirisi mevcut değil.
- Tez No: 523322
- Danışmanlar: Belirtilmemiş.
- Tez Türü: Doktora
- Konular: İstatistik, Statistics
- Anahtar Kelimeler: Belirtilmemiş.
- Yıl: 2018
- Dil: İngilizce
- Üniversite: University of Leeds
- Enstitü: Yurtdışı Enstitü
- Ana Bilim Dalı: Belirtilmemiş.
- Bilim Dalı: Belirtilmemiş.
- Sayfa Sayısı: 227
Özet
Özet yok.
Özet (Çeviri)
In this thesis, we propose a new algorithm which automatically detects the number of clusters in a tree structure data set by denoising some generalized node values in the tree using lifting“one coefficient at a time”(LOCAAT) algorithm introduced by Jansen et al. (2001). Our algorithm can be applied to any multidimensional data set using compactness value as a node value or to phylogenetic data sets, DNA sequences, using either compactness value or dissimilarity score as a node value. Compactness value is defined as the average distance from the centroid of each possible cluster in the tree, and the dissimilarity score is the average number of loci, where at least one of them does not share the same nucleotide between sequences under the node of interest. For multidimensional data sets, we consider each node in the tree as a possible location of a cluster after denoising the tree by LOCAAT. Thus, for each possible cluster, we check how much departure we can allow from the centroid of the cluster to assign the objects under the node of interest as a cluster. Then if a node and all its child nodes are denoised less than or equal to the allowed amount of departure from the centroid of their clusters, a cluster is located at this node. We also propose another version of our algorithm based on non-decimated lifting (Knight & Nason, 2009) in which we generate a probability of being clustered for each node. If a node and all its child nodes have a probability of being clustered less than or equal to the probability of acceptance, 2 [0; 1], a cluster is located at this node. We provide a comparison study between our algorithms and some available internal cluster validity indices (CVIs) in the literature using some artificial data sets and a real data set. In addition, we compare the performance of each method using some available external cluster validity scores. For phylogenetic data sets, we check the performance of our algorithms and other CVIs using both compactness value and dissimilarity score as a node value. To be able to compute compactness value for a phylogenetic tree, we need to find the position of each specie in Rp using multidimensional scaling (MDS), and then we can find which species share the similar features using our algorithm. If we use the dissimilarity score as a node value, we will cluster similar species together by finding how much difference we can allow between species. We check the performance of our algorithms using some artificial and a real data sets. In the final part of our thesis, we propose a visualization tool for cophylogenetic data sets. We only consider the associated two phylogenetic trees case, and we apply our algorithm to both host and parasite trees separately to provide a summary of these data sets. We check the performance of our algorithm using two well-known cophylogenetic data sets.
Benzer Tezler
- Biyolojik çizge madenciliği: Alt çizge örüntülerinin bulunması ve etkileşim tahmininde kullanılması
Biological graph mining: Discovery of subgraph paterns and their utilization in interaction prediction
MEHMET EMİN TURANALP
Doktora
Türkçe
2008
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolSelçuk ÜniversitesiElektrik-Elektronik Mühendisliği Ana Bilim Dalı
PROF. DR. SAADETDİN HERDEM
- Stratejik yönetim perspektifinden sigortacılık sektöründe makine öğrenmesi algoritmaları ile anomali tespiti
An application of machine learning to anomaly detection in insurance industry using strategic management approach
AYŞE NURBANU ŞAHAN
Yüksek Lisans
Türkçe
2020
Endüstri ve Endüstri Mühendisliğiİstanbul Teknik Üniversitesiİşletme Mühendisliği Ana Bilim Dalı
DOÇ. TOLGA KAYA
- Galaksi kümelerindeki parlak galaksilerin özelliklerinin ortamla ilişkisinin incelenmesi
Properties of bright galaxies in galaxy clusters and their dependency on the environment
EYÜP KAAN ÜLGEN
Doktora
Türkçe
2022
Astronomi ve Uzay Bilimleriİstanbul ÜniversitesiAstronomi ve Uzay Bilimleri Ana Bilim Dalı
DR. ÖĞR. ÜYESİ SİNAN ALİŞ
- Fraud Detection in mobile communication networks using data mining
Veri madenciliği yardımıyla mobil telekomünikasyon şebekelerinde sahtekarlık tespiti
BÜLENT KUŞAKSIZOĞLU
Yüksek Lisans
İngilizce
2006
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolBahçeşehir ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
Y.DOÇ. ADEM KARAHOCA
- Intrusion detection with pattern classification
Örüntü sınıflandırması ile saldırı tespiti
MÜGE ÇEVİK
Yüksek Lisans
İngilizce
2005
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrolİstanbul Teknik ÜniversitesiKontrol ve Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. MEHMET BÜLENT ÖRENCİK