Geri Dön

Cluster detection by lifting with application to phylogenetics

Başlık çevirisi mevcut değil.

  1. Tez No: 523322
  2. Yazar: NEBAHAT BOZKUŞ
  3. Danışmanlar: Belirtilmemiş.
  4. Tez Türü: Doktora
  5. Konular: İstatistik, Statistics
  6. Anahtar Kelimeler: Belirtilmemiş.
  7. Yıl: 2018
  8. Dil: İngilizce
  9. Üniversite: University of Leeds
  10. Enstitü: Yurtdışı Enstitü
  11. Ana Bilim Dalı: Belirtilmemiş.
  12. Bilim Dalı: Belirtilmemiş.
  13. Sayfa Sayısı: 227

Özet

Özet yok.

Özet (Çeviri)

In this thesis, we propose a new algorithm which automatically detects the number of clusters in a tree structure data set by denoising some generalized node values in the tree using lifting“one coefficient at a time”(LOCAAT) algorithm introduced by Jansen et al. (2001). Our algorithm can be applied to any multidimensional data set using compactness value as a node value or to phylogenetic data sets, DNA sequences, using either compactness value or dissimilarity score as a node value. Compactness value is defined as the average distance from the centroid of each possible cluster in the tree, and the dissimilarity score is the average number of loci, where at least one of them does not share the same nucleotide between sequences under the node of interest. For multidimensional data sets, we consider each node in the tree as a possible location of a cluster after denoising the tree by LOCAAT. Thus, for each possible cluster, we check how much departure we can allow from the centroid of the cluster to assign the objects under the node of interest as a cluster. Then if a node and all its child nodes are denoised less than or equal to the allowed amount of departure from the centroid of their clusters, a cluster is located at this node. We also propose another version of our algorithm based on non-decimated lifting (Knight & Nason, 2009) in which we generate a probability of being clustered for each node. If a node and all its child nodes have a probability of being clustered less than or equal to the probability of acceptance,  2 [0; 1], a cluster is located at this node. We provide a comparison study between our algorithms and some available internal cluster validity indices (CVIs) in the literature using some artificial data sets and a real data set. In addition, we compare the performance of each method using some available external cluster validity scores. For phylogenetic data sets, we check the performance of our algorithms and other CVIs using both compactness value and dissimilarity score as a node value. To be able to compute compactness value for a phylogenetic tree, we need to find the position of each specie in Rp using multidimensional scaling (MDS), and then we can find which species share the similar features using our algorithm. If we use the dissimilarity score as a node value, we will cluster similar species together by finding how much difference we can allow between species. We check the performance of our algorithms using some artificial and a real data sets. In the final part of our thesis, we propose a visualization tool for cophylogenetic data sets. We only consider the associated two phylogenetic trees case, and we apply our algorithm to both host and parasite trees separately to provide a summary of these data sets. We check the performance of our algorithm using two well-known cophylogenetic data sets.

Benzer Tezler

  1. Biyolojik çizge madenciliği: Alt çizge örüntülerinin bulunması ve etkileşim tahmininde kullanılması

    Biological graph mining: Discovery of subgraph paterns and their utilization in interaction prediction

    MEHMET EMİN TURANALP

    Doktora

    Türkçe

    Türkçe

    2008

    Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolSelçuk Üniversitesi

    Elektrik-Elektronik Mühendisliği Ana Bilim Dalı

    PROF. DR. SAADETDİN HERDEM

  2. Stratejik yönetim perspektifinden sigortacılık sektöründe makine öğrenmesi algoritmaları ile anomali tespiti

    An application of machine learning to anomaly detection in insurance industry using strategic management approach

    AYŞE NURBANU ŞAHAN

    Yüksek Lisans

    Türkçe

    Türkçe

    2020

    Endüstri ve Endüstri Mühendisliğiİstanbul Teknik Üniversitesi

    İşletme Mühendisliği Ana Bilim Dalı

    DOÇ. TOLGA KAYA

  3. Galaksi kümelerindeki parlak galaksilerin özelliklerinin ortamla ilişkisinin incelenmesi

    Properties of bright galaxies in galaxy clusters and their dependency on the environment

    EYÜP KAAN ÜLGEN

    Doktora

    Türkçe

    Türkçe

    2022

    Astronomi ve Uzay Bilimleriİstanbul Üniversitesi

    Astronomi ve Uzay Bilimleri Ana Bilim Dalı

    DR. ÖĞR. ÜYESİ SİNAN ALİŞ

  4. Fraud Detection in mobile communication networks using data mining

    Veri madenciliği yardımıyla mobil telekomünikasyon şebekelerinde sahtekarlık tespiti

    BÜLENT KUŞAKSIZOĞLU

    Yüksek Lisans

    İngilizce

    İngilizce

    2006

    Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolBahçeşehir Üniversitesi

    Bilgisayar Mühendisliği Ana Bilim Dalı

    Y.DOÇ. ADEM KARAHOCA

  5. Intrusion detection with pattern classification

    Örüntü sınıflandırması ile saldırı tespiti

    MÜGE ÇEVİK

    Yüksek Lisans

    İngilizce

    İngilizce

    2005

    Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrolİstanbul Teknik Üniversitesi

    Kontrol ve Bilgisayar Mühendisliği Ana Bilim Dalı

    PROF. DR. MEHMET BÜLENT ÖRENCİK