Geri Dön

Learning in extreme conditions: Online and active learning with massive, imbalanced and noisy data

Başlık çevirisi mevcut değil.

  1. Tez No: 401678
  2. Yazar: ŞEYDA ERTEKİN
  3. Danışmanlar: DR. C. LEE GILES
  4. Tez Türü: Doktora
  5. Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
  6. Anahtar Kelimeler: Belirtilmemiş.
  7. Yıl: 2009
  8. Dil: İngilizce
  9. Üniversite: The Pennsylvania State University
  10. Enstitü: Yurtdışı Enstitü
  11. Ana Bilim Dalı: Bilgisayar Bilimleri ve Mühendisliği Ana Bilim Dalı
  12. Bilim Dalı: Belirtilmemiş.
  13. Sayfa Sayısı: 182

Özet

Özet yok.

Özet (Çeviri)

This thesis addresses improving the performance of machine learning algorithms with a particular focus on classification tasks with large, imbalanced and noisy datasets. The field of machine learning addresses the question of how best to use experimental or historical data to discover general patterns and regularities and improve the process of decision making. However, applying machine learning algorithms to very large scale problems still faces challenges. Additionally, class imbalance and noise in the data degrade the prediction accuracy of standard machine learning algorithms. The main focus of this thesis is designing machine learning algorithms and approaches that are faster, data efficient and less demanding in computational resources to achieve scalable algorithms for large scale problems. This thesis addresses these problems in active and online learning frameworks. The particular focus of the thesis is on Support Vector Machine (SVM) algorithm with classification problems, but the proposed approaches on active and online learning are also well extensible to other widely used machine learning algorithms. This thesis first proposes a fast online Support Vector Machine algorithm (LASVM) that has an outstanding speed improvement over the classical (batch) SVM and other online SVM algorithms, while preserving the classification accuracy rates of the state-of-the-art SVM solvers. The ability to handle streaming data, speed improvement in both training and recognition and the demand for less memory with the online learning setting enable SVM to be applicable to very large datasets. The effectiveness of LASVM and active learning in real world problems is assessed by targeting the name disambiguation problem in CiteSeer's repository of scientific publications. The thesis then presents an efficient active learning framework to target the expensive labeling problem for producing training data. The proposed method yields an efficient querying system and removes the barriers of applying active learning to very large scale datasets due to the high computational costs. We then point out that even when the labels are readily available, active learning can still be used to reach out to the most informative instances in the training data. By applying active sample selection and early stopping in the online SVM, we show that the algorithm can reach and even exceed the prediction accuracies of baseline setting of LASVM with random sampling. Our experimental results also reveal that active learning can be a highly effective method for dealing with the class imbalance problem. We further propose a hybrid method of oversampling and active learning to form an adaptive technique (named VIRTUAL) to efficiently resample the minority class instances in imbalanced data classification. Finally, we propose a non-convex online SVM algorithm (LASVM-NC) based on the Ramp loss, which has strong ability of suppressing the in uences of outliers in noisy datasets. Then, again in the online learning setting, we propose an outlier filtering mechanism that approximates non-convex behavior in convex optimization (LASVM-I). These two algorithms are built on an online SVM solver (LASVM-G) which leverages the duality gap to obtain more trustworthy intermediate models. Our experimental results show that the proposed approaches yield a more scalable online SVM algorithm with sparser models and less computational running time both in the training and recognition phases without sacrificing generalization performance. In the end, we also point out the relation between the non-convex behavior in SVMs and active learning.

Benzer Tezler

  1. Potentialities for and limits to inclusion by education: The case of Syrian children's education in Turkey and child labour

    Eğitim tarafından içermede potansiyeller ve limitler: Türkiye'deki Suriyeli çocukların eğitimi ve çocuk işçiliği

    YASEMİN KIZILOĞLU

    Yüksek Lisans

    İngilizce

    İngilizce

    2021

    Sosyal HizmetOrta Doğu Teknik Üniversitesi

    Sosyal Politika Ana Bilim Dalı

    DR. ÖĞR. ÜYESİ MEHMET OKYAYUZ

  2. Sequential nonlinear learning

    Ardışık doğrusal olmayan öğrenme

    NURİ DENİZCAN VANLI

    Yüksek Lisans

    İngilizce

    İngilizce

    2015

    Elektrik ve Elektronik Mühendisliğiİhsan Doğramacı Bilkent Üniversitesi

    Elektrik-Elektronik Mühendisliği Ana Bilim Dalı

    DOÇ. SÜLEYMAN SERDAR KOZAT

  3. Güç kalitesi olaylarının makine öğrenme teknikleri ile sınıflandırılması

    Classification of power quality events using machine learning methods

    FERHAT UÇAR

    Doktora

    Türkçe

    Türkçe

    2018

    Elektrik ve Elektronik MühendisliğiFırat Üniversitesi

    Elektrik-Elektronik Mühendisliği Ana Bilim Dalı

    DR. ÖĞR. ÜYESİ FİKRET ATA

    PROF. DR. BEŞİR DANDIL

  4. Geliştirilmiş rastgele vektör işlevsel bağlantı ağları ile dağıtım şebekelerinde arıza türü ve yerinin tespiti

    Fault type and location detection in distribution networks with improved random vector functional link networks

    CEM HAYDAROĞLU

    Doktora

    Türkçe

    Türkçe

    2022

    Elektrik ve Elektronik MühendisliğiDicle Üniversitesi

    Elektrik ve Elektronik Mühendisliği Ana Bilim Dalı

    DOÇ. DR. BİLAL GÜMÜŞ

  5. Design and performance evaluation of demand forecasting system for online food data

    Sanal yemek verisi üzerinde talep tahmin sistemi tasarımı ve başarım değerlendirmesi

    MELTEM ARSLAN

    Yüksek Lisans

    İngilizce

    İngilizce

    2024

    Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolGalatasaray Üniversitesi

    Bilgisayar Mühendisliği Ana Bilim Dalı

    DOÇ. DR. GÜLFEM ALPTEKİN