Document classification using improved word embeddings
Geliştirilmiş kelime gömmeleri kullanarak belge sınıflandırma
- Tez No: 826408
- Danışmanlar: DR. ÖĞR. ÜYESİ AYHAN AKBAŞ
- Tez Türü: Yüksek Lisans
- Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
- Anahtar Kelimeler: Belirtilmemiş.
- Yıl: 2023
- Dil: İngilizce
- Üniversite: Çankırı Karatekin Üniversitesi
- Enstitü: Fen Bilimleri Enstitüsü
- Ana Bilim Dalı: Elektronik ve Bilgisayar Mühendisliği Ana Bilim Dalı
- Bilim Dalı: Belirtilmemiş.
- Sayfa Sayısı: 60
Özet
Bu çalışmada, derin öğrenme kullanarak tahmin yapmak için sınıflandırma yapabilen eğitilmiş bir model oluşturma işlemi, en önemli hedeflerden biridir = In this study, one of the most important goals is to create a trained model that can classify to make predictions using deep learning.
Özet (Çeviri)
In this study, one of our primary objectives is to develop a trained model capable of classification using deep learning techniques for prediction. Neural networks, especially in the realm of natural language processing, have demonstrated impressive results, notably in document classification. Researchers have focused extensively on classification prediction. Convolutional network models, recurrent networks, and other embedding mechanisms are employed where texts are extracted (embedded) from documents either at the sentence or word level. Historically, the Word2Vec model was utilized in natural language processing to extract words based on context. This was later augmented with Long Short-Term Memory (LSTM) networks. The use of N-gram properties, in context with the text and associations between words, has proven to enhance prediction accuracy in classification tasks. Previous studies have primarily based document classification on visual methods or formats, perhaps concentrating on titles and abstracts. However, this article posits that classification should be anchored in word inclusion. Utilizing a dataset comprised of 47,000 texts and topics, we employ word embeddings to determine document themes. These embedded words — vast textual content — are sorted into seven primary categories, serving as foundational classes in our dataset. This data then trains deep learning models designed for document classification (both for training and testing). Once trained, this model can autonomously classify documents based on embedded words and texts. Our approach begins by extracting words from the dataset's texts. Subsequently, two models are constructed using Word2Vec. The words undergo lemmatization, reverting to their original form. Superfluous elements, such as symbols and punctuation, are purged to ensure the text remains pure, concentrating solely on semantically significant words. These cleansed word series are then used to train the two models, aiming to establish correlations between words. Both models strive to construct associations based on word sequences within the text. The first model assigns vectors to words based on context and endeavors to predict context via these words. In contrast, the second model hinges on the interrelationships between words and predicts specific words based on classification, yielding a relational concept termed“Neighbor word”. Finally, we employ a deep learning model rooted in Long Short-Term Memory (LSTM). This is buttressed by the relationships deduced from the two Word2Vec models. Evaluations between them are conducted to ascertain which offers superior performance in predictive classification
Benzer Tezler
- Hakem atama otomasyonu için bir karar destek sistemi: Doğal dil işleme ve veri-güdümlü optimizasyon ile bütünleşik bir yaklaşım
A decision support system for reviewer assignment automation: An integrated approach with natural language processing and data-driven optimization
MELTEM AKSOY
Doktora
Türkçe
2023
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrolİstanbul Teknik ÜniversitesiEndüstri Mühendisliği Ana Bilim Dalı
DOÇ. DR. SEDA YANIK ÖZBAY
PROF. DR. MEHMET FATİH AMASYALI
- Türkçe eşgönderge çözümlemesi
Turkish coreference resolution
TUĞBA PAMAY
Yüksek Lisans
Türkçe
2018
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrolİstanbul Teknik ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ GÜLŞEN ERYİĞİT
- Gizli dirichlet ayrımı ve Word2vec yöntemlerinin birleşimi ile özgün bir metin temsil modeli geliştirilmesi
Combining latent dirichlet allocation and Word2vec for a novel document representation model
HALİL İBRAHİM ÇELENLİ
Yüksek Lisans
Türkçe
2020
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolKocaeli ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. SEVİNÇ İLHAN OMURCA
DOÇ. DR. MURAT CAN GANİZ
- Text classification via word embeddings: An application for Turkish music mood detection
Kelime temsilleri yoluyla metin sınıflaması: Türkçe müziklerde duygu tespiti uygulaması
BARIŞ ÇİMEN
Yüksek Lisans
İngilizce
2017
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolBoğaziçi ÜniversitesiYönetim Bilişim Sistemleri Ana Bilim Dalı
YRD. DOÇ. DR. AHMET ONUR DURAHİM
- Deep learning methods with pre-trained word embeddings and pre-trained transformers for extreme multi label text classification
Çoklu etiket sınıflandırması için önceden eğitilmiş kelime vektörleri ve önceden eğitilmiş transformer modelleri ile derin öğrenme yöntemleri
NECDET EREN ERCİYES
Yüksek Lisans
İngilizce
2022
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolÇankaya ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ ABDÜL KADİR GÖRÜR