Makine öğrenmesi teknikleri ile sosyal medya kullanımı üzerine bir duygu analizi çalışması

A study on sentiment analysis on social media using machine learning techiques

PDF İndir

Tez No: 620849
Yazar: MOHAMED GUMA IBRAHIM BODEA
Danışmanlar: DR. ÖĞR. ÜYESİ İSMAİL YILDIZ
Tez Türü: Yüksek Lisans
Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2020
Dil: Türkçe
Üniversite: Kastamonu Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Malzeme Bilimi ve Mühendisliği Ana Bilim Dalı
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 89

Özet

Son yıllarda farklı platformlarda insanlar tarafından yazılan metinlerin yaygınlaşması ve özellikle erişimin de artması nedeniyle, söz konusu metinleri analiz etmek için makine öğrenmesi (İng. machine learning) tekniklerinin kullanılması belirgin bir ilgiye mazhar olmaktadır. Bu metinler insanlar tarafından yazıldığı için, doğru bilginin elde edilmesi, Doğal Dil İşleme (NLP) olarak bilinen yoğun bir işlem süreci gerektirir. Burada kullanılan tekniklerin karşılaşacağı başlıca zorluk, bu metinlerde bulunan çok fazla miktardaki bilgi ve kullanılan kelimeler gibi öznitelikler ve çıkarımı yapılmak istenen bilgi arasındaki karmaşık ilişkilerdir. Bu bağlamda, bilgi çıkarımı üzerinde hiç etkisi olmayan veya olumsuz etkisi olan kelimelerin ihmal edilmesi, çok boyutluluğu azaltarak ve bilgi sunumunun verimliliğini artırarak NLP tekniklerinin performansını önemli ölçüde artırabilir.Bu çalışmada, kelimelerin sınıflandırıcıların performansı üzerindeki etkisi hakkında elde edilen bilgileri temsil eden vektörleri ve aynı kelimelerin duygusal anlamını kullanan yeni bir öznitelik belirleme yöntemi önerilmektedir. Önerilen yöntemde, takviyeli öğrenim yoluyla ve veri kümesindeki her bir kelimeyi kaldırmanın etkisini izlemeye dayalı olarak eğitilen yapay bir sinir ağı kullanılmaktadır. Bu kelimeleri temsil eden vektörleri elde etmek için kelime kalıplama (İng. word embedding) kullanılır, bu sayede; bir kelime eğitim veri kümesinde yer almasa dahi, kendisi için üretilen vektörün değerlerine ve eğitim sırasında kullanılan, anlamca bu kelimeye en benzer kelimelere bağlı olarak sıralaması (İng. rank) tahmin edilebilir. Dolayısıyla, ne bütüncedeki herhangi bir kelime için, ne de bütünceye daha sonra eklenebilecek herhangi bir yeni kelime için karmaşık istatistiksel hesaplamalara gerek kalmaz. Yapılan değerlendirme sonucunda, önerilen yöntemin eğitim kümesinde yer almayan her kelimenin sıra veya derecesini % 94.61 doğrulukla hesap etme yeteneği olduğu görülmüştür. Ayrıca, bahsedilen sıra ve derecelere dayalı özellik seçiminin; Destek Vektör Makinesi (SVM), Naïve Bayes (NB) ve Rastgele Orman (RF) gibi metini temsil etmek için sayı vektörlerini kullanan ve Evrişimli Sinir Ağı (CNN), Uzun-Kısa Süreli Bellek (LSTM) ve Geçitli Tekrarlanan Birim (GRU) gibi kelime kalıplamaya dayanan farklı sınıflama türlerinin performansını arttırdığı görülmüştür. Ayrıca, GRU sınıflandırıcı, %95.54 doğrulukla, literatürde yer alan diğer sınıflandırıcılara ve en gelişmiş yöntemlere kıyasla en yüksek performansı vermiştir.

Özet (Çeviri)

In recent years, the use of machine learning techniques to analyze texts written by humans is attracting significant attention, according to the wide availability of these texts and their ease of access. As these texts are written by humans, the extraction of accurate knowledge requires intensive processing, known as Natural Language Processing (NLP). The main challenge that these techniques face is the enormous amount of information available in these texts and the complex relations among the features, i.e. words, and the knowledge required to be extracted. Accordingly, eliminating the words that has negative or no influence on the knowledge extraction can significantly improve the performance of NLP techniques, by reducing dimensionality and improving the efficiency of knowledge representation. In this study, we propose a new feature selection technique that uses vectors that represent the sentimental meaning of words and knowledge extracted about the influence of words on the performance of the classifiers. The proposed method uses an artificial neural network that is trained using reinforcement learning by monitoring the influence of removing each word in the training dataset. Word embedding is used to provide the vectors that represent these words, so that, even if a word is not included in the training, its rank can be predicted by the proposed method depending on the values of the vector generated for it and the knowledge about the most similar words that are considered during the training. Accordingly, no complex statistical computations are required for each word in the corpus, as well as any new words that can be added to the corpus in the future. The evaluation of the proposed method has shown its ability to predict the rank of each word that is not included in the training with 94.61% accuracy. Moreover, feature selection based on these ranks has been able to improve the performance of different types of classifiers, such as the Support Vector Machine (SVM), Naïve Bayes (NB) and Random Forest (RF), which use count vectors to represent the text, as well as the Convolutional Neural Network (CNN), Long- Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) classifiers, which rely on the word embedding vectors for the classification. Moreover, the GRU classifier has been able to achieve the highest performance, with 95.54% accuracy, compared to the other classifiers and state-of-the-art methods in the literature.

Benzer Tezler

Tez No
964704
Sosyal mühendislikte komplo tabanlı içeriklerin yapay zekâ ile analizi
Analysis of conspiracy-based content in social engineering with artificial intelligence
EMEL KOÇYİĞİT
Yüksek Lisans
Türkçe
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Sakarya Üniversitesi
Bilişim Sistemleri Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ FATİH ÇALLI
Tez No
580093
Cryptocurrency price prediction by using social media data
Makine öğrenmesi teknikleri kullanılarak sosyal medya verileri ile kripto para fiyat tahmini
ÖZLEM GÜL PAMUK
Yüksek Lisans
İngilizce
2019
Bilim ve Teknoloji İstanbul Teknik Üniversitesi
Bilişim Uygulamaları Ana Bilim Dalı
DOÇ. DR. SEFER BADAY
Tez No
682241
Sentiment analysis of twitter texts using machine learning algorithms
Makine öğrenme algoritmaları kullanılan twitter metinlerinin duygu analizi
HAWAR SAMEEN ALI AL-BARZENJI
Yüksek Lisans
İngilizce
2021
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Sakarya Üniversitesi
Bilgisayar ve Bilişim Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ VEYSEL HARUN ŞAHİN
Tez No
849805
Emotion recognition using deep learning focusing on the hand and facial expressions
El ve yüz ifadelerine odaklanan derin oğrenmeyi kullanarak duygu tanıma
HASANAIN JAWAD RADEEF
Yüksek Lisans
İngilizce
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Ankara Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
YRD. DOÇ. DR. YILMAZ AR
Tez No
389371
Türkçe metinlerde duygu analizi
Sentiment analysis in Turkish texts
CUMALİ TÜRKMENOĞLU
Yüksek Lisans
Türkçe
2015
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
YRD. DOÇ. DR. AHMET CÜNEYD TANTUĞ

Geri Dön