Veri madenciliği yöntemlerini kullanarak sosyal medya reytingleri ile geleneksel tv reytingleri arasındaki ilişkiyi bulmak

Finding the relationship between social media ratings and traditional tv ratings using data mining methods

PDF İndir

Tez No: 952987
Yazar: MEHMET ÖZKAR
Danışmanlar: DR. ÖĞR. ÜYESİ MEHMET TAHİR SANDIKKAYA
Tez Türü: Yüksek Lisans
Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2022
Dil: Türkçe
Üniversite: İstanbul Teknik Üniversitesi
Enstitü: Lisansüstü Eğitim Enstitüsü
Ana Bilim Dalı: Bilgisayar Mühendisliği Ana Bilim Dalı
Bilim Dalı: Bilgisayar Mühendisliği Bilim Dalı
Sayfa Sayısı: 61

Özet

Reyting değeri, bir programın kaç kez izlendiğini belirtir ve genellikle bir programın TV'de gösterilmeye devam edip etmeyeceğini belirler. Günümüzde geleneksel yöntemlerin dışında Twitter, web gibi çeşitli platformlardan bu değerler hesaplanabilmektedir. Tez çalışmasında literatürde daha önce yapılmamış bir uygulama denenmiştir. Geleneksel TV reyting ölçümleri ile Twitter reytingleri ve Web izlenme reytingleri arasında bir ilişki olup olmadığı ve varsa bu değerlere nasıl ulaşılacağı belirlenmeye çalışılmıştır. Bu belirlemeler veri madenciliği yöntemleri kullanılarak yapılmıştır. Verilerin elde edileceği diziler olarak Twitter reytingine ulaşılabilen diziler seçilmiştir. Bu diziler Çukur, Diriliş Ertuğrul, Mucize Doktor ve Hercai olarak belirlenmiştir. Hazır bir veri kümesi olmadığı için tüm veriler farklı platformlardan (Twitter, Youtube, Facebook, Web) elle toplanmıştır. Veri kümesini oluşturan öznitelikler Twitter reytingi, Twitter tekil kullanıcı sayısı, webden izlenme sayısı, TV reytingi ve TV izlenme payı oranıdır. Bu özellikler arasındaki ikili, üçlü ve dörtlü korelasyonlar ayrı ayrı incelenmiştir. Yapılan incelemeler sonucunda sahip olunan yüksek korelasyon değeri nedeniyle Twitter reyting, Twitter tekil kullanıcı sayısı, webden izlenme ve TV izlenme payı özniteliklerinin girdi değişkeni, TV reytinginin ise çıktı değişkeni olduğu bir model oluşturulmuştur. Korelasyon hesaplamalarından ve bu hesaplamalardan elde edilen sonuçlardan reytingi tahmin etmek için bir program tasarlanmıştır. Uygulamada kullanılacak programlama dili olarak Python seçilmiştir. Programda kullanılacak algoritma olarak çoklu lineer regresyon seçilmiştir. Bunun nedeni, denenen birçok farklı algoritma içerisinde iyi sonuçlar veren SMOReg algoritmasına göre daha yaygın ve daha iyi bilinen bir algoritma olmasıdır. Uygulamanın temel çıktısı, tahmini AB reyting değeridir. Bir dizi için elde edilen Twitter reyting değeri, Twitter kullanıcı sayısı, webden izlenme sayısı ve AB izlenme payı değeri ilgili metin kutularına girilip tahmini AB reytingi butonuna tıklanınca, sonuç olarak uygulamanın ürettiği tahmini AB reyting değeri ilgili butonda gözükmektedir. Mevcut veri kümesinden oluşan eğitim ve sınama verileri ile k-kat çapraz doğrulama yöntemi kullanılarak yapılan testler sonucunda, programdan üretilen sonuçların değerlendirilmesinde kullanılan en önemli verilerden olan ortalama mutlak hata değeri birin altında bulunmuştur. Uygulamanın daha önce eğitim ve sınama için kullanılan veriler dışında harici veriler kullanılarak yapılan testlerinin de başarılı sonuçlar verdiği görülmüştür. TİAK'ın birkaç yıl önce almış olduğu kararla reytinglerin halka açık platformlardan yayınlanması yasaklanmıştır. TV dünyasının en önemli verisi olan reytinglere ulaşmak artık zor olduğundan tez çalışmasında AB izlenme payı verisi olmadan AB reytingi tahmin edilmeye çalışılmıştır. 4 farklı algoritma kullanılarak yapılan bu çalışmaların sonucu bir tabloyla gösterilmiştir. Yapılan çalışmalar sonucunda en az ortalama mutlak hata ve en yüksek R-kare değerlerine Twitter kullanıcı, Twitter reyting ve webden izleme özniteliklerinin giriş ve AB reyting özniteliğinin çıkış olarak kullanıldığı denklemden ve RandomForest algoritmasıyla ulaşılmaktadır. AB reytinginin bu şekilde tahmin edilmesinin TV kanalları, reklam verenler ve dizi yapımcıları için oldukça faydalı olacağı düşünülmektedir.

Özet (Çeviri)

Data mining is the process of extracting information from large amounts of data. The application areas of data mining are very wide. It has successful applications in almost every field from medicine to finance, from bioinformatics to business intelligence, from education to telecom. In the thesis study, using regression analysis, one of the data mining methods, traditional rating systems and rating systems obtained through social media are compared and it is investigated whether there is a relationship between them. First of all, the attributes to be used are determined. Then the objects that will form the data are selected. After the data is created, it is preprocessed. Input and output attributes are determined according to the correlations formed after the preprocessing step. After the algorithm to be used in the program is selected, the results of the program are tested. In the past, television was just a watching device. Nowadays, it is possible not only to watch television but also to interact with it. The television has many functions such as connecting to the internet and playing games. Social media is an internet-based form of communication. As a result of the widespread use of social media, the concepts of first screen and second screen have emerged. The first screen is the screen we only watch and listen to, such as television and radio; the screen that we not only watch but also actively participate in the process as a user, is called the second screen. The rating value indicates how many views a program has and often determines whether a program can continue to be featured on TV. The rating is traditionally measured by peoplemeter devices given to users. Apart from traditional methods, these values can be calculated from various platforms such as Twitter and the Web. There are various companies that measure ratings from Twitter. It is possible to reach web ratings from the websites of Youtube or TV channels. In the thesis study, firstly, Twitter rating is used. Both Twitter rating values and Twitter unique user numbers are included as attributes. The other attribute used is the web rating, which is obtained from various web sites. The last used attributes are the AB rating and share, which are published daily on various web sites. As the TV program to be used in the study, TV series that can easily access both Twitter rating values, web viewing ratings and traditional rating values are chosen. Due to their high rating values, Çukur, Diriliş Ertuğrul, Mucize Doctor and Hercai series are used. There are data that needed a preprocessing step before the data is processed. The normalization process under the data transformation step, which is one of the preprocessing steps, is applied to these data. Twitter unique users are normalized with value/1000. Web viewing number is normalized with value/1000000 for Diriliş Ertuğrul, Mucize Doktor and Hercai series. For Çukur series, since the episodes are not published on Youtube and the number of views is taken from the web page of the relevant TV channel, the value is normalized by value/200000. The purpose of these decimal normalization operations is to harmonize the data set we have with different ranges, with the traditional AB rating and share values. After the data were obtained and preprocessed, the correlations between the features are \hbox {examined. According} to the double correlation results, a positive very high relationship exists between AB rating and AB share values, a positive high relationship exists between Twitter ratings and Twitter user numbers, a negative medium/high relationship exists between AB ratings and the number of views on the web, and a negative medium/high relationship exists between AB share and the number of views on the web. Very weak positive or negative relationships are found among other features. After double, triple and quadruple correlation analysis, due to the high correlation value (0.89) , a model is determined in which AB share, Twitter user, Twitter rating and web views are the input variables and AB rating is the output value. In the application, the Curve Fitting (Regression) method, which is one of the Predictive models of data mining methods, is used. High correlation and low mean absolute error values are achieved with both multiple linear regression and SMOreg algorithms. There are two important components in the OLS regression results table, which is formed as a result of the application performed with the multiple linear regression algorithm. The first of these is the p value. The p values for Twitter rating, web views and AB share are 0.013, 0.012 and 0.000 respectively. These values are less than 0.05 p values. Therefore, a statistically high correlation is found between the output variable AB rating value and these input attributes. The second important value, the corrected R-square value, is 0.805, which again shows that the model has considerable good results. Python is chosen as the programming language to implement the application. Multiple linear regression is chosen as the algorithm to be used in the program. This is because it is a more common and well-known algorithm. Basic output of the application is the estimated AB rating value. The Twitter rating value, the number of Twitter users, the number of views on the web and the AB share value obtained for a series are entered in the relevant boxes and the estimated AB rating button is clicked. As a result, the estimated AB rating value produced by the application appears in the relevant box. For testing our application, external values that are not used in the learning and test set of series are used. As a result of the examinations made with the external test data, the actual rating value of the 78th episode of the Çukur series was 8.36, while the result from the program was 8.84. This means that there is a difference of 0.48 between the real value. This difference is under the limits of the mean absolute error value of 0.9528. When the other test values are examined in the same way, the difference values are observed under the mean absolute error limits. An episode of Mucize Doctor is found to be the closest guess. The difference is 0.16. As the furthest guess, an episode of the Çocuk TV series is found. The difference is found to be 0.95. The mean absolute error of test set is 0.629. In the past, ratings were published on every platform, but this has been banned by the decision of TIAK a few years ago. TIAK now shares rating data only with its subscribers. Since it is difficult to reach the ratings, which are the most important data of the TV world, in the thesis study, the AB rating is tried to be estimated without the AB share data. The results of these studies using 4 different algorithms are shown in a table. As a result of the studies, the minimum average absolute error and the highest R2 values are reached with the RandomForest algorithm and the equation in which Twitter user, Twitter rating and web viewing attributes are used as input and AB rating attribute as output. Predicting the AB rating in this way is thought to be very beneficial for TV channels, advertisers and TV series producers. In the application used in the thesis study, the data set was divided into training and test sets using the k-fold cross-validation method. The results were then tested with external test data. In future studies, it will be more useful to perform an evaluation test on a large test dataset. The dataset used here can be expanded further in similar studies to be carried out in the future. The new results could be reinterpreted.

Benzer Tezler

Tez No
925486
Metin madenciliği ve derin öğrenme yöntemleri kullanılarak borsadaki hareketlerin sosyal medya ile ilişkisinin analiz edilmesi
Analyzing the relationship between stock market movements and social media using text mining and deep learning methods
METİN OKTAY BOZ
Yüksek Lisans
Türkçe
2025
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Mersin Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ JALE BEKTAŞ
Tez No
512017
İngilizce ve Türkçe twitter mesajlarının Word2Vec modeli ile sınıflandırılması
Classification of English and Turkish twitter messages by using Word2Vec model
ABDULLAH AMMAR KARCIOĞLU
Yüksek Lisans
Türkçe
2018
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Atatürk Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ TOLGA AYDIN
Tez No
796366
Prediction of gender based violence in Iraq using data mining and artificial intelligence
Başlık çevirisi yok
LAYTH RAFEA ABDULKAREEM
Yüksek Lisans
İngilizce
2022
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Altınbaş Üniversitesi
Bilişim Teknolojileri Ana Bilim Dalı
DR. ÖĞR. ÜYESİ OĞUZ KARAN
Tez No
517112
Analyzing twitter contents using text mining methods
Metin madencilik yöntemlerini kullanarak twıtter içeriğinin analizi
MUSTAFA LATEEF FADHIL JUMAILI
Yüksek Lisans
İngilizce
2018
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Erciyes Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ FEHİM KÖYLÜ
Tez No
672681
Sosyal medya analizine dayalı rekabetçi zekâ modelönerisi: Antalya bölgesi otel işletmeleri analizi
The proposal of competitive intelligence model based on socialmedia analysis: Antalya region hotel business analysis
AHMET BÜYÜKEKE
Doktora
Türkçe
2020
Bilim ve Teknoloji Gazi Üniversitesi
Yönetim Bilişim Sistemleri Ana Bilim Dalı
PROF. DR. ALPTEKİN SÖKMEN

Geri Dön