Detecting flemish innovative companies using web scraping
Başlık çevirisi mevcut değil.
- Tez No: 770319
- Danışmanlar: PROF. JAN DE SPİEGELEER
- Tez Türü: Yüksek Lisans
- Konular: İstatistik, Statistics
- Anahtar Kelimeler: Belirtilmemiş.
- Yıl: 2021
- Dil: İngilizce
- Üniversite: Katholieke Universiteit Leuven (Catholic University of Leuven)
- Enstitü: Yurtdışı Enstitü
- Ana Bilim Dalı: Belirtilmemiş.
- Bilim Dalı: Belirtilmemiş.
- Sayfa Sayısı: 57
Özet
The development in computational technology enables the application of modern methodologies to big and organic data. The producers of official statistics are experimenting with these new tools in order to construct quality and timely statistics. Nevertheless, the dynamics of new information sources bring new challenges to the process. Due to this, we investigate a particular application of modern techniques where we extract information from business websites. The traditional survey method to estimate innovation within a given region is Community Innovation Survey (CIS). Compared to the traditional approach, we utilize web scraping, text mining, machine learning and deep learning algorithms. The main focus of the study is to investigate the reproducibility of applications performed in several other EU states. In addition to that, we explore possible state-of-art techniques in order to improve the published results. Lastly, the application concentrates on CIS 2019 whereas we identify various scalability aspects to all Flemish businesses. The empirical results indicate that the business websites hold valuable information which can be used to classify businesses whether innovative or not. Compared to the baseline study by Statistics Netherlands, we have obtained similar prediction results [9] [10]. The traditional pipeline from text processing to binary classifiers resulted in 0.90 accuracy and 0.80 F1 score. On the other hand, the transformers under deep learning techniques achieved 0.91 accuracy and 0.90 F1 score. The significant rise of the F1 score stemmed from the improvement in the recall of non-innovative companies. Despite the increase in evaluation metrics, the deep learning techniques offer considerably less interpretability. In the context of the official statistics, the accountability and transparency of the estimates are essential. Hence, a sub-optimal logistic regression model may arguably be preferred due to its white-box nature. In the end, this thesis aims to present a complementary methodology in the estimation of official innovation statistics in Flanders. In contrast to the traditional survey approach (CIS), the modern big data application offers timely dissemination of results, cost reduction and no response burden to the businesses. On the other hand, the genesis of the supervised learning models that we evaluated depends on the labeled data from the CIS 2019. Moreover, a censuslike application needs further theoretical work which should address model bias and model degradation. Consequently, this thesis intends to lead the way in developing quality and robust statistics for Statistics Flanders by setting the baseline for a scalable approach.
Özet (Çeviri)
Özet çevirisi mevcut değil.
Benzer Tezler
- Bursa Ovasının tarımsal amaç dışı kullanım durumu
The Out of agricultural purpose usage position of Bursa Plain
SEVDA KARAPİRİM
- Diyabetik retinopati hastalığının derin öğrenme ile tespit edilmesi
Detecting diabetic retinopathy by deep learning
ABDÜSSAMED ERCİYAS
Doktora
Türkçe
2022
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolGazi ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. NECAATTİN BARIŞÇI
- Detecting and classifying fabric defects with computer-vision algorithms
Bilgisayar-görme algoritmaları ile kumaş hatalarının tespiti ve sınıflandırılması
FATMA GÜNSELİ ÇIKLAÇANDIR
Doktora
İngilizce
2022
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolDokuz Eylül ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. SEMİH UTKU
- Erken Neolitik Dönem'de bireysel ve toplumsal kimliklerin belirlenmesi: Aşıklı Höyük'te toplumsal cinsiyet, yaş ve kesişen kimlikler
Detecting individual and collective identities during the Early Neolithic Period: Gender, age, and the intersectionality of identities at Aşıklı Höyük
SERA YELÖZER KILIÇ
Doktora
Türkçe
2022
Arkeolojiİstanbul ÜniversitesiTarih Öncesi Arkeolojisi Ana Bilim Dalı
PROF. DR. MİHRİBAN ÖZBAŞARAN