Detecting flemish innovative companies using web scraping
Başlık çevirisi mevcut değil.
- Tez No: 770319
- Danışmanlar: PROF. JAN DE SPİEGELEER
- Tez Türü: Yüksek Lisans
- Konular: İstatistik, Statistics
- Anahtar Kelimeler: Belirtilmemiş.
- Yıl: 2021
- Dil: İngilizce
- Üniversite: Katholieke Universiteit Leuven (Catholic University of Leuven)
- Enstitü: Yurtdışı Enstitü
- Ana Bilim Dalı: Belirtilmemiş.
- Bilim Dalı: Belirtilmemiş.
- Sayfa Sayısı: 57
Özet
The development in computational technology enables the application of modern methodologies to big and organic data. The producers of official statistics are experimenting with these new tools in order to construct quality and timely statistics. Nevertheless, the dynamics of new information sources bring new challenges to the process. Due to this, we investigate a particular application of modern techniques where we extract information from business websites. The traditional survey method to estimate innovation within a given region is Community Innovation Survey (CIS). Compared to the traditional approach, we utilize web scraping, text mining, machine learning and deep learning algorithms. The main focus of the study is to investigate the reproducibility of applications performed in several other EU states. In addition to that, we explore possible state-of-art techniques in order to improve the published results. Lastly, the application concentrates on CIS 2019 whereas we identify various scalability aspects to all Flemish businesses. The empirical results indicate that the business websites hold valuable information which can be used to classify businesses whether innovative or not. Compared to the baseline study by Statistics Netherlands, we have obtained similar prediction results [9] [10]. The traditional pipeline from text processing to binary classifiers resulted in 0.90 accuracy and 0.80 F1 score. On the other hand, the transformers under deep learning techniques achieved 0.91 accuracy and 0.90 F1 score. The significant rise of the F1 score stemmed from the improvement in the recall of non-innovative companies. Despite the increase in evaluation metrics, the deep learning techniques offer considerably less interpretability. In the context of the official statistics, the accountability and transparency of the estimates are essential. Hence, a sub-optimal logistic regression model may arguably be preferred due to its white-box nature. In the end, this thesis aims to present a complementary methodology in the estimation of official innovation statistics in Flanders. In contrast to the traditional survey approach (CIS), the modern big data application offers timely dissemination of results, cost reduction and no response burden to the businesses. On the other hand, the genesis of the supervised learning models that we evaluated depends on the labeled data from the CIS 2019. Moreover, a censuslike application needs further theoretical work which should address model bias and model degradation. Consequently, this thesis intends to lead the way in developing quality and robust statistics for Statistics Flanders by setting the baseline for a scalable approach.
Özet (Çeviri)
Özet çevirisi mevcut değil.
Benzer Tezler
- E - 80 uluslararası karayolunun Gürbulak Hudut Kapısı – Erzurum bölümünde toprakta trafik kaynaklı ağır metal kirliliğinin araştırılması
Detecting heavy metal pollution that emitted from traffic in the soils of the E-80 international highway line between Gürbulak Boarder Entrance and Erzurum
ERGÜN TATAR
Yüksek Lisans
Türkçe
2014
Fizik ve Fizik MühendisliğiAğrı İbrahim Çeçen ÜniversitesiFizik Ana Bilim Dalı
DOÇ. DR. İBRAHİM HAN
- Staphylococcus aureus suşlarında metisilin direncinin klasik yöntemler ve moleküler yöntemlerle araştırılması
Detecting of methicillin resistance of staphylococcus aureus isolates by classical and molecular methods
HAYRUNİSA HANCI
Doktora
Türkçe
2015
MikrobiyolojiAtatürk ÜniversitesiTıbbi Mikrobiyoloji Ana Bilim Dalı
PROF. DR. AHMET AYYILDIZ
- Design of mixed-mode building blocks for an electronic stethoscope IC
ELEKTRONİK STETESKOP İÇİN KARMA MODLU YAPI TAŞLARININ TASARIMI
ALAATTİN ALPER KÜRK
Yüksek Lisans
İngilizce
2024
Elektrik ve Elektronik MühendisliğiBoğaziçi ÜniversitesiElektronik Mühendisliği Ana Bilim Dalı
PROF. DR. ZEYNEP YASEMİN KAHYA
- Yapay görme ile sürücü yorgunluk durumunun tespit edilmesi
Detecting driver fatigue with artificial vision
ALİ AKİN
Yüksek Lisans
Türkçe
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolGebze Teknik ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. HABİL KALKAN