Geri Dön

Detecting flemish innovative companies using web scraping

Başlık çevirisi mevcut değil.

  1. Tez No: 770319
  2. Yazar: NUSRET İPEK
  3. Danışmanlar: PROF. JAN DE SPİEGELEER
  4. Tez Türü: Yüksek Lisans
  5. Konular: İstatistik, Statistics
  6. Anahtar Kelimeler: Belirtilmemiş.
  7. Yıl: 2021
  8. Dil: İngilizce
  9. Üniversite: Katholieke Universiteit Leuven (Catholic University of Leuven)
  10. Enstitü: Yurtdışı Enstitü
  11. Ana Bilim Dalı: Belirtilmemiş.
  12. Bilim Dalı: Belirtilmemiş.
  13. Sayfa Sayısı: 57

Özet

The development in computational technology enables the application of modern methodologies to big and organic data. The producers of official statistics are experimenting with these new tools in order to construct quality and timely statistics. Nevertheless, the dynamics of new information sources bring new challenges to the process. Due to this, we investigate a particular application of modern techniques where we extract information from business websites. The traditional survey method to estimate innovation within a given region is Community Innovation Survey (CIS). Compared to the traditional approach, we utilize web scraping, text mining, machine learning and deep learning algorithms. The main focus of the study is to investigate the reproducibility of applications performed in several other EU states. In addition to that, we explore possible state-of-art techniques in order to improve the published results. Lastly, the application concentrates on CIS 2019 whereas we identify various scalability aspects to all Flemish businesses. The empirical results indicate that the business websites hold valuable information which can be used to classify businesses whether innovative or not. Compared to the baseline study by Statistics Netherlands, we have obtained similar prediction results [9] [10]. The traditional pipeline from text processing to binary classifiers resulted in 0.90 accuracy and 0.80 F1 score. On the other hand, the transformers under deep learning techniques achieved 0.91 accuracy and 0.90 F1 score. The significant rise of the F1 score stemmed from the improvement in the recall of non-innovative companies. Despite the increase in evaluation metrics, the deep learning techniques offer considerably less interpretability. In the context of the official statistics, the accountability and transparency of the estimates are essential. Hence, a sub-optimal logistic regression model may arguably be preferred due to its white-box nature. In the end, this thesis aims to present a complementary methodology in the estimation of official innovation statistics in Flanders. In contrast to the traditional survey approach (CIS), the modern big data application offers timely dissemination of results, cost reduction and no response burden to the businesses. On the other hand, the genesis of the supervised learning models that we evaluated depends on the labeled data from the CIS 2019. Moreover, a censuslike application needs further theoretical work which should address model bias and model degradation. Consequently, this thesis intends to lead the way in developing quality and robust statistics for Statistics Flanders by setting the baseline for a scalable approach.

Özet (Çeviri)

Özet çevirisi mevcut değil.

Benzer Tezler

  1. Resim sanatında iç mekânda natürmort

    In painting, in the interior: Still life

    SERPİL AKBAŞ

    Yüksek Lisans

    Türkçe

    Türkçe

    2023

    Güzel SanatlarŞırnak Üniversitesi

    Resim Ana Sanat Dalı

    PROF. DR. İSMAİL ATEŞ

  2. Bursa Ovasının tarımsal amaç dışı kullanım durumu

    The Out of agricultural purpose usage position of Bursa Plain

    SEVDA KARAPİRİM

    Yüksek Lisans

    Türkçe

    Türkçe

    1998

    ZiraatAtatürk Üniversitesi

    Toprak Ana Bilim Dalı

    PROF. DR. KORAY SÖNMEZ

  3. Diyabetik retinopati hastalığının derin öğrenme ile tespit edilmesi

    Detecting diabetic retinopathy by deep learning

    ABDÜSSAMED ERCİYAS

    Doktora

    Türkçe

    Türkçe

    2022

    Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolGazi Üniversitesi

    Bilgisayar Mühendisliği Ana Bilim Dalı

    PROF. DR. NECAATTİN BARIŞÇI

  4. Detecting and classifying fabric defects with computer-vision algorithms

    Bilgisayar-görme algoritmaları ile kumaş hatalarının tespiti ve sınıflandırılması

    FATMA GÜNSELİ ÇIKLAÇANDIR

    Doktora

    İngilizce

    İngilizce

    2022

    Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolDokuz Eylül Üniversitesi

    Bilgisayar Mühendisliği Ana Bilim Dalı

    DOÇ. DR. SEMİH UTKU

  5. Erken Neolitik Dönem'de bireysel ve toplumsal kimliklerin belirlenmesi: Aşıklı Höyük'te toplumsal cinsiyet, yaş ve kesişen kimlikler

    Detecting individual and collective identities during the Early Neolithic Period: Gender, age, and the intersectionality of identities at Aşıklı Höyük

    SERA YELÖZER KILIÇ

    Doktora

    Türkçe

    Türkçe

    2022

    Arkeolojiİstanbul Üniversitesi

    Tarih Öncesi Arkeolojisi Ana Bilim Dalı

    PROF. DR. MİHRİBAN ÖZBAŞARAN