Geri Dön

Detecting flemish innovative companies using web scraping

Başlık çevirisi mevcut değil.

  1. Tez No: 770319
  2. Yazar: NUSRET İPEK
  3. Danışmanlar: PROF. JAN DE SPİEGELEER
  4. Tez Türü: Yüksek Lisans
  5. Konular: İstatistik, Statistics
  6. Anahtar Kelimeler: Belirtilmemiş.
  7. Yıl: 2021
  8. Dil: İngilizce
  9. Üniversite: Katholieke Universiteit Leuven (Catholic University of Leuven)
  10. Enstitü: Yurtdışı Enstitü
  11. Ana Bilim Dalı: Belirtilmemiş.
  12. Bilim Dalı: Belirtilmemiş.
  13. Sayfa Sayısı: 57

Özet

The development in computational technology enables the application of modern methodologies to big and organic data. The producers of official statistics are experimenting with these new tools in order to construct quality and timely statistics. Nevertheless, the dynamics of new information sources bring new challenges to the process. Due to this, we investigate a particular application of modern techniques where we extract information from business websites. The traditional survey method to estimate innovation within a given region is Community Innovation Survey (CIS). Compared to the traditional approach, we utilize web scraping, text mining, machine learning and deep learning algorithms. The main focus of the study is to investigate the reproducibility of applications performed in several other EU states. In addition to that, we explore possible state-of-art techniques in order to improve the published results. Lastly, the application concentrates on CIS 2019 whereas we identify various scalability aspects to all Flemish businesses. The empirical results indicate that the business websites hold valuable information which can be used to classify businesses whether innovative or not. Compared to the baseline study by Statistics Netherlands, we have obtained similar prediction results [9] [10]. The traditional pipeline from text processing to binary classifiers resulted in 0.90 accuracy and 0.80 F1 score. On the other hand, the transformers under deep learning techniques achieved 0.91 accuracy and 0.90 F1 score. The significant rise of the F1 score stemmed from the improvement in the recall of non-innovative companies. Despite the increase in evaluation metrics, the deep learning techniques offer considerably less interpretability. In the context of the official statistics, the accountability and transparency of the estimates are essential. Hence, a sub-optimal logistic regression model may arguably be preferred due to its white-box nature. In the end, this thesis aims to present a complementary methodology in the estimation of official innovation statistics in Flanders. In contrast to the traditional survey approach (CIS), the modern big data application offers timely dissemination of results, cost reduction and no response burden to the businesses. On the other hand, the genesis of the supervised learning models that we evaluated depends on the labeled data from the CIS 2019. Moreover, a censuslike application needs further theoretical work which should address model bias and model degradation. Consequently, this thesis intends to lead the way in developing quality and robust statistics for Statistics Flanders by setting the baseline for a scalable approach.

Özet (Çeviri)

Özet çevirisi mevcut değil.

Benzer Tezler

  1. Resim sanatında iç mekânda natürmort

    In painting, in the interior: Still life

    SERPİL AKBAŞ

    Yüksek Lisans

    Türkçe

    Türkçe

    2023

    Güzel SanatlarŞırnak Üniversitesi

    Resim Ana Sanat Dalı

    PROF. DR. İSMAİL ATEŞ

  2. E - 80 uluslararası karayolunun Gürbulak Hudut Kapısı – Erzurum bölümünde toprakta trafik kaynaklı ağır metal kirliliğinin araştırılması

    Detecting heavy metal pollution that emitted from traffic in the soils of the E-80 international highway line between Gürbulak Boarder Entrance and Erzurum

    ERGÜN TATAR

    Yüksek Lisans

    Türkçe

    Türkçe

    2014

    Fizik ve Fizik MühendisliğiAğrı İbrahim Çeçen Üniversitesi

    Fizik Ana Bilim Dalı

    DOÇ. DR. İBRAHİM HAN

  3. Staphylococcus aureus suşlarında metisilin direncinin klasik yöntemler ve moleküler yöntemlerle araştırılması

    Detecting of methicillin resistance of staphylococcus aureus isolates by classical and molecular methods

    HAYRUNİSA HANCI

    Doktora

    Türkçe

    Türkçe

    2015

    MikrobiyolojiAtatürk Üniversitesi

    Tıbbi Mikrobiyoloji Ana Bilim Dalı

    PROF. DR. AHMET AYYILDIZ

  4. Design of mixed-mode building blocks for an electronic stethoscope IC

    ELEKTRONİK STETESKOP İÇİN KARMA MODLU YAPI TAŞLARININ TASARIMI

    ALAATTİN ALPER KÜRK

    Yüksek Lisans

    İngilizce

    İngilizce

    2024

    Elektrik ve Elektronik MühendisliğiBoğaziçi Üniversitesi

    Elektronik Mühendisliği Ana Bilim Dalı

    PROF. DR. ZEYNEP YASEMİN KAHYA

  5. Yapay görme ile sürücü yorgunluk durumunun tespit edilmesi

    Detecting driver fatigue with artificial vision

    ALİ AKİN

    Yüksek Lisans

    Türkçe

    Türkçe

    2024

    Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolGebze Teknik Üniversitesi

    Bilgisayar Mühendisliği Ana Bilim Dalı

    DOÇ. DR. HABİL KALKAN