Active learning based human in the loop deep object detectionfor scalable data annotation

Ölçeklenebilir veri etiketlenmesi için aktif öğrenme tabanlı insan katılımlı derin nesne tespiti sistemi

PDF İndir

Tez No: 678322
Yazar: ATABERK ARMAN KAYHAN
Danışmanlar: DOÇ. DR. NAZIM KEMAL ÜRE
Tez Türü: Yüksek Lisans
Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2021
Dil: İngilizce
Üniversite: İstanbul Teknik Üniversitesi
Enstitü: Lisansüstü Eğitim Enstitüsü
Ana Bilim Dalı: Uçak ve Uzay Mühendisliği Ana Bilim Dalı
Bilim Dalı: Uçak ve Uzay Mühendisliği Bilim Dalı
Sayfa Sayısı: 104

Özet

Son on yıl gibi dönemde görsel veri setleri üzerine odaklanan derin öğrenme modelleri büyük başarılar göstererek ilerleme kaydettiler. Bu başarı da özellikle gözetimli öğrenim temelli derin sinir ağları tabanlı nesne tespit modellerinde çoğunluka büyük miktarlardaki etiketlenmiş veri setlerine dayandırılabilir. Açıklanmış ya da etiketlenmiş veri bir görsel veri seti üzerindeki her bir nesnenin tespit edilip görsel üzerinde konumlanması ve uygun nesne ismiyle eşleştirilmesi olarak tanımlanabilir. Derin nesne tespit sistemlerinin gelişimi ve devamlılığı ancak güncel etiketlenmiş veri setlerinin model eğitim aşamasına katılımıyla mümkün olabilir. Ancak veri etiketleme süreci yüksek oranlarda manuel insan katılımı gerektirmektedir. Bu nedenle de veri etiketleme süreci hem zorlayıcı, hem maliyetli hem de yavaş bir süreç olarak karşımıza çıkmaktadır. Bu tez çalışmasında insan gücü veri etiketlemesiyle aktif öğrenme temelli derin nesne tespit sistemi modellerinin birlikte kullanılabilmesi için bir sistem önerilmektedir. Bu tipteki çalışmalar özellikle nesne sınıflandırma görevleri için geçmişte sıklıkla kullanılmıştır. Ancak, obje tespit ve konumlandırma ayrımında böyle bir sistemi uygulamak daha karmaşık bir hal almaktadır. Bu metodolojiyi karmaşık kılan, bir görselde içerisinde nesne barındırdığını teklif edilen her bir nesne konumlayıcı kutu için çeşitli birleştirme yöntemleri ile nesne ve kutucuk tahminlerinin belirsizlik değerlerinin hesaplanması gerekliliğidir. Bu yöntemleri uygulamadan önce makine öğrenmesi uygulamalarında kullanılan bilgi matriksi uygulaması tanıtılmalıdır. Bu matrikste her bir veri noktasının eğitim konusundaki durumu ifade edilmektedir. Bilindiği bilinen durum makine öğrenmesi modelinin anlık durumunu ifade eder ve modelin tahminlerinden emin olduğu veri noktaları burada konumlanır. Bilinen bilinmezler makine öğrenmesi modelinin tahminlerinden emin olmadığı durumu ifade eder ve bu tez çalışmasının ana odağını oluşturan belirsizlik metriklerini kullanır. Bilinmeyen bilinirler makine öğrenmesi uygulamasına başka bir makine öğrenmesi alanından aktarılan bilgiyi ifade eder ve bu durum da yine bu tezin ana odaklarından bir tanesidir. Son olarak bilinmeyen bilinmezler the makine öğrenmesi modelindeki boşlukları ifade eder ve eğitim verisinin çeşitlendirilmesi ve buna bağlı örneklenmesi konusunu ele alır. Bu tez çalışmasının önerisi belirsizlik örneklemesi, modeller arası bilgi aktarımı ve çeşitlendirme örneklemesi metodlarının bir aktif öğrenme şemasında birleştirilmesiniaraştırmaktadır. Önerilenmetodoloji,sürekliveriakışınınolduğu veri setlerinde model eğitimini hedefleten artımlı öğrenme yöntemini içermektedir. Güncel nesne tespit sistemleri sürekli veri toplanımının yapıldığı video tabanlı canlı sistemlerde çalışmaktadır. Bu nedenle, aktif öğrenme tabanlı bir yaklaşım kullanarak etiketlenmemiş video karelerini etkilemek bu nesne tespit sistemlerini daha güncel hale getirmektedir. Artımlı öğrenme metodolojisi üç ana kısımdan oluşmaktadır. Öncül eğitilmiş nesne tespit modeli, belirsizlik skoru atama mekanizması ve küçük veri grupları için nesne tespit modeli eğitimi. Öncül eğitilmiş model olarak yeterli nesne tespiti doğruluğu sağlaması ve hızlı çalışması nedeniyle Yolo olarak seçilmiştir. Yolo modeli darknet mimarisini kullanarak çalışmaktadır ve Tensorflow ve Pytorch gibi canlı uygulamada kullanıma hazır platformlarla kullanılabilmektedir. Etiketlenmemiş ham veriler aktif öğrenme tabanlı nesne tespit modeline öncül nesne tespiti için gönderilir, nesne tespiti sırasında her bir görsel için belirsizlik puanı hesaplanır. Ardından bu belirsizlik puanına göre nesne tespit modeli için en faydalı olan veriler yani belirsizlik puanı en yüksek olan veriler insan etiketleyicilere etiketletilmek üzere dağıtılır. Son olarak da insanların etiketlediği veriler aktif öğrenme döngüsüne eğitim verisi olarak katılır. Artımlı öğrenme yaklaşımı derin nesne tespit sistemlerinin eğitim sırasındaki performansı için en gerekli veri noktalarını seçmelerini ve model eğitimi için değersiz olan veri noktalarını elemesini sağlayarak ortalama verimi artırmaktadır. Belirsizlik puanı hesaplama işlemi tanımlayıcı kutu regresyonu ve bu kutu için tahmin edilen nesne ismi dağılımlarının birlikte kullanımı ile hesaplanır. Tüm görsel üzerinde böyle bir puan elde etmek için birçok yöntem bu çalışmada ele alınmıştır. Sınıflandırma modelleri için, least confidence,ratio of confidence, entropy ve margin of confidence metrikleri incelenmiştir. Bu metriklerden herhangi biri kullanıldığında tüm görsel için bir puan belirlemek için tüm belirsizlik skor noktaları için toplama işlemine gidilmiştir. Bu puana göre daha sonra görseller dağıtılmak üzere örneklenmektedir. Bir diğer önemli konu ise video dosyası gibi arka arkaya benzer içeriklere sahip görsel karelerinden oluşan veri setlerinin belirsizlik puanıyla kullanımıdır. Benzer içerikli görseller benzer belirsizlik puanına sahip olacağından, aynı özellikteki veri noktaları model eğitiminde kullanılmak üzere işaretlenir. Ancak bu yaklaşım nesne tespit modelinin genelleştirilme kabiliyetini düşereceği gibi model eğitiminde de düşük performansa sebep olacaktır. Bu durumu engellemek için çeşitlendirme tabanlı örnekleme metodları kullanılabilir. Bu tez çalışmasında merkezi noktaların başlatılmasını sağlayan k-means++, daha geniş temsile sahip veri setinin alt setlerini kullanan core-set ve belirsizlik puanı ile çeşitlendirmeyi birlikte kullanan seyrek modelleme tezin ilgili bölümlerinde detaylıca incelenmiştir. Nesne tespit modelinin eğitimi artımlı bir şemada gerçekleştiğinden optimal veri grubu sayısının belirlenmesi model performansı için çok kritik bir seçim olmaktadır. Çünkü, yüksek sayıdaki veri grupları (batch) model performasını artırırken yüksel hesap gücü gerektirmektedir. Bu tezdeki çalışma gerçek uygulama alanlarına odaklandığı için hesaplama gücü ve model eğitim süresi öncelikli kriterler olarak belirlenmiştir. Farklı veri grubu sayıları model eğitimi sırasında denenmiştir ve en uygun sonuçlar tezin sonuçlar kısmında detaylandırılmıştır. Aktif öğrenme modeli Googe bulut üzerinde test edilmiştir. Derin sinir ağlarının eğitimindeki yazılım çerçevesi olarak PyTorch seçilmiştir. Tez çalışmasında testler boyunca VOC2012 veri seti kullanılmıştır. Öncül nesne tespit sistemi olarak kullanılan YOLOv3 öğrenim aktarımı yöntemi ile VOC2012 setini kullanarak eğitilmiştir. Aktif öğrenme şemasındaki belirlenen veri grupları belirsizlik puanları hesaplanmak üzere tahminleme modeline aktarılmaktadır. En yüksek belirsizlik puanına sahip görseller işaretlenip model eğitimine gönderilmektedir ve bu döngü tüm veri grupları tüketilene kadar devam etmektedir. Tezde öne sürülen hipotezler gerçek bir model eğitim ortamında kıyaslama seti olan VOC2012 veri seti üzerinde doğrulanmak üzere test edilmiştir. Bunu yanında standart bilgi aktarımı tabanlı nesne tespit modeli de aktif öğrenme sistemi öncesi denenmiş ve sonuçları not edilmiştir. Önerilen aktif öğrenme tabanlı sistemin sonuçları standart yöntem ile başarı metrikleri üzerinden kıyaslanmıştır. Bu çalışmadaki başarı kriterleri iki adet olarak belirlenmiştir; model performasının ölçülmesi için ortalama doğruluk ve model eğitimi sırasında kullanılan verinin etiketlenme süresi. Testler sonucunda önerilen yöntemin ortalama doğruluk metriğine göre küçük farkla stanard modelin gerisinde kalmış olup, veri etiketlemesi için gereken zamanı standart modele kıyasla 1/4 oranında azaltmıştır. Böylece ilk belirtilen amaç olan manuel insan katılımına olan ihtiyacı azaltma hipoteziyle uyumlu bir sonuç alınmıştır.

Özet (Çeviri)

The deep learning models employed on image data have achieved great success in the last decade. Especially the success behind supervised learning based deep object detection models mostly depend on the vastness of the annotated data. The annotated data can be expressed as, labelling each target object in a given data, images in the deep object detection case. The continuing and improvement of the model can be ensured by providing latest annotated data to the deep model for training. However, the process of annotating the data requires an immense amount of manual human intervention. Therefore, the data annotation process is both challenging and costly. In this thesis study, a novel proposal is made to unify human annotation with an active learning based deep object detection model. This method is used in various studies, for object classification cases. However, in the case of object detection it is more complex to apply this methodology. The reason behind the complexity is that, for each region proposed for each class in an image shall be utilized for further calculation of uncertainty metrics on each image. The knowledge quadrant used in Machine Learning applications expresses the states of the data. The known knowns represent the current model state, where the confident predictions of the model is positioned. The known unknowns are the non-confident model predictions, where uncertainty sampling, one of the focus of this thesis is performed. The unknown-knowns represents the information transferred from another domain, which indicated the transfer learning scheme, which is the other focus of this thesis. The fourth quadrant, unknown unknowns represents the gaps in the model, which make use of diversity sampling techniques. The proposition of this thesis, exploits the uncertainty sampling, transfer learning and diversity sampling within an active learning scheme. The proposed methodology includes incremental learning, which aims to train a deep object detection model using a stream of data. The latest object detection products perform on live systems, which continuously gather data using video camera installation. Therefore, exploiting gathered unlabelled video frames and performing labelling on a live active learning based system enables deep object detection models to become up to date. The incremental learning methodology consists of three main parts, an initial previously trained object detection model, uncertainty scoring mechanism, and model training. The previously trained model, which is employed for transfer learning is selected as YOLO, which provides both adequate object detection accuracy and high FPS(Frame per Second) values. The YOLO offers adequate performance, since bounding box prediction and class probability calculations are performed in single ANN. This method enables YOLO to be an optimized end-to-end training network. The YOLO runs on Darknet and can be executed on Tensorflow, which makes the network more product ready. The unlabelled raw image data is fed into an active learning based deep object detection model, then initial detection is performed by the initial pre-trained model, then uncertainty scores per image are calculated. Based on uncertainty scores, the images that are the most valuable for the model training are selected and these images are distributed for human annotation with higher priority. Finally the human annotated data is fed back into the active learning scheme as training data, followed by a batch training scheme. The incremental learning scheme, enables the deep objection detection systems to select the data that is more necessary for the model to learn and suspend the data which is less valuable for the model,thus increasing the overall efficiency. The uncertainty score calculation is performed based on bounding box regression uncertainties combined with distribution of class scores. Several methods are employed to achieve the best uncertainty metric that can measure uncertainty of the overall image. For the classification case, least confidence, ratio of confidence, entropy and margin of confidence metrics are studied. The each score metric is aggregated to obtain a single uncertainty score for each image. The other aspect that is taken into consideration is, consecutive images that occur in the video frames. Due to the fact that, the similar images can be ranked together a diversity based model is employed to prevent class imbalances during model training. The selecting strategies studies are, k-means++(KMPP), provides initialization for the centroids used in the k-means algorithm, core-set(CS), subset of data points that covers the distribution of a broader set of points, and sparse modelling, which aims to integrate both uncertainty and diversity with sparse modeling. The sampling strategies are given in detail in the continuing sections. The model training is performed on an incremental scheme, therefore selecting optimized batch size is critical as it affects the model accuracy at great scale. The trade off is that, the higher batch size increases the model accuracy, however it increases the training duration. The time that is spent on training is important as it affects the real-life applicability of the model. Different batch sizes are used during the benchmark and corresponding results are discussed in detail.The framework used for model training is PyTorch, as it is more suitable for customized developments. The YOLO model is imported as backbone and the final layers of the architecture is trained using splitted VOC2012 training data-set, to obtain initial network for prior object detection. Employing an incremental scheme, the allocated batch size of active learning test data-set is fed into the prediction model for uncertainty calculation. The annotations of top scoring images are fed into the model training and the training continues until there is no unlabelled data left. The hypotheses are validated by creating an actual model training environment. Also a standard transfer learning based object detection model Yolov3 is trained for comparison with proposed model. The success criteria for the model comparison is selected as mean average precision metric on the test data set, and total required time for the annotation of the data-set that is used at model training. The results show that the proposed model decreased accuracy performance when compared to standard model, however the proposed model decreases the total annotation time to 1/4 of the time required for standard model. Since the prior goal of this study was diminishing the manual human annotation and fasten the artificial intelligence process, the hypothesis proposal is succeeded.

Benzer Tezler

Tez No
478669
Design, control and evaluation of educational devices with series elastic actuation
Uygulamalı eğitim amaçlı seri elastik eyleyici tahrikli eğitim cihazlarının tasarımı ve denetimi
ATA OTARAN
Yüksek Lisans
İngilizce
2017
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Sabancı Üniversitesi
Mekatronik Mühendisliği Ana Bilim Dalı
DOÇ. DR. VOLKAN PATOĞLU
Tez No
387830
Design, implementation and BCI-based control of a series elastic mobile robot for home-based physical rehabilitation
Evde kullanılabilen seri elastik mobil rehabilitasyon robotunun tasarımı, uygulaması ve beyin-bilgisayar arayüzü tabanlı kontrolü
MİNE SARAÇ
Yüksek Lisans
İngilizce
2013
Mekatronik Mühendisliği Sabancı Üniversitesi
Mekatronik Mühendisliği Ana Bilim Dalı
DOÇ. DR. VOLKAN PATOĞLU
DOÇ. DR. MÜJDAT ÇETİN
Tez No
826416
Aktif öğrenme ile alan uyarlaması
Domain adaptation with active learning
EREN DURGUNLU
Yüksek Lisans
Türkçe
2023
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Kocaeli Üniversitesi
Elektronik ve Haberleşme Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ AYHAN KÜÇÜKMANİSA
Tez No
800580
Açık kaynaklı bütünleşik çoklu eklenti yöneticisi tasarımı ve uygulaması
Design and implementation of an open-source integratedmultiple plugin manager
MUSTAFA UÇAR
Yüksek Lisans
Türkçe
2023
Jeodezi ve Fotogrametri İstanbul Teknik Üniversitesi
Geomatik Mühendisliği Ana Bilim Dalı
DOÇ. DR. AHMET ÖZGÜR DOĞRU
Tez No
215732
Gemi makineleri işletme mühendisliğinde aktif eğitim uygulamaları
Active learning applications in marine engineering
MUSTAFA NURAN
Yüksek Lisans
Türkçe
2008
Denizcilik Dokuz Eylül Üniversitesi
Deniz İşletmeciliği Ana Bilim Dalı
YRD. DOÇ. DR. ENDER ASYALI

Geri Dön