Generative adversarial networks in computer vision applications

Bilgisayarli görü uygulamalarinda çekişmeli üretici ağlar

PDF İndir

Tez No: 658650
Yazar: SEMİH ÖRNEK
Danışmanlar: PROF. DR. ENDER METE EKŞİOĞLU
Tez Türü: Yüksek Lisans
Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Elektrik ve Elektronik Mühendisliği, Computer Engineering and Computer Science and Control, Electrical and Electronics Engineering
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2021
Dil: İngilizce
Üniversite: İstanbul Teknik Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Elektronik ve Haberleşme Mühendisliği Ana Bilim Dalı
Bilim Dalı: Telekomünikasyon Mühendisliği Bilim Dalı
Sayfa Sayısı: 97

Özet

Çekişmeli üretici ağlar, derin öğrenme tabanlı yöntemler kullanan üretken modelleme metotlarından biridir. Çekişmeli üretici ağların üretken modelleme eğitme yöntemleri içinde en iyi başarım sağlayan yöntem olduğu düşünülmektedir. Çekişmeli üretici ağlar iki ağdan oluşmaktadır. Birincisi üretici ağ, ikincisi ayrıştırıcı ağdır. Üretici ağın misyonu, ayrıştırıcı ağ için gerçek verilerden ayırt edilemeyen sahte veriler oluşturmaktır. Ayrıştırıcı ağın misyonu, gerçek verileri üretici ağ tarafından üretilen sahte verilerden ayırmaktır. Çekişmeli üretici ağlarda eğitilmesi gereken iki farklı yapay sinir ağı mimarisi vardır. Çekişmeli üretici ağlar, ayrıştırıcı ağ eğitimini ve üretici ağ eğitimini birlikte yürütmelidir. Üretici ağın eğitimi ve ayrıştırıcı ağın eğitimi birbirlerinden geri bildirimler aldığından, tek bir yinelemede dönüşümlü olarak eğitim sağlanmaktadır. Üretken modeller, denetimsiz makine öğreniminin bir dalıdır. Ancak üretken modellemeye dayanan çekişmeli üretici ağı mimarisinin eğitimi, denetimli makine öğreniminin konusu olarak kabul edilir. Bu tez, bilgisayarlı görü problemlerinin çözümü için birçok çekişmeli üretici ağ mimarisini incelemektedir. Bu bilgisayarlı görü problemleri sahte görüntülerin üretimi, görüntülerin çözünürlüğünün arttırılması ve görüntülerdeki gürültülerin yok edilmesidir. Tezde, çekişmeli üretici ağları kullanan üç farklı bilgisayarlı görü uygulaması incelenmiştir. Sonuçlar, çekişmeli üretici ağların bu belirli problemler için çok etkili bir şekilde kullanılabileceğini göstermektedir. Çekişmeli üretici ağlardan önce, üretken modellemeyi kullanan başka mimariler de sunulmuştur. Ancak çekişmeli üretici ağların geliştirilmesiyle, bilgisayarlı görü problemlerini çözmek için üretken modellemeyi kullanan diğer mimariler giderek popülerliğini yitirmiştir ve çekişmeli üretici ağlara göre daha etkisiz kalmışlardır. Ayrıca, bilgisayarlı görü problemlerini çözmek için kullanılan görüntü işleme tekniklerinden bazıları da kullanım dışına itilmiştir. Son birkaç yıla kadar çekişmeli üretici ağları kullanarak görüntülerin çözünürlüğünü arttırmak için fazla strateji yoktu. Derin öğrenmedeki gelişmelere paralel olarak, bu probleme yönelik araştırmalar giderek gelişmeye başladı. Yeni süper çözünürlük modelleri oluşturuldu ve bu modeller üzerinde farklı öğrenme yöntemleri uygulandı. Ancak bu öncül yöntemler herhangi bir cihazın kamerasından doğrudan alınan görüntülerde başarısız oldu. Süper çözünürlük için derin öğrenme modellerini eğitmenin en yaygın yöntemi, en yakın komşu yeniden örnekleme, çift kübik yeniden örnekleme ve çift doğrusal yeniden örnekleme gibi yöntemlerle veri kümesinin içindeki görüntülerin ölçeğini küçültmekle başlamaktadır. Bu işlem, yüksek çözünürlüklü ve düşük çözünürlüklü eğitim görüntü çiftleri içeren bir veri kümesi oluşturmayı sağlar. Bu işlemden sonra elde edilen düşük çözünürlüklü görüntülerde gürültü büyük miktarda azalır yani görüntüler aynı zamanda temizlenmiş olur. Süper çözünürlüklü derin öğrenme modellerinin temel amacı görüntülerin çözünürlüğünü artırmaktır. Bu probleme yönelik olarak tez kapsamında çekişmeli üretici ağı mimarilerinden olan ESRGAN gerçeklenmiştir. ESRGAN, SRGAN mimarisinin geliştirilmiş bir versiyonudur. Ana amacı, giriş görüntülerine kıyasla daha yüksek çözünürlüklü görüntüler üretmek için görüntülerin çözünürlüğünü arttırmaktır. Çekişmeli üretici ağlarla sahte görüntü üretme konusu Ian Goodfellow tarafından 2014 yılında ortaya atıldı. Bu kapsamda çekişmeli üretici ağ eğitim veri kümeleriyle eğiterek, verisetlerinde olmayan yeni görüntülerin üretilmesi sağlandı. Bu üretilmiş görüntüler, eğitim veri kümesinden alınmış gibi görünmesine rağmen aslında benzersizdirler ve tamamen sıfırdan üretilmişlerdir. Tez kapsamında bu problem için iki farklı çekişmeli üretici ağı mimarisi gerçeklenmiştir. Bu mimarilerden bir tanesi olan DCGAN, basit çekişmeli üretici ağı yapısının geliştirilmiş bir versiyonudur. Tez kapsamında sahte görüntü üretme için gerçeklenen ikinci GAN tabanlı yöntem ise yöntem BigGAN üretici ağını içermektedir. BigGAN ağ mimarisi ResNet ağ yapısına dayanmaktadır. BigGAN mimarisi görece olarak daha büyük çekişmeli üretici ağların eğitimini mümkün kılmaktadır. BigGAN mimarisinin kullanımı ağların önceki mimarilere göre dört kata kadar daha fazla parametre ve sekiz kata kadar daha fazla yığın boyutu ile eğitilebilmesini mümkün kılmıştır. Sonuç olarak üretilen görüntüler, çekişmeli üretici ağı eğitmek için kullanılan veri kümesindeki gerçek giriş görüntülerinden neredeyse ayırt edilemez görünmektedir. Gürültü giderme, en popüler görüntü işleme problemlerinden bir tanesidir. Bu problemi çözmek için çok farklı yöntemler sunulmuştur. Görüntülerde gürültü gidermeye yönelik olarak sunulmuş çok sayıda geleneksel görüntü işleme yöntemi literatürde yer almaktadır. Bunlar arasında doğrusal filtreler, doğrusal olmayan filtreler, uyarlanabilir filtreler ve yerel olmayan yöntemler örnek olarak verilebilir. Yakın zamanda literatürde sunulan bir araştırmada, gürültü dağılımını öğrenmek için bir çekişmeli üretici ağın kullanımı önerilmiştir. Bu adımın ardından görüntüleri gürültüden arındırmak için klasik evrişimli sinir ağı kullanılmıştır. Literatürde, görüntülerde gürültü giderme problemine özel olarak uyarlanmış bir çekişmeli üretici ağı mimarisi yer almamaktadır. Literatürde bu problem için, SRGAN benzeri bir mimarinin kullanılması önerilmiştir. Sunulan benzetim sonuçları SRGAN mimarisinin sadece görüntü çözünürlüğünü iyileştirmek için değil, aynı zamanda görüntülerdeki gürültüyü giderme problemine yönelik olarakta iyi sonuçlar verdiğini göstermiştir. Tez kapsamında ise DCGAN mimarisinin görüntülerde gürültü giderme için kullanımı incelenmiştir. DCGAN mimarisi sahte görüntü oluşturma, görüntülerin çözünürlüğünün arttırılması ve görüntülerin bulanıklığının giderilmesi gibi uygulamalarda kullanılmıştır. Tez kapsamında yapılan çalışma ile bu yapının görüntülerde gürültü giderme problemine yönelik olarak da kullanılabileceği anlaşılmıştır. Günümüzde derin öğrenme yöntemleri geleneksel görüntü işleme tekniklerinden daha popüler hale gelmişlerdir. Derin ağlar çok büyük veri setleri kullanarak eğitimi ve analitik yöntemlerle ulaşılamayan özniteliklerin çıkarılmasını sağlamaktadır. Tez kapsamında üç farklı bilgisayarlı görü problemine yönelik olarak çekişmeli üretici ağ derin öğrenme mimarilerinin kullanımı incelenmiştir. Literatüre yeni kazandırılmış olan çekişmeli üretici ağ mimarilerinin eğitilmesi göreceli olarak kolaydır. Çekişmeli üretici ağlar için eğitim, doğrulama ve test kısımları çok benzerdir. Bu kısımların aralarındaki tek fark kullanılan kayıp fonksiyonları ve optimizasyon yöntemleridir. Tez kapsamında yapılan gerçeklemeler, görüntü çözünürlük yükseltme, sahte görüntü üretme ve görüntülerde gürültü giderme uygulamaları için çekişmeli üretici ağların başarıyla kullanılabileceğini göstermiştir.

Özet (Çeviri)

Generative Adversarial Networks (GANs) are one of the examples of generative modelling which uses deep learning (DL) based methods. It is considered that GANs are the best way to train a generative model. GANs consist of two parts. The first one is the generator and the second one is the discriminator. Generator's mission is to create fake data that is indistinguishable from the real data for the discriminator. Discriminator's mission is to distinguish the real data from the fake data that has been generated by the generator. Generative adversarial networks have two different neural network architectures to train. GANs should run discriminator training and generator training together. Because the generator training and the discriminator training heavily rely on each other, they are trained alternatingly in one iteration. Generative models are a branch of unsupervised machine learning, but the training of the GAN architecture, which relies on generative modelling, is considered as supervised machine learning. This thesis demonstrates that generative adversarial network architectures can effectively tackle important computer vision problems. These problems include the generation of fake images, super-resolving images and denoising of noisy images. In this thesis, we studied three different computer vision applications which use generative adversarial networks for solution. The results indicate that GANs can be very effectively used for these particular problems. Before GAN there were other architectures which used generative modelling as well, but with the founding of GAN those architectures that used generative modelling to solve the computer vision problems became pretty much insufficient. Also, some of the image processing techniques that were used to solve computer vision problems also fell out of use. There were not many strategies to super-resolve images by using GANs until the last few years. Deep learning started to come handy for solving this task and the research for this problem started to grow. The super-resolution models are created, and different learning methods applied to these models to solve this task, but most of them failed on real world images which are taken by devices such as smartphones. The most widespread method for training the super-resolution deep learning models starts by downscaling the images that are inside the dataset with methods like nearest neighbor resampling, bicubic resampling and bilinear resampling. This process is applied in order to make a dataset that contains high-resolution and low-resolution training image pairs. The low-resolution images that are created by this process have almost no noise, in other words the images are clean. The main purpose of these super-resolution deep learning models is to increase the resolution of the images. Enhanced Super Resolution Generative Adversarial Network (ESRGAN) has been used as a GAN architecture for this problem. ESRGAN is an improved version of the Super Resolution Generative Adversarial Network (SRGAN) architecture. As the name SRGAN suggests, it uses a deep neural architecture with an adversarial network. Its main purpose is to super resolve the images to produce higher resolution images compared to the input images. Fake image generation subject was started by Ian Goodfellow back in 2014. He and his colleagues founded the Generative Adversarial Network theory. Images have been generated from the datasets by using GAN architecture. These generated images look as if they are from the training dataset, but they are unique on their own because there are no such generated images in the training dataset. For this problem two different GAN architecture have been used. In 2015, Deep Convolutional Generative Adversarial Networks (DCGAN) were developed. It is a better version of the simple GAN. One more important thing that had come with DCGAN is the fact that traversing through the latent space of the generated image and changing the values in latent space dimensions can change the generated image drastically. For example, with using vector arithmetic in the latent space of the generated images, a new generated image can be produced. In 2018, BigGAN model was proposed. ResNet GAN architecture has been used for the BigGAN model. BigGAN benefitted from scaling and it provided bigger generative adversarial networks and larger batch sizes. Neural networks have been trained with two or four times more parameters and eight times more batch size then the previous implementations. As a result, the generated images looked indistinguishable from the real input images from the dataset that have been used to train the GAN. Deep learning methods have been used to tackle image processing problems for quite some time. Denoising is one of the most known image processing problems to this date. There are different methods to attack this problem. The most traditional ones are the image processing methods. Denoising the images with the linear filters, non-linear filters, adaptive filters can be given as an example for the traditional image processing techniques. CNN architectures have been proposed to tackle this problem. In a recent method, a GAN was trained to learn the noise distribution and then CNN was used to denoise the images. In the literature there are no GAN architectures that are solely adapted to image denoising problem. For this problem, using an SRGAN like architecture was proposed. It has been understood that using the SRGAN like architecture not only works for improving the image resolution but to denoise the images as well. Also, it has been shown that DCGAN architecture can be used in more than one image processing problem. It has been shown that this architecture can be used not only for fake image generation, but also for problems such as image denoising, super resolving the images and deblurring the images as well. Nowadays deep learning methods are much more popular than the traditional image processing techniques, because these methods can learn from the datasets, create their own features, and preserve the image details better because of the learning aspect. Updated form of the DCGAN architecture has been proposed and used for the denoising problem as a GAN architecture. In the generator of the neural network architecture some changes have been made to generate bigger images and also to reach better performance. Image size is kept the same while going forward in the layers of the generator but, the channel size is changed. An extra hidden layer was added to the generator to make the neural network denser and to keep the image size the same. Some changes have been made on the discriminator because of the changes in the generator. From the practical standpoint, recent GAN architectures with the improved optimization techniques are easy to train, and the results are getting more accurate. The training, validation and testing parts are almost the same. The only differences are the loss functions and the optimizers. The procedure for making the dataset ready for training differs for each problem. Different pre-processing techniques and normalization techniques are used on those datasets. Also, the GAN part looks the same with two distinct networks, one being the generator and the other one being the discriminator. Although these two architectures do the same job every time, where the generator tries to generate fake images and the discriminator tries to distinguish the generated fake images from the real images, these architectures get changed from one problem to another to tackle the particular characteristics of the problem.

Benzer Tezler

Tez No
865196
Üretken çekişmeli ağ tabanlı tek görüntü üretim modellerinin tasarımı
Design of single image generation models based on generative adversarial networks
EYYÜP YILDIZ
Doktora
Türkçe
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Üniversitesi-Cerrahpaşa
Bilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. SELÇUK SEVGEN
DOÇ. DR. MEHMET ERKAN YÜKSEL
Tez No
717024
Satellite images super resolution using generative adversarial networks
Uydu görüntülerinde çekişmeli üretici ağ kullanarak süper çözünürlük
MARYAM SERDAR
Yüksek Lisans
İngilizce
2022
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
İletişim Sistemleri Ana Bilim Dalı
PROF. DR. AHMET HAMDİ KAYRAN
Tez No
542575
Human activity recognition using deep learning
Derin öğrenme ile insan aktivitesi tanıma
MURAT YALÇIN
Yüksek Lisans
İngilizce
2018
Elektrik ve Elektronik Mühendisliği İstanbul Teknik Üniversitesi
Elektronik ve Haberleşme Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ HÜLYA YALÇIN
Tez No
633721
Words as art materials: Generating paintings with sequential generative adversarial networks
Sanat materyali olarak kelimeler: Seri üretici çekişmeli ağlar ̇ile sanatsal resim üretimi
AZMİ CAN ÖZGEN
Yüksek Lisans
İngilizce
2020
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. HAZIM KEMAL EKENEL
Tez No
720406
Synthesization and reconstruction of 3d facesby deep neural networks
Başlık çevirisi yok
BARİS GECER
Doktora
İngilizce
2020
Biyoteknoloji University of London
DR. STEFANOS ZAFEİRİOU

Geri Dön