Cisim tanıma problemine yapay sinir ağlarının uygulanması

Application of artificial neural networks to object recognition

Tez No: 83130
Yazar: ATİLLA ÜSTÜN
Danışmanlar: PROF. DR. A. TALHA DİNİBÜTÜN
Tez Türü: Yüksek Lisans
Konular: Makine Mühendisliği, Mechanical Engineering
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 1999
Dil: Türkçe
Üniversite: İstanbul Teknik Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Makine Mühendisliği Ana Bilim Dalı
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 72

Özet

CİSİM TANIMA PROBLEMİNE YAPAY SİNİR AĞLARININ UYGULANMASI ÖZET Görsel cisim tanıma, makina ya da bilgisayar görmesinin önemli bir parçasıdır. Gerçek dünyadan elde edilen görüntülerde bilgisayarın daha önceden bildiği bir cismin var olup olmadığı, varsa hangi konum ve oryatıtasyonda bulunduğu sorulan temel problemi oluşturmaktadır. Bu çalışmadaki öncelikli amaç görüntü üzerinde bütünüyle gözüken üç boyutlu cisimlerin tanınmasıdır. Daha sonra kısmen defofrme olmuş cisimlerin tanınması hedeflenmiştir. Başka cisimler tarafindan örtülmüş cisimlerin tanınması problemi son olarak ele alınmıştır. Problem tamamen bilgisayar içinde oluşturulmakta, kamera ve cisimler bir katı modelleme ve animasyon programı kullanılarak simüle edilmektedirler. Bu programda oluşturulan cisimler, kendi etraflarında belirli açılarla döndürülerek, bu açılardaki görüntüleri kayıt edilmektedir. Bu görüntüler, hazırlanan bir görüntü işleme programı tarafından işlenerek basit sayısal verilere dönüştürülmektedir. Bu veriler cisimleri temsil etmekte ve cisimlerin bir yapay sinir ağı ile sınıflandırılması için kullanılmaktadırlar. Kullanılan görüntü işleme programı temel görüntü işleme algoritmalarını barındırmaktadır. Bunlar arasında kenar belirleme, eşiklendirme ve kontur oluşturma gibi görüntünün temel özelliklerini ortaya çıkartan algoritmalar bulunmaktadır. Cisimlerin konturian çıkarıldıktan sonra moment envaryantlan adı verilen ve cismi öteleme ve rotasyondan bağımsız olarak temsil edebilen yedi adet sayı hesaplanır. Bu sayılar cismin farklı açılardan çekilen resimleri için farklıdırlar. Bundan sonra bir miktar deforme olmuş cisim görüntüleri için de bu işlemler tekrarlanır. Bu şekilde tanınmak istenen her model için bir moment envaryantlan kümesi oluşturulur ve bunlar yardımı ile bir geri yayılım ağı(Back Propagation Neural Net) eğitilir. Bu ağ farklı konfigürasyonlar için eğitim verileri dışındaki görüntülerden oluşan deneme verileri ile denenerek ideal konfigürasyona getirilir. Bundan sonra ağ, deforme olmuş cisimler için denenerek performansının artırılması için farklı eğitim verileri kullanılarak eğitilir. Son olarak kısmen gözüken cisimlerin tanınması üzerinde durulmuştur. Bu amaçla cisimlerin parçalan eğitim verisi olarak kullanılarak bir geri yayılım ağı eğitilir ve üzeri örtülmüş cisim görüntüleri ile denenir. Sonuç olarak yapay sinir ağı belirli oranda deforme olmuş cisimler içinbaşanlı sonuçlar vermektedir. Kısmen gözüken cisimlerde ise örtülü kısmın yerive miktarına bağlı olarak ağın başansı değişmektedir. Az örtülmüş cisimler yüksektanınma oram verirken, örtülme miktan ile bu basan hızla düşmektedir. Yine de oluşturulan ağlarm genelde olumlu sonuçlar verdiği gözlenmektedir. vııı

Özet (Çeviri)

APPLICATION OF ARTIFICIAL NEURAL NETWORKS TO OBJECT RECOGNITION SUMMARY 1. INTRODUCTION For any living creature, being able to visually perceive and react to its environment is vitally important A machine that is capable of imitating such ability will have great advantages over a machine with no such capability. It will certainly be able to accomplish, its tasks with greater efficiency and may even succeed in tasks that it could never have succeeded in before. The arsa of research in which emulation of the visual abilities of living creatures is studied is named as computer vision. Computer vision has become an area of great interest and a great deal of research has been done in the last few decades. It can be applied to a wide range of areas. In a robotics application, a robot arm may be expected to locate and recognise industrial parts or a mobile robot may have to interpret its environment in order to follow a suitable path. In a military application automatic target recognition may be the problem. In another application recognition of hand written characters may be the case. This study is mainly concerned with visually recognising of three dimensional solid objects. Whatever the application may be the first step to take is to describe and represent the objects in the images. The features that will be used to represent objects may be divided into two different groups. The type of features to be used is application dependent If isolated, completely visible objects are in consideration then global features such as area, size, shape descriptors, moments, Hough transform may be used to represent objects. These features are robust with regard to noise but are not effective when partially occluded objects are the case. Local features such as corners, holes, curve and line segments are used in the case of overlapping objects. 2. TRANSFORMATION OF IMAGES INTO DATA STRUCTURES The success of the classification process depends mostly on how properly the images are represented. The images need to be transformed into certain structures which will contain enough information that will enable the classifier to recognise the image. These structures are also called feature vectors and are expected to possess qualities such as: 1) Containing great amount of information while being small in dimension. 2) Being independent of scale, rotation about optical axis, and translation. For our application it is thought that a set of moment invariant functions are suitable for representing images. The central moments of an image represented by a matrix of l's and O's can be given as IXl^^l/NEiCui-uTCvi-Vr (LI) u' and v' are the mean values of the coordinates u and v of the image respectively. A set of moment invariants can be derived using only second and third order moments thus forming a suitable way of representing images. Only the pixels on the boundary of the object are used to calculate moment invariants. So the feature vector contains seven moment invariant functions. These functions are invariant to rotation and translation. The objects in consideration are three dimensional solid structures and are expected to be recognised from different angle of views. The objects can be rotated about three independent axis. These can be defined as follows: Let XYZ be a coordinate system and XY is the image plane where X is the horizontal axis and Y is the vertical axis. Then the Z axis is parallel to the optical axis. Let xyz be another coordinate system attached to the solid object These two coordinate systems are thought to be initially aligned. The object is rotated about the any of the axis attached to it and a different image is formed. The rotation about the optical axis produces no change in the size and shape of the image formed so the moment functions are invariant with the rotation about this axis. The distance along optical axis causes the image to be scaled by some factor determined by the distance of image plane to the object. As mentioned in [19] the radius of gyration of any planar figure is directly proportional to the size of the image and inversely proportional to the distance of the object along optical axis. So the product of these two factors is constant Using this information the moment functions forming our feature vector can be normalised in order to get invariance with the distance along optical axis. The objects are assumed not to be rotated about the x axis in order to speed up the training set construction process. Also the distance along the optical axis is assumed to be constant or initially known. Otherwise the first moment in the feature vector should not be used in the recognition process. 3. CONSTRUCTION OF THE TRAINING and TEST DATA For the purpose of recognition every object needs a training set which will be used to train the classifier. The training set is constructed with the images formed by rotating the object about any of the axes defined earlier in section 2. The only rotation that will be taken into consideration is the rotation about the y axis of the coordinate system attached to the object The object coordinate system is initially aligned to the XYZ coordinate system and the object is rotated by 6° increments every step until it is 1 80° rotated At every step the 128 grey level image is transformed into a contour image of l's and 0's and the seven moment functions, which constitute our feature vector, are calculated The problem dealt in this work may be classified into three sections. First step of the research is concerned with the classification of completely visible undeformed objects. Second step is the separation of completely visible partially deformed objects. And finally we will deal with the recognition of partially occluded solid objects. So it is most probable that three different training sets will be needed to accomplish three different recognition tasks.Classifying 3-D, completely visible solid objects seems to be the easiest problem and two different object models will be used to form the training set. Our models include a teapot and a statue of a bird As explained above the objects will be rotated about x axis and after extracting the contour of the image the 7 moment functions will be calculated and included in the training set which will be called training set A. The second task of recognising deformed objects as well as undeformed ones is a harder task and the training set used in solving the first problem may not be sufficient. So incorporating the data obtained from partially deformed objects may be necessary. Object images of partially deformed teapot and bird statue models are obtained by rotating the objects about x-axis and are preprocessed and included in the training set A to form the training set B. The third task stated above will also need a specialised recogniser trained by an extended training data set Every object is formed from a body and certain parte. The training set is formed showing only one part of the object together with the body in the scene. These images are processed to calculate the moment functions and included in the training set A to form training set C. The construction of the training data for each task will be explained in the next section, where the classifier is constructed, in detail. The recognisers formed for each recognition task will be tested with test data to get a performance measure for the recogniser. The test data for the first task stated above will be formed of 20 images of undeformed teapot and bird statue objects. These objects will not be deformed and images will be obtained by rotating them about x axis with different angles than the training data. Exemplary images from training and test data are given in appendix A. The test data for the second task stated above is formed by objects deformed by a certain amount assuring that they are deformed slightly more than the objects used for training. The sampling angle is the same as the first task, which is 6°. Examples of training and test data for the second task can be found in appendix A. Test data for the last problem stated above, which is recognition of partially visible objects, is formed by partially occluding the object in the scene by another object, such as a box or cylinder. In every test image a different part of object is covered by a different amount Such images can be found in appendix A. The test sets for the first, second, and third tasks will be revealed as test set A, B, and C respectively. 4. CONSTRUCTING THE RECOGNISER A neural net classifier will be applied to each of the three different problems stated in section 3. The goal is to find the simplest neural network model with the highest performance. First of all linear neural models will be applied to see if the problem is linearly solvable. Then more complex neural structures and learning algorithms may be applied in order to get a higher performance. 4. 1 Classification of Undeformed Objects. First of all a single layer neural net with 7 inputs and one output is constructed. The net has 7 inputs because the objects are represented by 7 moment XIfunctions. This structure is trained with perceptron and delta learning rules. The training data used is the training data A which is formed as explained in section 3. As a result both training rules have proven to be unsuccessful to classify these objects. Search for a linear solution will be stopped here and nonlinear models will be applied to the problem. For a nonlinear solution, one of the most popular neural structures the back propagation algorithm is applied. A two layered feed forward net with 7 inputs, suitable number of hidden neurons and 2 outputs is formed. The net has 7 inputs because our feature vector consists of 7 elements and the number of hidden neurons will be altered to get the highest performance. After every training process the net is tested with the test set A and its performance is recorded. Then the number of hidden neurons and the type of activation functions of the hidden and output layers are altered and the net is trained again. This process is carried until the performance of the net stops increasing. At the end of this train and test procedure the nets configuration is formed as follows: 5 hidden neurons with tangent-sigmoid activation functions, logarithmic activation functions in the output layer. The error goal has been set to 0.02. The net converges in 848 epochs when trained with fast back propagation and converges in 132 epochs when trained with Back Propagation(BP) using Levenberg- Marquardt optimization. The net trained with Levenberg-Marquardt optimisation has performed to classify 19 of the 20 test inputs correctly which is the highest score obtained among all configurations. 4.2 Classification of Deformed Objects The Back Propagation net used in the previous section to classify undeformed objects is used to classify partially deformed objects. The test set used is the test set B defined in section 3 and consists of stretched or bended objects. The net misclassified 8 out of 20 inputs. In order to increase the rate of true classifications the training set is extended to include deformed object images thus forming the training set B as defined in section 3. The net trained with training set B using BP with Levenberg-Marquardt optimisation has performed to classify 18 out of 20 inputs from test set B correctly. 4.3 Classification of Partially Visible Objects. The classifiers used in the first and second parts of the problem are tested to classify partially visible objects and failed to do so. It is certain that new training data is needed to accomplish the third and final task. In constructing the training set the fact that every object is formed of certain parts will be taken into consideration, For example the teapot model we classified in the first and second problems, has a body, a spout, a lid, and a handle. If the classifier will be able to recognise each of these parts separately then if any of these parts is detected in the scene, the teapot also will be detected. XHInitially we will concentrate only on the teapot model. We have formed three different sub models, related to the teapot which are a body and spout a body and lid, and a body and handle. If any of the parts are occluded then the teapot may be detected with the help of the visible parts. Different feed forward net configurations and different training algorithms are used to classify these models. The neural configurations with the best results are given below. A feed forward net trained with Back Propagation with Levenberg-Marquardt optimisation, having 1 hidden layer with 6 hidden neurons, 3 outputs(each output should signal a 1 if a related model is detected in the scene or else a 0 should be signalled.) and 7 inputs with an error goal of 0.02 gave the following results for different activation functions: 1) Activation functions of tangent sigmoid and logarithmic sigmoid in the hidden and output layers respectively converged in 43 epochs. 2) Activation functions of tangent sigmoid and linear in the hidden and output layers respectively converged in 57 epochs. 3) Activation functions of logarithmic sigmoid and linear in the hidden and output layers respectively converged in 80 epochs. These configurations were tested with the test set C and formed 14, 11, 13 misclassifications respectively. It is clear that this approach is not suitable to form a suitable solution to the problem. The main objective is to detect the teapot itself and not the parts individually. This gives us the opportunity to use each part to make a strengthening effect on the output of the net which is related to the original solid object. To test the method a bird statue model is formed in addition to the teapot The parts of the bird such as head or tail are used to form submodels. The images of the original models and the submodels are processed to form the training set C as defined in section 3. After the training set is formed a feed forward net is constructed. The net will have 7 inputs where the moment functions describing the objects will be fed. It will have a hidden layer with a suitable number of hidden neurons and certain types of activation functions, which will be determined by the performance of the net. BP with Levenberg-Marquardt optimisation will be used to train the net The configuration with the best result is as follows: 8 hidden neurons with tangent sigmoid activation functions, 1 output neuron which will signal 1 for the teapot and 0 for the bird statue with logarithmic sigmoid activation functions, and a error goal of 0.02 has converged in 820 epochs. The test data which is used to test the network explained above is the test set C and some object images from that set is given in Appendix A. The test results are 12 correct classifications out of 20. To increase the correct classification rate the error goal which determines when to stop training is altered between 0.01 and 0.1. The highest true classification score of 17 correct out of 20 inputs is obtained with the error goal of 0.07. xin

Benzer Tezler

Tez No
243847
Çoklu silindirik hedeflerin sınıflandırılması
Başlık çevirisi yok
MUSTAFA MELİH TAŞANER
Yüksek Lisans
Türkçe
2009
İletişim Bilimleri Yıldız Teknik Üniversitesi
Elektronik ve Haberleşme Mühendisliği Ana Bilim Dalı
DOÇ. DR. AHMET KIZILAY
Tez No
139845
Yapay sinir ağlarıyla 3-D bilgisayar görmesi
3-D computer vision using artificial neural networks
MUHARREM MERCİMEK
Yüksek Lisans
Türkçe
2003
Elektrik ve Elektronik Mühendisliği Yıldız Teknik Üniversitesi
Elektronik ve Haberleşme Mühendisliği Ana Bilim Dalı
DOÇ. DR. TÜLAY YILDIRIM
Tez No
520165
Reduced dimensional features for object recognition
Nesne tanıma için boyutu indirgenmiş öznitelik vektörleri
REYHAN KEVSER KESER
Yüksek Lisans
İngilizce
2018
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
Bilişim Uygulamaları Ana Bilim Dalı
DOÇ. DR. BEHÇET UĞUR TÖREYİN
Tez No
56754
Object shape and hollowness identification by tactile sensing with a s-fingered robot hand
Cisim şekil ve boşluklarının 5-parmaklı bir robot el tarfından dokunma ile tanınması
TEOMAN PASİNLİOĞLU
Yüksek Lisans
İngilizce
1996
Elektrik ve Elektronik Mühendisliği Orta Doğu Teknik Üniversitesi
Elektrik-Elektronik Mühendisliği Ana Bilim Dalı
DOÇ. DR. M. AYDAN ERKMEN
Tez No
21906
Numerical simulation of 2-D laminar flow heat generation and forced convection from rectangular blocks in a narrow channel
Dar bir kanal içinde dikdörtgen bloklar etrafında laminer akış, ısı üretimi ve zorlanmış taşımanın 2 boyutlu benzeşimi
İBRAHİM ÖZKOL
Doktora
İngilizce
1992
Uçak Mühendisliği İstanbul Teknik Üniversitesi
DOÇ. DR. C. RUHİ KAYKAYOĞLU

Geri Dön