Makine öğrenme algoritmalarıyla hatalı ürün tahmini

Prediction of defective product with machine learning algorithms

PDF İndir

Tez No: 511285
Yazar: ENES ŞANLITÜRK
Danışmanlar: PROF. DR. FERHAN ÇEBİ
Tez Türü: Yüksek Lisans
Konular: Bilim ve Teknoloji, Endüstri ve Endüstri Mühendisliği, İşletme, Science and Technology, Industrial and Industrial Engineering, Business Administration
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2018
Dil: Türkçe
Üniversite: İstanbul Teknik Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: İşletme Mühendisliği Ana Bilim Dalı
Bilim Dalı: İşletme Mühendisliği Bilim Dalı
Sayfa Sayısı: 77

Özet

Teknolojideki gelişmelerle birlikte günümüzde veri toplamak ve veri depolamak oldukça rahat bir hale gelmiştir. Hem bireysel anlamda hem de kurumsal anlamda hayatımızda günlük olarak birçok veri istemli olarak veya istemsiz olarak kaydedilip depolanmaktadır. Her verinin toplanması, kayıt altına alınması avantaj gibi gözükse de gün geçtikçe bu kadar büyük veri içinden anlamlı ve faydalı bilgiler çıkarmak zorlaşmıştır. Bu durum büyük veri (big data) kavramını tartışılır hale getirmiştir. İnsanlar için analiz edilip anlamdırılabilen veriler ancak bir anlam ve bilgi ifade eder. Bu nedenle verileri işlemek ve çözümlemek için yapay zekâ gibi özel yöntemlere gereksinim duyulmuştur. İnsanın zekasından esinlenerek aynı sistem ve kurguyu robot ve makinelerde uygulanabilir hale getirmeyi hedefleyen yapay zeka kavramı; tahmin, sınıflandırma, kümeleme gibi amaçlar için makine öğrenmesi kavramının ortaya çıkmasına sebep olmuştur. Makinelerin ve bilgisayarların da insan gibi öğrenip en iyi kararı verebilme arzusu makine öğrenmesi kavramının temel amacı olmuştur. Bu da ancak makine öğrenmesi kavramıyla bilgisayarların da insanlar gibi eğitilip öğretilerek belli bir seviyeye ulaştırılmasını gerekli kılmıştır. Makine öğrenmesi yardımıyla geçmişte var olan verilere dayanarak ileriye yönelik tahmin ve planlama yaparak daha doğru ve geçerli sonuçlar elde edilebilmektedir. Bu çalışmada, makine öğrenme algoritmalarının hatalı ürünü önceden tahminlemede kullanımı incelenmiş, Rastgele Orman (Random Forest), Basit Bayes (Naive Bayes), Destek Vektör Makineleri (Support Vector Machines) ve k-En Yakın Komşu (k-Nearest Neighbor) algoritmalarının başarımı bir uygulama üzerinden değerlendirilmiştir. Çalışmada öncelikle makine öğrenmesi kavramına, makine öğrenmesi yöntemlerine ve süreçlerine değinilmiştir. Ardından makine öğrenme algoritmaları genel olarak incelenmiş, bu algoritmaların çalışma yöntemleri, başarımları ve avantajlarına değinilmiştir. Tezin uygulama aşamasında, beyaz eşya sektöründe Türkiye'nin önde gelen şirketlerinden birine ait çamaşır makinesi toz boyama ünitesi girdi ve çıktı ölçümleri temin edilmiştir. Öncelikle bu verilerle çamaşır makinesi parçalarının toz boyama ünitesinden çıktıktan sonra istenilen boya kalınlık değer aralığında olup olmadığı belirlenmiş, bu doğrultuda algoritmada kullanılmak üzere hedef değişken oluşturulmuştur. Sonradan ilavelerle gözlem sayısı zenginlentirilmiş ve daha etkin olacağı düşünülen yeni değişken veri setine dahil edilmiştir. Ve algoritmalarda değerlendirilmek üzere 327 satırlı veri seti kullanılmıştır. Verideki girdi değişkenlerine, işletmenin amaçlarına ve veriye uygun olarak makine öğrenmesi algoritmaları değerlendirilmiştir. Bu doğrultuda Rastgele Orman, Basit Bayes, Destek Vektör Makineleri ve k-En Yakın Komşu algoritmaları veri setinde uygulanmaya başlanılmıştır. Veri setinin 70%'i eğitim seti, 30%'u test seti olacak şekilde kullanılmıştır. Veri setinde bu algoritmaları çalıştırabilmek için RapidMiner isimli yazılım kullanılmıştır. RapidMiner ile belirlenen algoritmalar veriler üzerinde çalıştırılarak her bir algoritma için bir sınıflandırma sonucu elde edilmiştir. Makine öğrenmesindeki 4 farklı algoritma kullanılarak sınıflandırma başarıları ölçülmüş ve birbiriyle karşılaştırılmıştır. Karşılaştırmada algoritmaların doğruluk değerleri kullanılarak, her algoritmanın hatalı ürün tahminindeki sınıflandırma performansının ölçülmesi istenilmiştir. Daha sonra veriler normalize edilerek ve ölçeklenerek veriler üzerinde öznitelik mühendisliği çalışmaları yapılmış ve işlenmiş veriler için tekrardan aynı algoritmalar ile performansları ölçümlenerek karşılaştırılmıştır. Son olarak da performansı en yüksek çıkan algortimanın sonuçlarının genel geçerliliği test edilerek yorumlanmıştır. Çamaşır makinesindeki hatalı ürün tahmininde Rastgele Orman, Basit Bayes, Destek Vektör Makineleri ve k-En Yakın Komşu algoritmaları arasından Rastgele Orman algoritması en üstün performansı göstermiştir. Başka bir başarılı hatalı ürün sınıflandırıcı algoritması olarak Basit Bayes algoritması gözlemlenmiştir. Ayrıca şirketin odaklanması gereken, bir sonraki iyileştirme ve çalışmalarda kullanılmak üzere yardımcı olacak en önemli girdi değişkenleri de bu analiz sonucunda ortaya çıkarılmıştır. Şirketler için hatalı üründen kaynaklı maliyetleri azaltmaya yardımcı olacak, hatalı ürünü tahminlemede sınıflandırma performansı yüksek algoritma olarak Rastgele Orman algoritması tespit edilmiştir. Elde edilen bulgulardan ve sonuçlardan yararlanılarak sonraki çalışmalar için öneri ve tavsiyelerde bulunulmuştur

Özet (Çeviri)

In recent times, data collecting and storing has become easy with the current developments in technology. Many data are recorded and stored voluntarily or involuntarily on a day-to-day basis in our lives both in an individual sense and in an institutional sense. Although the collection of each data seems to be an advantage in recording, it is becoming increasingly difficult to extract meaningful and useful information from such large data. This situation has made the big data debatable. The data that can be analyzed and understood for humans have become meaningful and information. For this reason, special methods such as artificial intelligence are needed to process and analyze the data. The concept of artificial intelligence, which is inspired by the intelligence of human beings and aims to make the same system and fiction possible in robots and machines; has led to the concept of machine learning for purposes such as prediction, classification, clustering. The desire of machines and computers to learn like human beings and to be able to make the best decision is the main aim of machine learning concept. It is necessary to educate and teach computers like human beings to a certain level only by means of machine learning. With the help of machine learning, more accurate and valid results can be obtained by forward prediction and planning based on past data. In this study, the use of machine learning algorithms in predicting defective products is examined, and the performance of Random Forest, Naive Bayes, Support Vector Machines and k-Nearest Neighbor algorithms is evaluated through an application. In the study, firstly the concept of machine learning, machine learning methods and processes are mentioned. Then, machine learning algorithms are examined in general and working methods, performances and advantages of these algorithms are mentioned. There are different methods to use in machine learning. With the development of information and communication, these methods are developing and increasing day by day. Four of them are prominent in the machine learning method. These are supervised, unsupervised, semi supervised, and reinforcement learning.Supervised learning is a method which obtain the interaction and influence of inputs on outputs under the surveillance of supervisor . The main element in supervised learning is the existence of a training set that was previously made up of observations, and this training set is taught and introduced to the system by a supervisor. Unsupervised learning is a learning method in which there is no supervisor, only observations with input variables. It is a learning method that collects and groups the samples having close features and variables among the observations under the same set, which enables the system to find the relation between the input variables by self learning without the supervisor. The learning method applied to a data stack consisting of observations with only input variables in a very large quantity and observations with input and output variables in a small amount is called semi-supervised learning. Semi-supervised learning is used when it is difficult and costly to obtain observations with a target outcome. The method that learns how to reach the target through trial and error system in accordance with the target given to the system is learned to reinforce. The next step in the system with the award-penalty given at each step is a learning system that tries to reach the end by making use of the inferences from the previous step. Some processes should be followed in the face of problems to be solved by using machine learning. These are respectively as follows; definition of the problem, data analysis, preparation of the data, establishment of the model, evaluation of the model and use of the model. The first step in the problems that has solved by machine learning is defining the problem in the best way and clarifying what the purpose of the solution is . In this step, the success criterion for the purpose of the problem and what is in the present situation needs to be well defined. In the data analysis part, the core reason is to obtain the appropriate data. In addition, it should be checked whether the main feature that will be used in the model is met by that dataset needed. After the necessary data is obtained , the data can be used by machine learning algorithms. This process have called data preparing. The most appropriate model in order to solve the link between the variables and determine the most suitable algortihms for the model in the data set is the phase of establishing the model. At this stage, more than one suitable model is determined and applied to the data set in order to get the best result. Different models and algortihms is compared during the evaluation process of the model. In a model evaluation stage , it is determined that the results of the model will be beneficial or not in the future. In the process of using the model , the core reason is to benefit from definig model in an a suitable way in similar situations and researches with the problem . The implementation phase of the thesis, washing machine powder coating unit input and output measurements is provided by one of the leading companies in the durable good sector in Turkey Firstly, it is determined whether or not the laundry machine parts are in the range of the desired paint thickness value after leaving the powder coating unit, and the target variable is set in this direction for use in the algorithm. Later on, the number of observations was enriched and included in the new variable data set, which is thought to be more effective. And 327 line data set is used to evaluate in the algorithms. Machine learning algorithms have been evaluated in accordance with input variables, business objectives and data. In this direction, Random Forest, Naive Bayes, Support Vector Machines and k-Nearest Neighbor algorithms have begun to be applied in the data set. 70% of the data set was used as the training set and 30% as the test set. In the data set, RapidMiner named software was used to run these algorithms. The algorithms determined by RapidMiner are run on the data for classification result for each algorithm. Classification successes were measured and compared with each other using 4 different algorithms in machine learning. it is desirable to measure the classification performance of each algorithm in the wrong product estimation by using the accuracy values of the algorithms in comparison. Then feature engineering studies were performed on the data with data normalization and data scaling. Same machine learning algorithms are applies and their performances were compared on these data. Finally, the general validity of the results of the algorithm that has the highest performance is tested and interpreted. Random Forest algorithm among the Random Forest, Naive Bayes, Support Vector Machines and k-Nearest Neighbor algorithms in the washing machine's defective product prediction has shown the best performance. Another successful defective product classifier algorithm is the Naive Bayes algorithm. In addition, the most important input variables that will help the company to focus on the next improvements and work are also revealed in this analysis. The Random Forest algorithm has been identified as a high performance classification algorithm for defective products, which will help reduce costs for companies. Suggestions and recommendations have been made for the findings obtained and the results of the subsequent studies.

Benzer Tezler

Tez No
713289
Optimization the training algorithms of machine learning using GAN networks
Çekişmeli üretici ağlar için makine öğrenmesi eğitim algoritmalarında optimizasyon
SEDAT AKEL
Yüksek Lisans
İngilizce
2022
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Çankaya Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ ROYA CHOUPANI
Tez No
741159
Hard and soft tissue characterization with microwave dielectric spectroscopy
Mikrodalga dielektrik spektroskopi ile sert ve yumuşak doku karakterizasyonu
SEDA KESKİN
Yüksek Lisans
İngilizce
2022
Biyomühendislik İstanbul Teknik Üniversitesi
Elektronik ve Haberleşme Mühendisliği Ana Bilim Dalı
PROF. DR. TAYFUN AKGÜL
Tez No
962834
Ceviz çeşitlerinin derin öğrenme algoritmalarıyla sınıflandırılması
Classification of walnut varieties with deep learning algorithms
HALİL KILIF
Yüksek Lisans
Türkçe
2025
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Selçuk Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ İLKAY ÇINAR
PROF. DR. NURETTİN DOĞAN
Tez No
847299
Doğal dil işleme ve derin öğrenme yöntemleri kullanılarak finansal verilerin analizi
Analysis of financial data using natural language processing and deep learning methods
MUSTAFA SAMİ KAÇAR
Doktora
Türkçe
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Konya Teknik Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
PROF. DR. HALİFE KODAZ
DR. ÖĞR. ÜYESİ SEMİH YUMUŞAK
Tez No
840112
Makine öğrenme yaklaşımlarının biyoinformatikte ilaç geliştirme probleminde kullanılması
Using machine learning approaches in drug development problem in bioinformatics
TUĞÇE SEMERCİ
Yüksek Lisans
Türkçe
2023
İstatistik Hacettepe Üniversitesi
İstatistik Ana Bilim Dalı
PROF. DR. ÇAĞDAŞ HAKAN ALADAĞ

Geri Dön