Reınforcement learnıng and evolutıonary algorıthms for contaıner loadıng problem
Reinforcement learning and evolutionary algorithms for container loading problem
- Tez No: 382780
- Danışmanlar: YRD. DOÇ. DR. ARMAĞAN ÖZKAYA
- Tez Türü: Yüksek Lisans
- Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
- Anahtar Kelimeler: Belirtilmemiş.
- Yıl: 2014
- Dil: İngilizce
- Üniversite: Mevlana Üniversitesi
- Enstitü: Fen Bilimleri Enstitüsü
- Ana Bilim Dalı: Bilgisayar Mühendisliği Ana Bilim Dalı
- Bilim Dalı: Belirtilmemiş.
- Sayfa Sayısı: 95
Özet
Container Loading Problem (CLP) is a space utilization problem subject to various constraints. An example of it is the placement of containers in storage so as to minimize the waste of space. Other constraints that may be imposed include a certain loading order and an even weight distribution. Although evolutionary algorithms have been extensively studied to solve this problem, Reinforcement Learning (RL) which is a means of learning optimal behaviors by interacting with the environment, has not received enough attention in this respect. This work explores the use of RL as an alternative for tackling CLP so as to minimize the waste of space while maximizing the number of containers. We have applied five different RL algorithms (Q-learning, TD( ), Monte-Carlo, TDQ-learning and SARSA), and two types of evolutionary algorithms (Genetic Algorithm and Ant Colony Optimization) to solve this problem. Simulations have been carried out using MATLAB to compare these algorithms based on space utilization, number of containers, simulation time and speed of convergence. The simulation parameters are set so that the algorithms are allowed to fill in storage yards 100% with containers of different sizes. Results show that, in general, RL may not guarantee the best results, but can minimize the computational difficulty providing a simple way to solve this problem. Genetic Algorithm (GA) on the other hand gives best speed of convergence for small storage yards, and, unlike other approaches that may require complex computations, four RL algorithms, namely Q-learning, TD( ), Monte-Carlo and SARSA, give better speed of convergence than GA for larger storage yards used in our simulations. Ant Colony Optimization (ACO), while generally being worse than the others in terms of average convergence time, performs better than the ordinary Q-learning for the largest storage yard area used in our simulations and gives a result similar to GA. Growth of ACO's average convergence time seems to be slower than those of others, indicating that it has the potential to have better convergence times with increasing storage yard area sizes. In terms of the number of containers that can be packed into storage yard when the number of containers is not restricted, all RL algorithms give approximately the same result. As the storage yard size becomes larger, however, Q-learning starts performing better than all the other RL algorithms in this respect and Monte-Carlo gradually worsens even though it is the best of all for filling small storage yards. But in terms of simulation time (for RL algorithms only) TDQ performs the best for all storage yard area combinations, except for the smallest area in which it comes in second position after TD( ). This is because learning optimal value in TDQ can take place under any policy and learning rate effect. TD( ) comes second for all remaining combinations while MC and SARSA come third and fourth respectively.
Özet (Çeviri)
Container Loading Problem (CLP) is a space utilization problem subject to various constraints. An example of it is the placement of containers in storage so as to minimize the waste of space. Other constraints that may be imposed include a certain loading order and an even weight distribution. Although evolutionary algorithms have been extensively studied to solve this problem, Reinforcement Learning (RL) which is a means of learning optimal behaviors by interacting with the environment, has not received enough attention in this respect. This work explores the use of RL as an alternative for tackling CLP so as to minimize the waste of space while maximizing the number of containers. We have applied five different RL algorithms (Q-learning, TD( ), Monte-Carlo, TDQ-learning and SARSA), and two types of evolutionary algorithms (Genetic Algorithm and Ant Colony Optimization) to solve this problem. Simulations have been carried out using MATLAB to compare these algorithms based on space utilization, number of containers, simulation time and speed of convergence. The simulation parameters are set so that the algorithms are allowed to fill in storage yards 100% with containers of different sizes. Results show that, in general, RL may not guarantee the best results, but can minimize the computational difficulty providing a simple way to solve this problem. Genetic Algorithm (GA) on the other hand gives best speed of convergence for small storage yards, and, unlike other approaches that may require complex computations, four RL algorithms, namely Q-learning, TD( ), Monte-Carlo and SARSA, give better speed of convergence than GA for larger storage yards used in our simulations. Ant Colony Optimization (ACO), while generally being worse than the others in terms of average convergence time, performs better than the ordinary Q-learning for the largest storage yard area used in our simulations and gives a result similar to GA. Growth of ACO's average convergence time seems to be slower than those of others, indicating that it has the potential to have better convergence times with increasing storage yard area sizes. In terms of the number of containers that can be packed into storage yard when the number of containers is not restricted, all RL algorithms give approximately the same result. As the storage yard size becomes larger, however, Q-learning starts performing better than all the other RL algorithms in this respect and Monte-Carlo gradually worsens even though it is the best of all for filling small storage yards. But in terms of simulation time (for RL algorithms only) TDQ performs the best for all storage yard area combinations, except for the smallest area in which it comes in second position after TD( ). This is because learning optimal value in TDQ can take place under any policy and learning rate effect. TD( ) comes second for all remaining combinations while MC and SARSA come third and fourth respectively.
Benzer Tezler
- İmalat sistemlerinin tasarlanması ve öncelik kurallarının belirlenmesinde yapay sinir ağlarının kullanılması
Başlık çevirisi yok
TARIK ÇAKAR
Doktora
Türkçe
1997
Mühendislik Bilimleriİstanbul Teknik Üniversitesiİşletme Mühendisliği Ana Bilim Dalı
PROF. DR. AYHAN TORAMAN
- Derin pekiştirmeli öğrenme yöntemi ile görüntü hash kodlarını oluşturma
Generating image hash codes with deep reinforcement learning method
ELİF AKKAYA
Yüksek Lisans
Türkçe
2024
Elektrik ve Elektronik MühendisliğiSakarya ÜniversitesiElektrik ve Elektronik Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ BURHAN BARAKLI
- Entegre fotonik cihazların tasarımına yönelik hesaplama tabanlı yaklaşımlar
Integrated photonic device designs based on computational approaches
EMRE BOR
Doktora
Türkçe
2020
Elektrik ve Elektronik MühendisliğiTOBB Ekonomi ve Teknoloji ÜniversitesiElektrik-Elektronik Mühendisliği Ana Bilim Dalı
PROF. DR. HAMZA KURT
DOÇ. DR. MIRBEK TURDUEV
- Hyper-heuristics in dynamic environments
Dinamik ortamlarda üst-sezgiseller
BERNA KİRAZ
Doktora
İngilizce
2014
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrolİstanbul Teknik ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. AYŞE ŞİMA ETANER UYAR
- Monte Carlo tree search with temporal-difference learning for general video game playing
Zamansal-fark öğrenmeli Monte Carlo ağaç araması ile genel video oyunu oynama
ERCÜMENT İLHAN
Yüksek Lisans
İngilizce
2017
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrolİstanbul Teknik ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
DOÇ. DR. AYŞE ŞİMA UYAR