Distributed algorithms based on fictitious play for near optimal sequential decision making

Başlık çevirisi mevcut değil.

PDF İndir

Tez No: 401214
Yazar: ESRA ŞİŞİKOĞLU
Danışmanlar: DOÇ. DR. MARINA A. EPELMAN, PROF. ROBERT L. SMITH
Tez Türü: Doktora
Konular: Endüstri ve Endüstri Mühendisliği, Industrial and Industrial Engineering
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2009
Dil: İngilizce
Üniversite: University of Michigan
Enstitü: Yurtdışı Enstitü
Ana Bilim Dalı: Belirtilmemiş.
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 96

Özet

Özet yok.

Özet (Çeviri)

In this dissertation, we considered two classes of sequential decision making problems: (a) discrete, deterministic and nite horizon DP problems and (b) stochastic, discounted, in nite horizon MDPs. Chapters II and IV are dedicated to the analysis of Sampled Fictitious Play (SFP) based algorithms designed for the two classes of problems listed above. In these chapters, the details of the developed algorithms are given rst, which are followed by the convergence results and the numerical experiments on their performance. Chapter III presents numerical results obtained by applying the algorithms developed in Chapter II to the Traveling Salesman Problem and discusses some modi cations that might potentially improve their performance. The algorithms studied in Chapter II, namely RSFP and SFPLS, are stochastic search algorithms for discrete, deterministic and nite horizon DP problems. As noted earlier, stochastic algorithms cannot outperform the label correcting algorithms such as Dijsktra's Algorithm in theory if the requirement is to nd an optimal solution. However, on large-scale problems, Dijsktra's algorithm loses its e ectiveness, and it usually is unable to nd the optimal solution within a reasonable time limit as required by most real-world applications. For such problems, we propose using SFP methods to nd close to optimal solutions quickly. As presented in Chapter II, SFP based search algorithms not only converge to the optimal solution but also nd close to optimal solutions quickly. The potential of these algorithms is also supported by numerical experiments presented in Chapter III highlighting the performance improvement after small modi cations. In Chapter IV, we propose using SFP concepts to design an online learning algorithm for in nite horizon, discounted MDPs, where the probability distribution of the exogenous random disturbance is unknown. The learning algorithm, called Sampled Fictitious Play based Learning Algorithm (SFPL), uses the information (i.e., frequency of the random events) obtained by observing the system to nd the best possible decision in each iteration. This algorithm is proven to converge to the optimal solution, and the numerical results based on the dynamic location and windy gridworld problems show the algorithm converges quickly. The numerical experiments also suggest that learning the system dynamics and simultaneously optimizing the system variables is more ecient than running these two routines separately (i.e., rst observing the system and collecting data on unknown parameters and then calculating the optimal solution with the estimated parameters). We also present comparisons of SFPL with two algorithms from the Reinforcement Learning literature, Q-Learning and SARSA (slightly modi ed version of Q-Learning). These comparisons show that the performance of Q-Learning strongly depends on the step size parameter while SFPL does not use a step size parameter and that Q-Learning algorithm is more susceptible to discount rate of the problem than SFPL. The advantages of using SFP based algorithms for sequential decision making problems are as follows: The SFP approach does not require a closed form representation of an optimization model. The algorithms presented can even be applied to complex black-box optimization problems. The SFP based algorithms developed in this dissertation do not rely on structures of costs, transition functions and objective function. Therefore, these algorithms are applicable to a wide range of problems. The SFP based algorithms do not need the knowledge of the entire state space (e.g., network topology). They discover the state space of a given problem gradually. It is a well known fact that the entire feasible region has to be searched in order to con rm that an optimal solution has been found. However, with our approach, it is possible to nd good solutions by only partially discovering the state space. To model a DP problem as a game, we de ne states to be the players and the objective function to be the payo function for each player. Therefore, the best reply calculations of players naturally divides the large-scale problem into manageable subproblems which can be solved in parallel. The parallel processing possibility, in some cases, is one of the considerations for applying these algorithms to large-scale problems, since a parallel implementation can increase the performance of the algorithms substantially.

Benzer Tezler

Tez No
2162
Transformatör sargılarında oluşan hızlı değişimli geçici olayların incelenmesi ve enerji iletim sistemlerinin modellenmesinde yeni bir yaklaşım
Investigation of high speed transients on transformer windings and a new approach in modelling of energy transmission systems
A.OĞUZ SOYSAL
Doktora
Türkçe
1985
Elektrik ve Elektronik Mühendisliği İstanbul Teknik Üniversitesi
PROF. DR. M. KEMAL SARIOĞLU
Tez No
39300
Generatör ve transfarmatör bloğundan oluşan sistemin yüksek frekans davranışı
Başlık çevirisi yok
YAŞAR CİNAKLI
Yüksek Lisans
Türkçe
1994
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol İstanbul Teknik Üniversitesi
PROF.DR. M. KEMAL SARIOĞLU
Tez No
46532
Sincap kafesli asenkron makinenin rotor alan yönlendirmeli kontrolü
Rotor field-orientation control of a squirrel cage induction machine
SAFFET ALTAY
Yüksek Lisans
Türkçe
1995
Elektrik ve Elektronik Mühendisliği İstanbul Teknik Üniversitesi
PROF.DR. M. EMİN TACER
Tez No
807303
Gümrük beyannamesinin akıllı sözleşme aracılığı ile düzenlenmesi üzerine bir uygulama
An application on the regulation of customs declaration through smart contract
MEHMET KORAY KELEŞ
Yüksek Lisans
Türkçe
2023
İşletme Burdur Mehmet Akif Ersoy Üniversitesi
Gümrük İşletme Ana Bilim Dalı
DR. ÖĞR. ÜYESİ SAMET GÜRSOY
Tez No
302590
Yönlü algılayıcı ağlarda dönme ve hareket yetenekleri ile kapsama alanının iyileştirilmesi
Coverage enhancement exploiting both motility and mobility in directional sensor networks
M. AMAÇ GÜVENSAN
Doktora
Türkçe
2011
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Yıldız Teknik Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
YRD. DOÇ. DR. A. GÖKHAN YAVUZ

Geri Dön