On the reinforcement learning analysis and learning the control of humanoid robot leg
Başlık çevirisi mevcut değil.
- Tez No: 400914
- Danışmanlar: DR. MARTIN BROWN
- Tez Türü: Doktora
- Konular: Elektrik ve Elektronik Mühendisliği, Electrical and Electronics Engineering
- Anahtar Kelimeler: Belirtilmemiş.
- Yıl: 2013
- Dil: İngilizce
- Üniversite: The University of Manchester
- Enstitü: Yurtdışı Enstitü
- Ana Bilim Dalı: Belirtilmemiş.
- Bilim Dalı: Belirtilmemiş.
- Sayfa Sayısı: 171
Özet
Özet yok.
Özet (Çeviri)
Reinforcement learning is a method for learning sequential control actions or decisions using an instantaneous reward signal which implicitly defines a long term value function. It has been proposed to solve complex learning control problems without requiring explicit knowledge of the system's dynamics. Moreover, it has also been used as a model of cognitive learning in humans and applied to systems, such as humanoid robots, to study embodied cognition. However, there are relatively few results which describe the actual performance of such learning algorithms, even on relatively simple problems. In this thesis, simple test problems are used to investigate issues associated with the value function's representation and parametric convergence. In particular, the terminal convergence problem is analyzed with a known optimal (bang-bang) control policy where aim is to accurately learn the value function. For certain initial conditions, the closed form solution for the value function is calculated and it is shown to have a polynomial form. It is parameterized by terms which are functions of the unknown plant's parameters and the value function's discount factor and their convergence properties are analyzed. It is shown that the temporal difference error introduces a null space associated with the finite horizon basis function during the experiment. This is only non-singular when the experiment is terminated correctly and a number of (equivalent) solutions are described. It is also demonstrated that, in general, the test problem's dynamics are chaotic for random initial states and this causes a digital offset in the value function. Methods for estimating the offset are described and a dead-zone is proposed to switch off learning in the chaotic region. Another value function estimation test problem is then proposed which uses a saturated piecewise linear control signal. This is a more realistic control scenario and it is also shown to address the chaotic dynamics problem. It is shown that the condition of the learning problem depends on both the saturation threshold and the value function's discount factor and that a badly conditioned learning problem may result. Moreover, it is proved that the temporal difference error introduce a trajectory null space associated with the differenced higher order bases until the saturation threshold of the saturated piecewise linear control signal. These results are then used to explain the behaviour of reinforcement learning algorithms when higher order systems are used and the impact of function approximation algorithms and exploration noise is discussed. Finally, a central pattern generator based reinforcement learning algorithm is applied to a single leg of a robot where the target is to generate appropriate control signals for each joint.
Benzer Tezler
- Design and application of half-bridge LLC resonant converter using reinforcement learning control
Pekiştirmeli öğrenme kontrollü yarım köprü LLC rezonans dönüştürücü tasarımı ve uygulaması
MUHAMMET KILIÇTAŞ
Yüksek Lisans
İngilizce
2024
Elektrik ve Elektronik Mühendisliğiİstanbul Teknik ÜniversitesiElektrik Mühendisliği Ana Bilim Dalı
DOÇ. DR. SALİH BARIŞ ÖZTÜRK
- Derin pekiştirmeli öğrenme yöntemi ile görüntü hash kodlarını oluşturma
Generating image hash codes with deep reinforcement learning method
ELİF AKKAYA
Yüksek Lisans
Türkçe
2024
Elektrik ve Elektronik MühendisliğiSakarya ÜniversitesiElektrik ve Elektronik Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ BURHAN BARAKLI
- A comparative study of nonlinear model predictive control and reinforcement learning for path tracking
Yol izleme için doğrusal olmayan model öngörülü kontrol ve pekiştirmeli öğrenmenin karşılaştırmalı çalışması
GAMZE TÜRKMEN
Yüksek Lisans
İngilizce
2022
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrolİstanbul Teknik ÜniversitesiKontrol ve Otomasyon Mühendisliği Ana Bilim Dalı
PROF. DR. OVSANNA SETA ESTRADA
- Derin pekiştirmeli öğrenme ile robot kol tork kontrolü
Robotic arm torque control via deep reinforcement learning
MUHAMMED RAŞİT EVDÜZEN
Yüksek Lisans
Türkçe
2021
Elektrik ve Elektronik MühendisliğiPamukkale ÜniversitesiElektrik ve Elektronik Mühendisliği Ana Bilim Dalı
PROF. DR. SERDAR İPLİKÇİ
- Data-driven prediction and emergency control of transient stability in power systems towards a risk-based optimal power flow operation
Güç sistemlerinde risk tabanlı optimal güç akışı işletimineyönelik geçici hal kararlılığın veri güdümlü tahmini veacil durum kontrolü
SEVDA JAFARZADEH
Doktora
İngilizce
2022
Elektrik ve Elektronik Mühendisliğiİstanbul Teknik ÜniversitesiElektrik Mühendisliği Ana Bilim Dalı
PROF. VEYSEL MURAT İSTEMİHAN GENÇ