Geri Dön

On the reinforcement learning analysis and learning the control of humanoid robot leg

Başlık çevirisi mevcut değil.

  1. Tez No: 400914
  2. Yazar: ÖNDER TUTSOY
  3. Danışmanlar: DR. MARTIN BROWN
  4. Tez Türü: Doktora
  5. Konular: Elektrik ve Elektronik Mühendisliği, Electrical and Electronics Engineering
  6. Anahtar Kelimeler: Belirtilmemiş.
  7. Yıl: 2013
  8. Dil: İngilizce
  9. Üniversite: The University of Manchester
  10. Enstitü: Yurtdışı Enstitü
  11. Ana Bilim Dalı: Belirtilmemiş.
  12. Bilim Dalı: Belirtilmemiş.
  13. Sayfa Sayısı: 171

Özet

Özet yok.

Özet (Çeviri)

Reinforcement learning is a method for learning sequential control actions or decisions using an instantaneous reward signal which implicitly defines a long term value function. It has been proposed to solve complex learning control problems without requiring explicit knowledge of the system's dynamics. Moreover, it has also been used as a model of cognitive learning in humans and applied to systems, such as humanoid robots, to study embodied cognition. However, there are relatively few results which describe the actual performance of such learning algorithms, even on relatively simple problems. In this thesis, simple test problems are used to investigate issues associated with the value function's representation and parametric convergence. In particular, the terminal convergence problem is analyzed with a known optimal (bang-bang) control policy where aim is to accurately learn the value function. For certain initial conditions, the closed form solution for the value function is calculated and it is shown to have a polynomial form. It is parameterized by terms which are functions of the unknown plant's parameters and the value function's discount factor and their convergence properties are analyzed. It is shown that the temporal difference error introduces a null space associated with the finite horizon basis function during the experiment. This is only non-singular when the experiment is terminated correctly and a number of (equivalent) solutions are described. It is also demonstrated that, in general, the test problem's dynamics are chaotic for random initial states and this causes a digital offset in the value function. Methods for estimating the offset are described and a dead-zone is proposed to switch off learning in the chaotic region. Another value function estimation test problem is then proposed which uses a saturated piecewise linear control signal. This is a more realistic control scenario and it is also shown to address the chaotic dynamics problem. It is shown that the condition of the learning problem depends on both the saturation threshold and the value function's discount factor and that a badly conditioned learning problem may result. Moreover, it is proved that the temporal difference error introduce a trajectory null space associated with the differenced higher order bases until the saturation threshold of the saturated piecewise linear control signal. These results are then used to explain the behaviour of reinforcement learning algorithms when higher order systems are used and the impact of function approximation algorithms and exploration noise is discussed. Finally, a central pattern generator based reinforcement learning algorithm is applied to a single leg of a robot where the target is to generate appropriate control signals for each joint.

Benzer Tezler

  1. Design and application of half-bridge LLC resonant converter using reinforcement learning control

    Pekiştirmeli öğrenme kontrollü yarım köprü LLC rezonans dönüştürücü tasarımı ve uygulaması

    MUHAMMET KILIÇTAŞ

    Yüksek Lisans

    İngilizce

    İngilizce

    2024

    Elektrik ve Elektronik Mühendisliğiİstanbul Teknik Üniversitesi

    Elektrik Mühendisliği Ana Bilim Dalı

    DOÇ. DR. SALİH BARIŞ ÖZTÜRK

  2. Derin pekiştirmeli öğrenme yöntemi ile görüntü hash kodlarını oluşturma

    Generating image hash codes with deep reinforcement learning method

    ELİF AKKAYA

    Yüksek Lisans

    Türkçe

    Türkçe

    2024

    Elektrik ve Elektronik MühendisliğiSakarya Üniversitesi

    Elektrik ve Elektronik Mühendisliği Ana Bilim Dalı

    DR. ÖĞR. ÜYESİ BURHAN BARAKLI

  3. A comparative study of nonlinear model predictive control and reinforcement learning for path tracking

    Yol izleme için doğrusal olmayan model öngörülü kontrol ve pekiştirmeli öğrenmenin karşılaştırmalı çalışması

    GAMZE TÜRKMEN

    Yüksek Lisans

    İngilizce

    İngilizce

    2022

    Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrolİstanbul Teknik Üniversitesi

    Kontrol ve Otomasyon Mühendisliği Ana Bilim Dalı

    PROF. DR. OVSANNA SETA ESTRADA

  4. Derin pekiştirmeli öğrenme ile robot kol tork kontrolü

    Robotic arm torque control via deep reinforcement learning

    MUHAMMED RAŞİT EVDÜZEN

    Yüksek Lisans

    Türkçe

    Türkçe

    2021

    Elektrik ve Elektronik MühendisliğiPamukkale Üniversitesi

    Elektrik ve Elektronik Mühendisliği Ana Bilim Dalı

    PROF. DR. SERDAR İPLİKÇİ

  5. Data-driven prediction and emergency control of transient stability in power systems towards a risk-based optimal power flow operation

    Güç sistemlerinde risk tabanlı optimal güç akışı işletimineyönelik geçici hal kararlılığın veri güdümlü tahmini veacil durum kontrolü

    SEVDA JAFARZADEH

    Doktora

    İngilizce

    İngilizce

    2022

    Elektrik ve Elektronik Mühendisliğiİstanbul Teknik Üniversitesi

    Elektrik Mühendisliği Ana Bilim Dalı

    PROF. VEYSEL MURAT İSTEMİHAN GENÇ