Deep reinforcement learning approach in control of Stewart platform- simulation and control

Stewart platformunun kontrolünde derin pekistirmeli öğrenme yaklaşımıc- simülasyon ve kontrol

PDF İndir

Tez No: 803654
Yazar: HADI YADAVARI
Danışmanlar: DOÇ. DR. SERHAT İKİZOĞLU, DR. ÖĞR. ÜYESİ VAHİT BARIŞ TAVAKOL
Tez Türü: Doktora
Konular: Mekatronik Mühendisliği, Mechatronics Engineering
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2023
Dil: İngilizce
Üniversite: İstanbul Teknik Üniversitesi
Enstitü: Lisansüstü Eğitim Enstitüsü
Ana Bilim Dalı: Mekatronik Mühendisliği Ana Bilim Dalı
Bilim Dalı: Mekatronik Mühendisliği Bilim Dalı
Sayfa Sayısı: 131

Özet

Bu çalışma, başlığından da anlaşılacağı üzere, yeni bir simülasyon ortamı sunarak Stewart platformunun kontrol görevine pekiştirmeli öğrenme yöntemleri yardımıyla yaklaşım sergilemektedir. Stewart platformu, uçuş ve sürüş simülatörlerinden yapısal test platformlarına kadar geniş bir yelpazede uygulama alanına sahip tamamen paralel bir robottur. Bu platformun uygulamalarında istenen başarının elde edilmesi için hassas olarak kontrolu esas olup, bu sürecin zorlukları vardır. Yapay zekanın temel amaçlarından biri, yüksek boyutlu ölçüm bilgileri kullanarak karmaşık sorunları çözmektir. Pekiştirmeli öğrenme, bir amaç fonksiyonu olarak gelecekteki ödüllerin toplamını en üst düzeye çıkarmak için bazı politikalara göre çevresindeki ortamla etkileşime giren bir ajanı içeren belirli bir Makine Öğrenimi (ML) alanıdır. Ajanın öğrenme süreci, politika alanından seçilen eylemin kalitesine göre bir ödül-ceza şemasına dayalıdır. Bu çalışma, güncel teknolojiye dayalı derin pekiştirme algoritmaları (DRL) kullanarak Stewart platformunun karmaşık modelini kontrol etmeyi öğrenmeye odaklanmaktadır. Neden bir simülasyon ortamına gereksinim duyulduğu sorulabilir. Pekiştirmeli öğrenme, optimal bir politikayı öğrenmek için bir ortamla çok sayıda etkileşim gerektirir. Gerçek robotlarla deneyimler pahalı, zaman alıcı, kopyalanması zor ve hatta tehlikelidir. Pekiştirmeli öğrenme (RL) algoritmalarını gerçek zamanlı uygulamalarda güvenli bir şekilde kullanabilmek için, ajan ortamının tüm doğrusal olmayan durumlarını ve belirsizliklerini dikkate alan güvenilir bir simülasyon ortamı kaçınılmazdır. Böylece, gerçek donanım sorunları hakkında endişe duyulmaksızın bir ajan yeterli sayıda deneme yapmak suretiyle simülasyon ortamında eğitilebilir. Bu ortamda öğrenilen kontrolcü parametreleri daha sonra gerçek zamanda fiziksel bir sisteme aktarılabilir. Bu bağlamda bu çalışmada, öğrenme performansının güvenilirliğini artırmak ve sistemin davranışını tamamen taklit edebilen bir test ortamına sahip olmak üzere hassas bir şekilde tasarlanmış bir simülasyon ortamı sunulmaktadır. Bu çalışmada simülasyon ortamı için Open Dynamic Engine (ODE) veya Bullet fiziğine dayalı açık kaynaklı bir simülatör olan Gazebo simülatörü seçilmiştir. Gazebo'yu robot işletim sistemi (ROS) ile entegre etmek, çok ajanlı robotları içeren farklı ortamları simüle etme yeteneği nedeniyle verimli karmaşık robotik uygulamaların önünü açabilir. Stewart platformunun bazı Bilgisayar Destekli Tasarım (CAD tabanlı) simülasyonları var olsa da, son geliştirilen RL uygulamalarıyla uyumlu, yüksek verim ve performansa sahip, güncel pekiştirmeli öğrenme algoritmalarından yararlanmak için ROS ve Gazebo seçilmiştir. Bununla birlikte, ROS birçok robotik simülasyon içermesine rağmen, Stewart platformunda kullanılana benzer paralel uygulamalardan ve kapalı bağlantı yapılarından yoksundur. Bu nedenle, önce Gazebo ve ROS'ta Stewart platformunun kinematiği için parametrik bir gösterim tasarlanmış ve yapıları uygun bir şekilde oluşturmak için tasarım bir Python sınıfıyla entegre edilmiştir. Simülasyon ortamını tasarladıktan sonra, ilk kontrol stratejisi olarak, PID kontrolcüsünün kazanç parametrelerini sürekli öğrenmek ve ayarlamak için Eşzamansız Avantaj Aktör-Eleştirel (A3C), Derin Deterministik Politika Gradyanı(DDPG) ve Yakın Politika Optimizasyonu (PPO) olmak üzere üç DRL algoritması uygulanmıştır. Deneylerimizi gerçekleştirmek için önerdiğimiz üç algoritma, sürekli durum ve eylem uzayları için geçerlidir; bu, durum değişkenlerinin ve eylem değişkenlerinin ayrık olmadığı, ancak bir dizi değer alabildiği hallerde kullanılabilecekleri anlamına gelir. Bu tür durumlar, durum değişkenlerinin platform bileşenlerinin konumlarını, hızlarını ve ivmelerini içerebileceği Stewart platformu gibi kontrol sistemlerinde ortaya çıkabilir. Ayrıca eylem değişkenleri, platforma uygulanan kuvvetleri veya torkları içerebilir. Genel olarak, bu çalışmada ele aldığımız problem için kullandığımız gibi sürekli durum ve eylem uzaylarına uygulanan algoritmalar, kontrol problemleri için yalnızca ayrık uzaylara uygulanan algoritmalardan (örn. Q-öğrenme) daha uygundur. Bunun nedeni, birçok kontrol probleminin sürekli değişen durum değişkenlerini ve eylemleri içermesi ve bu problemleri ayrık uzaylar kullanarak doğru bir şekilde modellemenin zor olabilmesidir. Sürekli uzaylarda kullanılmak için tasarlanmış algoritmalar ile, daha doğru ve güvenilir kontrol çözümleri elde etmek genellikle mümkündür. PID kazanç ayarlama problemi ile ilgili olarak, Stewart platformunun standart kontrol metodolojilerinden birine bir RL ajanı döngüsü eklenmiş ve gerekli PID kazanç aralığını bulmak üzere her üç DRL algoritması için kontrol politikasının sinir ağı fonksiyonunun çıktısı değiştirilmiştir. Simülasyon sonuçları, DRL algoritmalarının PID denetleyici kazançlarını başarılı bir şekilde öğrenebildiğini göstermiş ve tatmin edici bir kontrol performansı elde edilmiştir. Dinamik modelleme bölümünde, platform bacaklarının kuvvetleri olarak tanımlayabileceğimiz gerekli eylemleri uygulamak için klasik ileri beslemeli ters dinamik blok bir RL ajani ile değiştirilmiştir. Burada mevcut klasik kontrol topolojilerinden, ters kinematik modellemeden ve sistemin ileri dinamiğinden yararlanmak için bir pekiştirmeli öğrenme kontrol topolojisi sunulmuştur. Burada RL kontrolcünün performansını artırmaya yardımcı olan hibrit bir modda kullanılmıştır. Sunulan topolojiye uygun şekilde kullanılmak üzere pekiştirmeli öğrenme algoritmaları olarak önce üç adet 'modelden bağımsız' DRL algoritması denenmiştir. Bu algoritmalar ile, PID kontrolcünün kuvvet çıktılarına ilave olarak, platformun altı bacağına kumanda eden motorlara doğrudan kuvvet eylemi uygulanmıştır. Daha sonra, önce sistemin dinamik modelini öğrenmek ve ardından onu ileri beslemeli kontrol gibi Stewart platformunu kontrol etmek amacıyla kullanmak için PILCO (probabilistic inference for learning control) ve MBPO (model-based policy optimization) olmak üzere iki model tabanlı RL algoritması denenmiştir. Her iki algoritma da, daha önce sözünü ettiğimiz 'modelden bağımsız' algoritmalar gibi, Stewart platformunun sürekli durum-eylem uzaylarına uygulanmaktadır. Kontrolcüye RL tarafından öğrenilen bir ileri besleme çevrimi eklemenin sonucu olarak denetleyicinin performanısında ve sistemin kararlılığında ciddi bir artış gözlenmiştir.

Özet (Çeviri)

As named, this work approaches the Stewart platform's controlling task with reinforcement learning methods, presenting a new simulation environment. The Stewart platform, having a broad range of applications that span from flight and driving simulators to structural test platforms, is a fully parallel robot. Exact control of the Stewart platform is challenging and essential in its applications to deliver the desired performance. The fundamental aim of artificial intelligence is to address complex problems by utilizing sensory information with a high number of dimensions. Reinforcement learning (RL) is a specific area of Machine Learning (ML) that incorporates an agent interacting with its surrounding environment according to some policies to maximize the sum of the future rewards as an objective function. The agent's learning process is based on a reward-penalty scheme according to the quality of the selected action from the policy space. In this manner, RL tries to solve many problems and tasks. The primary focus of this work revolves around acquiring the ability to control a sophisticated model of the Stewart platform through the utilization of cutting-edge deep reinforcement algorithms (DRL) and model-based reinforcement learning algorithms. The question is that why do we need a simulation environment? To learn an optimal policy, reinforcement learning necessitates a multitude of interactions with the environment. Experiences with real robots are expensive, time consuming, hard to replicate, and even dangerous. To safely implement the RL algorithms in real-time applications, a reliable simulation environment that considers all the nonlinearities and uncertainties of the agent environment is inevitable. Therefore, an agent could be trained in the simulation through sufficient trials without concerns about the actual hardware issues. After having accurate parameters of the controller learned by the simulation, they can be transferred to a physical real-time system. With the objective of improving the reliability of learning performance and creating a comprehensive test bed that replicates the system's behavior, we introduce a precisely designed simulation environment. For our simulation environment, we opted for the Gazebo simulator, which is an open-source platform utilizing either Open Dynamic Engine (ODE) or Bullet physics. Integrating Gazebo with ROS can pave the way for efficient complex robotic applications due to the ability to simulate different environments involving multi-agent robots. Although some Computer-Aided Design (CAD-based) simulations of the Stewart platform exist, we choose ROS and Gazebo to benefit from the latest reinforcement learning algorithms with high yield and performance, compatible with the last developed RL frameworks. However, despite many robotic simulations in ROS, it lacks parallel applications and closed linkage structures like the Stewart platform. Consequently, our initial step involves creating a parametric representation of the Stewart platform's kinematics within the Gazebo and Robot Operating System (ROS) frameworks. This representation is then seamlessly integrated with a Python class to facilitate the generation of structures. After designing the simulation environment, we are ready to experiment with it. In the first control strategy, we employ three deep reinforcement learning (DRL) algorithms, namely the asynchronous advantage actor-critic (A3C) algorithm, the Proximal Policy Optimization (PPO), and the Deep Deterministic Policy Gradient (DDPG). These algorithms are utilized to iteratively learn and adjust the gain parameters of the PID controller. The three algorithms proposed for our experiments are suitable for continuous state and action spaces, indicating their capability to handle problem scenarios where state and action variables are not limited to discrete values but can encompass a range of continuous values. These types of problems can arise in control systems, such as the Stewart platform, where the state variables might include the platform's positions, velocities, and accelerations. Also, the action variables might include forces or torques applied to the platform. In general, the algorithms that apply to continuous state and action spaces, such as the ones we hired for our problem, are better suited for control problems than algorithms that only apply to discrete spaces (e.g., Q-learning). This is because many control problems involve continuously varying state variables and actions, and it can be difficult to model these problems accurately using discrete spaces. By using algorithms designed to handle continuous spaces, obtaining more accurate and reliable control solutions is often possible. In the PID tuning problem, we add an RL agent loop to one of the standard control methodologies of the Stewart platform and modify the output of the neural network function of the control policy for all three DRL algorithms to explore the required range of PID gains. According to the simulation results, it is evident that the DRL algorithms effectively acquire the PID controller gains through learning, leading to commendable control performance. In the dynamic modeling section, we replace the classical feedforward inverse dynamic block with an RL agent to apply the required actions, which are the leg's forces here, for different trajectory states. We present a reinforcement learning control topology to benefit from existing classical control topologies, inverse kinematic modeling, and the forward dynamic of the system. We use the RL in a hybrid mode that helps to increase the controller's performance. As reinforcement learning algorithms to fit this presented topology, we first experiment with three model-free DRL algorithms to send force action to six legs motor directly beside the PID controller force output. Then we try two model-based RL algorithms, namely PILCO (probabilistic inference for learning control) and MBPO(model-based policy optimization), first to learn the dynamic model of the system and then utilize it to control the Stewart platform like feedforward control. Both algorithms apply to continuous state-action spaces of the Stewart platform like the model-free algorithms we experienced before. The result of adding a feedforward loop in the controller learned by RL shows a boost in the controller's performance and is more stable.

Benzer Tezler

Tez No
692874
Derin pekiştirmeli öğrenme ile robot kol tork kontrolü
Robotic arm torque control via deep reinforcement learning
MUHAMMED RAŞİT EVDÜZEN
Yüksek Lisans
Türkçe
2021
Elektrik ve Elektronik Mühendisliği Pamukkale Üniversitesi
Elektrik ve Elektronik Mühendisliği Ana Bilim Dalı
PROF. DR. SERDAR İPLİKÇİ
Tez No
781696
Bir insansız hava aracının modellenmesi ve derin pekiştirmeli öğrenme tabanlı otonom kontrolü
Modeling of an unmanned aerial vehicle and autonomous control based on deep reinforcement learning
BURAK TAŞ
Yüksek Lisans
Türkçe
2023
Savunma ve Savunma Teknolojileri Fırat Üniversitesi
Savunma Teknolojileri Ana Bilim Dalı
PROF. DR. AYŞEGÜL UÇAR
Tez No
582600
A comparative study of learning based control policies and conventional controllers on 2D bi-rotor platform with tail assistance
Öğrenme temelli kontrolcüler ile geleneksel kontrolcülerin iki boyutta kuyrukla desteklenmiş iki rotorlu uçan robotik platform üzerinde karşılaştırmalı çalışması
HALİL İBRAHİM UĞURLU
Yüksek Lisans
İngilizce
2019
Elektrik ve Elektronik Mühendisliği Orta Doğu Teknik Üniversitesi
Elektrik-Elektronik Mühendisliği Ana Bilim Dalı
DOÇ. DR. AFŞAR SARANLI
DOÇ. DR. SİNAN KALKAN
Tez No
894523
Improving sample efficiency in reinforcement learning control using autoencoders
Pekiştirmeli öğrenme kontrolde otokodlayıcılar ile örnekleme verimliliğini arttırma
BURAK ER
Yüksek Lisans
İngilizce
2023
Elektrik ve Elektronik Mühendisliği İstanbul Teknik Üniversitesi
Kontrol ve Otomasyon Mühendisliği Ana Bilim Dalı
PROF. DR. MUSTAFA DOĞAN
Tez No
845261
Dijital elektrohidrolik sistemlerin popet tipi hidrolik valflerle pekiştirmeli öğrenme kullanılarak denetlenmesi
Control of digital electrohydraulic systems driven by poppet-type hydraulic valves using reinforcement learning
MUSTAFA YAVUZ COŞKUN
Doktora
Türkçe
2023
Makine Mühendisliği Karadeniz Teknik Üniversitesi
Makine Mühendisliği Ana Bilim Dalı
PROF. DR. MEHMET İTİK

Geri Dön