Q öğrenme algoritması ile kontrolör tasarımı

Controller design via Q learning algorithm

PDF İndir

Tez No: 714608
Yazar: ERVA HATUN TEKEOĞLU
Danışmanlar: PROF. DR. MÜJDE GÜZELKAYA
Tez Türü: Yüksek Lisans
Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Mühendislik Bilimleri, Computer Engineering and Computer Science and Control, Engineering Sciences
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2022
Dil: Türkçe
Üniversite: İstanbul Teknik Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Kontrol ve Otomasyon Mühendisliği Ana Bilim Dalı
Bilim Dalı: Kontrol ve Otomasyon Mühendisliği Bilim Dalı
Sayfa Sayısı: 93

Özet

Geleneksel kontrol kuramında, sisteme ait matematiksel model üzerinden kontrolör tasarımı yapılırken, bu modelin sistemin tüm özelliklerini sürekli sağladığı varsayılır. Bununla birlikte, sistem tam olarak bilinmediğinde ve çalışma koşulları değişken olduğunda klasik teknikler kontrolör tasarımı için tam olarak uygun değildir. Bu problemlerin üstesinden gelmek için, yapay zeka alanından birçok uyarlanabilir yaklaşım ortaya çıkmıştır. Bu tez çalışmasında, uyarlanabilir PID kontrolü için çevrimdışı ve çevrimiçi bir pekiştirmeli öğrenme algoritması önerilmektedir. Pekiştirmeli öğrenme makine öğrenmesinin bir parçasıdır. Bir çevredeki bir temsilcinin, genellikle ödül veya ceza alarak eylemleri kendi başına almayı öğrendiği süreçtir. Sonlu durumlara ve sonlu eylemlere sahip ortamdaki temsilci, mevcut durumu için tüm olası eylemlerden rastgele bir eylem yapmaya karar verir. Temsilcinin belirli bir duruma ulaşma olasılığı, önceki durum için yaptığı işlemden etkilenir. Bu nedenle, sonraki durum önceki duruma ve temsilcinin eylemine bağlıdır. Literatürde çeşitli pekiştirmeli öğrenme algoritmaları mevcuttur. Pekiştirmeli öğrenmenin amacı, belirsiz bir ortamda bir görevi tamamlaması için bir temsilciyi eğitmektir. Temsilci, çevreden gözlemler ve bir ödül alır ve çevreye eylemler gönderir. Ödül, bir eylemin görev hedefini tamamlama açısından ne kadar başarılı olduğunun bir ölçüsüdür. Q-öğrenme algoritması kolaylık ve iyi yakınmasından dolayı pratikte pekiştirmeli öğrenme algoritmaları içinde en yaygın olandır. Q-öğrenme algoritmasında bir“Q- değeri”her durum-eylem çiftine atanmaktadır. Q-değeri beklenilen uzun-vadeli ödül değerini yansıtmaktadır. Pekiştirmeli öğrenme yönteminde, öğrenim aşamasında hiçbir yönetici veya uzman, seçilen kontrol faaliyetinin kalitesini değerlendirmek için bulunmamaktadır. Bu nedenle, prosedürün değerlendirmesi, sadece uzun bir çalışma aşamasından sonra yapılmaktadır. Bu tezde Q-öğrenme algoritması kullanılarak PID kontrolörünün başarımı artırılmaya çalışılmıştır. PID kontrolör parametrelerini, kapalı çevrimli sistemin başarımını belli bir davranış ölçütüne göre en iyilemek amacıyla Q-öğrenme algoritmasına dayalı olarak ayarlayanabilir yöntem önerilmektedir. Bu amaçla bir ceza ve ödül ataması yapılmıştır. Davranış ölçütünün başarımını iyileştirmek için sistem çıkışında istenen değişiklikler kontrol işareti üzerinden yapılmaktadır. En iyileştirilecek parametreler hata işareti ve kontrol işareti olarak belirlenmiştir. Her PID parametresine bağlı durum ve ödül değerleri için bir Q-değeri tanımlanmıştır. Bu Q-değerleri adım adım Q-öğrenme algoritması tarafından güncelleştirilmektedir. Böylece öğrenim prosedürü, en iyi PID parametreleri belirlemektedir. Başlangıçta durum, Q tablosu ve ödül tablosu sıfır olarak ayarlanmış ve kontrolörün çeşitli şartlar altında çalıştırılması ile Q tablosu güncellenerek son halini almıştır. Tasarlanılan kontrolörün tipi, verilen süre, çevrimdışı veya çevrimiçi tasarım yöntemi kullanılmasına bağlı olarak kontrolör tasarımı aşaması uzundur. Ancak parametreler değişmediği duruma geldiğinde algoritmadan çıkılabilir. Davranış ölçütü kullanılarak farklı parametre değerleri ile elde edilen her basamak yanıtı sonunda tasarlanılan kontrolörün etkinliği bir değer ile ölçülmüş olur. Bu çalışmada çevrimdışı pekiştirmeli öğrenme yönteminde davranış ölçütü olarak karesel hata integrali (ISE) ve çevrimiçi pekiştirmeli öğrenme yönteminde davranış ölçütü olarak durum yani sistem çıkış değeri kullanılmıştır.

Özet (Çeviri)

In traditional control theory, while the controller is designed over the mathematical model of the system, it is assumed that this model provides all the features of the system continuously. However, classical techniques are not fully suitable for the controller design when the system is not fully known and the operating conditions are variable or unknown in advance. If the plant has nonlinear properties and they change over time, the difficulty of this task increases significantly. To overcome these disadvantages, many adaptive approaches have emerged, mainly from the field of artificial intelligence. Reinforcement learning in control system development allows to transfer basic attention from exploration of plant properties to development of universal control system that is capable to adapt to plant properties and guarantee necessary control quality. Reinforcement learning is a part of machine learning. Reinforcement learning control system is capable to provide control signal that is close to optimal value by trial-and-error exploration. The process by which an agent in an environment learns to take actions on their own, usually by receiving a reward or punishment. The agent in the environment with finite states and finite actions decides to take a random action out of all possible actions for his current state. The probability of the agent reaching a particular state is affected by the action it took for the previous state. Therefore, the next state depends on the previous state and the action of the agent. The purpose of reinforcement learning is maximizing of total reward. In order to do that it trains an agent to complete a task in an uncertain environment. The agent observes from the environment and receives a reward and sends actions to the environment. Reward is a measure of how successful an action is in completing the task objective. Reinforcement learning is about making sequential decisions using experiences to attain a goal over many steps. The process of Reinforcement Learning involves these simple steps:observation of the environment, deciding how to act using some strategy, acting accordingly, receiving a reward or penalty, learning from the experiences and refining our strategy and iterate until an optimal strategy is found. While other types of artificial intelligence perform what you might call perceptive tasks, like recognizing the content of an image, reinforcement learning performs tactical and strategic tasks. Games are a good proxy for problems that reinforcement learning can solve, but it is also being applied to real-world processes in the private and public sectors such as robotics, industrial operations, supply chain & logistics, traffic control etc. The Q-learning algorithm is the most common of the reinforcement learning algorithms in practice due to its convenience and good complaint. It will find the next best action, given a current state. It chooses this action at random and aims to maximize the reward. Q-learning is a model-free, off-policy and value-based reinforcement learning that will find the best action, given the current state of the agent. Depending on where the agent is in the environment, it will decide the next action to be taken. The algorithm estimates its optimal policy without the need for any transition or reward functions from the environment that makes it model-free. The algorithm updates its value functions based on equations rather than estimating the value function with a greedy policy that makes it value-based. We used Bellman equation in this thesis. The algorithm learns from its own actions and does not depend on the current policy that makes it off-policy. %The objective of the model is to find the best course of action given its current state. To do this, it may come up with rules of its own or it may operate outside the policy given to it to follow. This means that there is no actual need for a policy, hence we call it off-policy. %Model-free means that the agent uses predictions of the environment's expected response to move forward. It does not use the reward system to learn, but rather, trial and error. Most important terms are used in Q-Learning are summarized like that. Agent, takes actions. Environment is the world through which the agent moves, and which responds to the agent. State, represents the current situation of an agent in an environment. Action, is the set of all possible moves the agent can make. Reward, it is the feedback by which we measure the success or failure of an agent's actions in a given state. Q-Value, is used to determine how good an action taken at a particular state. The Q-learning algorithm is the most common of the reinforcement learning algorithms in practice due to its convenience and good complaint. In the Q-learning algorithm, a“Q-value”is assigned to each state-action pair. The Q-value reflects the expected long-term reward value. In the reinforcement learning method, no manager or expert is present at the learning stage to evaluate the quality of the selected control activity. Therefore, the evaluation of the procedure is defined only after a long phase of action. In this study, a method that adjusts a closed-loop system based on the Q-learning algorithm in order to maximize or minimize a certain performance criterion is proposed. The vector parameters to be optimized for the adjustment of the parameters of the PID controller, the controller input, the error and the control signal are selected as outputs. The Q-value is defined for each PID parameter based on state and reward values. These Q-values are updated step by step by the Q-learning algorithm. Thus, the learning procedure determines the best PID parameters. At first, the situation is finalized by updating the Q table by operating the controller under various conditions, while the Q table, the reward table are zero. We propose an off-line and on-line reinforcement learning algorithm for PID controller design. In off-line reinforcement learning, we tried to obtain minimum ISE. If algorithm finds the minimum ISE, reward will be maximum. We initialized PID parameters with small values such as 0.5. After that every possible PID parameters are assigned with twenty seven options. Currenct state, possible next states, possible next states of next states are calculated. Based on obtained parameters minimum ISE value is founded and maximum reward is given for that condition. This process repeats until obtaining the PID controller parameters which give minimum ISE. In on-line reinforcement learning, PID parameters which are obtained in off-line reinforcement learning are used as the initial parameters. The goal of online reinforcement learning is to minimize the error and try to set status as one. Controller and plant interact with each other in every 0.1 seconds. At every time step, the controller receives information about the current plant state, possible next plant states based on the plant state obtained. After obtaining next states, next of next states are obtained to calculated Q values and rewards. According to these information, possible control signals are also carried out. While clarifying the algorithm, it was tried to find the appropriate method by going step by step. Firstly, P, then PI and finally PID parameters were calculated. Education is tried to be provided by assigning a punishment or reward. With this learning method, desired changes are made in the system output via the check mark in order to improve a performance criterion. The controller design process changes, depending on the type of controller designed. In offline reinforcement learning, if agent starts to get maximum rewards, exits from algorithm. In online design method, if agent start the get maximum reward it keeps calculating based on given process time. The off-line and on-line PID tuning algorithms proposed in the thesis are compared with each other and with PID controller the parameters of which are searched using genetic algoritm. ISE value, percent overshoot, settling time and disturbance rejection performance are selected as comparison criteria.

Benzer Tezler

Tez No
349567
Fuzzified Q-learning algorithm in the design of fuzzy PID controller
Bulanık mantık kontrolörün tasarımında kullanılan bulanik Q-öğrenme algoritması
VAHİD TAVAKOL AGHAEI
Yüksek Lisans
İngilizce
2013
Elektrik ve Elektronik Mühendisliği İstanbul Teknik Üniversitesi
Kontrol ve Otomasyon Mühendisliği Bilim Dalı
PROF. DR. İBRAHİM EKSİN
Tez No
739434
Pekiştirmeli öğrenme yöntemi ile optimal dc motor hız kontrolcüsünün tasarlanması
Optimal DC motor speed controller design with reinforcement learning algorithm
BEKİR MURAT AYDIN
Yüksek Lisans
Türkçe
2022
Elektrik ve Elektronik Mühendisliği Sakarya Üniversitesi
Elektrik ve Elektronik Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ BURHAN BARAKLI
Tez No
779380
Trajectory tracking control of a quadrotor with reinforcement learning
Pekiştirmeli öğrenme ile bir quadrotor'un yörünge takip kontrolü
EREN ÇAKMAK
Yüksek Lisans
İngilizce
2023
Elektrik ve Elektronik Mühendisliği İstanbul Teknik Üniversitesi
Kontrol ve Otomasyon Mühendisliği Ana Bilim Dalı
PROF. DR. MUSTAFA DOĞAN
Tez No
907316
Gerçek zamanlı endüstriyel kontrol sistemleri için makine öğrenmesi temelli yaklaşımlar
Machine learning approaches for real-time industrial control systems
SÜLEYMAN MANTAR
Doktora
Türkçe
2024
Elektrik ve Elektronik Mühendisliği Bursa Uludağ Üniversitesi
Elektronik Mühendisliği Ana Bilim Dalı
DOÇ. DR. ERSEN YILMAZ
Tez No
468300
Bilinen ortamlarda, otonom hareketlerin ve yol planlamasının olduğu robotik sistem tasarımı
Design of robotic system comprising autonomous movements and path planning in the known environments
HALİL ÇETİN
Yüksek Lisans
Türkçe
2017
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Selçuk Üniversitesi
Elektrik-Elektronik Mühendisliği Ana Bilim Dalı
YRD. DOÇ. DR. AKİF DURDU

Geri Dön