Reinforcement learning for stock markets

Hisse senetlerı̇ için pekiştirmeli öğrenme

PDF İndir

Tez No: 682735
Yazar: UĞUR HAZIR
Danışmanlar: DR. ÖĞR. ÜYESİ TANER DANIŞMAN
Tez Türü: Yüksek Lisans
Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 2021
Dil: İngilizce
Üniversite: Akdeniz Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Bilgisayar Mühendisliği Ana Bilim Dalı
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 78

Özet

“Finans sektöründe hangi hisse senedinin ne zaman alınıp satılabileceğine ve ne zaman hamle yapılması gerektiği hakkında karar vermeyi sağlayan pek çok model vardır”(Hazır ve Danışman 2020) . Bu tezde Derin-Q-Ağı ajanının Borsa İstanbul için performansı test edilmiş olup Borsa İstanbul'da yer alan BIST30 endeks hisselerine odaklanılmıştır. Ham veriler Mynet web sitesinden Google Chrome gezgini üzerinde JavaScript fonksiyonlarından yararlanılarak elde edilmiş olup operasyonlardan önce işlenmiştir. İlk önce; her bir hisse için Stokastik osilatör ve MACD değerleri hesaplanır. Bu hesaplamalar esnasında başlangıç verisi hesaplamalar için harcanır ve bu nedenle az miktarda veri hesaplamalar uğruna kaybedilir. İkinci olarak; elde edilen veri, eğitim verisi ve test verisi olarak iki parçaya bölünür. Eğitim periyodu 08.03.2000 tarihinden başlar ki bu işlenmiş verinin başkangıç tarihidir ve 31.12.2014 tarihinde sona erer. Test periyodu ise 01.01.2015 tarihinde başlar ve 21.12.2019 tarihinde sona erer. Verilen herhangi bir günde her bir hisse işlem görmez. Bir hisse verilen günde tüm işlemlere kapatılmış olabilir. Bu nedenle eğitim ve test periyod süreleri her bir hissede eşit değildir. Pekiştirmeli Öğrenme ajanı yaratmak için Derin-Q-Ağı metodu seçilmiştir. Her bir hisse için Q değerlerini hesaplamak üzere Derin-Q-Ağı üretilmiştir. Program, Tensorflow kütüphanesini arka uç olarak kullanan Keras kütüphanesi kullanılarak Python programlama dili ile üretilmiştir. Her bir Derin-Q-Ağına, günlük parametreler, Stokastik ve MACD parametreleri girdi olarak verilmiş olup; bu parametreler şunlardır: Kapanış, Düşük, Yüksek, Hacim, Yıl, EMA12, EMA26, MACD, MACDsinyal9, StokastikMax14, StokastikMin14, StokastikK14, StokastikD14, StokastikMax5, StokastikMin5, StokastikK5 ve StokastikD5. 17 adet girdi parametresi Derin-Q-Ağına verilir; ki bu parametreler veri giriş katmanını oluşturur. İlk gizli katmanda 540 nöron bulunur. İkinci gizli katmanda 180, üçüncü gizli katmanda 64, dördüncü gizli katmanda 32, beşinci gizli katmanda 8 nöron bulunur. Bütün gizli katmanlarda ReLU aktivasyon fonksiyonu kullanılır. Bir Derin-Q-Ağı için eylem alanı bekle, al ve sat eylemlerinden oluşur. Bu nedenle çıktı katmanında 3 nöron bulunur. Çıktı katmanında lineer aktivasyon fonksiyonu kullanılır. Lineer aktivasyon fonksiyonu nörondaki değerin hiç bir matematiksel operasyona maruz bırakılmadan doğrudan kullanılmasını sağlar. Tekrar Hafızası geçmiş tecrübeleri kullanarak bir sonraki eylemi tahmin etmek için kullanılır. Eğitim periyodu için eğitim verisi girdi olarak verilir ve program her bir Derin-Q-Ağı için 5000 bölüm kadar çalıştırılır. Ajan 1000 TL anapara ile başlar ve daha zorlu bir ortam yaratmak adına her bir alım satım eyleminde 1 TL işlem ücreti öder. Ajan yeterince parası varsa ve al sinyali geldi ise satın alma işlemini gerçekleştirir. Ajan elinde hisse varsa ve sat sinyali geldiyse ya da yüzde beş kârlı durumda ise satma işlemini gerçekleştirir. Diğer durumlarda, bekle sinyali geldiyse hisseyi elinde tutar ya da elinde yoksa satın almayı bekler. Q değerlerini tahmin etmek için epsilon açgözlü stratejisini kullanırız. Başlangıçta Derin-Q-Ağı, keşif için daha rastgele hareket edecektir. Sonunda daha az rastgele eylemler yapacaktır çünkü öğrendiği deneyimden faydalanacaktır. 5000 bölüm sonunda keras model kaydetme fonksiyonu ile mevcut durum ve değerleri kaydederiz. Son olarak, test verisi algoritmanın başarısını sınamak için programa girdi olarak verilir. Bu safhada bütün Derin-Q-Ağları eş zamanlı olarak çalıştırılır ve aynı zamanda benim ajanı daha fazla hamle yapmaya zorlamak için icat ettiğim Birleştirilmiş Derin-Q-Ağı metodu hesaplanır. Birleştirilmiş Derin-Q-Ağı metodu, gerçek bir Derin-Q-Ağı metodu değildir. Bu metodda verilen koşullarda en iyi performans sergileyen Derin-Q-Ağına ait hamleler takip edilir. Eğer Birleştirilmiş Derin-Q-Ağı ajanı elinde hisse yok ise kendisini al sinyali veren (eylem olarak al eylemi seçilmiş ise) Derin-Q-Ağına bağlar. Birleştirilmiş Derin-Q-Ağı ajanı eğer bağlandığı Derin-Q-Ağı ajanı sat sinyali verirse (eylem olarak sat eylemi seçilmiş ise) veya elindeki hisse belirlenen oranda kâr ya da zararda ise bağlandğı Derin-Q-Ağı ajanını salar. Birleştirilmiş Derin-Q-Ağı ajanı bir sonraki al sinyalini bekler ve verilen gün için kendisini en iyi performans gösteren Derin-Q-Ağına bağlar. Birleştirilmiş Derin-Q-Ağı ajanı bu şekilde bağlanma ve salma işlemleri yaparak süreci tekrar eder. Test aşamasında epsilon açgözlü stratejisi uygulanmaz; çünkü ajan öğrenme sürecini tamamlamıştır ve Q değerleri artık hesaplanmıştır. Hesaplanan bu Q değerlerinin üzerine yazmak istenmeyen bir durumdur. Sonuç olarak, açıkça görülmektedir ki; ajan öğrendikçe iflas sayısı azalmakta ve son bakiye artmaktadır. Birleştirilmiş Derin-Q-Ağı metodu en iyi metod değildir; fakat eylem sayısını ve aynı zamanda riski de arttırmaktadır.

Özet (Çeviri)

“There are many models at Finance sector to decide which stock to buy or sell and when to act”(Hazır and Danışman 2020, 1) . In this thesis performance of the Deep-Q Network agent for Borsa Istanbul (Istanbul Stock Exchange) is tested. We focused on BIST30 index stocks Borsa Istanbul (Istanbul Stock Exchange) Top 30 index stocks. Pure data is gathered from Mynet website by the help of JavaScript functions using a Google Chrome browser and data is processed before operations. First of all, Stochastic oscillator and MACD values are calculated for each stock. Some small amount of initial data is lost and spend during those calculations. At second, data is splitted into training data and test data for a stock. Training period starts at 08.03.2000 which is the beginning of the processed data and ends at 31.12.2014. Test period starts at 01.01.2015 and ends at 21.12.2019. Not all stock may be open to process during given day. A stock might be banned for all processes for a given day. So that the training and test day period size is not equal for every stocks. Deep-Q Network methodology is chosen to create a Reinforcement Learning agent. For each stock a DQN is generated and run to calculate the Q values. Program is produced using Python programming language by using Keras library which uses Tensorflow library as backend. We give daily parameters, Stochastic and MACD values as input parameters to our DQNs which are : Close, Low, High, Volume, Year, EMA12, EMA26, MACD, MACDsignal9, StochasticMax14, StochasticMin14, StochasticK14, StochasticD14, StochasticMax5, StochasticMin5, StochasticK5 and StochasticD5. We give 17 input parameters to DQN which is our input layer. At first hidden layer there are 540 neurons. At second hidden layer there are 180 neurons. At third hidden layer there are 64 neurons. At forth hidden layer there are 32 neurons. At fifth hidden layer there are 8 neurons. For all hidden layers ReLU activation function is used. For a DQN action space is wait, buy and sell so that we have 3 neurons at output layer. Output layer uses linear as activation function which means the value of the neuron is used directly without any mathematical operation. The Replay Memory is used to predict next action from previous experience. For the training phase we give training data as input and run the program for 5000 episodes for each DQNs. The agent starts with 1000 TL initial money and every transaction costs 1 TL to create a harsh environment. The agent buys if it has enough money and has buy signal. It sells if it has 5 percent profit or sell signal when it has stocks. Otherwise; it waits or holds if wait action is selected as next step. We apply epsilon greedy strategy to forecast Q values. At the beginning DQN will act more randomly for exploration. At the end it will make less random actions because it has learned so it exploits. After 5000 episodes we save the states and values of the DQN by saving keras model. Finally, test data is given as input to calculate the success of the algorithm. We load the last saved state of the DQNs and give the test data as input. At this phase we run all DQNs simultaneously and at the same time The Combined DQN Method (Hazır and Danışman 2020, 5) is calculated which is a process that I invented to force the agent to make more actions. It is not a real DQN but it follows the acts of the best performing DQN during the execution by using given conditions. If The Combined DQN agent has no stock, it attaches itself to the best performing DQN which has a buy signal (which produces buy action). The Combined DQN agent detaches itself if the attached DQN gives sell signal (which produces sell action) or the stock is at the specified rate loss or profit. The Combined DQN agent waits for the next buy action and attaches itself to the best performing DQN for a given day and it repeats the processes of attaching and detaching. Epsilon greedy strategy is not applied at testing phase because the agent is learned and calculated Q values already. It is not desirable if we overwrite those values. In conclusion, it can be clearly seen that as the agent learns the number of bankruptcy drops and end balance is increase. The Combined DQN method is not the best method but it increases the number of operations and also the risk.

Benzer Tezler

Tez No
721325
Deep reinforcement learning approach for trading automation in the stock market
Hisse senetlerinde işlem otomasyonu için derin güçlendirme öğrenme yaklaşımı
TAYLAN KABBANİ
Yüksek Lisans
İngilizce
2021
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Özyeğin Üniversitesi
Veri Bilimi Ana Bilim Dalı
Prof. Dr. EKREM DUMAN
Tez No
895426
Predicting stock prices in bist: A reinforcement learning and sentimental analysis approach
Pekiştirmeli derin öğrenme ve duyarlılık analizi yaklaşımı ile bıstteki hisselerin fiyatlarının tahmin edilmesi
ŞEYMA EĞE
Yüksek Lisans
İngilizce
2024
Endüstri ve Endüstri Mühendisliği İstanbul Teknik Üniversitesi
Büyük Veri ve Veri Analitiği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ MEHMET ALİ ERGÜN
Tez No
721309
Equity portfolio optimization using reinforcement learning: An emerging market case
Pekiştirmeli öğrenme ile hisse senedi portföyü optimizasyonu: Gelişmekte olan piyasa örneği
MERT CANDAR
Yüksek Lisans
İngilizce
2022
Endüstri ve Endüstri Mühendisliği İstanbul Teknik Üniversitesi
Endüstri Mühendisliği Ana Bilim Dalı
PROF. DR. ALP ÜSTÜNDAĞ
Tez No
255594
A mathematical contribution of statistical learning and continuous optimization using infinite and semi-infinite programming to computational statistics
İstatistiksel öğrenme ve sürekli optimizasyon yöntemlerinıin sonsuz ve yarı sonsuz programlama kullanılarak hesaplamalı istatistiğe uygulanması
SÜREYYA ÖZÖĞÜR AKYÜZ
Doktora
İngilizce
2009
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Orta Doğu Teknik Üniversitesi
Bilimsel Hesaplama Ana Bilim Dalı
PROF. DR. GERHARD WİLHELM WEBER
PROF. DR. JOHN SHAWE TAYLOR
Tez No
856607
Optimizing deep reinforcement learning models in stock trading through hyperparameter tuning
Hiperparametre ayarlama ile hisse senedi ticaretinde derin pekiştirmeli öğrenme modellerini optimize etme
ÖMER FIRAT
Yüksek Lisans
İngilizce
2024
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Bahçeşehir Üniversitesi
Bilgisayar Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ TARKAN AYDIN

Geri Dön