Yeni zelanda GPS zaman serileri verisinin bayesci istatistik ile incelenmesi
Investigation of the New Zealand time series data with bayesian statistics
- Tez No: 800569
- Danışmanlar: PROF. DR. GÜRSEL SUNAL, PROF. DR. MEHMET SİNAN ÖZEREN
- Tez Türü: Yüksek Lisans
- Konular: Jeoloji Mühendisliği, Geological Engineering
- Anahtar Kelimeler: Belirtilmemiş.
- Yıl: 2023
- Dil: Türkçe
- Üniversite: İstanbul Teknik Üniversitesi
- Enstitü: Lisansüstü Eğitim Enstitüsü
- Ana Bilim Dalı: Jeoloji Mühendisliği Ana Bilim Dalı
- Bilim Dalı: Jeoloji Mühendisliği Bilim Dalı
- Sayfa Sayısı: 73
Özet
İstatistik dünyasında yıllar boyunca birçok veri işleme, stokastik süreçleri anlama yöntemleri gelişmiştir. Büyük veriler üzerinde kullanılması zor olan bu yöntemlere teknolojinin gelişimi sayesinde getirilen yaklaşımsal ek yöntemler, bu yöntemlerin büyük veriler üzerinde de kullanılmasına olanak sağlamıştır. Bu çalışmada Yeni Zelanda'dan alınan GPS (Küresel Konumlama Sistemi) zaman serisi veri seti bilgisayar tabanlı Bayesci istatistik yöntemleri ile incelenmiştir. Çalışma için Yeni Zelanda'nın Jeolojik ve Nükleer Bilimler Enstitüsü'nden (GNS Sciences) alınan, 2011-2021 yılları arasındaki, günlük mikrometre (mikron; 10-6 m) mertebesinde ölçüm yapılan 146 GPS istasyonlu GPS yer hareketleri veri seti kullanılmıştır. Bu veri seti üzerinde eksik veri noktaları Beklenti Maksimizasyonu algoritması ile doldurulmuştur. Zaman serilerinde mevcut olan trend bileşeni yüksek korelasyona neden olarak analizi zorlaştırdığı için zaman serileri de-trend edilmiştir. Çalışmada doğrusal regresyon yapılmak istenmiştir. Ancak veri setinin büyük olması nedeniyle olası 280 tane regresyon modeli olduğundan problem klasik regresyon analizi ile çözülememektedir. En makul regresyon modelinin teşhisi için Stokastik Arama ile Değişken Seçimi yapılmış bunun için Markov Zincirleri tabanlı bilgisayar Gibbs örneklemesi algoritması kullanılmıştır. Markov zinciri 30000 zincirden oluşacak şekilde tasarlanmıştır. Markov zincirinde potansiyel makul tahmin ettirici değişkenlerin regresyon modelindeki katsayıların, Bayesci istatistiğin de bir kavramı olan, önsel dağılım için“Spike-and-Slab Prior”olarak bilinen bir dağılım modeli önerilmiştir. Oluşturulan Markov zincirinin bütün sonsal dağılımları, bu önsel dağılımla çözümlenmektedir. R programlama dilinde yazılmış“BoomSpikeSlab”paketi bütün sonsal dağılımların girilmiş olduğu bir paket olup çalışmada bu paketten yararlanılmıştır. Çalışmada nihai hedefi Yeni Zelanda'nın tektonik birlikleri hakkında, GPS zaman serileri kullanılarak, yorumlar yapılmasıdır. Bu amaçla kümeleme analizi yapılmak istenmiştir. Bu çalışmada kümeleme analizi için özgün bir metot önerilmektedir. Bu metot, her bir tahmin ettirici değişkenin, yanıt değişkeninin Markov zincirindeki muhtemel 30000 regresyon modeline dahil olma oranlarını k-Ortalama kümeleme algoritmasında kullanılmasıdır. Burada k küme sayısını belirtmekte olup küme sayısını belirlemek için Dirsek Metodu (Elbow Method for k-Means Clustering) olarak bilinen bir metot kullanılmış, ideal küme sayısının 3 olduğu düşünülmüştür. Kümeleme analizi sonucu, bu çalışmada uygulanan tekniklerin, Yeni Zelanda'nın tektonik birlikleri ve yavaş depremler fenomeni ile ilgili anlamlı yorumlar yapılabileceği görülmüştür.
Özet (Çeviri)
In the world of statistics, many methods of data processing and understanding stochastic processes have developed over the years. Linear regression analysis, which is one of these methods, was chosen to be used in this study. Regression methods were impossible to use on large data sets without computers in the past. Thanks to the development of technology, these methods, which are difficult to use on big data, have been introduced, and methods such as regression have been brought to be used on big data. Regression models are the models used in artificial intelligence and machine learning fields today. Regression is nothing more than fitting curves on data points of dependent and independent variables. The problem here is which of the potential curves best represents the dataset. Another problem is that it is difficult to select a potential regression model with a large data set. In this study, the potential number of regression models is 280 due to the size of the data set. This number is 1.5 times larger than the diameter of the universe in kilometers. The problem of choosing the most realistic model out of so many potential models can be solved by algorithms known as Markov Chains, which are based on the distribution of conditional probabilities, put forward by the Russian mathematician Markov. Markov says that the probability revealed as a result of Bernoulli's independent experiments can also be revealed by dependent experiments. Bernoulli proved with the experiments he designed that the probability of the occurrence of an event whose probability is desired to be predicted should have a value close to the result of a series of independent experiments. In other words, the probability that a marble to be drawn from a box containing 1000 red and white marbles will come out red, according to Bernoulli, will be close to the rate of coming red as a result of a series of independent experiments, and as the number of experiments is increased, the test rate will converge to the true value. Bernoulli's most important argument in these experiments is that each attraction must be independent of the previous one. At this point, Markov put forward his revolutionary idea that these experiments could produce a result without being independent of each other. Markov also designed very clever experiments for this idea. Markov claims that the desired probability can be obtained as a result of a series of interdependent probability experiments (chains of probability experiments). Even if we cannot observe it in the universe, all events are more or less dependent on each other. Events that we will never anticipate have an impact on us or on another event. Often these effects are small enough to be ignored. An example of this is the famous idea of the“Butterfly Effect”mentioned by the famous science writer James Gleick. In Geology, for example, an earthquake is triggered by a very small effect right before it starts. This extraordinary claim of the world of probability enables to establish connections between events thanks to the developed methods. In this study, it is aimed to reveal the connection between the ground movements of the existing stations in the GPS (Global Positioning System) time series data set from New Zealand. For the study, a GPS ground motion dataset with 146 GPS stations, which was taken from New Zealand's Institute of Geological and Nuclear Sciences (GNS Sciences), between 2011-2021, with daily measurements at micrometer (10-6 m) level, was used. Missing data points on this data set were filled with the Expectation Maximization algorithm. In statistics, the Expectation maximization algorithm is an iterative search method used to find the maximum likelihood or maximum after-effects estimates of the parameters of statistical models that depend on unobservable hidden variables. Thanks to this method, the missing data is filled in as close to the truth as possible and the analysis continues. After removing the missing data points, the time series of all stations in 3 measurement directions were plotted. It has been observed in the graphs that the eastward time series contain more specific ground movements. Therefore, only the data in the east direction was used in the continuation of the analysis. Because the trend component present in the time series causes high correlation and complicates the analysis, the time series were de-trended. In the study, linear regression was desired. However, the problem cannot be solved by classical regression analysis as there are 2 80 possible regression models due to the large data set. For the diagnosis of the most reasonable regression model, Variable Selection with Stochastic Search was made and a computer Gibbs sampling algorithm based on Markov Chains was used for this. The Markov chain is designed to consist of 30000 chains. A distribution model known as“Spike-and-Slab Prior”has been proposed for the prior distribution of the coefficients in the regression model of potential plausible predictor variables in the Markov chain, which is also a concept of Bayesian statistics. All posterior distributions of the generated Markov chain are analyzed by this prior distribution.“BoomSpikeSlab”package written in R programming language is a package where all posterior distributions are entered and this package is used in the study. The ultimate goal of the study is to make comments on New Zealand's tectonic units using GPS time series. For this purpose, cluster analysis was desired. In this study, a unique method is proposed for cluster analysis. This method is the use of each predictor variable, the rate of inclusion of the response variable in the possible 30000 regression model in the Markov chain, in the k-means clustering algorithm. This inclusion rate determines how important the relevant predictor variable is for the response variable in regression, whether this predictor needs to be present in the regression. The K-means clustering algorithm creates clusters as many as the specified number of clusters through the coordinates entered for each element to be clustered. While doing this, it assigns coordinates for the center of random clusters as much as the determined number of k clusters, and the elements to be clustered are clustered in the form of their distance from this center. In the next step, the centers of these determined clusters are recalculated and all the elements become members of the cluster center closest to them. This process ends when the cluster center does not change again, and this last situation becomes the result of the cluster analysis. Here, k indicates the number of clusters, and a method known as the Elbow Method for kMeans Clustering was used to determine the number of clusters, and the ideal number of clusters was thought to be 3. In the graph of the method made for the estimation of the cluster number, the number of clusters was not clearly revealed as 3. However, k=3 would still be the most reasonable choice. The reason why the number of clusters does not appear clearly is the diversity of the effects on the ground movements, the transitions between the clusters are not very sharp. As a result of the cluster analysis, it was seen that the techniques applied in this study can make meaningful interpretations about New Zealand's tectonic units and slow earthquakes phenomenon. The resulting clusters overlapped with slow seismic zones defined in the literature. As a result of the k-mean clustering analyzes performed by increasing the number of k clusters, the tendency of newly formed clusters to be correlated with slow earthquakes continued. In the analysis performed by shifting 5 days for a specifically selected slow earthquake with a shorter period (150-day data length), it was observed that the clusters that existed before the slow earthquake started lost their cluster quality when the earthquake started and continued. This shows that slow earthquakes have an effect on the effects on the surface movements.
Benzer Tezler
- Probenesid'in tavşanlarda Sevofluran ile Aminoglikozid'in nefrotoksik etkilerini önlemesindeki yeri
Başlık çevirisi yok
MEHMET NİHAT OKUDUCU
Tıpta Uzmanlık
Türkçe
2002
Anestezi ve ReanimasyonFırat ÜniversitesiAnesteziyoloji ve Reanimasyon Ana Bilim Dalı
PROF.DR. ÖMER L. ERHAN
- Genç futbol oyuncularının maç aktivite profillerinin fiziksel performansla ilişkisi
The relationship between match activity profile and physical performance at young soccer players
EMRE RECEP ALİ DEMİRCİ
Yüksek Lisans
Türkçe
2018
SporDokuz Eylül ÜniversitesiHareket ve Antrenman Bilim Dalı
YRD. DOÇ. MEHMET İSMET TOK
- Yeni Zelanda Tavşanı'nın (Oryctologus Cuniculus L.) baş, boyun, önbacak ve göğüs boşluğunda yer alan lenf düğümleri ve büyük lenf kanallarının makroanatomik ve subgros incelenmesi
Macro-anatomic and subgros investigations on the lymph noodles and large lymph canals of the head, neck, forelimb and cavum thoracis in New Zealand Rabbits
İSMAİL ÖNDER ORHAN
Doktora
Türkçe
1997
Veteriner HekimliğiAnkara ÜniversitesiAnatomi Ana Bilim Dalı
PROF. DR. R. MERİH HAZIROĞLU
- Yeni Zelanda tavşanında (Oryctolagus cuniculus L.) karaciğer, karaciğerin damarları (V. Portae, A. hepatica) ve safra kanallarının makro-anotomik ve subgros incelenmesi
The Macro-anatomically and subgrossly investigation of the liver and it's vessels (V. portea, A. hepatica) and biliary ducts in New Zeland rabbit
ÖĞÜT İLKNUR
Doktora
Türkçe
1998
Veteriner HekimliğiAnkara ÜniversitesiAnatomi Ana Bilim Dalı
PROF. DR. SÜLEYMAN TECİRLİOĞLU