Yüzey kurma probleminin transputer tabanlı sistemlerde paralel çözümü
Başlık çevirisi mevcut değil.
- Tez No: 75161
- Danışmanlar: YRD. DOÇ. DR. COŞKUN SÖNMEZ
- Tez Türü: Yüksek Lisans
- Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
- Anahtar Kelimeler: Belirtilmemiş.
- Yıl: 1998
- Dil: Türkçe
- Üniversite: İstanbul Teknik Üniversitesi
- Enstitü: Fen Bilimleri Enstitüsü
- Ana Bilim Dalı: Bilgisayar Mühendisliği Ana Bilim Dalı
- Bilim Dalı: Belirtilmemiş.
- Sayfa Sayısı: 80
Özet
ÖZET Gerçek zamanlı veya ağır işlem yükü gerektiren hesaplamalar içeren uygulamaları yeterince hızlı sonlandırmak amacı ile daha yetenekli işlemcilerin geliştirilmesine alternatif olarak problemlerin birden fazla işlemci kullanarak paralel ortamlarda çözüm aranması fikri yaygınlık kazanmıştır. Donanım alt yapısı haberleşmeye dayanan, ucuz maliyetli işlemcilerin ortaya çıkması ile, uygun paralel yöntemler seçilerek ölçeklenebilir paralel sistemler kurulabilmektedir. Bu tür bir işlemci olan INMOS T805 ile yapılan çalışmada bir görüntü işleme problemi olan yüzey kurma problemi ele alınarak paralel bir uygulama geliştirilmiştir Bir çok görüntü işleme probleminde olduğu gibi, kullanılan yüzey kurma algoritması da çok miktarda hesaplama gerektirmektedir. Problemin temelinde, bir karma enerji fonksiyonelinin en aza indirgenmesini sağlayan iteratif denklemlerin yakınsama süresinin uzun olması yatmaktadır. Görüntü matrisine ait tüm elemanlar üzerinde uygulanan iteratif denklemlerin sonlanma koşulu iki iterasyon sonucu arasındaki toplam farkın önceden belirlenen bir değerden küçük olmasıdır. Tamamlanması uzun süre alan bu işlemler paralelleştirilerek birden fazla işlemci ile daha çabuk sonlanmaları sağlanabilmektedir. Algoritmanın, matris üzerindeki her bir eleman için ayrı ayrı uygulanıyor olması, matrisin parçalara bölünerek her bir parçasının farklı işlemcilerde eş zamanlı işlenmesine olanak tanımaktadır. Bu yapıya uygun olarak işlemci tarlası yaklaşımı uygulanmıştır. Bu yaklaşım gereğince uygulama ana görev ve işçi görev olarak adlandırılan iki görevden oluşmaktadır. Ana görev, görüntü matrisinin parçalara ayrılmasını, bu parçaların diğer işlemcilere gönderilmesini ve sonuçlarının alınmasını, görüntü matrisine sonuçların yerleştirilmesini ve sonlanma koşulunun kontrolünü gerçekleştirir. İşçi görev ise ana görevden aldığı paketler üzerinde algoritmayı uygular ve sonuçlan ana göreve gönderir. Ana görev sadece bir transputera yüklenirken işçi görevin bir kopyası ağ üzerindeki tüm transputer'larda bulunur. İşlemci tarlası yaklaşımının bir avantajı kurulan sisteme işlemci eklemek veya çıkarmak için geliştirilen program kodu üzerinde herhangi bir değişiklik yapmaya veya tekrar derlemeye gerek olmamasıdır. Ağ konfigürasyonu üzerindeki değişiklikler için fiziksel bağlantıların sağlanması yeterlidir. 4 adet INMOS T805 tipi transputer kullanılarak bir çoklu-bilgisayar sistemi kurulmuştur. Saat hızı 20 MHz olan bu işlemcinin üzerinde 1 Mbyte'lık bellek, 4 Kbyte'lık cep bellek ve 4 adet hızlı seri iletişim bağlantısı bulunmaktadır. Her bir bağlantı için tek yönlü, bir çift kablo bulunmaktadır ve bu bağlantılar aracılığı ile saniyede 0.8Mbyte'lık veri bir transputer' dan diğerine aktarılabilmektedir. Yapılan çalışmada farklı konfigürasyon şekilleri, farklı büyüklükteki haberleşme paketleri ve farklı sayıda transputer kullanarak denemeler yapılmıştır. Paralelleştirme işleminin etkinliğini belirlemek için başarım ölçütleri kullanılmıştır. 4 transputer' dan oluşan sistemde başarım ölçütleri arasında en önemlisi olan hızlanma 2.45 değerine kadar yükseltilebilmiştir. vııı
Özet (Çeviri)
SUMMARY Many of today's advanced research problems need greater computing power at high speeds. Examples of this kind include artificial intelligence, image processing, robotics, signal processing, molecular physics, weather forecasting and space sciences. An approach to gain speed in computing is by parallelism. In this concept, many computers would work together, all simultanously executing some portions of a procedure used for solving a problem. Two basic assumptions,[l], has to be made in order to obtain an advantage in parallelism such as:. The availability of many low-cost, high speed computers that can be put together to work in union.. The existence of a strategy to partition a problem into smaller problems, such that most of these can be solved simultaneously, and from which we can easily construct the solution to the entire problem. Based on the Flynn, taxonomy of parallel systems we have the SISD, SIMD, MISD, MIMD classes of parallel computers, [2],. In MIMD (multiple instruction multiple data) architectures, different instructions are executed simultaneously in different processors acting on different data with the added requirement that they all contribute to solving a common problem. Transputer based machines are MIMD machines and also belong to massage-passing architectures. Massage passing is the technique, processors must use to exchange information with each other without using remote memory access. Transputer Transputers are high performance microprocessors that support parallel processing through on-chip hardware. They can be connected together by their serial links in application-specific ways and can be used as the building blocks for complex parallel processing systems. Multi-transputer systems can be built very simply. The four high speed links allow transputers to be connected to each other in arrays, trees and many other configurations. The circuitry to drive the links is all on the transputer chip and only two wires are needed to connect two transputers together. Each individual transputer also supports communication between parallel processes through' a system of internal links implemented as words in memory. Processes waiting for input or output, or waiting on a timer, consume no CPU resources, and process context switching time can be as little as one microsecond. The communication links between processors operate concurrently with the processing unit and can transfer data simultaneously on all links without the intervention of the CPU. Scematic overview of the transputer can be seen in Figure- 1. There are 2 general approaches to parallelisation - algorithm parallelism and data parallelism. In algorithmic parallelisation the algorithm may be split up into several IXsections which can be run in parallel, and the data is passed between these sections running on different processors. This method can attain typical efficiencies of 50 % of the ideal,[3],. In data parallelism, instead of splitting the algorithm into several sections, it may be possible to split the data to be processed between several processors running the same algorithm. Efficiencies of 80% have been reported for this method. One drawback of splitting the data into the same number of chunks as there are processors is that if some segments of the data set take longer to process than others,then the entire program will have to wait for the most complex data segment to be processed. The solution to this problem is to split the problem data set into a large number of packets, and to distribute these between the processors, so that when a processor finishes one packet, the next is sent to it. This is known as task- farming. Output Input Output Input Output Input Output Input Figure 1 Schematic overview of transputer For many parallel computations it is useful to be able to create applications which will automatically configure themselves to run on any network of transputers. Such applications will automaticly run faster when more transputers are added to a network, without recompilation or reconfiguration. In the processor farm technique, an application is coded as one master task which breaks the job down into small, independant pieces called work packets which are processed by any number of anonymous worker tasks. Work packets are automatically distributed across an arbitrary network of transputers by routing software. All of the worker tasks must run the same code. Each worker simply accepts work packets, processes them, and sends back result packets via the same routing software. The master task has to be written for the application which will split the job into independent work packets which theworker tasks can handle without communicating with other tasks. Many computationally intensive applications can in fact be jmplemented by processor farms, particularly graphics applications like ^rfacejrerons1ructiöff~o^ the earlier compressed images, where different sections of the screen~carrİ5g^ worked on independently. This thesis' s purpose is to develop a parallel approach on the surface reconstruction problem which is a step in decompression of the earlier compressed image according to the centipede model, [4]. In the compression level the following steps are applied to the real image concerning the centipede model.. Edge map is obtained by generalized edge detector,. Edges are traced to obtain distinct contours,. Contours are ordered regarding to weighted sum of their length, mean contrast and mean curvature. Segments with high priority are selected, others are eliminated.. The edge map“ together with the image intensities is used to extract ”the model parameters,. Polynomial coefficients are quantized and sent with the edge map coded by differential chain coding followed by Huffman coding. After these steps being completed, we have got two images for reconstruction:. Binary edge map. Intensity image obtained from model parameters. Since both images are sparse, hybrid energy functional is used to span a surface through these points. For this purpose, the surface reconstruction problem is set as finding a function f(x,y) which minimizes *~~ E{f;X,x)= JJ(P(x,jO(/(x,JO- d(x,y)f + n W-t)(fx2(x,y)+f?(x,y))+ (i) Wİ(*. JO + 2fi(*,y) + fl{x,y))dxdy where X controls the smoothness of the surface and x controls the continuity of the surface. An iterative solution of (1) is. obtained by SOR,[5], (Sucessive Over Relaxation) iterations as: Aj -f,j ~Y^~ (2-a) ^P = (Py + 4^(i + 4t))/;;;) - mi + 7t)Q/X?} + fSff + f& + fSl ] JiJ + 2Xt[f%;\ + f%% + f^ + /$ +I ] (2.b) + WHS + f$2 + f& +fSİ2\- Mu XIThe iterations terminate when the condition \\ft"+Y>-ftj II ( e is satisfied for a specified 8. In this study, surface reconstruction is implemented on a network of 4 transputers using the processor farm approach. According to this approach, two tasks generated. One is the master task which splits up the matrix into smaller packets, sends and receives them, and puts the result packets into the right location of the main matrix. The master task also has the responsibility of deciding whether to stop or continue making iterations. The master task takes place only in one of the transputers which is the root transputer. The worker task is the task where all the job is being done. It receives packeta which are prepared by the master task, applies the equations given in (2), and sends the result packets back to the master task via its links. It stays idle until another packet is received. The worker task runs on all the nodes of the transputer network. Performance is of paramount importance in parallel programming. Measuring the performance of a parallel program is a way to assess how efficient the efforts have been at dividing the application into modules cooperating with each other in parallel. The first and the most important measure of performance is the speedup which relates to execution speed, and is a measure of how much faster the program runs on the parallel machine than it does on a serial machine. Speedup = (3) Speedup is measured using different types of configuration, different packet sizes and different types of packet data. Speedup for different configuration types are shown in Table- 1 Table 1 Speedups calculated for different configurations XllIn order to gain a better performance, the matrix is splitted into pieces of different sizes by the master task. The main image expressed by a 128x128 matrix is divided into smaller pieces of lxl, 2x2, 4x4, 8x8, 16x16 each relates to the packet sizes. The speedup is calculated for each different packet size. The results are shown in Figure 2. 2 3 No. of transputers Figure 2 Speedup for different packet sizes As seen in the Figure 2 the best performance is achived by using packets consisting of 16x16 matrix pieces. The larger the size of the packets selected, less number of packets have to be constructed by the master task. For each packet, the router task which is a high priviliged one starts to run on the nodes through which the packet travels including the root transputer. The master task on the root tranputer and the worker task on other transputers have to suspend in order to give way for the router task ^independently of the size of the packet. There is also a time penalty because of the context switching which takes place on each node the packet travels. For these reasons, having less numbers of packets for one iteration gives better performance if the packet sizes are small enough not to lock the network traffic. In this study the largest packet that doesn't lock the network is found as 16x16 work packets which occupies less than 1 Kbyte. The next measure of performance is the processor utilization. This measure is applied to all the processors active to solve a given application, and reports -how much work each one contributes. Assuming that N processors are used by a parallel program, the utilization of Processor Pi is defined as: Utilization(Pt) = ComputeTime{Pi ) IdleTime{Pi ) + ComputeTime{Pi ) (4) In Figure 3, the calculated utilisation of 4 transputers are shown. The first transputer has slightly better utilisation than the other three. It is the root transputer on which the master and the worker task is running concurrently. It spends its time whether xnipreparing, and receiving the packets or making calculations of the worker task. However the other three transputers may stay idle for a while if it has finished the job of one packet and been waiting another one to arrive. Figure 3 The utilisation graphics for 4 transputers of the network Another measure for performance is the efficiency which is defined as the speedup divided by the number of processors: Efficiency{N) = Speedup(N) N (5) Efficiency values calculated using different packet sizes with 1 to 4 transputers are shown in Table 2. Table 2 Efficiencies calculated for 4 transputer system for different packet sizes XIV
Benzer Tezler
- Bayes tümleştirme teknikleri kullanılarak yüzey kurma ve ayrıt sezme
Visual surface recontruction and boundary detection using bayesian integration
BİLGE GÜNSEL
Doktora
Türkçe
1993
Elektrik ve Elektronik Mühendisliğiİstanbul Teknik ÜniversitesiPROF.DR. ERDAL PANAYIRCI
- Performance evaluation of current density based magnetic resonance electrical impedance tomography reconstruction algorithms
Akım yoğunluğu tabanlı manyetik rezonans elektriksel empedans tomografisi geriçatım algoritmalarının performans değerlendirmesi
RASİM BOYACIOĞLU
Yüksek Lisans
İngilizce
2009
BiyomühendislikOrta Doğu Teknik ÜniversitesiElektrik-Elektronik Mühendisliği Ana Bilim Dalı
PROF. DR. MURAT EYÜBOĞLU
- Baraj göllerinde deprem sırasında oluşan hidrodinamik basınçların analizi
Analysis of earthquake induced hydrodynamic pressures in dam-reservoirs
ENDER DEMİREL
Doktora
Türkçe
2008
Deprem MühendisliğiEskişehir Osmangazi Üniversitesiİnşaat Mühendisliği Ana Bilim Dalı
DOÇ. DR. İSMAİL AYDIN
- Dynamic frictional contact problems involving functionally graded materials
Fonksiyonel derecelendirilmiş malzemeler içeren dinamik sürtünmeli temas problemleri
MEHMET NURULLAH BALCI
Doktora
İngilizce
2018
Makine MühendisliğiOrta Doğu Teknik ÜniversitesiMakine Mühendisliği Ana Bilim Dalı
PROF. DR. SERKAN DAĞ
- Sızdırmazlık elemanlarına çevre koşullarının etkisinin deneysel incelenmesi
Experimental investigation of the impact of environmental conditions on sealing elements
ÖNDER TÜRKÖLMEZ
Yüksek Lisans
Türkçe
2020
Makine Mühendisliğiİstanbul Teknik ÜniversitesiMakine Mühendisliği Ana Bilim Dalı
DR. ÖĞR. ÜYESİ ZEYNEP PARLAR