Micro-architectural support for improving synchronization and efficiency of SIMD execution on GPUs
Başlık çevirisi mevcut değil.
- Tez No: 401272
- Danışmanlar: PROF. DAVID KAELI
- Tez Türü: Doktora
- Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
- Anahtar Kelimeler: Belirtilmemiş.
- Yıl: 2013
- Dil: İngilizce
- Üniversite: Northeastern University
- Enstitü: Yurtdışı Enstitü
- Ana Bilim Dalı: Belirtilmemiş.
- Bilim Dalı: Belirtilmemiş.
- Sayfa Sayısı: 181
Özet
Özet yok.
Özet (Çeviri)
GPUs dedicate a majority of their transistor budgets to compute units rather than control logic. As a result, they can achieve excellent data-parallel power/performance. Given the continual demands for performance and power effciency, GPUs have become todays compute accelerators for many application domains. The general purpose community has been focusing on developing strategies to move a broader class of applications to these powerful devices. The underlying GPU architecture has been adapted to run a limited class of general purpose computations present across a range of applications. Many applications have already been ported to GPU platforms to take advantage of the potential data-parallel performance that GPUs afford. But there still remain barriers to migrating a broader class of applications onto GPUs. Being originally designed to run 3-D graphics, GPUs are highly optimized for graphics workloads. Graphics workloads possess a high degree of uniformity in their execution. Therefore, GPU architectures are optimized for effcient uniform execution. GPUs achieve high performance with data-parallel applications possessing regular control ow (i.e., predictable loops) and data access patterns that can effectively exploit high off-chip memory bandwidth. However, many general-purpose real world applications differ from graphics workloads { they come with large input sets exhibiting irregular access and synchronization patterns, and they possess varying computational granularity and irregular control ow. The current requirements for uniformity and predictability present barriers to moving a broader range of applications to GPUs. We believe if GPUs are going to become a mainstream computing device that it is necessary to relax some of these constraints. Only then can a wider variety of applications exploit the computational power of GPUs. One critical barrier present in non-uniform data-parallel applications is the need to synchronize between threads. Fine-grained synchronization is needed to support shared data access, especially when faced with irregular access and communication patterns. This dissertation presents a new approach to enhance the efficiency and scalability of GPU synchronization. The proposed scheme can enable applications that work on shared data to effectively communicate at finer levels of granularity. To achieve this ambitious goal, we propose a new synchronization approach called Hierarchical Queuing Locks (HQL). HQL is a novel hardware-based synchronization mechanism which provides ecient use of resources through execution blocking and hierarchical queuing. To provide a queue-based locking mechanism, HQL extends current GPU L1 and L2 cache management protocols by adding a synchronization protocol. Integration of HQL's synchronization protocol simplifies the synchronization, but adds a level of complexity to the cache management protocol. Given this added complexity to the cache management scheme, as part of this dissertation we provide a formal verification of the proposed HQL synchronization protocol. To evaluate the benefits of HQL, we start with studying a set of micro-benchmarks that represent highly irregular applications that require frequent synchronization. We additionally evaluate macro-benchmarks that utilize synchronization. We report on both the performance benefits and the savings in terms of instructions executed. Building upon the efficient fine-grained synchronization support provided for by HQL, we explore ScalarWaving (SW) and Simultaneous Scalar and SIMD groupWaving (SSSW) architectures to further improve efficiency of SIMD execution on GPUs. These two mechanisms attempt to reduce the amount of redundant computations performed by the threads in a SIMD group. SW and SSSW improve SIMD eciency for both irregular and regular applications. We motivate this work by reporting on the percent of redundant computations present in a range of workloads. We then quantitatively evaluate the benefits of SW and SSSW architectures using programs taken from four different benchmark suites. The impact of this dissertation design architectural features that can make the benefits of GPU computing available to a much wider range of applications. These kind of enhancements can only further accelerate the adoption of GPUs as a first-class computing device.
Benzer Tezler
- Yüklenici firmaların yenileşim yaklaşımlarının değerlendirilmesi
Evaluation of contracting firms' innovation approaches
AKIN TOLGA İLTER
- An uninterrupted urban walk: 3d analysis methods for supporting the design of walkable streets
Kentte kesintisiz bir yürüyüş: Yürünebilir sokakların tasarım desteği için 3b analiz yöntemleri
ELİF ENSARİ SUCUOĞLU
Doktora
İngilizce
2020
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrolİstanbul Teknik ÜniversitesiBilişim Ana Bilim Dalı
PROF. DR. MİNE ÖZKAR KABAKÇIOĞLU
- Java virtual machine implementation on micro-C/OS-II real-time operating system
Micro-C/OS-II gerçek zamanlı işletim dizgesi üzerinde java sanal makinesi gerçekleştirimi
ALP BÜLENT BURÇ SÜRMELİ
Yüksek Lisans
İngilizce
2005
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolÇankaya ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
PROF.DR. TURHAN ALPER
- Sürdürülebilir kentleşme sürecinde, İstanbul Kağıthane deresi çevresindeki kent içi konut yerleşimlerinin ekolojik koridor yerleşim ilkeleri bağlamında analizi
Analysis of urban residential settlements around İstanbul Kağithane stream in the context of ecological corridor settlement principles for sustainable urbanization
DİLARA ŞİMŞEK
Yüksek Lisans
Türkçe
2024
Mimarlıkİstanbul Teknik ÜniversitesiKentsel Tasarım Ana Bilim Dalı
PROF. DR. HATİCE AYATAÇ
- Kentsel alanlarda kullanılan odunsu bitki taksonlarının ekosistem hizmetleri bağlamında incelenmesi; Rize kenti örneği
Investigation of woody plant taxa used in urban areas in context of ecosystem services; case of Rize
YEŞİM ÖZCAN
Yüksek Lisans
Türkçe
2022
Peyzaj MimarlığıArtvin Çoruh ÜniversitesiPeyzaj Mimarlığı Ana Bilim Dalı
DOÇ. DR. DERYA SARI