Micro-architectural support for improving synchronization and efficiency of SIMD execution on GPUs
Başlık çevirisi mevcut değil.
- Tez No: 401272
- Danışmanlar: PROF. DAVID KAELI
- Tez Türü: Doktora
- Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
- Anahtar Kelimeler: Belirtilmemiş.
- Yıl: 2013
- Dil: İngilizce
- Üniversite: Northeastern University
- Enstitü: Yurtdışı Enstitü
- Ana Bilim Dalı: Belirtilmemiş.
- Bilim Dalı: Belirtilmemiş.
- Sayfa Sayısı: 181
Özet
Özet yok.
Özet (Çeviri)
GPUs dedicate a majority of their transistor budgets to compute units rather than control logic. As a result, they can achieve excellent data-parallel power/performance. Given the continual demands for performance and power effciency, GPUs have become todays compute accelerators for many application domains. The general purpose community has been focusing on developing strategies to move a broader class of applications to these powerful devices. The underlying GPU architecture has been adapted to run a limited class of general purpose computations present across a range of applications. Many applications have already been ported to GPU platforms to take advantage of the potential data-parallel performance that GPUs afford. But there still remain barriers to migrating a broader class of applications onto GPUs. Being originally designed to run 3-D graphics, GPUs are highly optimized for graphics workloads. Graphics workloads possess a high degree of uniformity in their execution. Therefore, GPU architectures are optimized for effcient uniform execution. GPUs achieve high performance with data-parallel applications possessing regular control ow (i.e., predictable loops) and data access patterns that can effectively exploit high off-chip memory bandwidth. However, many general-purpose real world applications differ from graphics workloads { they come with large input sets exhibiting irregular access and synchronization patterns, and they possess varying computational granularity and irregular control ow. The current requirements for uniformity and predictability present barriers to moving a broader range of applications to GPUs. We believe if GPUs are going to become a mainstream computing device that it is necessary to relax some of these constraints. Only then can a wider variety of applications exploit the computational power of GPUs. One critical barrier present in non-uniform data-parallel applications is the need to synchronize between threads. Fine-grained synchronization is needed to support shared data access, especially when faced with irregular access and communication patterns. This dissertation presents a new approach to enhance the efficiency and scalability of GPU synchronization. The proposed scheme can enable applications that work on shared data to effectively communicate at finer levels of granularity. To achieve this ambitious goal, we propose a new synchronization approach called Hierarchical Queuing Locks (HQL). HQL is a novel hardware-based synchronization mechanism which provides ecient use of resources through execution blocking and hierarchical queuing. To provide a queue-based locking mechanism, HQL extends current GPU L1 and L2 cache management protocols by adding a synchronization protocol. Integration of HQL's synchronization protocol simplifies the synchronization, but adds a level of complexity to the cache management protocol. Given this added complexity to the cache management scheme, as part of this dissertation we provide a formal verification of the proposed HQL synchronization protocol. To evaluate the benefits of HQL, we start with studying a set of micro-benchmarks that represent highly irregular applications that require frequent synchronization. We additionally evaluate macro-benchmarks that utilize synchronization. We report on both the performance benefits and the savings in terms of instructions executed. Building upon the efficient fine-grained synchronization support provided for by HQL, we explore ScalarWaving (SW) and Simultaneous Scalar and SIMD groupWaving (SSSW) architectures to further improve efficiency of SIMD execution on GPUs. These two mechanisms attempt to reduce the amount of redundant computations performed by the threads in a SIMD group. SW and SSSW improve SIMD eciency for both irregular and regular applications. We motivate this work by reporting on the percent of redundant computations present in a range of workloads. We then quantitatively evaluate the benefits of SW and SSSW architectures using programs taken from four different benchmark suites. The impact of this dissertation design architectural features that can make the benefits of GPU computing available to a much wider range of applications. These kind of enhancements can only further accelerate the adoption of GPUs as a first-class computing device.
Benzer Tezler
- Geleneksel yapılarda su hasarları ve müdahale yöntemleri
Water related damages in traditional buildings and methods of intervention
DENİZ EZGİ ÜNLÜ
Yüksek Lisans
Türkçe
2025
MimarlıkMimar Sinan Güzel Sanatlar ÜniversitesiMimarlık Ana Bilim Dalı
PROF. DR. OĞUZ CEYLAN
- Yüklenici firmaların yenileşim yaklaşımlarının değerlendirilmesi
Evaluation of contracting firms' innovation approaches
AKIN TOLGA İLTER
- Java virtual machine implementation on micro-C/OS-II real-time operating system
Micro-C/OS-II gerçek zamanlı işletim dizgesi üzerinde java sanal makinesi gerçekleştirimi
ALP BÜLENT BURÇ SÜRMELİ
Yüksek Lisans
İngilizce
2005
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve KontrolÇankaya ÜniversitesiBilgisayar Mühendisliği Ana Bilim Dalı
PROF.DR. TURHAN ALPER
- An uninterrupted urban walk: 3d analysis methods for supporting the design of walkable streets
Kentte kesintisiz bir yürüyüş: Yürünebilir sokakların tasarım desteği için 3b analiz yöntemleri
ELİF ENSARİ SUCUOĞLU
Doktora
İngilizce
2020
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrolİstanbul Teknik ÜniversitesiBilişim Ana Bilim Dalı
PROF. DR. MİNE ÖZKAR KABAKÇIOĞLU
- Döngüsel ekonomi bağlamında yağmur suyu hasadı pratiklerinin kentsel açık alanlardaki performansının ölçülmesi
Measuring the performance of rainwater harvesting practices in urban open spaces within the context of circular economy
ŞAZİYE LOFCALI
Yüksek Lisans
Türkçe
2025
Peyzaj Mimarlığıİstanbul Teknik ÜniversitesiPeyzaj Mimarlığı Ana Bilim Dalı
PROF. DR. HAYRİYE EŞBAH TUNÇAY