Improving SIMD Utilization with Thread-Lane Shuffled Compaction in GPGPU

LI Bingchao; WEI Jizeng; GUO Wei; SUN Jizhou

doi:10.1049/cje.2015.10.004

LI Bingchao, WEI Jizeng, GUO Wei, SUN Jizhou. Improving SIMD Utilization with Thread-Lane Shuffled Compaction in GPGPU[J]. Chinese Journal of Electronics, 2015, 24(4): 684-688. DOI: 10.1049/cje.2015.10.004

Citation:

Improving SIMD Utilization with Thread-Lane Shuffled Compaction in GPGPU

Graphical Abstract

Graphical Abstract

Abstract

Abstract

GPGPUs adopt SIMT execution model in which each logical thread in a warp corresponds to a SIMD lane while can still follow an independent control flow. When a branch divergence appears and threads within a warp take different execution paths, GPGPUs have to execute each path serially through SIMD lane masking, which potentially decreases the SIMD utilization and performance. We propose an efficient thread compaction mechanism to handle branch divergence with a novel register file structure. We also develop a new thread scheduling policy cooperating with our compaction mechanism. The simulation results show that our approach improves the SIMD utilization up to 74.4% and achieves a maximum 11.1% performance speedup with small hardware overhead.