Leveraging AI Ecosystem for Portable and Sustainable GPU Kernels in HPC
This program is tentative and subject to change.
High-Performance Computing (HPC) applications increasingly depend on GPUs, yet developing optimized kernels across evolving GPU architectures remains a major productivity bottleneck. With a tile-based programming model, Triton, a Python-based domain-specific language from the AI ecosystem, presents a compelling opportunity to simplify high-performance GPU kernel development for HPC. However, its tight coupling with Python creates significant integration barriers. In this paper, we investigate the feasibility of leveraging Triton for traditional HPC development. We present a compilation framework that transforms Triton kernels into standalone shared objects with C-compatible interfaces, eliminating Python dependencies and enabling seamless integration into HPC codebases while preserving optimization and portability benefits. We validate the approach by replacing kernels in representative HPC workloads with simpler Triton implementations that deploy across NVIDIA and AMD GPUs without modification. Triton achieves near-parity performance with native implementations on tile-friendly workloads, while irregular kernels reveal current limitations of its tile-based programming model. These results suggest that bridging the AI and HPC ecosystems via Triton offers a practical path toward more productive, portable, and sustainable GPU kernel development for HPC.
This program is tentative and subject to change.
Tue 16 JunDisplayed time zone: Mountain Time (US & Canada) change
15:50 - 17:30 | |||
15:50 20mTalk | Vectorizing Sparse Coiteration for Two-finger Loop Structure (Extended Abstract) ARRAY | ||
16:10 25mTalk | Leveraging AI Ecosystem for Portable and Sustainable GPU Kernels in HPC ARRAY Yanbo Zhao North Carolina State University, Zhaonan Meng North Carolina State University, Sai Krishna Teja Varma Manthena NCSU, Xu Liu North Carolina State University, Ajay Panyala Pacific Northwest National Laboratory, Jiajia Li North Carolina State University | ||
16:35 20mTalk | Lazy Arithmetic using Systolic Arrays for Closing the Verification Gap on Embedded Systems ARRAY | ||
16:55 20mTalk | Towards a Linear-Algebraic Hypervisor ARRAY Pre-print | ||
17:15 5mResearch preview | Semantics as a Tool of Thought: Provenance-Aware Dimensional Checking in a Reactive Array IR ARRAY Christopher Buck None | ||
17:20 10mLive Q&A | Mini Panel ARRAY | ||