Modular GPU Programming with Typed PerspectivesDistinguished Paper
This program is tentative and subject to change.
To achieve peak performance on modern GPUs, one must balance two frames of mind: issuing instructions to individual threads to control their behavior, while simultaneously tracking the convergence of many threads acting in concert to perform collective operations like Tensor Core instructions. The tension between these two mindsets makes modular programming error prone. Functions that encapsulate collective operations, despite being called per-thread, must be executed cooperatively by groups of threads.
In this work, we introduce Prism, a new GPU language that restores modularity while still giving programmers the low-level control over collective operations necessary for high performance. Our core idea is typed perspectives, which materialize, at the type level, the granularity at which the programmer is controlling the behavior of threads. We describe the design of Prism, implement a compiler for it, and lay its theoretical foundations in a core calculus called Bundl. We implement state-of-the-art GPU kernels in Prism and find that it offers programmers the safety guarantees needed to confidently write modular code without sacrificing performance.
This program is tentative and subject to change.
Wed 17 JunDisplayed time zone: Mountain Time (US & Canada) change
16:10 - 17:50 | |||
16:10 20mTalk | Kuiper: Correct and Efficient GPU Programming with Dependent Types and Separation Logic PLDI Research Papers Guido Martínez Microsoft Research, Bastian Köpcke TU Berlin, Jonáš Fiala ETH Zurich, Gabriel Ebner Microsoft Research, Tahina Ramananandro Microsoft Research, Michel Steuwer TU Berlin, Tyler Sorensen Microsoft Research, Nikhil Swamy Microsoft Research DOI Pre-print | ||
16:30 20mTalk | Modular GPU Programming with Typed PerspectivesDistinguished Paper PLDI Research Papers Manya Bansal Massachusetts Institute of Technology, Daniel Sainati University of Pennsylvania, Joseph W. Cutler University of Pennsylvania, Saman Amarasinghe Massachusetts Institute of Technology, Jonathan Ragan-Kelley Massachusetts Institute of Technology DOI | ||
16:50 20mTalk | [TOPLAS] StreamAlloc: A Framework for Analyzing and Transforming CUDA Code to Enable Asynchronous Execution PLDI Research Papers | ||
17:10 20mTalk | SIMT-Step Execution: A Flexible Operational Semantics For GPU Subgroup Behavior PLDI Research Papers Zheyuan Chen University of California at Santa Cruz, Naomi Rehman University of California at Santa Barbara, Guido Martínez Microsoft Research, Tyler Sorensen Microsoft Research; University of California at Santa Cruz DOI Pre-print | ||
17:30 20mTalk | Uniformity Analysis in the WebGPU Shading Language PLDI Research Papers James Lee-Jones Imperial College London, John Wickerson Imperial College London, Alastair F. Donaldson Imperial College London DOI | ||