Kuiper: Correct and Efficient GPU Programming with Dependent Types and Separation Logic
We introduce Kuiper, a language for safe and verified efficient CPU/GPU programming embedded as an
extensible library within the F* dependently typed language. We rely on F*’s support for dependent types and its
associated Pulse concurrent separation logic to develop a program logic in which to prove CPU/GPU programs
safe, data-race free, and functionally correct. Our model of the GPU includes several intricacies, including the
memory hierarchy, kernel launches, and synchronization within a single comprehensive framework. To do so,
we extend the Pulse program logic with a novel notion of located resources and a new connective to structure
reasoning about massively parallel programs, and present new proof rules to lift the per-thread view of GPU
kernels to an end-to-end correctness specification.
We have used Kuiper to program and prove correct a variety of GPU kernels, including full functional
correctness proofs of an optimized matrix multiplication using two levels of block tiling and tensor cores. In
doing so, we have developed a range of libraries to enable programs and proofs at a high level of abstraction
but without imposing any runtime overhead. These allow Kuiper programs to be polymorphic (over types,
operations, memory layout, and more) and compile to efficient, specialized CUDA code, while enabling a
novel form of verified auto-tuning. Our experimental evaluation confirms that Kuiper programs match the
performance of their handwritten CUDA counterparts and are competitive with closed source, state-of-the-art
kernels in cuBLAS.
Wed 17 JunDisplayed time zone: Mountain Time (US & Canada) change
16:10 - 17:50 | GPU ProgrammingPLDI Research Papers at Flatirons 3 Chair(s): Gilbert Louis Bernstein University of Washington | ||
16:10 20mTalk | Kuiper: Correct and Efficient GPU Programming with Dependent Types and Separation Logic PLDI Research Papers Guido Martínez Microsoft Research, Bastian Köpcke TU Berlin, Jonáš Fiala ETH Zurich, Gabriel Ebner Microsoft Research, Tahina Ramananandro Microsoft Research, Michel Steuwer TU Berlin, Tyler Sorensen Microsoft Research, Nikhil Swamy Microsoft Research DOI Pre-print | ||
16:30 20mTalk | Modular GPU Programming with Typed PerspectivesDistinguished Paper PLDI Research Papers Manya Bansal Massachusetts Institute of Technology, Daniel Sainati University of Pennsylvania, Joseph W. Cutler University of Pennsylvania, Saman Amarasinghe Massachusetts Institute of Technology, Jonathan Ragan-Kelley Massachusetts Institute of Technology DOI | ||
16:50 20mTalk | [TOPLAS] StreamAlloc: A Framework for Analyzing and Transforming CUDA Code to Enable Asynchronous Execution PLDI Research Papers | ||
17:10 20mTalk | SIMT-Step Execution: A Flexible Operational Semantics For GPU Subgroup Behavior PLDI Research Papers Zheyuan Chen University of California at Santa Cruz, Naomi Rehman University of California at Santa Barbara, Guido Martínez Microsoft Research, Tyler Sorensen Microsoft Research; University of California at Santa Cruz DOI Pre-print | ||
17:30 20mTalk | Uniformity Analysis in the WebGPU Shading Language PLDI Research Papers James Lee-Jones Imperial College London, John Wickerson Imperial College London, Alastair F. Donaldson Imperial College London DOI | ||