SIMT-Step Execution: A Flexible Operational Semantics For GPU Subgroup Behavior (PLDI 2026 - PLDI Research Papers)

Mon 15 - Fri 19 June 2026 Boulder, Colorado, United States

Who

Zheyuan Chen, Naomi Rehman, Guido Martínez, Tyler Sorensen

Track

PLDI 2026 PLDI Research Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT-06:00) Mountain Time (US & Canada).

Use conference time zone: (GMT-06:00) Mountain Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 Jun 2026 16:50 - 17:10 at Flatirons 3 - GPU Programming

Abstract

GPU hardware implements a SIMT execution model, where small groups of threads, called subgroups (or warps in CUDA), execute synchronously. Languages expose this through high-performance subgroup-level APIs. However, providing precise subgroup semantics in languages is challenging, as compilers may transform the program, potentially disrupting source-level synchronous behavior even if the hardware is synchronous. As a result, no GPU programming language provides rigorous semantics for subgroup execution.

In this work, we present SIMT-Step, a formal and flexible operational semantics for subgroup execution. At its core is a new semantic object, dynamic basic blocks, which enables precise specification of converged subgroup execution. SIMT-Step then provides flexibility for the execution of instructions, which can be collective, synchronous, or independent. We propose several candidate instantiations of SIMT-Step and design a suite of idiomatic tests to distinguish them, highlighting counter-intuitive behavior that arises under relaxed variants. We implement SIMT-Step in TLA+ and validate the behavior of the idiomatic tests, all of which can be verified in under one second. Finally, to investigate how closely SIMT-Step models real-world GPU behavior, we conduct a large fuzzing campaign, spanning nine GPUs and seven vendors. Our results show that, despite hesitancy to provide guarantees in official specifications, most GPUs exhibit strongly synchronous behavior. Combined, these contributions provide both a theoretical foundation and practical tools for reasoning about subgroup semantics in GPU programming languages.

DOI

https://doi.org/10.1145/3808297

Zheyuan Chen

University of California at Santa Cruz

United States

Naomi Rehman

University of California at Santa Barbara

United States

Guido Martínez

Microsoft Research

United States

Tyler Sorensen

Microsoft Research; University of California at Santa Cruz

United States

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT-06:00) Mountain Time (US & Canada).

Use conference time zone: (GMT-06:00) Mountain Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 Jun
Displayed time zone: Mountain Time (US & Canada) change

15:50 - 17:30	GPU ProgrammingPLDI Research Papers at Flatirons 3

15:50 20m Talk		Kuiper: Correct and Efficient GPU Programming with Dependent Types and Separation Logic PLDI Research Papers Guido Martínez Microsoft Research, Bastian Köpcke TU Berlin, Jonáš Fiala ETH Zurich, Gabriel Ebner Microsoft Research, Tahina Ramananandro Microsoft Research, Michel Steuwer TU Berlin, Tyler Sorensen Microsoft Research, Nikhil Swamy Microsoft Research DOI
16:10 20m Talk		Modular GPU Programming with Typed Perspectives PLDI Research Papers Manya Bansal Massachusetts Institute of Technology, Daniel Sainati University of Pennsylvania, Joseph W. Cutler University of Pennsylvania, Saman Amarasinghe Massachusetts Institute of Technology, Jonathan Ragan-Kelley Massachusetts Institute of Technology DOI
16:30 20m Talk		[TOPLAS] StreamAlloc: A Framework for Analyzing and Transforming CUDA Code to Enable Asynchronous Execution PLDI Research Papers Soumik Kumar Basu IIT Hyderabad, Jyothi Vedurada IIT Hyderabad
16:50 20m Talk		SIMT-Step Execution: A Flexible Operational Semantics For GPU Subgroup Behavior PLDI Research Papers Zheyuan Chen University of California at Santa Cruz, Naomi Rehman University of California at Santa Barbara, Guido Martínez Microsoft Research, Tyler Sorensen Microsoft Research; University of California at Santa Cruz DOI
17:10 20m Talk		Uniformity Analysis in the WebGPU Shading Language PLDI Research Papers James Lee-Jones Imperial College London, John Wickerson Imperial College London, Alastair F. Donaldson Imperial College London DOI