Tue 16 Jun 2026 13:40 - 14:30 at Meadows CD - KEYNOTE-2 and Tensorial Themes Chair(s): Sangeeta Chowdhary

ABSTRACT: High-performance GPU kernels are increasingly written in high-level languages like Python (via tools like Triton or Torch Inductor). Python programmers can now explore high-level algorithmic choices without mastering low-level hardware complexity. A classic tension remains, maintaining abstraction and hardware compatibility without sacrificing performance. Each GPU architecture exposes new capabilities at granularities greater than a single thread: tensor cores that consume matrices, asynchronous DMA copy engines that move multi-dimensional volumes of data through the memory hierarchy, cluster-level coordination over blocks of tiles, and so on.

In this talk I will focus on the evolution of the NVIDIA GPU programming model, and in particular on two components of NVIDIA’s tile-based stack: cuTile, a Python DSL for authoring portable CUDA kernels in the idiom of NumPy and PyTorch; and TileIR, an array-based sibling abstraction to PTX, realized as an MLIR dialect, that enables forward compatibility and performance portability across hardware generations while utilizing architecture-specific features such as tensor cores.

We will contextualize this work as part of the broader evolution of NVIDIA’s compiler stack. Tile IR represents an important piece of our strategy to use shared intermediate representations (at various levels of abstraction) as core infrastructure across products, teams, open source, and more. We hope that this evolution inspires or informs future research efforts in array programming and programming systems as core infrastructure.

Jared Roesch is a distinguished engineer at NVIDIA, currently working on Tile IR. He earned his PhD in Computer Science and Engineering from the University of Washington in 2020, focusing on the compilation of dynamic neural networks.

Jared has previously been a contributor and PMC member of Apache TVM and a core-committer to the Lean and Rust programming languages. He joined NVIDIA in 2024 through the OctoAI acquisition, where he was a co-founder and CTO now applying his experience in building ML/AI services, systems, compilers, and frameworks to problems at NVIDIA.

Tue 16 Jun

Displayed time zone: Mountain Time (US & Canada) change

13:40 - 15:20
KEYNOTE-2 and Tensorial ThemesARRAY at Meadows CD
Chair(s): Sangeeta Chowdhary AMD Research
13:40
50m
Keynote
The Shape of Things to Come
ARRAY
14:30
10m
Live Q&A
Q&A for Keynote-2
ARRAY

14:40
20m
Talk
Tensor Algebra Equivalence Checker
ARRAY
Jubi Taneja Gimlet Labs, Tom St. John Gimlet Labs, Natalie Serrino Gimlet Labs
15:00
20m
Talk
Rhyme: A Multi-Paradigm Declarative Query Language
ARRAY
Ran Guo Purdue University, Tiark Rompf Purdue University

Information for Participants
Info for event:

BIO: Jared Roesch is now a Distinguished Engineer at NVIDIA, developing AI systems and compilers for NVIDIA GPUs. Previously, he was co-founder and CTO at OctoAI (formerly OctoML), where he led the development of large language model optimization & serving technology. He’s an open source advocate and contributor, having spent time working on many OSS projects — most notably the Rust and Lean programming languages and Apache TVM. Jared received his Ph.D. from the Paul G. Allen School of Computer Science and Engineering at the University of Washington. His Ph.D. work adapted ideas from programming languages and compilation to diverse problems in computer architecture, formal methods, high performance computing, and machine learning.