ABSTRACT: High-performance GPU kernels are increasingly written in high-level languages like Python (via tools like Triton or Torch Inductor). Python programmers can now explore high-level algorithmic choices without mastering low-level hardware complexity. A classic tension remains, maintaining abstraction and hardware compatibility without sacrificing performance. Each GPU architecture exposes new capabilities at granularities greater than a single thread: tensor cores that consume matrices, asynchronous DMA copy engines that move multi-dimensional volumes of data through the memory hierarchy, cluster-level coordination over blocks of tiles, and so on.
In this talk I will focus on the evolution of the NVIDIA GPU programming model, and in particular on two components of NVIDIA’s tile-based stack: cuTile, a Python DSL for authoring portable CUDA kernels in the idiom of NumPy and PyTorch; and TileIR, an array-based sibling abstraction to PTX, realized as an MLIR dialect, that enables forward compatibility and performance portability across hardware generations while utilizing architecture-specific features such as tensor cores.
We will contextualize this work as part of the broader evolution of NVIDIA’s compiler stack. Tile IR represents an important piece of our strategy to use shared intermediate representations (at various levels of abstraction) as core infrastructure across products, teams, open source, and more. We hope that this evolution inspires or informs future research efforts in array programming and programming systems as core infrastructure.
Jared Roesch is a distinguished engineer at NVIDIA, currently working on Tile IR. He earned his PhD in Computer Science and Engineering from the University of Washington in 2020, focusing on the compilation of dynamic neural networks.
Jared has previously been a contributor and PMC member of Apache TVM and a core-committer to the Lean and Rust programming languages. He joined NVIDIA in 2024 through the OctoAI acquisition, where he was a co-founder and CTO now applying his experience in building ML/AI services, systems, compilers, and frameworks to problems at NVIDIA.
Tue 16 JunDisplayed time zone: Mountain Time (US & Canada) change
13:40 - 15:20 | |||
13:40 50mKeynote | The Shape of Things to Come ARRAY Jared Roesch NVIDIA | ||
14:30 10mLive Q&A | Q&A for Keynote-2 ARRAY | ||
14:40 20mTalk | Tensor Algebra Equivalence Checker ARRAY | ||
15:00 20mTalk | Rhyme: A Multi-Paradigm Declarative Query Language ARRAY | ||
BIO: Jared Roesch is now a Distinguished Engineer at NVIDIA, developing AI systems and compilers for NVIDIA GPUs. Previously, he was co-founder and CTO at OctoAI (formerly OctoML), where he led the development of large language model optimization & serving technology. He’s an open source advocate and contributor, having spent time working on many OSS projects — most notably the Rust and Lean programming languages and Apache TVM. Jared received his Ph.D. from the Paul G. Allen School of Computer Science and Engineering at the University of Washington. His Ph.D. work adapted ideas from programming languages and compilation to diverse problems in computer architecture, formal methods, high performance computing, and machine learning.
