SparseZETA: Intelligent Auto-tuner for Designing High-Performance SpMV Programs (PLDI 2026 - PLDI Research Papers)

Who

Zhen Du, Ying Liu, Xionghui Chen, Yanbo Zhao, Xiaobing Feng, Huimin Cui, Jiajia Li

Track

PLDI 2026 PLDI Research Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT-06:00) Mountain Time (US & Canada).

Use conference time zone: (GMT-06:00) Mountain Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 19 Jun 2026 11:50 - 12:10 at Flatirons 4 - Compiler Optimization for Accelerators

Abstract

Sparse matrix-vector multiplication (SpMV) is a crucial operation in scientific computing, graph analytics, and machine/deep learning. Its performance is highly sensitive to matrix sparsity patterns, necessitating tailored program designs. This paper introduces \textsc{SparseZETA}, an intelligent auto-tuner that generates high-performance, machine-designed SpMV programs by directly mimicking and composing human-expert actions. To efficiently navigate the vast design space, \textsc{SparseZETA} reformulates auto-tuning as a behavior-cloning problem: rather than costly exploration, it directly synthesizes programs by sequentially predicting actions in a one-pass decision-making process, guided by the real-time state of the evolving, partially constructed program designs.
A novel self-training mechanism further accelerates the collection of training data for the prediction models.
On NVIDIA A100 (and RTX 2080 Ti) GPUs, \textsc{SparseZETA} achieves average speedups of $1.27\times$–$15.66\times$ ($1.44\times$–$19.07\times$) over existing auto-tuners, human-designed programs, and a sparse compiler.
\textsc{SparseZETA} substantially reduces the human effort required to design SpMV programs, including sparse format creation and kernel implementation, cutting the design time from days or even months to an average of \SI{82.52}{ms} per matrix via lightweight inference on only one CPU.

DOI

https://doi.org/10.1145/3808313

Zhen Du

Institute of Computing Technology at Chinese Academy of Sciences

China

Ying Liu

Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences

China

Xionghui Chen

Nanjing University

China

Yanbo Zhao

North Carolina State University

United States

Xiaobing Feng

Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences

China

Huimin Cui

Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences

China

Jiajia Li

North Carolina State University

United States

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT-06:00) Mountain Time (US & Canada).

Use conference time zone: (GMT-06:00) Mountain Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 19 Jun
Displayed time zone: Mountain Time (US & Canada) change

10:30 - 12:10	Compiler Optimization for AcceleratorsPLDI Research Papers at Flatirons 4

10:30 20m Talk		Compiling Strassen-like Matrix Multiplication Algorithms to Fast CUDA Kernels PLDI Research Papers Abhinav Jangda Microsoft Research DOI
10:50 20m Talk		Parameterized Algorithms and Complexity for Function Merging with Branch Reordering PLDI Research Papers Amir K. Goharshady University of Oxford, Kerim Kochekov Hong Kong University of Science and Technology, Tian Shu Hong Kong University of Science and Technology, Ahmed Khaled Zaher Hong Kong University of Science and Technology DOI
11:10 20m Talk		NEURA: A Unified and Retargetable Compilation Framework for Coarse-Grained Reconfigurable Architectures PLDI Research Papers Shangkun Li Hong Kong University of Science and Technology, Jinming Ge Hong Kong University of Science and Technology, Diyuan Tao Independent Researcher, Zeyu Li Hong Kong University of Science and Technology, Jiawei Liang Hong Kong University of Science and Technology, Linfeng Du Hong Kong University of Science and Technology, Jiang Xu Hong Kong University of Science and Technology, Wei Zhang Hong Kong University of Science and Technology, Cheng Tan Google; Arizona State University DOI
11:30 20m Talk		Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs PLDI Research Papers Yifan Zhao University of Illinois Urbana-Champaign, Egan Johnson University of Illinois Urbana-Champaign, Prasanth Chatarasi IBM Research, Vikram S. Adve University of Illinois Urbana-Champaign, Sasa Misailovic University of Illinois Urbana-Champaign DOI
11:50 20m Talk		SparseZETA: Intelligent Auto-tuner for Designing High-Performance SpMV Programs PLDI Research Papers Zhen Du Institute of Computing Technology at Chinese Academy of Sciences, Ying Liu Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Xionghui Chen Nanjing University, Yanbo Zhao North Carolina State University, Xiaobing Feng Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Huimin Cui Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Jiajia Li North Carolina State University DOI