LLM-based kernel generators automate optimized kernel synthesis for heterogeneous accelerators, but correctness is validated only by numerical testing on random inputs, a method that misses structural bugs below floating-point tolerance and cannot formally reason about semantic equivalence. The critical gap lies at the PyTorch and Triton IR level, where LLM introduced divergences first appear, yet no existing tool targets equivalence checking at this abstraction.
We present early work on a semantic equivalence checker that formally verifies whether an optimized kernel (PyTorch or Triton) computes the same function as its baseline for \emph{all} inputs. The equivalence checker lowers both programs through a five-phase pipeline: FX graph extraction, TTIR expansion, decomposition, scalarization, and Z3 verification, reducing them to scalar SMT formulas over symbolic inputs.
On KernelBench Level1, it formally proved 16 optimized kernels equivalent and confirmed 2 bugs in kernels that had passed numerical testing, including a float32 constant-truncation error two orders of magnitude below a typical tolerance.
We demonstrate that formal equivalence checking is practical for the straight-line, loop-free kernels that LLM generators commonly produce.
Submission Category: Extended Abstract
Tue 16 JunDisplayed time zone: Mountain Time (US & Canada) change
13:40 - 15:20 | |||
13:40 50mKeynote | The Shape of Things to Come ARRAY Jared Roesch NVIDIA | ||
14:30 10mLive Q&A | Q&A for Keynote-2 ARRAY | ||
14:40 20mTalk | Tensor Algebra Equivalence Checker ARRAY | ||
15:00 20mTalk | Rhyme: A Multi-Paradigm Declarative Query Language ARRAY | ||