This program is tentative and subject to change.

Mon 15 Jun 2026 14:40 - 15:00 at Meadows CD - PAgE Session 2

As Language Model Systems (LMS) are deployed across a wide range of applications, aligning them with human ethics has be- come crucial. We introduce a domain-specific language, Lumos, for specifying and statistically certifying LMS behaviors. Lumos is the first imperative probabilistic programming language over graphs, with constructs to sample independent and identically distributed prompts for statistical certification of LMS. It abstracts large prompt distributions as graphs, forming random prompts from subgraphs.

Lumos gives a systematic way to reason about, specify, and certify novel properties. Using Lumos, we develop the first safety specifications for vision-language models (VLMs) in autonomous driving scenarios and the first specifications for privacy preser- vation in LMS responses. We show that the state-of-the-art VLM Qwen-VL exhibits critical safety failures, producing incorrect and unsafe responses with at least 90% probability in right-turn scenar- ios under rainy driving conditions, revealing substantial safety risks. We further integrate a prompt-level deterministic verifier to obtain guarantees over the privacy of the LLM generation distribution over a prompt distribution. Our results reveal that Qwen-3 (4B) has at least 24% chance of revealing private email information under adversarial prompt distributions. Lumos is simple to program in, requiring only a few constructs, as evidenced by state-of-the-art large language models generating correct Lumos specifications in zero-shot settings. Lumos is the first principled, language-based framework for specifying and certifying LMS behaviors, paving the way towards widespread adoption of LMS certification.

This program is tentative and subject to change.

Mon 15 Jun

Displayed time zone: Mountain Time (US & Canada) change

13:40 - 15:20
PAgE Session 2PAgE at Meadows CD
13:40
20m
Talk
Combining Agentic AI and Lightweight Formal Methods To Find Bugs in a Production Hypervisor
PAgE
Hiroyuki Katsura University of Cambridge, Kayvan Memarian University of Cambridge, Peter Sewell University of Cambridge
14:00
20m
Talk
From Workarounds to Root Causes: Experience Using Agentic Workflows to Debug Complex Browser GPU Compiler Stacks
PAgE
Abhijit Ramesh UC Santa Cruz, Reese Levine University of California at Santa Cruz, Tyler Sorensen University of California at Santa Cruz
14:20
20m
Talk
Event-based Design Abstractions for Agent Harnesses
PAgE
McCoy Becker , Matin Ghavami Massachusetts Institute of Technology, Fabian Zaiser Massachusetts Institute of Technology, Timothy O'Donnell , Martin C. Rinard Massachusetts Institute of Technology, Joshua B. Tenenbaum Massachusetts Institute of Technology, Vikash Mansinghka Massachusetts Institute of Technology
14:40
20m
Talk
Lumos: Let there be Language Model System Certification
PAgE
Isha Chaudhary , Vedaant Jain , Prineet Parhar , Kavya Sachdeva , Avaljot Singh University of Illinois Urbana-Champaign, Sayan Ranu , Gagandeep Singh University of Illinois Urbana-Champaign
Pre-print
15:00
20m
Talk
Lazy Validation and Self-Healing for Agentic Programs
PAgE
Theodoros Tsampouris , Eleftherios Ioannidis Microsoft Research, Andreas Symeonidis Aristotle University of Thessaloniki