Mon 15 Jun 2026 11:00 - 11:20 at Meadows B - PAgE Session 1 Chair(s): Shraddha Barke

While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. As a result of this lack of trustworthiness, the agent’s answers need to be manually verified before they can be trusted, which requires substantial effort and can result in lower developer productivity. We describe a method to automatically validate the answers provided by a code reasoning agent by extracting a formal representation of the agent’s response, and using formal verification and program analysis tools to verify the reasoning steps. We applied this approach to a set of 20 program equivalence queries and found that the formal verification step successfully caught 6/8 incorrect agent judgments. This work lays the foundation for what may become a new class of high-precision, verifiable code agents, paving the way for their reliable use in critical software engineering workflows.

Mon 15 Jun

Displayed time zone: Mountain Time (US & Canada) change

10:40 - 12:20
PAgE Session 1PAgE at Meadows B
Chair(s): Shraddha Barke Microsoft Research, Redmond
10:40
20m
Talk
Agentic Code Reasoning
PAgE
Shubham Ugare Meta, Satish Chandra Meta Platforms, Inc.
Pre-print
11:00
20m
Talk
Towards Verified Code Reasoning by LLMs
PAgE
Meghana Aparna Sistla Google DeepMind, Gogul Balakrishnan Google, Patrick Rondon Google, José Pablo Cambronero Google, USA, Michele Tufano Google, Satish Chandra Meta Platforms, Inc.
11:20
20m
Talk
Testing, Credible Compilation, and Verification in the Axon Verified Compiler in Lean and Claude CodeRemote
PAgE
Martin C. Rinard Massachusetts Institute of Technology
11:40
20m
Talk
The Next Frontier for AI-Generated Kernels: Correctness
PAgE
Guido Martínez Microsoft Research, Tyler Sorensen University of California at Santa Cruz
DOI Pre-print
12:00
20m
Talk
Testing LLM-Generated Distributed Protocol Code
PAgE
Ankush Das Boston University, Brendan Coyne Boston University