While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. As a result of this lack of trustworthiness, the agent’s answers need to be manually verified before they can be trusted, which requires substantial effort and can result in lower developer productivity. We describe a method to automatically validate the answers provided by a code reasoning agent by extracting a formal representation of the agent’s response, and using formal verification and program analysis tools to verify the reasoning steps. We applied this approach to a set of 20 program equivalence queries and found that the formal verification step successfully caught 6/8 incorrect agent judgments. This work lays the foundation for what may become a new class of high-precision, verifiable code agents, paving the way for their reliable use in critical software engineering workflows.
Mon 15 JunDisplayed time zone: Mountain Time (US & Canada) change
10:40 - 12:20 | |||
10:40 20mTalk | Agentic Code Reasoning PAgE Pre-print | ||
11:00 20mTalk | Towards Verified Code Reasoning by LLMs PAgE Meghana Aparna Sistla Google DeepMind, Gogul Balakrishnan Google, Patrick Rondon Google, José Pablo Cambronero Google, USA, Michele Tufano Google, Satish Chandra Meta Platforms, Inc. | ||
11:20 20mTalk | Testing, Credible Compilation, and Verification in the Axon Verified Compiler in Lean and Claude CodeRemote PAgE Martin C. Rinard Massachusetts Institute of Technology | ||
11:40 20mTalk | The Next Frontier for AI-Generated Kernels: Correctness PAgE DOI Pre-print | ||
12:00 20mTalk | Testing LLM-Generated Distributed Protocol Code PAgE | ||