Towards Formal Verification of LLM-Generated Code from Natural Language Prompts

Summary

Councilman et al. propose a pipeline where natural language prompts are translated into formal specifications, LLM-generated code is produced from those prompts, and then the code is formally verified against the specification. The approach uses Lean 4 for verification and evaluates on algorithmic problems. Key finding: LLMs can generate verifiable code for well-specified problems, but struggle when the specification requires complex invariants.

What it means for our work

This is direct prior art for Z Spec’s /z prove command, which generates Lean 4 proof obligations from Z specifications. The pipeline they describe — natural language to formal spec to verified code — is the workflow Z Spec enables. Their finding about invariant complexity aligns with our experience: simple state invariants verify cleanly, while complex protocol properties need human guidance on the proof strategy.