Grounding & Formal Methods

Five tools covering the stages most teams skip under pressure: discover what to build, require what it must do, specify its states and invariants, restructure what exists safely, and review the reasoning that produced the code. AI removes the time penalty from each stage. These tools put that hypothesis into practice — and are built to help engineers learn the underlying disciplines, not just apply them.

PR/FAQ

v1.6.0

Amazon's Working Backwards process, inside the terminal.

Generate, review, and stress-test Amazon Working Backwards documents with specialized agents, simulated review meetings, and compiled PDF output. See an example — the PR/FAQ for this tool, written by the tool itself.

Uses Quarry
  • Complete PR/FAQ document generation — press release, FAQs, four-risks assessment, feature appendix
  • Simulated review meeting with four agentic personas who debate weak spots
  • Researcher agent integrates with Quarry for semantic search
  • Cascading feedback engine traces effects across all sections
  • Compiles to print-ready PDF via LaTeX
curl -fsSL https://raw.githubusercontent.com/punt-labs/prfaq/ffc7dfd/install.sh | sh
3 demos → GitHub → Code LevelL4
Coming Soon

Use Cases

v0.0.0 alpha

Actors. Goals. Scenarios. Extensions. The spec that comes before the code.

Use Cases will seek to bring Jacobson and Cockburn's structured requirements methodology into Claude Code. The goal: identify the actors in your system, define what each is trying to accomplish, describe the step-by-step scenario that achieves their goal, and systematically work through what can go wrong at each step. The output is a specification document — not code.

  • Guided walkthrough: actors, goals, scenarios, extensions
  • Cockburn's goal-level hierarchy — summary, user, subfunction
  • Systematic extension handling at every scenario step
  • Specification document output, not generated code
Links available at launch

Z Spec

v0.2.0 beta

Formal Z specifications for stateful systems.

Z is a specification language grounded in set theory and first-order predicate logic, developed at Oxford and standardized as ISO 13568. A Z spec describes a system as states, invariants that constrain them, and operations that transition between them — catching entire classes of bugs mathematically, not just the inputs you happened to test. Z Spec makes two powerful tools approachable inside Claude Code: Spivey's fuzz type-checker and the ProB animator/model-checker from HHU Düsseldorf. Extract specs from existing code, generate code from specs, type-check, and model-check — without learning the toolchain.

  • Bidirectional: code2model extracts specs, model2code generates implementations
  • Type-checks with fuzz — Spivey's Z type-checker from Oxford
  • Animates and model-checks with ProB — state-space exploration from HHU Düsseldorf
  • Derives test cases from specs using TTF testing tactics
  • Lean 4 proof obligations, runtime contracts, and property-based oracle testing
  • B-Method support: create, type-check, animate, and refine B machines
curl -fsSL https://raw.githubusercontent.com/punt-labs/z-spec/02d6125/install.sh | sh
5 demos → GitHub → Code LevelL4
Coming Soon

Refactory

v0.0.0 alpha

refactoring Behavior-preserving program transformation, not just "clean up code."

Refactory will seek to bring Opdyke's original definition of refactoring — behavior-preserving program transformations with formally defined preconditions — into an AI-assisted workflow. The AI decides what needs restructuring; Refactory would execute on a parsed program model where each transformation is proven safe before it runs.

  • Built on Opdyke's formal definition — each refactoring has preconditions that must hold
  • Operates on a parsed program model, not text substitution
  • AI selects the transformation; Refactory guarantees safety
  • Composable — chain small, proven-safe transformations into large restructurings
GitHub →
Coming Soon

ReasonTrace

v0.0.0 alpha

provenance Session recordings as first-class engineering assets.

ReasonTrace will seek to turn every AI coding session into a searchable, reviewable engineering asset. The idea: capture the full terminal recording and a structured event log, extract reasoning decisions, and run automated review agents that check the process against project standards and formal specifications. Existing tools capture what changed (Git) or what was decided (Entire.io). ReasonTrace would capture how the agent actually reasoned — and whether that reasoning was sound.

Reviews against Z Spec
  • Dual-stream capture — terminal recording for humans, structured event log for agents
  • Reasoning extraction — tool calls, decision points, error-recovery cycles, phase transitions
  • Automated review agents — check reasoning against standards, policies, and formal specs
  • Semantic search across sessions — find decisions by concept, not just keyword