About
Some say AI is accelerating development at the expense of quality. We think it’s an “and” not an “or”: there has never been a better or more important time to build systems fast and well.
We believe that AI removes the time penalty from rigor. Formal specs, structured requirements, and deterministic refactoring used to cost hours of skilled effort — now they can cost seconds. What remains is the payoff: code you can verify, not just test.
Punt Labs builds tools that ground agentic software engineering in the best of computer science research — not just empowering engineers to apply rigour, but helping them learn and better apply those skills. The methods that have always produced trustworthy software, now accessible at the speed of AI. That’s how you earn trust and go fast.
Our work draws on two academic traditions: Ralph Johnson’s refactoring research at the University of Illinois (Gang of Four, Design Patterns) and Andrew Simpson’s formal methods programme at the University of Oxford, where Z specification, state-based modelling, and software verification are applied to real systems.
Tenets
Nobody knows where software engineering lands as AI reshapes it. These are hypotheses refined through building — not conclusions. We expect to revise them as we learn more.
-
Earn trust to go fast
Speed is a goal. Sustainable speed requires code that is robust, reliable, transparent, and auditable. AI-generated code is non-deterministic. So is human-written code. The answer to both is verifiability — and verifiable code is code you can confidently ship quickly.
We think this has to be true for AI adoption to succeed broadly. Without earned trust, adoption stalls — teams slow down to double-check everything, or stop adopting altogether. We're trying to build tools that help earn that trust — and to find and adopt the best tools from others — so teams can move faster, not slower. Martin Fowler frames it well: engineering discipline moves when AI writes the code — it shifts to new places, it doesn't disappear.
-
Automate aggressively, always transparently
We want agents doing as much as possible. Every automated decision should be auditable and every automated action verifiable. Transparent automation is durable automation.
Audit trails add overhead. We think that overhead pays for itself — and we're encouraged that teams like Entire.io are building provenance infrastructure that captures the reasoning behind AI-generated code, and SageOx is working on persistent context for agentic sessions.
-
What if applying rigour did not require a trade-off?
Structured product discovery, formal specifications, and comprehensive tests have always improved outcomes — but they required time and expertise that most teams had to trade against shipping. We believe that AI dissolves that trade-off: the time cost collapses, and the learning curve flattens with it.
Applying rigour still takes attention, even when the time cost drops. Martin Kleppmann predicts formal verification goes mainstream for exactly this reason. Kent Beck calls TDD a superpower with agents. We're testing the same hypothesis with Z specifications.
-
Invest in the whole flow
Faster code generation makes every other stage more valuable. Product vision, specification, review, auditing — each deserves the same attention as writing code. A frictionless pipeline means all of it, end to end.
Broad investment is slower than optimizing the one stage you can measure. We invest across the flow because bottlenecks migrate downstream. DORA's 2025 research confirms it: AI amplifies high-performing teams and exposes dysfunction in struggling ones. Gergely Orosz tracks the same pattern — PR review time increased 91% even as code generation accelerated.
-
The terminal is the interface
We build for the command line first. CLI tools and Claude Code are already operating at the frontier of agentic coding — agent teams, multi-agent orchestration, parallel sessions — while IDEs are still adding copilot sidebars. If a new IDE emerges, it is more likely to grow out of the terminal than to be retrofitted from an editor with decades of legacy.
Agent-enhanced IDEs will reach more people. We accept that trade-off. The CLI is where the most powerful workflows live today, and power users pull the ecosystem forward. Thomas Dohmke (ex-GitHub CEO) sees the same shift: “The terminal has become the new center of gravity on our computers again.”
-
Prefer the ecosystem over the vendor
We choose composable, independent tools over integrated platform features — even from vendors we respect. Ecosystems evolve. Optionality compounds.
Integrated platforms are more convenient, and independent tools require more maintenance. We accept that cost for the flexibility. The MCP ecosystem is a good example — an open protocol that lets tools compose across vendors rather than locking into one.
-
Build forward
To get to the future, we build new tools designed for AI-native workflows rather than retrofit tools designed before agents existed. We built Biff instead of integrating with Slack. We built PR/FAQ instead of wrapping Confluence. The right foundation for hybrid human-agent teams is one that assumes agents from day one.
Existing tools have existing users, existing integrations, and proven reliability. Building new is expensive and risky. But Slack, Jira, and Confluence were designed for humans coordinating with humans — and retrofitting them for agents means inheriting assumptions that may work against you. Entire.io made the same bet: a new developer platform rather than a GitHub plugin.
-
Fun is a feature
Tools people enjoy using are tools people actually use. Biff uses BSD Unix vocabulary because it's charming. Dungeon is a text adventure. Joy drives retention.
Playfulness is a deliberate choice. The Charm.sh team showed that terminal tools can be beautiful and fun while remaining serious infrastructure. We're inspired by that example.
Influences
These tenets didn't emerge in a vacuum. We're informed by practitioners who are building and writing in the same space.
- Ralph Johnson — Gang of Four co-author (Design Patterns). “Any software property that has not been verified, does not exist.”
- William Opdyke — the 1992 PhD thesis under Ralph Johnson that defined refactoring as behavior-preserving program transformation. Refactory builds on this lineage.
- J.M. Spivey — The Z Notation: A Reference Manual. The formal specification notation that Z Spec puts into practice.
- Martin Fowler on non-deterministic computing and the shift in engineering discipline.
- Kent Beck on augmented coding vs. vibe coding, and TDD as a superpower with agents.
- DORA — “AI does not create elite organizations; it anoints them.”
- Gergely Orosz on bottleneck migration and the real cost of faster code generation.
- Martin Kleppmann on AI making formal verification mainstream.
- Simon Willison on LLMs as over-confident pair programmers and the economics of AI-generated code.
- Steve Yegge on agent orchestration and the economics of AI coding.
- Andrew Simpson and Oxford's Department of Computer Science, where formal specification is taught as practical engineering.
- Entire.io on provenance infrastructure and capturing the reasoning behind AI-generated code.
- SageOx on persistent context infrastructure for agentic sessions.
- Model Context Protocol — the open protocol that lets tools compose across vendors.
- Charm.sh on making terminal tools beautiful, fun, and serious at the same time.
Founder
“The more the world is automated, the more people need to understand how to build automation well.”
Building software since 1995, including roles at Prime Video, Alexa, and Zalando. University of Oxford (software engineering under Andrew Simpson) and University of Illinois (computer science under Ralph Johnson).
Why “Punt Labs”?
In British English, “taking a punt” means giving something a go — a calculated bet on an idea worth exploring. Nobody knows where software engineering lands as AI reshapes it. Our tenets are hypotheses refined through building, not conclusions. We ship, observe what holds, and revise what doesn’t.