← Back to blog

Spec-driven AI development: why we write specs before code

Why every Twill engagement starts with a spec, and how we choose between Spec Kit, OpenSpec, Superpowers, and Plan mode for the work in front of us.

TS
Twill Studio
3 min read·
spec-drivenmethodologyai-agentsspec-kitopenspec

AI agents amplify intent. Hand a model a vague brief and you get a vague codebase, faster. The whole point of spec-driven development is to make the intent legible before the code is — to humans and models both. This post is what we mean by spec-driven, why every Twill engagement starts there, and how we pick the right amount of ceremony for a given piece of work.

Why spec-first

A spec is the only artifact a human and a model can both read, critique, and modify in the same way. Code is for machines. Prompts are for the moment. A spec sits in the middle — precise enough to compile against, prose enough to argue with. That dual nature is the entire reason it earns its place in the repo, alongside the code it generates.

The other reason is durability. Specs are what we leave behind. The team owns the spec, edits it, regenerates against it. Six months from now, when someone needs to change how the permissions system works, they don't have to reverse-engineer intent from a diff — they read the spec, change the spec, then change the code. The spec is the source of truth; the code is one valid implementation of it. That ordering is load-bearing. Skip it and your repo becomes a stack of frozen prompt outputs nobody can confidently modify.

The four methods

We pick the lightest method that fits the work.

01

Spec Kit

Greenfield, multi-feature projects.

When we reach for it

When you're starting from a blank repo with several interconnected features and systems thinking matters more than speed. Spec Kit's per-feature specs let us reason about how features interact before any of them ship.

What you get

Per-feature spec files, a structured /specify → /plan → /tasks flow, and cross-feature analysis baked into the methodology.

02

OpenSpec

Brownfield, incremental modernization.

When we reach for it

When the repo already exists and the work is a series of focused changes. OpenSpec's single living spec stays aligned with reality through Propose → Apply → Archive cycles, so the spec never drifts from what shipped.

What you get

A unified living specification, delta-based change proposals, and a fast iteration loop tuned for AI agents.

03

Superpowers

Smaller, well-scoped tasks.

When we reach for it

When the work is contained — a feature, a bug, a refactor — and a skill-driven workflow gets us to a tested change without the overhead of a full spec lifecycle.

What you get

A reusable skill invocation, TDD by default, and a clean commit. The skill itself becomes part of the toolchain you keep.

04

Plan mode

Quick, contained changes.

When we reach for it

When the change is small enough that a written spec is overkill, but large enough that vibe-coding it would be reckless. Plan mode forces a reviewed plan before any edit happens.

What you get

A reviewed plan, then a focused execution. No artifact to maintain — the value is in the discipline.

We pick the lightest method that fits the work.

How to read the four methods

We start at Plan mode and only move up when the work demands it. Plan mode is for one-shot edits where vibe-coding would be reckless but a written spec is overkill — the value is the discipline of a reviewed plan, not a maintained artifact. Superpowers is the next step up: when the task is contained but deserves a tested, repeatable workflow. The skill itself becomes part of your toolchain.

OpenSpec comes in when the repo already exists and the work is a series of focused changes that need to stay aligned with reality. Its single living spec drifts less than scattered tickets ever could. Spec Kit is the heaviest tool, reserved for greenfield projects where features interconnect and systems thinking matters more than speed. The progression is from no artifact to maintain to reusable skill to living spec to per-feature specs.

Decision lens

How we choose, in one chart.

Two axes. Pick the lightest method that lands in the right quadrant for the work in front of us. The further from the origin, the more spec ceremony pays for itself.

Bigger scope earns more ceremony. Smaller scope earns less. The spec is the constant — only its weight changes.

Scope earns ceremony. We err toward the bottom-left.

What a spec actually replaces

The honest read: a spec gives you four things a prompt doesn't.

  • Stable identity. It doesn't change shape between sessions. The model isn't reasoning from scratch every time it touches the code.
  • Version control. Specs live in the repo. Diffs review like code. You can see exactly when the contract changed and why.
  • Reviewability. A pull request can show we changed the spec, then changed the code, here's why. A prompt iteration can't be reviewed — it can only be re-run.
  • Model durability. When Opus 4.7 becomes Opus 5 becomes whatever's next, the spec is the constant. The model is the variable. Workflows built on prompts age badly. Workflows built on specs don't.

The handoff

Three weeks in, your team owns the spec. They edit it the way they edit any other file in the repo — through PRs, with review, with commit messages explaining the change. They regenerate against it when a feature needs an overhaul. They use it to onboard the next engineer, who reads the spec before reading the code, the way it should be.

The workflow keeps working because the spec keeps existing. Nothing about it depends on us still being in the room, or on any particular model being the latest one. That is the whole bet.

If you want the full breakdown of how we choose between these four, we've laid it out on our methodology page.