How to Build Specs Today That Won't Become Tomorrow's Refactor: AI Strategies

The worst advice on specs right now is also the most common. “Write everything up front.” “Lock the plan.” “Hand it to the agent.” That's how you get exactly what critics are warning about: waterfall with markdown, stale docs, and a lot of token spend for a document nobody wants to reopen.

That backlash is real. Simon Willison has pushed on spec-heavy workflows. Birgitta Böckeler has been useful here because her spec-first, spec-anchored, and spec-as-source taxonomy makes an obvious point that a lot of people skip: not every team should work the same way. Addy Osmani's six-section format is helpful too, but only if you treat it as a starting shape, not a sacred ritual. GitHub Spec Kit, Kiro, OpenSpec, Traycer, Vibe Kanban, BMAD, and the rest all sit somewhere on that spectrum. None of them save you if your spec is brittle.

For solo builders on brownfield codebases, the problem is simpler. You aren't writing a perfect plan. You're writing instructions that need to survive changing code, changing assumptions, and an AI agent that interprets everything precisely. That gap matters more now because Stack Overflow's 2025 Developer Survey reported that 84% of developers were using or planning to use AI tools, which means specs increasingly need to be resilient enough for automated interpretation, not just human interpretation, as discussed in this 2025 talk on AI-assisted coding and resilient specs.

A spec that won't become tomorrow's refactor usually comes down to three disciplines. Write at the right abstraction level. Tag decisions so future-you knows what changed. Put refresh triggers inside the spec so it knows when it might be wrong.

Don't Let Your Specs Become Waterfall with Markdown
- The backlash is pointing at a real failure mode
- Frozen specs die fast in brownfield code
The Three Disciplines of Durable Specs
Writing Specs for AI Agents Not Just for Humans
- Start with a human-readable frame
- Then make it literal enough for a machine
Living Spec Patterns That Prevent Drift and Rot
Common Anti-Patterns That Guarantee a Refactor
- Specs that pretend the codebase is clean
- Refactors mixed with feature work
Choosing Your Tool and Integrating Your Workflow
- Most solo builders should stay spec-anchored
- Pick the lightest tool that you will keep using

Don't Let Your Specs Become Waterfall with Markdown

A overwhelmed developer buried under a waterfall of technical specifications and markdown documents labeled productivity trap.

The backlash is pointing at a real failure mode

The critics aren't wrong when they call bad spec work “waterfall with markdown.” They're describing a document that tries to freeze reality before implementation starts. In practice, reality moves first.

You see it fast on a brownfield app. You open the repo, think a feature is isolated, then discover auth state leaks into billing, a shared helper is used in three places you didn't expect, and the “simple cleanup” touches one package with old assumptions baked into it. If the spec was written like a fixed contract, it's already stale.

Practical rule: If your spec assumes the codebase will stay still while the work happens, the spec is fiction.

That's why the anti-spec backlash lands. People wrote ceremonial docs, then watched the code move underneath them. The problem wasn't writing things down. The problem was pretending the document wouldn't need to change.

A lot of the public argument gets stuck there. If you want the broader critique, this piece on whether spec-driven development is just waterfall with markdown frames the tension clearly.

Frozen specs die fast in brownfield code

What works is a spec written to survive contact with implementation. That means fewer claims about internals, more clarity on boundaries, and explicit notes about what gets revisited when conditions change.

This matters even more with AI coding agents. A human developer can often infer missing intent from tribal knowledge. An agent can't. It will fill gaps with guesses. On a clean greenfield toy app, you might get away with that. On a codebase that already has history, shortcuts, and hidden coupling, you won't.

A frozen spec also ignores how refactoring really works. In mature engineering practice, refactoring is a continuous maintenance activity tied to keeping design and requirements aligned as software evolves, not a one-time cleanup phase, as argued in Ben Nadel's discussion of refactoring in intent programming.

That changes how you write the spec. You're not trying to stop change. You're trying to make future change cheaper.

The Three Disciplines of Durable Specs

An infographic titled The Three Disciplines of Durable Specs, outlining three key principles for documentation.

Write behavior and leave implementation room

The first discipline is simple. Specify behaviors, not implementations.

Bad spec line:

Fragile instruction: “Add handleGoogleAuth in auth.ts, update the session serializer, and write the callback result to users.provider_id.”

Better spec line:

Stable contract: “A user can sign in with Google. If the account already exists, attach Google as an additional sign-in method. Existing email-password login must keep working.”

The first version hard-codes a path through the codebase. The second version defines the user-visible contract. That contract can survive refactoring.

This is not abstract purity. It's practical. A defect found late costs more to fix. IBM and the Systems Sciences Institute reported that a defect found during design costs about 1x to fix, while the same defect can cost 6x during coding, 15x during testing, and 100x or more after release, a spread often used to argue for encoding constraints and validation early in the spec rather than leaving them implicit until later change work, as summarized in this discussion of defect cost over time.

Tag decisions so history survives

The second discipline is decision state. Use Proposed / Accepted / Superseded.

That sounds boring until you revisit a feature months later and can't remember why you rejected the first path. A spec without decision history forces you to rediscover old mistakes.

A simple pattern is enough:

Decision	State	Note
Use hosted search for v1	Accepted	Faster to ship than building ranking logic now
Add typo tolerance in app layer	Superseded	Replaced by provider-native query settings
Rebuild auth before adding SSO	Proposed	Blocked until current session bugs are isolated

On r/ClaudeCode, a thread about ADRs pointed out the value of recording not just what was decided, but what was superseded, because it stops teams from circling back to failed paths with no memory of why they failed in the first place, as discussed in this Reddit thread on ADRs and supersession history.

Record dead ends. Future-you is one of the main readers of the spec.

Add refresh triggers before you need them

The third discipline is often skipped. Put refresh triggers inside the spec.

Examples:

Dependency trigger: Review this section when the Stripe API version changes.
Traffic shape trigger: Review rate-limit assumptions if background jobs begin batching uploads.
Architecture trigger: Review file ownership if auth moves from middleware to service boundaries.
Product trigger: Review onboarding rules when invited team accounts ship.

These lines look small. They do a lot of work. They admit that the spec will become wrong, and they tell you when to reopen it.

Without triggers, stale specs decay. With triggers, they become living docs tied to real events.

Writing Specs for AI Agents Not Just for Humans

A human can interpolate. An AI agent usually can't. That changes the level of precision you need.

Start with a human-readable frame

Addy Osmani's six-section spec format is a good base because it forces you to separate context from action: Background, Goals, Non-Goals, Implementation, Open Questions, Success Metrics. It's readable. It also makes missing thought visible. If you can't write non-goals, you probably haven't bounded the task.

For solo builders, non-goals are often the highest-value section. Much of the available refactoring advice assumes you can keep inspecting and polishing by hand, but for small teams the constraint is bandwidth, not discipline. That's why explicit non-goals and testable acceptance scenarios matter when validation time is scarce, as argued in this piece on refactoring, bandwidth, and reducing rework.

A decent starting shape looks like this:

Background: What problem exists in the product and codebase
Goals: What user-visible result must happen
Non-Goals: What this work must not change
Implementation notes: Constraints, touched systems, and known hazards
Open questions: Things you still need answered
Success criteria: Observable outcomes you can verify

If you want a deeper framing for why this works with agentic coding, Samuel Woods' AI insights are useful because they push beyond one-off prompting and toward context design.

Then make it literal enough for a machine

The second layer is where most specs fail. They read well to a human and still leave too much room for an agent to improvise.

Add these fields when AI will implement the work:

Touched files and directories
- Include full paths when you know them.
- If path discovery is part of the task, say that explicitly.
Protected zones
- Name what must not be changed.
- Example: “Do not modify session storage behavior.”
Interface contracts
- List endpoints, events, job names, schema expectations, and error conditions.
- If the agent can't infer a boundary safely, spell it out.
Acceptance criteria written like tests
- Example: “WHEN the user uploads a non-image file, THEN the API returns 400 and the UI shows Invalid file type.”
- This is much better than “validate uploads properly.”
Excluded behavior
- Say what you are leaving alone even if it's ugly.
- Brownfield work needs fences more than inspiration.

The more literal the execution environment, the less you can rely on implied intent.

If you need a practical writing template for this style, this guide to writing technical specifications is worth keeping nearby.

Living Spec Patterns That Prevent Drift and Rot

An infographic titled Living Spec Patterns listing four best practices to prevent documentation drift and decay.

A spec starts aging the day you write it. The trick is slowing that decay enough that it stays useful.

Risk tags expose the parts most likely to break

Untyped assumptions are where future bugs hide. I've had more trouble from “obvious” assumptions than from known hard parts.

Write assumptions like this:

[ASSUMPTION LOW] Users already have a verified email by the time they reach billing.
[ASSUMPTION HIGH] The webhook provider delivers events in a predictable order.
[ASSUMPTION HIGH] Existing image processing workers can handle the new file size range.

The tag matters because it tells you where to spend review time. In solo work, that matters more than perfect completeness. You usually don't have enough time to validate every sentence equally.

A living spec also needs a place to age gracefully. That's the difference between a useful document and a dead artifact. If you want the framing around that, this piece on living specs versus frozen specs gets the distinction right.

Acceptance criteria must be observable

A lot of specs fail because they describe wishes instead of checks.

Bad examples:

Vague: “The dashboard loads quickly.”
Vague: “The sync process should be reliable.”
Vague: “The upload flow should feel smooth.”

Better examples:

Observable: “The dashboard shows either loaded data or a visible loading state. It never renders an empty shell with no status.”
Observable: “If provider sync fails, the user sees the last successful sync time and a retry action.”
Observable: “If upload validation fails, the user keeps the form state and sees the error next to the file field.”

You're defining what someone can see from outside the system. That's the safest layer to preserve through refactors.

This is also why durable specs map well to a red/green/refactor contract. Define the smallest externally visible behavior first, then keep scope constrained enough that implementation can change without rewriting the contract, which lowers regression risk because you can validate changes incrementally, as explained in this refactoring guide from TechTarget.

Here's a quick pattern I use:

Weak criterion	Strong criterion
“Handle invalid inputs”	“Reject unsupported file types with a visible error and no partial record creation”
“Improve search results”	“Searching by full email returns the exact matching account in results”
“Make onboarding clear”	“New users see the next required action before they can continue”

A short explainer helps here:

Validation scenarios make refactors safer

Validation scenarios are where your spec stops being commentary and starts being executable thinking.

Use Given / When / Then for the risky paths:

Given a user has an expired invite
When they open the invite link
Then they see an expired state and a request-new-invite action
Given a payment retry succeeds after a previous failure
When the webhook arrives twice
Then only one successful charge state is persisted
Given a user loses network mid-save
When they retry from the UI
Then the system avoids duplicate record creation

These scenarios force clarity. They also help separate valid refactoring from accidental behavior changes.

Common Anti-Patterns That Guarantee a Refactor

A chart illustrating four common software specification anti-patterns and their resulting negative impacts on development workflows.

Specs that pretend the codebase is clean

The “magic spec” is the document written as if the current app is tidy, coherent, and obvious. It usually isn't.

If you don't inspect the existing code before writing the spec, you'll miss the ugly constraints that shape the implementation. That means your agent will discover them for you, late, badly, and with side effects.

The fastest way to create tomorrow's refactor is to write a spec that ignores:

Hidden dependencies: Shared utilities, middleware side effects, old callbacks, or scheduler jobs
State ownership: Which layer is responsible for truth
Existing failure modes: Timeouts, retries, stale caches, duplicate events
Protected legacy behavior: Weird behavior users already rely on

A spec for a brownfield feature that doesn't mention the old system isn't a spec. It's wishful thinking.

Refactors mixed with feature work

The second anti-pattern is mixing cleanup with behavior change because “you're already in there.” That almost always expands the blast radius.

Refactoring guidance keeps repeating the same point for a reason. Create or verify tests before changing code. Don't mix refactors with new features. Scope creep turns a safe refactor into a dangerous behavior change, as covered in this refactoring guide from Cheesecake Labs.

The shortcuts that cause pain later are predictable:

Implementation-first specs: They hard-code functions, classes, and schema choices before you know what the smallest stable behavior is.
No non-goals: The agent wanders into nearby systems because nothing fenced it off.
Dependency omissions: The spec lists the new endpoint but not the cron, worker, policy, analytics event, or cache that also gets touched.
Cleanup bundling: “While we're here” rewrites structure and adds features in one pass.

If you catch yourself saying “we can tidy this and add the new flow at the same time,” stop and split the work.

Choosing Your Tool and Integrating Your Workflow

A maturity model infographic illustrating three stages of spec tooling: Basic, Integrated, and Automated workflows.

Most solo builders should stay spec-anchored

Birgitta Böckeler's framing is useful because it stops people from jumping straight to the most extreme workflow. There's a difference between spec-first, spec-anchored, and spec-as-source. For most solo founders with an existing codebase, spec-anchored is the sweet spot.

That means the spec is central, but not sacred. You write it before implementation. You update it during implementation. You keep it tied to tests, boundaries, and decisions. But you don't pretend the whole system should be generated from it.

That middle ground fits the nature of brownfield work. You need structure, but you also need room to adapt when the repo reveals something annoying on day two.

Long-term context is the issue. On forums like r/VibeCodeDevs, solo developers using AI keep running into the same problem: chat history is a terrible substitute for a living document that tracks architecture and evolving requirements over time, as discussed in this r/VibeCodeDevs thread on long-term context management.

Pick the lightest tool that you will keep using

You don't need a massive framework if you won't maintain it. You need a workflow that survives actual use.

Here's the practical trade-off view:

Option	What it's good at	Where it breaks
Plain Markdown in repo	Cheap, flexible, easy to start	Depends heavily on your discipline
GitHub Spec Kit	Strong structure and repeatable templates	Can feel rigid for messy existing systems
OpenSpec	Brownfield-friendly delta specs and lighter footprint	Still needs a habit for review and refresh
Tekk.coach	Generates structured specs from a GitHub repo, keeps a living document, and adds an async CTO loop with one proposal or question per workspace tick	Another tool in your stack, and currently GitHub-only

That's the critical decision. Not which brand wins. Which loop can you sustain?

A simple setup I'd recommend for most solo builders looks like this:

Keep specs in version control
- The spec should live near the code, not in disappearing chat threads.
Use one canonical template
- Don't invent a new format every week.
- Add goals, non-goals, touched systems, assumptions, decisions, refresh triggers, and validation scenarios.
Hand the spec to the agent manually
- Then review output against the acceptance criteria, not against vibes.
Update the spec after material discoveries
- Especially when a decision changes from proposed to accepted, or accepted to superseded.
Separate debt notes from current scope
- If you see adjacent cleanup, log it. Don't absorb it.

For a wider framing on how teams are thinking about cleanup pressure created by AI-generated work, insights on managing AI technical debt are worth reading.

If you're working in brownfield code, tool choice also depends on how much ceremony you can tolerate. OpenSpec is worth a look for delta-style specs. GitHub Spec Kit is useful if you want more structure. If you're comparing approaches for existing repositories, the brownfield angle in OpenSpec tends to fit solo work better than heavier process stacks. If you want a broader critique of where these workflows help and where they become ceremony, page 47 on honest critique is a better read than most tool landing pages. For the repo reality check, page 39 on brownfield SDD is also worth your time, and the main spec-driven development hub ties the models together.

If you want help keeping specs alive instead of turning them into tomorrow's refactor, Tekk.coach is one option. Connect your GitHub repo. Describe the problem. Get a structured spec. Ship.

Part of the Spec-Driven Development pillar — a 52-page honest playbook on shipping with AI coding agents.

How to Build Specs Today That Won't Become Tomorrow's Refactor: AI Strategies

Table of Contents

Don't Let Your Specs Become Waterfall with Markdown