Prompt-Chaining for Complex Builds: A Technical Playbook
September 15, 2025
An implementation-level guide to shipping complex systems by chaining small, verifiable stages.
TL;DR
Traditional “mega prompts” collapse under ambiguity and hidden coupling. The fix is to ship in small, testable Stages that each return a complete artifact, preserve explicit contracts (IDs, APIs, schemas, filenames, tone), and include a pass/fail acceptance checklist. You run each Stage in a fresh chat with a Universal Runner Prompt, paste the prior artifact(s) as context, and iterate that Stage until it passes. This yields reproducibility, easier debugging, and fewer regressions.
Core concepts
- Stage — a short prompt defining one concrete outcome (e.g., an endpoint, a UI section, a data transform).
- Artifact — the sole output requested by a Stage (file(s) to save as-is).
- Contract — non-negotiable IDs, APIs, schemas, filenames, copy, style guides that must not drift.
- Acceptance — explicit pass/fail checks (visible effects, exact strings, shapes). Your gate to move on.
- Regression budget — a reminder: previous features must still work. Shims > silent breaking changes.
- SELFTEST — a tiny page/script/command that sanity-checks critical behaviors after each Stage.
Always prefer one artifact per Stage unless it’s truly necessary to emit multiple files. Deterministic filenames make CI and reviews simple.
Universal Stage Runner Prompt (General)
Use this at the top of every Stage chat.
Universal Stage Runner Prompt (General)
ROLE
You are a meticulous, senior builder. You will be given:
1) A single “Stage” prompt (plain text) describing the next feature or slice of a larger project.
2) (Optional) The current artifact(s) produced by earlier stages.
OBJECTIVE
Return a COMPLETE, USABLE artifact that integrates the Stage requirements while preserving all previously built behavior/contracts. Prefer a SINGLE artifact output unless the Stage explicitly calls for multiple files.
HARD RULES
- Output exactly what the Stage requests, and nothing else (no commentary/markdown unless the Stage asks for it).
- Preserve previously established IDs, APIs, filenames, schemas, and UX contracts unless explicitly superseded in this Stage.
- Avoid regressions. If a change risks breaking earlier functionality, adjust your implementation to keep earlier guarantees intact.
- Follow the Stage’s acceptance checklist; treat it as a self-test gate before output.
- If ambiguity arises, make the smallest, lowest-risk decision that satisfies acceptance and preserves prior behavior.
- Do not print or echo the prompt text or your reasoning; only return the requested artifact(s).
INPUT FORMAT
STAGE:
<paste the full Stage text here>
CURRENT_ARTIFACTS:
<for Stage 01+ only — paste/link prior outputs here as the starting point>
OUTPUT FORMAT
Return ONLY the requested artifact(s) in the format(s) specified by the Stage (e.g., single-file HTML, code file, Markdown doc, etc.). No extra prose.
Stage Skeleton (copy, then fill it in)
Keeps stages small, testable, and self-contained.
Stage NN — Title
Context:
- 1–3 lines that set the project's world/domain and key non-negotiables so this stage can run in isolation.
Goal of this stage:
- One concise outcome sentence that defines “done”.
What to build now (requirements):
- Bulleted, verifiable behaviors (APIs, UI elements, endpoints, data contracts, error strings).
- Interfaces to add or extend (IDs, function signatures, filenames).
- Constraints (performance, security, accessibility, portability).
- Nonfunctional requirements (style, tone, formatting).
Preserve & do not regress:
- Name fragile parts of prior work you must not break (IDs, schemas, UX contracts, acceptance tests).
Acceptance checklist:
- Concrete pass/fail criteria; exact string/shape matches; visible effect checks.
- Include at least one regression check (“previous feature X still functions”).
Output rule:
- e.g., “Return exactly one single-file HTML”, or “Return a Python script and a README.md; no other files”.
Why prompt-chaining outperforms “mega prompts”
Failure modes of monolith prompts
- Constraint loss: long lists of requirements compete; weak ones vanish.
- Hidden drift: IDs/filenames/schemas change silently; down-stream steps break.
- No ground truth: “Looks good?” is not a test.
- Non-determinism: updates regenerate everything and introduce new bugs.
Fixes via chaining
- One clean outcome per Stage → lower variance, faster iteration.
- Contracts restated → controlled evolution and shims.
- Acceptance gates → observable definition of “done.”
- Artifact history → reproducible, scriptable CI gates.
Contract design and enforcement
Contracts are the heartbeat of repeatable builds. Capture and re-assert them in every Stage.
What to freeze
- IDs/selectors:
#app
,data-test="submit"
,main-nav
- APIs: endpoint paths, method names, param names, success/error shapes
- Schemas: JSON types, required/optional fields, enums, error codes
- Filenames/paths:
stage-03-dashboard.html
,schema/user.v1.json
- Copy & tone: visible strings, exact error messages, headings
JSON Schema example (versioned)
{
"$id": "schema/user.v1.json",
"type": "object",
"required": ["id", "email"],
"properties": {
"id": {"type": "string", "pattern": "^[a-z0-9_-]{8,}$"},
"email": {"type": "string", "format": "email"},
"role": {"type": "string", "enum": ["user", "admin"]}
},
"additionalProperties": false
}
Error taxonomy (stable strings)
{
"E_BAD_INPUT": "Invalid request. See 'errors' for details.",
"E_NOT_FOUND": "Resource not found.",
"E_RATE_LIMIT": "Too many requests. Please retry later."
}
Shim policy
- If you must rename a symbol or change a shape, ship a shim that preserves old behavior until a later deprecation Stage.
- Log deprecations once per minute, not per call.
Acceptance engineering
Treat acceptance like unit tests: short, deterministic, and cheap to run.
Design rules
- Target visible, objective checks (text, DOM nodes, API responses).
- Use exact strings for headings/errors.
- Favor shape checks for JSON and schema validation.
- Add one regression item per Stage.
Examples
Front-end page
[ ] index.html contains <div id="app"> with a child <nav>.
[ ] <title> is exactly "Acme Dashboard".
[ ] Button with data-test="save" exists and is enabled by default.
API endpoint
[ ] GET /v1/users/{id} returns 200 with body matching schema/user.v1.json
[ ] Nonexistent id returns 404 with error "Resource not found."
[ ] Rate limited client receives 429 and preserves Retry-After header.
Data transform
[ ] normalize([]) → []
[ ] normalize(["A","a","Á"]) deduplicates case & accent (length=1)
[ ] normalize(null) throws TypeError("normalize: input must be array")
Reproducibility and determinism
- Seed randomness for fixtures and synthetic data.
- Pin dependencies; record tool versions and OS/arch.
- Freeze dataset slices (e.g., commit a minimal golden set).
- Surface TRACE logs (non-sensitive) for critical paths.
- Name artifacts deterministically and keep a
CHANGELOG.md
.
TRACE example
TRACE normalize: { in_len: 3, out_len: 1, method: "fold+NFC" }
CI wiring (example)
Run acceptance checks on every artifact change. These are intentionally small and fast.
GitHub Actions
name: stage-acceptance
on:
push:
paths:
- "stage-*/**"
- "schema/**"
- "scripts/**"
jobs:
selftest:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install deps
run: |
python -m pip install --upgrade pip
pip install -r scripts/requirements.txt
- name: Run acceptance
run: bash scripts/selftest.sh
scripts/selftest.sh
#!/usr/bin/env bash
set -euo pipefail
echo "== HTML checks =="
grep -q '<div id="app">' stage-03-dashboard.html
grep -q '<title>Acme Dashboard</title>' stage-03-dashboard.html
echo "== API schema checks =="
python scripts/check_schema.py response.json schema/user.v1.json
echo "== Done =="
scripts/check_schema.py
import json, sys
from jsonschema import validate
data = json.load(open(sys.argv[1]))
schema = json.load(open(sys.argv[2]))
validate(instance=data, schema=schema)
print("schema ok")
Runner I/O examples
Stage 00 (no CURRENT_ARTIFACTS)
STAGE:
Stage 00 — Project foundation
Context:
- New web app "Acme Dashboard". Output must be a single-file HTML scaffold.
Goal of this stage:
- A skeleton page that renders and passes minimal checks.
What to build now (requirements):
- <div id="app">
- A top nav region (empty ok), a main content region, and a footer.
- Title "Acme Dashboard".
Preserve & do not regress:
- N/A
Acceptance checklist:
- index.html contains <div id="app">
- <title> is exactly "Acme Dashboard"
- File loads without errors when opened locally
Output rule:
- Return exactly one single-file HTML named index.html
Stage 01+ (with CURRENT_ARTIFACTS)
STAGE:
Stage 01 — Primary navigation shell
Context:
- Build on Stage 00 skeleton. Preserve IDs and title.
Goal of this stage:
- Add a responsive nav bar with 3 placeholders (Home, Reports, Settings).
What to build now (requirements):
- Nav uses <nav id="main-nav"> and <button data-test="save">Save</button> in main area.
- Keep <div id="app"> root unchanged.
Preserve & do not regress:
- <div id="app"> exists; <title> stays "Acme Dashboard".
Acceptance checklist:
- main-nav exists; “Home” text is visible.
- save button exists and is enabled.
- Stage 00 checks still pass.
Output rule:
- Return exactly one updated index.html
CURRENT_ARTIFACTS:
<paste the index.html produced by Stage 00 here>
Domain recipes (expanded)
Front-end app
- 00 Foundation layout & tokens
- 01 Component shells & routing
- 02 Data layer & mock API (fixtures)
- 03 A11y & keyboard flows (aria, focus order, landmarks)
- 04 Validation & error taxonomy (exact strings)
- 05 Theming & reduced motion (prefers-reduced-motion)
- 06 Persistence & deep links (URL state)
- 07 SELFTEST page (ids + state echo)
- 08 Perf budget (LCP targets, code split)
- 09 Docs site (usage, contracts, examples)
Back-end/API
- 00 Service skeleton & health checks (
/healthz
,/readyz
) - 01 Endpoint A + tests + error taxonomy
- 02 Schema & migrations (idempotent)
- 03 AuthN/Z & audit logging (PII safe)
- 04 Rate limits & idempotency keys
- 05 Observability (metrics, tracing, logs)
- 06 Load tests & SLOs (+ burn-in)
- 07 Blue/green deploy & rollback plan
- 08 Runbook (operational procedures)
- 09 Migration/deprecation Stage (remove shims)
Data/ML
- 00 Data contract & ingestion (schema + examples)
- 01 Cleaning/normalization (unit tests)
- 02 Feature set v1 + baseline metrics
- 03 Eval harness + golden sets (frozen)
- 04 Bias/fairness checks + documentation
- 05 Monitoring & drift detection (stat tests)
- 06 Reproducibility (seed RNG, lockfile)
- 07 Model card + thresholds + sign-off
- 08 Batch/real-time glue + backfill plan
- 09 Rollback protocol and shadow deploy
Docs/Writing
- 00 Outline & voice/tone guide
- 01 Section skeletons with targets
- 02 Draft sections (acceptance: headings/word count)
- 03 Cross-refs & citations (stable anchors)
- 04 Visuals & captions (alt text)
- 05 Line edit & style audit (terminology, glossary)
- 06 Accessibility (structure, link text)
- 07 Executive summary (1-page)
- 08 Fact check & references (links validated)
- 09 Publish (PDF/HTML)
Prompt engineering patterns that help
- First line clarity: “Return exactly one file named …”
- Contracts up front: restate non-negotiables at the top of “What to build now”.
- Concrete acceptance: no fuzzy language; use exact strings and shapes.
- Ambiguity resolution: instruct the model to choose the smallest, lowest-risk option.
- No chatter, just artifacts: explicit “no commentary” rule in Runner Prompt.
Example: minimal end-to-end slice
Stage 00 output (index.html
) (excerpt)
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Acme Dashboard</title>
</head>
<body>
<div id="app">
<nav id="main-nav"></nav>
<main>
<button data-test="save">Save</button>
</main>
<footer></footer>
</div>
</body>
</html>
SELFTEST (bash)
grep -q '<div id="app">' index.html
grep -q '<title>Acme Dashboard</title>' index.html
grep -q 'data-test="save"' index.html
echo "selftest ok"
Debugging & recovery
- Failing acceptance? Tighten the item or split into two smaller ones.
- Drifted contract? Reintroduce the missing ID/API and add a shim + deprecation note.
- Flaky outputs? Seed randomness, pin versions, reduce non-determinism.
- Scope creep? Move extras to the next Stage; preserve green acceptance today.
Versioning & traceability
- One directory per Stage or a prefix in filenames:
stage-07-selftest.html
. - Add a short
CHANGELOG.md
per Stage with: what changed, why, acceptance diff, contracts touched. - Tag repo on milestones (
v0.1-foundation
,v0.2-nav
).
Suggested repo layout
.
├─ stages/
│ ├─ 00-foundation/
│ │ ├─ index.html
│ │ └─ CHANGELOG.md
│ ├─ 01-nav/
│ │ ├─ index.html
│ │ └─ CHANGELOG.md
│ └─ ...
├─ schema/
├─ scripts/
│ ├─ selftest.sh
│ └─ check_schema.py
└─ README.md
FAQ
Why one chat per Stage?
To avoid hidden, stale context. Each Stage restates contracts and supplies the actual artifact; results become reproducible.
How do I handle large artifacts?
Use a link to a hosted file or attach a trimmed excerpt plus a consistent file path the model should preserve.
What about reasoning visibility?
Ask the model to mentally verify acceptance; you only want the artifact. Keep prompts and artifacts in the repo for review.