Prompt-Chaining for Complex Builds: A Technical Playbook

September 15, 2025

An implementation-level guide to shipping complex systems by chaining small, verifiable stages.

TL;DR

Traditional “mega prompts” collapse under ambiguity and hidden coupling. The fix is to ship in small, testable Stages that each return a complete artifact, preserve explicit contracts (IDs, APIs, schemas, filenames, tone), and include a pass/fail acceptance checklist. You run each Stage in a fresh chat with a Universal Runner Prompt, paste the prior artifact(s) as context, and iterate that Stage until it passes. This yields reproducibility, easier debugging, and fewer regressions.

Core concepts

Stage — a short prompt defining one concrete outcome (e.g., an endpoint, a UI section, a data transform).
Artifact — the sole output requested by a Stage (file(s) to save as-is).
Contract — non-negotiable IDs, APIs, schemas, filenames, copy, style guides that must not drift.
Acceptance — explicit pass/fail checks (visible effects, exact strings, shapes). Your gate to move on.
Regression budget — a reminder: previous features must still work. Shims > silent breaking changes.
SELFTEST — a tiny page/script/command that sanity-checks critical behaviors after each Stage.

Always prefer one artifact per Stage unless it’s truly necessary to emit multiple files. Deterministic filenames make CI and reviews simple.

Universal Stage Runner Prompt (General)

Use this at the top of every Stage chat.

Universal Stage Runner Prompt (General)

ROLE
You are a meticulous, senior builder. You will be given:
  1) A single “Stage” prompt (plain text) describing the next feature or slice of a larger project.
  2) (Optional) The current artifact(s) produced by earlier stages.

OBJECTIVE
Return a COMPLETE, USABLE artifact that integrates the Stage requirements while preserving all previously built behavior/contracts. Prefer a SINGLE artifact output unless the Stage explicitly calls for multiple files.

HARD RULES
  - Output exactly what the Stage requests, and nothing else (no commentary/markdown unless the Stage asks for it).
  - Preserve previously established IDs, APIs, filenames, schemas, and UX contracts unless explicitly superseded in this Stage.
  - Avoid regressions. If a change risks breaking earlier functionality, adjust your implementation to keep earlier guarantees intact.
  - Follow the Stage’s acceptance checklist; treat it as a self-test gate before output.
  - If ambiguity arises, make the smallest, lowest-risk decision that satisfies acceptance and preserves prior behavior.
  - Do not print or echo the prompt text or your reasoning; only return the requested artifact(s).

INPUT FORMAT
STAGE:
<paste the full Stage text here>

CURRENT_ARTIFACTS:
<for Stage 01+ only — paste/link prior outputs here as the starting point>

OUTPUT FORMAT
Return ONLY the requested artifact(s) in the format(s) specified by the Stage (e.g., single-file HTML, code file, Markdown doc, etc.). No extra prose.

Stage Skeleton (copy, then fill it in)

Keeps stages small, testable, and self-contained.

Stage NN — Title

Context:
  - 1–3 lines that set the project's world/domain and key non-negotiables so this stage can run in isolation.

Goal of this stage:
  - One concise outcome sentence that defines “done”.

What to build now (requirements):
  - Bulleted, verifiable behaviors (APIs, UI elements, endpoints, data contracts, error strings).
  - Interfaces to add or extend (IDs, function signatures, filenames).
  - Constraints (performance, security, accessibility, portability).
  - Nonfunctional requirements (style, tone, formatting).

Preserve & do not regress:
  - Name fragile parts of prior work you must not break (IDs, schemas, UX contracts, acceptance tests).

Acceptance checklist:
  - Concrete pass/fail criteria; exact string/shape matches; visible effect checks.
  - Include at least one regression check (“previous feature X still functions”).

Output rule:
  - e.g., “Return exactly one single-file HTML”, or “Return a Python script and a README.md; no other files”.

Why prompt-chaining outperforms “mega prompts”

Failure modes of monolith prompts

Constraint loss: long lists of requirements compete; weak ones vanish.
Hidden drift: IDs/filenames/schemas change silently; down-stream steps break.
No ground truth: “Looks good?” is not a test.
Non-determinism: updates regenerate everything and introduce new bugs.

Fixes via chaining

One clean outcome per Stage → lower variance, faster iteration.
Contracts restated → controlled evolution and shims.
Acceptance gates → observable definition of “done.”
Artifact history → reproducible, scriptable CI gates.

Contract design and enforcement

Contracts are the heartbeat of repeatable builds. Capture and re-assert them in every Stage.

What to freeze

IDs/selectors: #app, data-test="submit", main-nav
APIs: endpoint paths, method names, param names, success/error shapes
Schemas: JSON types, required/optional fields, enums, error codes
Filenames/paths: stage-03-dashboard.html, schema/user.v1.json
Copy & tone: visible strings, exact error messages, headings

JSON Schema example (versioned)

{
  "$id": "schema/user.v1.json",
  "type": "object",
  "required": ["id", "email"],
  "properties": {
    "id": {"type": "string", "pattern": "^[a-z0-9_-]{8,}$"},
    "email": {"type": "string", "format": "email"},
    "role": {"type": "string", "enum": ["user", "admin"]}
  },
  "additionalProperties": false
}

Error taxonomy (stable strings)

{
  "E_BAD_INPUT": "Invalid request. See 'errors' for details.",
  "E_NOT_FOUND": "Resource not found.",
  "E_RATE_LIMIT": "Too many requests. Please retry later."
}

Shim policy

If you must rename a symbol or change a shape, ship a shim that preserves old behavior until a later deprecation Stage.
Log deprecations once per minute, not per call.

Acceptance engineering

Treat acceptance like unit tests: short, deterministic, and cheap to run.

Design rules

Target visible, objective checks (text, DOM nodes, API responses).
Use exact strings for headings/errors.
Favor shape checks for JSON and schema validation.
Add one regression item per Stage.

Examples

Front-end page

[ ] index.html contains <div id="app"> with a child <nav>.
[ ] <title> is exactly "Acme Dashboard".
[ ] Button with data-test="save" exists and is enabled by default.

API endpoint

[ ] GET /v1/users/{id} returns 200 with body matching schema/user.v1.json
[ ] Nonexistent id returns 404 with error "Resource not found."
[ ] Rate limited client receives 429 and preserves Retry-After header.

Data transform

[ ] normalize([]) → []
[ ] normalize(["A","a","Á"]) deduplicates case & accent (length=1)
[ ] normalize(null) throws TypeError("normalize: input must be array")

Reproducibility and determinism

Seed randomness for fixtures and synthetic data.
Pin dependencies; record tool versions and OS/arch.
Freeze dataset slices (e.g., commit a minimal golden set).
Surface TRACE logs (non-sensitive) for critical paths.
Name artifacts deterministically and keep a CHANGELOG.md.

TRACE example

TRACE normalize: { in_len: 3, out_len: 1, method: "fold+NFC" }

CI wiring (example)

Run acceptance checks on every artifact change. These are intentionally small and fast.

GitHub Actions

name: stage-acceptance
on:
  push:
    paths:
      - "stage-*/**"
      - "schema/**"
      - "scripts/**"
jobs:
  selftest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install deps
        run: |
          python -m pip install --upgrade pip
          pip install -r scripts/requirements.txt
      - name: Run acceptance
        run: bash scripts/selftest.sh

scripts/selftest.sh

#!/usr/bin/env bash
set -euo pipefail
echo "== HTML checks =="
grep -q '<div id="app">' stage-03-dashboard.html
grep -q '<title>Acme Dashboard</title>' stage-03-dashboard.html
echo "== API schema checks =="
python scripts/check_schema.py response.json schema/user.v1.json
echo "== Done =="

scripts/check_schema.py

import json, sys
from jsonschema import validate
data = json.load(open(sys.argv[1]))
schema = json.load(open(sys.argv[2]))
validate(instance=data, schema=schema)
print("schema ok")

Runner I/O examples

Stage 00 (no CURRENT_ARTIFACTS)

STAGE:
Stage 00 — Project foundation

Context:
  - New web app "Acme Dashboard". Output must be a single-file HTML scaffold.

Goal of this stage:
  - A skeleton page that renders and passes minimal checks.

What to build now (requirements):
  - <div id="app">
  - A top nav region (empty ok), a main content region, and a footer.
  - Title "Acme Dashboard".

Preserve & do not regress:
  - N/A

Acceptance checklist:
  - index.html contains <div id="app">
  - <title> is exactly "Acme Dashboard"
  - File loads without errors when opened locally

Output rule:
  - Return exactly one single-file HTML named index.html

Stage 01+ (with CURRENT_ARTIFACTS)

STAGE:
Stage 01 — Primary navigation shell

Context:
  - Build on Stage 00 skeleton. Preserve IDs and title.

Goal of this stage:
  - Add a responsive nav bar with 3 placeholders (Home, Reports, Settings).

What to build now (requirements):
  - Nav uses <nav id="main-nav"> and <button data-test="save">Save</button> in main area.
  - Keep <div id="app"> root unchanged.

Preserve & do not regress:
  - <div id="app"> exists; <title> stays "Acme Dashboard".

Acceptance checklist:
  - main-nav exists; “Home” text is visible.
  - save button exists and is enabled.
  - Stage 00 checks still pass.

Output rule:
  - Return exactly one updated index.html

CURRENT_ARTIFACTS:
<paste the index.html produced by Stage 00 here>

Domain recipes (expanded)

Front-end app

00 Foundation layout & tokens
01 Component shells & routing
02 Data layer & mock API (fixtures)
03 A11y & keyboard flows (aria, focus order, landmarks)
04 Validation & error taxonomy (exact strings)
05 Theming & reduced motion (prefers-reduced-motion)
06 Persistence & deep links (URL state)
07 SELFTEST page (ids + state echo)
08 Perf budget (LCP targets, code split)
09 Docs site (usage, contracts, examples)

Back-end/API

00 Service skeleton & health checks (/healthz, /readyz)
01 Endpoint A + tests + error taxonomy
02 Schema & migrations (idempotent)
03 AuthN/Z & audit logging (PII safe)
04 Rate limits & idempotency keys
05 Observability (metrics, tracing, logs)
06 Load tests & SLOs (+ burn-in)
07 Blue/green deploy & rollback plan
08 Runbook (operational procedures)
09 Migration/deprecation Stage (remove shims)

Data/ML

00 Data contract & ingestion (schema + examples)
01 Cleaning/normalization (unit tests)
02 Feature set v1 + baseline metrics
03 Eval harness + golden sets (frozen)
04 Bias/fairness checks + documentation
05 Monitoring & drift detection (stat tests)
06 Reproducibility (seed RNG, lockfile)
07 Model card + thresholds + sign-off
08 Batch/real-time glue + backfill plan
09 Rollback protocol and shadow deploy

Docs/Writing

00 Outline & voice/tone guide
01 Section skeletons with targets
02 Draft sections (acceptance: headings/word count)
03 Cross-refs & citations (stable anchors)
04 Visuals & captions (alt text)
05 Line edit & style audit (terminology, glossary)
06 Accessibility (structure, link text)
07 Executive summary (1-page)
08 Fact check & references (links validated)
09 Publish (PDF/HTML)

Prompt engineering patterns that help

First line clarity: “Return exactly one file named …”
Contracts up front: restate non-negotiables at the top of “What to build now”.
Concrete acceptance: no fuzzy language; use exact strings and shapes.
Ambiguity resolution: instruct the model to choose the smallest, lowest-risk option.
No chatter, just artifacts: explicit “no commentary” rule in Runner Prompt.

Example: minimal end-to-end slice

Stage 00 output (index.html) (excerpt)

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>Acme Dashboard</title>
</head>
<body>
  <div id="app">
    <nav id="main-nav"></nav>
    <main>
      <button data-test="save">Save</button>
    </main>
    <footer></footer>
  </div>
</body>
</html>

SELFTEST (bash)

grep -q '<div id="app">' index.html
grep -q '<title>Acme Dashboard</title>' index.html
grep -q 'data-test="save"' index.html
echo "selftest ok"

Debugging & recovery

Failing acceptance? Tighten the item or split into two smaller ones.
Drifted contract? Reintroduce the missing ID/API and add a shim + deprecation note.
Flaky outputs? Seed randomness, pin versions, reduce non-determinism.
Scope creep? Move extras to the next Stage; preserve green acceptance today.

Versioning & traceability

One directory per Stage or a prefix in filenames: stage-07-selftest.html.
Add a short CHANGELOG.md per Stage with: what changed, why, acceptance diff, contracts touched.
Tag repo on milestones (v0.1-foundation, v0.2-nav).

Suggested repo layout

.
├─ stages/
│  ├─ 00-foundation/
│  │  ├─ index.html
│  │  └─ CHANGELOG.md
│  ├─ 01-nav/
│  │  ├─ index.html
│  │  └─ CHANGELOG.md
│  └─ ...
├─ schema/
├─ scripts/
│  ├─ selftest.sh
│  └─ check_schema.py
└─ README.md

FAQ

Why one chat per Stage?
To avoid hidden, stale context. Each Stage restates contracts and supplies the actual artifact; results become reproducible.

How do I handle large artifacts?
Use a link to a hosted file or attach a trimmed excerpt plus a consistent file path the model should preserve.

What about reasoning visibility?
Ask the model to mentally verify acceptance; you only want the artifact. Keep prompts and artifacts in the repo for review.

Thanks

Thanks to NL and MP for reading drafts of this.