Flowtely · Process Walkthrough

AI-Led Agile Software Delivery

A best-in-class, maximum-autonomy software-delivery process in which an AI delivery team designs, builds, tests, secures, ships, and operates a software product end-to-end. The human's role is INPUT, never approval: the sponsor provides a brief (what they want + any available data) up front and answers questions continuously, but there are ZERO human sign-off gates. Instead, every major artifact — the spec/PRD, the architecture & plan, each sprint plan, each sprint increment, release readiness, post-deploy go-live, hypercare exit, and final handover — is validated by INDEPENDENT AI reviewer panels (a standing independent evaluator, Verdict, who never reviews what it authored, plus non-author specialists and an adversarial red-team) that iterate a shared GapLog to a binary, receipt-backed convergence verdict ('solid' = zero open material gaps AND all non-author reviewers agree). Honesty is enforced by RE-DERIVATION: reviewers reproduce the receipts (test counts, scan results, health checks, git hashes) instead of trusting author summaries, and Verdict's own verdicts are spot-checked by a second independent actor. Where the sponsor is silent the team proceeds on an explicit, flagged assumption (tracked in a canonical Assumption Register with a refutation-cascade) — it never blocks; the only human decision points are rare, non-blocking flagged escalations (e.g. risk-accepting a critical vulnerability) with a conservative silent default (no-go). Methodology spine: dual-track agile, Scrum DoR/DoD, Shape Up, C4 + ADRs + STRIDE threat modeling + privacy-by-design, the test pyramid, shift-left secure-SDLC (OWASP ASVS/SAMM) + software supply-chain controls (SBOM, dependency pinning, signing, SLSA provenance), trunk-based development + CI/CD quality gates, DORA metrics, SRE hypercare + LLMOps eval-drift monitoring. AI does all the work and runs the cadence; independent AI review — not human approval — is what makes each step trustworthy. (Refined by an independent multi-agent critique → repair → coherence loop until the independent reviewer's verdict was 'solid'.) — v3 (Sprint 117): the run now DELIVERS a real, deploy-verified, deployable-anywhere PACKAGE: infra_readiness_packaging emits a self-deploying package (deploy.md standard); clean_room_deploy_verify has a fresh zero-context Claude deploy + test it from deploy.md alone (receipt-backed, fail-closed); a context-bearing adequacy gate confirms the tests are meaningful; handover delivers the package + proof.

30 bricks 0 human approval gates Independent-review gated
Purpose

Turn a sponsor's brief into a fully specified, architected, built, tested, secured, deployed, and stabilized software product — produced autonomously by an AI delivery team — where quality is guaranteed not by human sign-off but by independent AI reviewer panels that iterate every major artifact to a documented, receipt-backed convergence.

01

Sponsor Intake: Brief, Four Canonical Registers & Continuous-Input Channel

AI+Human Agent: Vela

Stand up the engagement by capturing the sponsor's brief — the product idea in one paragraph, the target business outcome bound to a single fully-specified success metric, the stakeholder map, typed hard constraints, and an inventory of any provided data/information — and crystallizing it into the ONE run-spanning source of truth: FOUR canonical, versioned, schema-fixed, stable-ID registers that every later brick scores against and extends IN PLACE. The four are (1) the Assumption Register, (2) the Sponsor Question Log, (3) the PRODUCT Objectives & Success-Metric Rubric — the SPONSOR's product-acceptance criteria, explicitly NOT the meta process-design objectives — and (4) the NFR set. The PRODUCT rubric becomes the law every downstream independent-review loop mechanically scores product artifacts against by entry ID, converting later reviews from free-text opinion into scored, evidence-backed pass/fail and killing rubber-stamping at the source. Carry-forward is LAW: requirements_discovery, prd_and_backlog, and all later bricks EXTEND-IN-PLACE these same four registers by ID — they may add or refine rows but MUST NOT spin up a parallel register with a different schema; any new-schema register is a defect caught by lint. Each Assumption Register row names the downstream artifacts that depend on it (downstream-artifact-refs) so a later refuting sponsor answer can cascade-flag exactly those artifacts for re-review — the cascade is EXECUTED by program_health_rollup, named here as the owner of that mechanism; this brick only establishes the wiring (the refs). The continuous-input model holds: the sponsor PROVIDES information and answers questions throughout but NEVER sits in an approval gate; there is no human sign-off. Intake COMPLETES (never waits): it finishes when every rubric/NFR slot is either sponsor-sourced or assumption-filled-and-flagged, emitting an intake-coverage score and honesty receipts (artifact hashes, register IDs and versions, row counts). This brick is not a gate; it is the spine the whole run hangs on. ENTERPRISE-SCOPE-AND-SIZE (Sprint 117): Vela ASKS clarifying questions and captures a 1-10 SIZING score, then scopes the product like an enterprise-application architect — users, access/auth, security, controls, and functional depth/breadth sized to that score. NEVER build something useless for most users: the bar is a product that genuinely does its core job for its intended users (a toy is acceptable ONLY at sizing 1-2).

Deliverables
[Human] Raw sponsor brief: a one-paragraph product/idea description plus any provided information, data references, links, or documents, captured verbatim into the Sponsor Brief intake as the source-of-truth input.
Acceptance: The Sponsor Brief artifact contains a non-empty verbatim sponsor description field and an ingestion timestamp; the raw input is preserved unedited alongside the structured version (both raw and structured fields present).
[AI · Vela] Sponsor Brief (structured): one-paragraph idea, target business outcome, stakeholder map (name/role/decision-or-input), typed hard constraints, scope Non-Goals, and a provided-data inventory — emitted as a single hashed artifact.
Acceptance: Artifact exists and is content-hashed (hash in receipts). It contains all six sections; the constraints section tags every constraint with a type from {regulatory, security, data-residency, privacy, budget, timeline, technology, integration}; Non-Goals lists >=1 explicit out-of-scope item; the data inventory lists each provided asset with {name, format, sensitivity/classification, access-path} or is explicitly marked EMPTY with rationale.
[AI · Vela] Single Success Metric specification bound to the target outcome (the primary entry that the PRODUCT rubric centers on).
Acceptance: Exactly one primary success metric is defined with all five fields populated: {name, baseline (current value or 'unknown-flagged-assumption'), target, measurement_method/instrument, timeframe}. No additional metric is marked primary. If baseline is unknown it is recorded as a flagged row in the Assumption Register (not left blank).
[AI · Vela] REGISTER 1 of 4 — Assumption Register (canonical, versioned, run-spanning) with the cascade-wiring fields.
Acceptance: Register exists with a stable register ID and a version tag, and conforms to the FIXED schema for every row: {id, statement, owner, question_asked_of_sponsor, default_on_silence, blast_radius(HIGH|MED|LOW), falsification_test, status(open|confirmed|refuted|accepted-as-risk), downstream_artifact_refs[]}. Every row's downstream_artifact_refs[] lists the artifacts/brick-outputs that depend on it (>=0, empty only when nothing yet depends on it, and updated extend-in-place by later bricks). Every PRODUCT-rubric/NFR slot tagged source=assumption has a matching Assumption Register row by ID. A documented rule is recorded: a sponsor answer that refutes a row sets status=refuted, and program_health_rollup (the named owner of the cascade mechanism) reads downstream_artifact_refs[] to flag exactly those artifacts for re-review.
[AI · Vela] REGISTER 2 of 4 — Sponsor Question Log (canonical, versioned, run-spanning).
Acceptance: Log exists with a stable register ID and version tag and conforms to the FIXED schema per row: {id, question, raised_at, raised_to_sponsor, status(open|answered), answer, answered_at, affected_rubric_ids[], affected_assumption_ids[]}. The log is non-empty OR explicitly recorded as empty-with-rationale; schema is enforced (every row has all fields); each open question that drove an assumption cross-links the assumption row by ID.
[AI · Vela] REGISTER 3 of 4 — PRODUCT Objectives & Success-Metric Rubric (canonical, versioned, run-spanning): the SPONSOR's product-acceptance criteria, explicitly NOT the meta process-design objectives.
Acceptance: Rubric is a hashed artifact with a stable register ID and version tag. A header note explicitly states 'PRODUCT acceptance criteria — NOT the meta process-design objectives'. Every entry conforms to the FIXED schema {id, objective_text, measurable_criterion, measurement_method, weight/criticality, pass_predicate, source(sponsor|assumption|process-default)}. There is >=1 entry per sponsor-stated PRODUCT objective; every entry has a non-empty pass_predicate that evaluates to true/false; entry IDs are unique and stable. A coverage check confirms zero stated product objectives lack at least one measurable, pass/fail-predicated entry. No process-design (OBJ-01..OBJ-07 meta) objective is mixed into this register.
[AI · Vela] REGISTER 4 of 4 — NFR set (canonical, versioned, run-spanning): the non-functional requirements baseline (performance, availability, scalability, security posture, compliance, accessibility, operability) bound to measurable thresholds.
Acceptance: NFR set is a hashed artifact with a stable register ID and version tag and conforms to the FIXED schema per row: {id, category, requirement_statement, measurable_threshold, measurement_method, criticality, source(sponsor|assumption|process-default)}. Every row has a measurable_threshold (no unquantified 'fast'/'secure'); every row sourced=assumption has a matching Assumption Register row by ID; categories present cover at minimum {performance, availability, security, compliance} or each missing category is marked N/A-with-rationale.
[AI · Vela] Source-of-Truth Manifest + Carry-Forward Law statement binding the four registers as the ONE source of truth.
Acceptance: A manifest artifact lists exactly the four canonical registers by {name, stable_id, version, schema_fingerprint, stable_id_convention} and asserts the carry-forward law verbatim: 'requirements_discovery, prd_and_backlog, and all later bricks EXTEND-IN-PLACE these registers by ID; they may add/refine rows but MUST NOT create a parallel register with a different schema; any new register is a defect caught by lint.' The manifest names program_health_rollup as the owner of the refuting-answer cascade. The stable_id_convention is documented per register (e.g., ASM-###, SQL-###, PRB-### for product-rubric, NFR-###) and is collision-free within the run.
[AI · Vela] Continuous-Input Channel spec + intake-coverage receipt.
Acceptance: Channel spec documents (a) how mid-run sponsor input is ingested, (b) how it is diffed against the current registers, and (c) the re-review trigger when a late answer refutes an assumption (delegated to program_health_rollup via downstream_artifact_refs[]). Intake-coverage score is emitted as coverage = sponsor-sourced_slots / total_slots across the PRODUCT rubric and NFR set, with the count of assumption-filled slots; the receipts block records {brief_hash, manifest_hash, assumption_register_id+version+rows, question_log_id+version+rows, product_rubric_id+version+rows, nfr_set_id+version+rows, coverage_score}. There is NO approval/sign-off field anywhere in the brick's artifacts (gate-free assertion checkable by absence of any human-approval field).
[AI+Human · Vela] Scope & Sizing decision + enterprise scoping: the sponsor's 1-10 sizing score with its concrete interpretation — number/type of users, access/permission & auth model, security & controls, data/audit needs, and the functional depth & breadth (the IN/OUT feature list) — plus a short note of what BEST-IN-CLASS looks like for this category and what is deferred at this level. This drives the PRD scope.
Acceptance: A sizing score (1-10) is recorded with a concrete enterprise scope (users/access/security/controls/depth-breadth + IN/OUT list) reached via clarifying questions; the PRODUCT rubric encodes this level; the product is NOT a single-user toy unless sizing is explicitly 1-2.
Questions the agent asks (15)
  • In one paragraph, what is the product/idea you want built?
  • What is the single business outcome that would make this a success, and what one metric proves it? What is that metric's current baseline, target value, how is it measured, and by when?
  • Who are the stakeholders (names/roles), and who specifically should the team ask when it has questions during the run?
  • What are the hard constraints? Specifically: any regulatory, security, privacy, or data-residency requirements; and any budget, timeline, technology, or integration limits?
  • What non-functional requirements matter — performance/latency, availability/uptime, scale, security posture, compliance regimes, accessibility — and what measurable threshold makes each acceptable?
  • What is explicitly OUT of scope / a non-goal for this build?
  • What data or information can you provide now (documents, datasets, links, access)? For each, what is its format and sensitivity, and how do we access it?
  • Where you can't answer yet, is it acceptable for the team to proceed on an explicit, flagged assumption (with a default-on-silence) and revisit it when you can answer — rather than waiting on you?
  • On a scale of 1-10, how ambitious should this be? (1 = bare MVP / proof-of-concept; 5 = a solid, genuinely useful product for a team; 10 = best-in-class, enterprise-grade). I will scope functionality, the user/access model, security, controls, and depth/breadth to that level — and I'll tell you what each level includes before we proceed.
  • Who are the users and roughly how many — single user, a team, a whole org, or multi-tenant (many orgs)?
  • What access & permission model is needed — none, user accounts, roles/RBAC, SSO, per-tenant isolation?
  • What security/compliance constraints apply — authentication, data sensitivity, audit trails, regulations?
  • What is the minimum set of capabilities that makes this genuinely useful to its intended users (not a toy)?
  • For this product category, what does BEST-IN-CLASS look like — and which of those capabilities are in scope at your sizing score?
  • Expected scale, key integrations, and any non-functional bars (performance, availability, data volume)?
Do (14)
  • Capture the sponsor's raw words verbatim first, then structure — preserve the raw brief alongside the structured version so provenance is auditable.
  • Stand up EXACTLY four canonical registers — Assumption Register, Sponsor Question Log, PRODUCT Objectives & Success-Metric Rubric, NFR set — each with a stable register ID, a version tag, a fixed schema, and a documented stable-ID convention; record them all in the Source-of-Truth Manifest.
  • Keep the PRODUCT rubric strictly the SPONSOR's product-acceptance criteria; label it as such and never fold the meta process-design objectives into it.
  • Force the outcome into exactly ONE primary success metric with name, baseline, target, measurement method, and timeframe; if baseline is unknown, record it as a flagged Assumption Register row, never blank.
  • Make every rubric and NFR row genuinely machine-referenceable: stable ID, measurable criterion/threshold, a true/false pass_predicate, criticality/weight, and a source tag, so downstream reviews cite 'fails PRB-014' with evidence.
  • On every Assumption Register row populate downstream_artifact_refs[] (even if empty at intake) so a later refuting answer can cascade-flag exactly the dependent artifacts — and name program_health_rollup as the executor of that cascade.
  • State the carry-forward law explicitly in the manifest: later bricks EXTEND-IN-PLACE these four registers by ID; adding/refining rows is allowed, a parallel new-schema register is a lint defect.
  • When the sponsor is silent, proceed on an explicit assumption (statement, owner, question_asked_of_sponsor, default_on_silence, blast_radius, falsification_test) and surface the open question in the Sponsor Question Log — never block.
  • Type every hard constraint and quantify every NFR so Cipher, Keystone, and Vector consume the right, measurable inputs downstream.
  • Complete intake by coverage rule: every PRODUCT-rubric and NFR slot is sponsor-sourced OR assumption-filled-and-flagged; emit the coverage score and honesty receipts at exit.
  • Hand the four registers immediately to the downstream independent-validation review_loop (Verdict + a domain specialist) so the foundation is independently checked, not self-asserted.
  • ASK clarifying questions and iterate until the product/process is properly scoped — never proceed on a vague brief; if the sponsor is unsure, propose options and a recommended default.
  • Capture a single SIZING score (1-10) and translate it into concrete scope: who/how-many users, the access/permission model, authentication, security controls, auditability, and the depth/breadth of features — then state plainly what is IN and OUT at that level.
  • THINK LIKE THE ARCHITECT OF AN ENTERPRISE-CLASS APPLICATION, not a single-user utility: explicitly reason about number of users, access/permission model, authentication, security controls, auditability, data integrity, and operations — sized to the sponsor's 1-10 ambition score.
Don't (10)
  • Do NOT insert any human approval, sign-off, or gate field anywhere — the sponsor provides input, never approves.
  • Do NOT block or wait when the sponsor under-provides; turning 'capture the brief' into a de facto gate violates the model. Always complete via flagged assumptions and a coverage score.
  • Do NOT create more than (or fewer than) the four canonical registers, and do NOT let any later brick spin up a parallel register with a different schema — that is a defect lint must catch.
  • Do NOT mix the meta process-design objectives into the PRODUCT rubric; keep the sponsor's product-acceptance criteria separate and labeled.
  • Do NOT emit a free-text or unmeasurable rubric/NFR; an entry without a pass/fail predicate or a measurable threshold is invalid and lets downstream reviews drift into rubber-stamping.
  • Do NOT allow a metric soup — exactly one primary success metric; secondary signals may exist but must not be marked primary.
  • Do NOT leave an Assumption Register row without downstream_artifact_refs[] wiring or without a falsification_test and default_on_silence — the cascade and the non-blocking proceed both depend on those fields.
  • Do NOT let an assumption silently harden into a 'fact' — every assumption keeps a status and a falsification_test until confirmed, refuted, or accepted-as-risk.
  • Do NOT claim 'intake complete' without the receipts (brief/manifest hashes, the four register IDs+versions+row counts, coverage score); an unbacked completion claim violates the honesty architecture.
  • Do NOT self-certify the registers as correct; independent validation is a separate downstream brick, not this author's call.
Guardrails (11)
  • GATE-FREE INVARIANT: this brick contains no approval/sign-off mechanism; completion is gated only by coverage (sponsor-sourced OR assumption-flagged), never by a human decision, and the brick completes — it never waits.
  • FOUR-REGISTER SOURCE-OF-TRUTH INVARIANT: exactly four canonical registers (Assumption Register, Sponsor Question Log, PRODUCT Objectives & Success-Metric Rubric, NFR set) are the ONE source of truth; each has a stable ID, a version, a fixed schema, and a stable-ID convention, all recorded in the Source-of-Truth Manifest.
  • CARRY-FORWARD / EXTEND-IN-PLACE LAW: requirements_discovery, prd_and_backlog, and all later bricks extend these same four registers by ID — adding/refining rows only; any parallel register with a divergent schema is a defect that lint must catch.
  • PRODUCT-vs-META SEPARATION INVARIANT: the PRODUCT rubric holds only the sponsor's product-acceptance criteria and is labeled as such; the meta process-design objectives are never merged into it.
  • CASCADE-WIRING INVARIANT: every Assumption Register row carries downstream_artifact_refs[]; a sponsor answer that refutes a row sets status=refuted and program_health_rollup (the named owner of the cascade) flags exactly the referenced artifacts for re-review — this brick supplies the wiring, not the execution.
  • RUBRIC/NFR-IS-LAW INVARIANT: every PRODUCT-rubric and NFR row has a stable ID and a binary pass_predicate or measurable threshold; rows are immutable-referenceable so downstream reviews cite them by ID — any edit creates a new version, preserving the audit trail.
  • HONESTY-RECEIPT INVARIANT: 'intake complete' may be claimed only when receipts exist — brief and manifest hashes, the four register IDs with versions and row counts, and the coverage score; receipt-less claims are prohibited and reviewers re-derive these receipts rather than accept the summary.
  • INDEPENDENCE INVARIANT: Vela authors the registers; Vela does NOT certify them. They become binding only after the downstream independent-validation loop (Verdict + specialist, non-author, adversarial pass) records a ConvergenceVerdict of solid with zero open material gaps.
  • DATA-PROVENANCE INVARIANT: every register row is tagged source=sponsor|assumption|process-default; nothing is presented as sponsor-stated unless it actually came from the sponsor.
  • NO-SECRETS INVARIANT: provided data is inventoried by reference, classification, and access-path only; no credentials or secrets are written into any tracked intake artifact.
  • ENTERPRISE-GRADE INVARIANT: design and build to the sponsor's sizing score as an enterprise-application architect would — consider users/scale, access/permission/auth, security controls, auditability, data integrity, depth & breadth of functionality, and what best-in-class looks like. NEVER ship something useless for most real users; a single-user toy is acceptable ONLY when sizing is explicitly 1-2 and a proof-of-concept was requested.
02

Feasibility & Viability Triage (Kill-or-Proceed Checkpoint)

AI Agent: Keystone

Before any team or architecture is stood up, run a lightweight, time-boxed viability check on the sponsor brief so effort is never spent on an infeasible or out-of-scope idea. Keystone (architecture/feasibility), with Vela on value and market fit and Cipher flagging any showstopper security/compliance constraint, produces a Feasibility & Viability Memo against the PRODUCT Objectives Rubric and the constraints captured in onboard: technical feasibility, make-vs-buy / build-vs-adopt analysis, a rough order-of-magnitude effort + cost envelope, the top kill-risks, and a single explicit binary recommendation — PROCEED | RESHAPE | KILL. This is a checkpoint, not a study, and NOT a human approval gate: the recommendation is itself routed for a light independent sanity pass once a Verdict reviewer exists. Output feeds discovery only on PROCEED or RESHAPE; a KILL ends the run cleanly with a receipt so no further effort is spent.

Deliverables
[AI · Keystone] Feasibility & Viability Memo
Acceptance: Memo exists and contains all five required sections, each non-empty and each citing the specific onboard artifact/constraint or PRODUCT Objectives Rubric line it derives from: (1) technical feasibility verdict per major capability, (2) make-vs-buy / build-vs-adopt analysis naming at least one candidate buy/adopt option per major capability or stating why none exists, (3) a rough order-of-magnitude effort (T-shirt or sprint-count band) and cost envelope with stated assumptions, (4) a ranked list of the top kill-risks (>=1) each with a likelihood/impact tag, and (5) a single explicit recommendation token equal to exactly one of PROCEED, RESHAPE, or KILL. A RESHAPE recommendation MUST state the specific scope/objective change that would make it viable.
[AI · Vela] Value & market-fit feasibility note
Acceptance: Note states whether the brief's target outcome/success metric (from onboard) is plausibly achievable and worth pursuing, names the primary value driver and at least one named failure-of-value risk, and renders a one-word value verdict (VIABLE | MARGINAL | NOT-VIABLE) that is reflected in the Memo's kill-risk list.
[AI · Cipher] Showstopper constraint scan
Acceptance: Scan explicitly lists any hard security/compliance/regulatory/data-residency constraint from the brief that could block or fundamentally reshape the build, OR states 'no showstopper identified at this altitude' with the classes checked (e.g., regulated data, residency, auth, third-party data rights). Any showstopper found appears as a top kill-risk in the Memo.
[AI · Keystone] Independent sanity-pass record
Acceptance: An adversarial sanity pass on the recommendation is recorded with a binary outcome (CONFIRMED | CHALLENGED). If a Verdict reviewer exists at run time, the pass is performed by Verdict plus one non-author specialist who did not author the Memo, re-deriving the Memo's receipts rather than accepting its summary; if Verdict does not yet exist (this brick precedes assemble_team), Keystone runs an explicit adversarial red-team sub-persona self-critique here AND the record carries a forward-reference flag 'reconfirm-at-spec-review' so the assemble_team-defined independence FALLBACK is applied at the spec review loop. The record names which path was used and lists every challenge raised and how each was resolved or accepted.
[AI · Keystone] Assumption Register (feasibility entries)
Acceptance: Every sponsor question left unanswered at decision time is logged as a flagged assumption with the assumed value the recommendation was made on; the register is non-blocking (the brick does not wait on sponsor silence) and each entry is tagged 'confirm-in-discovery'.
[AI · Keystone] Triage routing decision + receipt
Acceptance: A single routing record states the final recommendation token and the resulting action: PROCEED or RESHAPE routes output to discovery (RESHAPE carries the required scope change forward); KILL ends the run and writes a kill receipt stating the deciding kill-risk(s) and the rubric/constraint that failed. The receipt is re-derivable from the Memo and sanity-pass record (no claim without a cited source).
Questions the agent asks (4)
  • Are there candidate off-the-shelf or adopt-and-extend solutions the sponsor already prefers or has ruled out for any major capability?
  • Is there a hard floor on effort/cost or a date beyond which the outcome is no longer worth pursuing (the threshold that would justify a KILL)?
  • Are there any non-negotiable regulatory, data-residency, or security constraints we must treat as showstoppers?
  • If the idea as briefed is not viable as-is, is a reduced or reshaped scope acceptable, or is it all-or-nothing?
Do (5)
  • Keep it a lightweight, time-boxed checkpoint — enough to make a defensible PROCEED/RESHAPE/KILL call, not a full study.
  • Tie every feasibility claim and the final recommendation back to a specific onboard constraint or PRODUCT Objectives Rubric line.
  • Treat make-vs-buy as a first-class question: prefer adopt/extend over build where it meets the outcome.
  • Be willing to recommend KILL or RESHAPE — a clean early stop is a success, not a failure.
  • Run a genuine adversarial pass on your own recommendation and record what it challenged.
Don't (5)
  • Do not block on sponsor silence — proceed on a flagged Assumption Register entry instead.
  • Do not turn this into the full architecture or discovery — that work belongs to later bricks and only happens on PROCEED/RESHAPE.
  • Do not introduce a human approval gate; the sanity pass is an independent AI review, not sign-off.
  • Do not let the author's own optimism stand unchallenged — never accept the Memo's summary in place of re-derived receipts.
  • Do not output a soft or multi-valued recommendation; it must resolve to exactly one of PROCEED | RESHAPE | KILL.
Guardrails (4)
  • Honesty: every line of the Memo and the routing receipt must be re-derivable from a cited source (onboard artifact, rubric line, or the sanity-pass record); claim only what a receipt proves.
  • Independence: the recommendation is never confirmed by its own author alone — once Verdict exists, Verdict plus a non-author specialist re-derive the receipts; before assemble_team, an adversarial self-critique runs here and the decision is re-confirmed at spec review via the assemble_team independence FALLBACK (forward reference).
  • A KILL must end the run with a written, re-derivable receipt stating the deciding kill-risk and the failed rubric/constraint — no silent termination.
  • Scope discipline: this brick operates only on the brief and onboard outputs and within this engagement's data; it neither invents requirements the sponsor has not stated nor begins building.
03

Assemble the AI Delivery Team & Operating System (the convergence machinery, seated BEFORE the first review loop)

AI Agent: Cadence

Cadence stands up the AI delivery roster as a running organism and commits the Operating System every later brick executes against — not prose policy but a single versioned, receipt-backed Operating-System Charter object whose every claim is checkable. This brick now runs FIRST, before requirements_discovery, so the convergence machinery, Verdict's non-authoring configuration, and the independence-enforcement hook already exist before the first independent review at review_spec_iterate. Concretely it: (1) onboards and PROBES the full roster — Vela (PO), Cadence (SM/Delivery Lead), Keystone (Architect), Mason (Engineer), Lens (Code Reviewer), Proof (QA/Test), Cipher (AppSec), Vector (DevOps/SRE), Iris (UX) — plus Verdict (standing Independent Reviewer) configured as a non-authoring evaluator; (2) publishes a RACI where exactly one AI agent is Accountable per artifact-type and ZERO humans appear in any Accountable or Approver cell (the sponsor is Consulted/Informed only — the continuous-input channel, never a gate); (3) commits machine-referenceable DoR/DoD bound to house law (all automated tests green + a cited receipt before "done"); (4) commits a numeric capacity/WIP model so "fits capacity" is a binary function call; (5) commits the canonical Independent-Review Protocol every downstream review_loop copies verbatim. This version closes the three gaps downstream loops silently assume: it ships a CONCRETE reviewer-independence FALLBACK for when the only holder of a needed lens is the artifact's author (tiered: adversarial red-team sub-persona, escalating to an external evaluator on security-critical artifacts; "same agent, different hat" is forbidden as sole independence on security-critical work); it commits ONE canonical gap-severity taxonomy (Blocker/Major/Minor, material := severity ≥ Major) every loop copies verbatim, retiring the divergent material/minor, critical/major/minor, and material/immaterial vocabularies; and it adds a meta-check so Verdict's own convergence verdicts are spot-checkable by a second independent actor re-deriving ledger rows, so Verdict is not the sole unenforced honesty authority. PROCESS-integrity objectives (autonomy, independence, honesty) are scored against PROCESS bricks separately from the PRODUCT Objectives & Success-Metric Rubric (owned by onboard/Vela) used to score the spec and product. This brick introduces NO human sign-off and asserts nothing it cannot prove with a receipt.

Deliverables
[AI · Cadence] Team & Operating-System Charter — a single committed, versioned object (charter.vN, git-hashed) bundling all sub-artifacts below; it is sequenced to commit BEFORE requirements_discovery and is referenced by stable id from every downstream brick.
Acceptance: Charter file exists, is committed (git hash recorded), carries a version tag, and its index lists+links all seven sub-artifacts (roster registration, RACI, DoR, DoD, capacity/WIP model, Independent-Review Protocol, severity taxonomy). The process-run order places this brick's commit timestamp BEFORE the first requirements_discovery artifact (checkable: charter git-commit time < first requirements_discovery artifact time). A later brick resolves the charter by stable id.
[AI · Cadence] Roster registration + health/capability probes — the 9 delivery agents plus Verdict registered with capability matrices, system prompt/domain knowledge, and tool access.
Acceptance: All 10 agents (9 + Verdict) return a passing capability/health probe (probe output captured as a receipt with timestamps); each capability matrix resolves with no unresolved tool/skill reference; the probe log shows 10/10 green. No agent is marked onboarded without a captured passing probe. The roster contains exactly these 10 slugs and no others (negative check: 'Atlas' is NOT a registered delivery-roster agent).
[AI · Verdict] Verdict independence configuration + non-capture verification — Verdict seated as standing evaluator that critiques only and cannot be assigned authoring/build tasks.
Acceptance: A receipt confirms Verdict's config rejects any authoring/build task assignment (negative test passes: assigning Verdict a build task returns refused); Verdict is registered as a required seat on every major-artifact review panel; every Verdict output is written as a ledger receipt with an attributable signer id. This config is committed before requirements_discovery so the first review loop can seat Verdict.
[AI · Cadence] RACI matrix (per artifact-type) — Accountable/Responsible/Consulted/Informed across PRD/spec, architecture, sprint plan, sprint increment, code, tests, security, deploy, UX, release, hypercare, handover.
Acceptance: Every artifact-type row has EXACTLY ONE Accountable AI agent; ZERO cells in the Accountable or Approver columns contain 'Human' (binary check passes across all rows); the sponsor appears ONLY as Consulted and/or Informed; each REVIEWABLE artifact row names its independent review panel (≥2 reviewers incl. Verdict) by agent slug; no row names an unregistered actor (re-uses the roster registration as the allowed-slug set).
[AI · Cadence] Definition-of-Ready + Definition-of-Done checklists — machine-referenceable, addressable by stable ids from later bricks.
Acceptance: DoR and DoD exist as discrete checklist objects with stable ids; DoD explicitly binds to house hard rule ('all automated tests green before merge' + 'done = a cited receipt, never asserted'); a later brick can cite a specific DoD line item by id.
[AI · Cadence] Capacity/WIP model — numeric per-agent concurrent-WIP limits, a per-sprint capacity number, and a fits_capacity(plan) → pass/fail function.
Acceptance: Per-agent WIP limit is a number for each of the 9 delivery agents; sprint capacity is a number; fits_capacity() returns boolean and is demonstrated on one example plan (a deliberately over-scoped plan returns FAIL); the function id is referenceable by sprint-planning bricks.
[AI · Cadence] Canonical Gap-Severity Taxonomy (severity_taxonomy.vN.json) — ONE committed enum every downstream review_loop copies verbatim: Blocker > Major > Minor, with the binary definition 'material := severity ≥ Major'. Includes a legacy-mapping note (prior critical→Blocker; prior material/immaterial and material/minor pairs collapse into this enum) so divergent vocabularies are retired.
Acceptance: File exists with version+hash and defines exactly three ordered tiers (Blocker, Major, Minor) plus the predicate material := severity ≥ Major; the Independent-Review Protocol and every downstream review_loop brick reference THIS taxonomy by hash (no review_loop defines its own minor/major/critical or material/immaterial set); a grep/lint over the process set finds zero competing severity vocabularies surviving (binary: divergent-vocab count == 0).
[AI · Verdict] Independent-Review Protocol (review_protocol.vN.json) — the single convergence contract every downstream kind:review_loop brick references rather than reinventing.
Acceptance: Protocol defines, as checkable rules: (a) reviewer-eligibility = excludes author + pair-partners AND requires the needed lens AND seats Verdict; (b) the reviewer-independence FALLBACK (see next deliverable) is invoked by id when no eligible non-author holds the needed lens; (c) panel size ≥2 for major artifacts; (d) gap severity is the committed canonical taxonomy by hash (Blocker/Major/Minor, material := ≥ Major); (e) loop continues while any MATERIAL (≥ Major) gap is open; (f) convergence verdict format = 'open material gaps == 0 + each PROCESS objective 1–7 cited with re-derived evidence + signer ≠ author + all non-author panel reviewers SOLID + bound to one final artifact version+hash'; (g) max-iteration N with escalation-to-human-as-INPUT (a question, never a gate); (h) mandatory adversarial/red-team pass; (i) the Verdict meta-check (see meta-check deliverable). A sample verdict instance validates against the format. PROCESS objectives 1–7 here are the autonomy/independence/honesty integrity objectives, explicitly distinct from the PRODUCT Objectives & Success-Metric Rubric used to score the spec/product.
[AI · Verdict] Reviewer-Independence Fallback (independence_fallback.vN.json) — a CONCRETE tiered procedure, referenced by the eligibility function, for when the only agent holding the needed lens is the artifact's author (e.g. Cipher authored the threat model AND is the sole AppSec lens).
Acceptance: File defines a tiered escalation, each tier checkable: TIER 1 — an adversarial RED-TEAM SUB-PERSONA of the author agent, invoked under an explicit break-it mandate, whose findings are logged to the GapLog as a SEPARATE attributable signer (sub-persona id ≠ author authoring id), never merged into the author's own narrative; TIER 2 (mandatory when the lens is SECURITY-CRITICAL, i.e. go-live / pen-test severity) — escalation to an EXTERNAL-EVALUATOR: a fresh non-author evaluator instance OR a human-input expert, consumed strictly as INPUT to the panel, NEVER as an approval gate. A binary rule is recorded and enforced: 'same agent, different hat' (sub-persona alone) is FORBIDDEN as the sole independence on any security-critical artifact (negative test passes: a security-critical convergence whose only independence is the author's sub-persona is REJECTED). The fallback is referenced by id from the eligibility function and from review_security/release/go-live loops.
[AI · Cadence] Independence enforcement hook — wiring that makes the author-never-certifies rule mechanically true rather than asserted, including the security-critical sub-persona-alone block.
Acceptance: Negative tests pass: (1) a convergence verdict whose signer set intersects the artifact's author set is REJECTED (rejection receipt captured); (2) a security-critical convergence whose sole non-author signer is the author's red-team sub-persona is REJECTED until an external-evaluator input is attached. Reviewer findings and verdicts are written as ledger receipts (attributable, auditable) per the Truth-Gate model; the hook id is referenced by the Independent-Review Protocol and the fallback.
[AI · Verdict] Verdict Meta-Check rule + verifier (verdict_metacheck.vN.json) — so Verdict's own convergence verdicts are not the sole unenforced honesty authority.
Acceptance: Rule states, and a verifier enforces, that for any ConvergenceVerdict signed by Verdict, a SECOND independent actor (a non-author specialist, or the external-evaluator fallback) must RE-DERIVE at least 3 ledger rows the verdict relied on (e.g. a test count, a traceability-matrix row, a scan/health receipt) and record agreement/disagreement as its own receipt. A convergence is invalid if Verdict is the only honesty authority (negative test passes: a Verdict-only verdict with no second-actor re-derivation of ≥3 rows is REJECTED); the re-derivation receipts cite the same artifact version+hash; on disagreement the gap re-opens.
[AI · Cadence] Anti-bluff acceptance rules — embedded in the protocol so independent review cannot become theater.
Acceptance: Protocol requires every reviewer finding to cite evidence (file/line, test id, scan/health output); a clean verdict on a non-trivial artifact requires positive evidence reviewed (silence/zero-findings is itself flagged); every 'tests pass' anywhere in the OS resolves to a cited run/count, never a bare claim; rule 'reviewer RE-DERIVES the receipt, never accepts the author's narrative' is stated and bound to the repo's text-path≠action-path lesson.
[AI · Cadence] PROCESS-Integrity Objectives Rubric (process_objectives.vN.json) — the autonomy/independence/honesty integrity objectives (1–7) scored against PROCESS bricks, explicitly separated from the PRODUCT Objectives & Success-Metric Rubric.
Acceptance: Rubric lists the 7 PROCESS-integrity objectives as binary checks (e.g. zero-human-gate, author-never-certifies, severity-taxonomy-single-source, receipt-re-derivation, capacity-throttle-enforced, fallback-honored, meta-check-honored); each review_loop verdict cites these by id when scoring a PROCESS brick; a stated, checkable separation note confirms these are NOT the PRODUCT Objectives & Success-Metric Rubric (owned by the onboard brick / Vela) and that product-scoring loops cite the PRODUCT rubric while process-integrity scoring cites THIS one (binary: no review_loop conflates the two rubric ids).
[AI · Cadence] Sprint cadence, ceremonies & branching/PR conventions — the shared delivery contract.
Acceptance: Charter states sprint length, the ceremony set (planning/standup/review/retro) with the driving agent each, and branching/PR conventions (feature/* off develop, PR-per-increment, merge blocked on red tests) consistent with house Git Strategy; later bricks reference the cadence + branch convention by id.
[Human] Continuous-input channel seed + flagged-assumption register — sponsor brief/answers captured and an open RAID-style register for when the human is silent.
Acceptance: Charter contains a standing question/assumption channel; where the human has not answered, each open item is recorded as an explicitly FLAGGED assumption (the team proceeds on it, never blocks); the register lists at least the initial assumptions and is the documented mechanism by which stuck review loops surface to the human as a QUESTION (input), never an approval gate.
Questions the agent asks (6)
  • What is the one-paragraph description of what you want to accomplish, and what existing information/data/systems can you share now (so the team starts from your reality, not assumptions)?
  • Are there constraints we must treat as fixed inputs — deadline, budget envelope, must-use tech/cloud, compliance/regulatory regime, data-residency — to record in the charter?
  • For security-critical artifacts (threat model, pen-test scope, go-live security sign-off), do you want a specific EXTERNAL evaluator named for the independence fallback — a human security expert as INPUT, a second fresh evaluator instance, or both — and how should we reach them?
  • Which channel and cadence do you prefer for the continuous-input loop, and what response latency should we assume so we can size the max-iteration escalation window?
  • Which artifacts do you want visibility into as Informed (e.g., architecture, security assessment, release readiness) even though you are never an approver?
  • Is there any domain context, prior attempt, or known landmine we should load before the team starts, so we don't rediscover it the hard way?
Do (12)
  • Sequence this brick to COMMIT BEFORE requirements_discovery: the convergence machinery, Verdict's non-authoring config, the severity taxonomy, and the independence-enforcement hook must exist before the first review loop at review_spec_iterate runs.
  • Emit ONE committed, versioned charter object with a git hash as the brick's receipt — 'done' only when the charter and all sub-artifacts resolve and probe green.
  • Onboard each agent by PROBING it: capture a passing capability/health probe receipt per agent before marking it onboarded (10/10 green, timestamps recorded).
  • Commit ONE canonical gap-severity taxonomy (Blocker/Major/Minor, material := severity ≥ Major) and make every downstream review_loop copy it VERBATIM by hash — retire the divergent material/minor, critical/major/minor, and material/immaterial vocabularies.
  • Ship a CONCRETE reviewer-independence fallback: when the only lens-holder is the author, escalate (Tier 1) to an adversarial red-team sub-persona with a separate signer id, then (Tier 2, mandatory on security-critical lenses) to an external evaluator consumed as INPUT only.
  • Add the Verdict meta-check: require a second independent actor (or the external-evaluator fallback) to RE-DERIVE ≥3 ledger rows behind any Verdict-signed convergence, so Verdict is never the sole unenforced honesty authority.
  • Define independence OPERATIONALLY and ENFORCE it: eligibility function (excludes author + pair-partners, requires lens, seats Verdict, invokes the fallback by id), panel ≥2 for major artifacts, and a hook that REJECTS any verdict whose signer overlaps the author set.
  • Keep PROCESS-integrity scoring separate from PRODUCT scoring: score autonomy/independence/honesty against the PROCESS-Integrity Objectives Rubric on PROCESS bricks; score the spec/product against the PRODUCT Objectives & Success-Metric Rubric.
  • Write every reviewer finding and verdict as a ledger receipt (attributable, auditable), inheriting the repo's Truth-Gate/honesty model rather than a weaker reinvention.
  • Make capacity numeric: per-agent WIP limits + a sprint capacity number + a fits_capacity() function later bricks call, demonstrated failing on a deliberately over-scoped plan.
  • Bind DoD to house law: all automated tests green before merge, and 'done' = a cited receipt (test counts, scan output, health code, reviewer verdict) — never an assertion.
  • Record the human's brief as up-front input AND keep the continuous-input channel open; where the human is silent, proceed on an explicitly FLAGGED assumption logged in the register, surfacing stuck loops as a QUESTION.
Don't (14)
  • Do NOT let requirements_discovery or review_spec_iterate run before this brick's charter, severity taxonomy, Verdict config, and independence hook are committed — that is the ordering defect this version fixes.
  • Do NOT allow 'same agent, different hat' (an author's own red-team sub-persona alone) to stand as the SOLE independence on any security-critical artifact — an external-evaluator input is mandatory there.
  • Do NOT let any review_loop define its own gap-severity vocabulary — they copy the ONE canonical taxonomy by hash; a surviving divergent vocabulary fails the brick.
  • Do NOT let Verdict be the sole honesty authority — a Verdict-signed convergence with no second-actor re-derivation of ≥3 ledger rows is rejected.
  • Do NOT place any Human in an Accountable or Approver cell of the RACI — the human is Consulted/Informed only; a single such cell fails the brick.
  • Do NOT let an author certify, sign, or converge their own artifact; do NOT accept a convergence verdict signed by anyone in the author set.
  • Do NOT conflate the PROCESS-Integrity Objectives Rubric with the PRODUCT Objectives & Success-Metric Rubric — process-integrity loops cite the former, product-scoring loops the latter.
  • Do NOT mark any roster member 'onboarded' without a captured passing health/capability probe; do NOT register 'Atlas' as a delivery-roster agent.
  • Do NOT accept a convergence verdict with zero cited evidence; a clean verdict on a non-trivial artifact without positive evidence reviewed is itself a flag, not a pass.
  • Do NOT let 'tests pass' (or any done/passed/live claim) appear anywhere in the OS as a bare assertion — it must resolve to a cited run/count/scan/health receipt.
  • Do NOT assign Verdict any authoring or build task — verify with a passing negative test.
  • Do NOT exceed WIP limits to appear faster; capacity is a deliberate throttle against the AI's no-fatigue overcommit failure mode.
  • Do NOT block on the human at any point — no ceremony, artifact, or review loop may halt awaiting human approval; proceed on a flagged assumption instead.
  • Do NOT restate house Sprint/Git rules in conflict with CLAUDE.md — reference them; the charter inherits, it does not fork, house law.
Guardrails (12)
  • TEAM-BEFORE-USE ORDERING: this brick commits BEFORE requirements_discovery; the binary check is charter git-commit time < first requirements_discovery artifact time, so no review loop runs before the machinery that makes it honest exists.
  • BINARY-RECEIPT GUARDRAIL: complete only when the charter is committed (git hash), all 10 agents probe green, RACI passes zero-Human-in-A/Approver, DoR/DoD/WIP/Protocol/severity-taxonomy/fallback/meta-check resolve by id, and the independence-enforcement negative tests pass. No assertion substitutes for these receipts.
  • SINGLE SEVERITY VOCABULARY: exactly one committed taxonomy (Blocker/Major/Minor, material := ≥ Major) referenced by hash from every review_loop; a lint finding any divergent material/minor, critical/major/minor, or material/immaterial vocabulary fails the brick.
  • INDEPENDENCE FALLBACK IS CONCRETE AND TIERED: when the only lens-holder is the author, Tier-1 is an adversarial red-team sub-persona with a separate signer id; Tier-2 (external evaluator as INPUT) is MANDATORY on security-critical lenses; sub-persona-alone on a security-critical artifact is mechanically REJECTED.
  • VERDICT IS NOT THE SOLE HONESTY AUTHORITY: every Verdict-signed convergence requires a second independent actor (or the external-evaluator fallback) to re-derive ≥3 ledger rows; a Verdict-only verdict is rejected, and disagreement re-opens the gap.
  • INDEPENDENCE IS ENFORCED, NOT ASSERTED: an author may never certify their own artifact; the eligibility function, fallback, and rejection hook make this mechanically true, and verdicts are ledger receipts so independence is auditable.
  • NO SMUGGLED HUMAN GATE: zero Human in any Accountable/Approver RACI cell is a hard binary check; the human is input-only (Consulted/Informed) and the continuous-input channel never becomes an approval gate.
  • PROCESS vs PRODUCT SCORING ARE SEPARATE: autonomy/independence/honesty are scored on PROCESS bricks via the PROCESS-Integrity Objectives Rubric; the spec/product are scored via the PRODUCT Objectives & Success-Metric Rubric; no loop conflates the two rubric ids.
  • CONVERGENCE IS DEFINED ONCE: 'solid' = open material gaps == 0 + each PROCESS objective (1–7) cited with re-derived evidence + non-author signer + all non-author panel reviewers SOLID + bound to one final version+hash + within max-iteration (else escalate to human as INPUT); every downstream review_loop is an instance of this one protocol.
  • ANTI-BLUFF: every finding cites evidence (file/line, test id, scan/health output); reviewers RE-DERIVE the receipt, never accept the author's narrative (the text-path≠action-path lesson); a mandatory adversarial/red-team pass guards the highest-stakes artifacts.
  • CAPACITY IS A THROTTLE: WIP limits and sprint capacity are numbers, fits_capacity() is the binary checker downstream calls, and limits are not exceeded to look faster.
  • HOUSE-LAW INHERITANCE: DoD binds to 'all automated tests green + cited receipt'; reviewer verdicts and findings are written into the ledger/Truth-Gate model — this process inherits the platform's honesty enforcement, never a weaker reinvention.
04

Program-Level Receipt/Honesty Rollup & Assumption-Cascade Executor

AI Agent: Verdict

Stand up ONE live, receipt-derived dashboard that rolls every brick's ledger receipts into a single program-wide view of honesty and exposure: every OPEN material gap across all review loops, every flagged assumption still unconfirmed (with its blast radius), and every refuted-assumption auto-flag — and EXECUTE the refute→re-review cascade the run promised. This is a cross-cutting rollup + cascade executor surfaced right after assemble_team so the machinery exists early; it is conceptually owned from that point onward and spans the whole lifecycle. It asserts nothing it cannot re-derive from underlying receipts. Verdict owns it (independently) so the rollup stays honest, and it is the authoritative source review_release_readiness_iterate's pre-go-live assumption gate reads. It is deliberately tight: a rollup and a cascade router, NOT a new review loop.

Deliverables
[AI · Verdict] Program Health & Honesty Dashboard — a live, re-derived rollup view that, for each brick in the run, reads the brick's ledger receipts and surfaces three sections at a glance: (A) every OPEN material gap across all review loops with its owning brick + GapLog id, (B) every flagged assumption still UNCONFIRMED with its blast-radius score + downstream-refs count, (C) every refuted-assumption auto-flag with the cascade status it triggered.
Acceptance: Dashboard is queryable and returns, for a known run, counts that EXACTLY equal an independent re-tally of the source receipts (open-gap count, unconfirmed-assumption count, refute-flag count each match the ledger to the row); every displayed item links to a real receipt id that resolves; zero items appear that lack a backing receipt.
[AI · Verdict] Assumption-cascade executor — on any sponsor/human-INPUT answer that REFUTES a canonical Assumption Register row, it reads that row's downstream-refs list, auto-flags every referenced downstream artifact for re-review, and routes each flagged artifact back to its owning brick's review loop with the refuting answer attached.
Acceptance: Given a test Assumption Register row marked refuted with N downstream-refs, the executor produces exactly N re-review flags (one per downstream-ref), each routed to the correct owning brick (asserted against the artifact→brick map), each carrying the refuting answer id; a re-run is idempotent (no duplicate flags for an already-flagged-and-unchanged ref).
[AI · Verdict] HIGH-blast-radius assumption rollup — the authoritative boolean 'all HIGH-blast-radius assumptions resolved-or-accepted' signal (with the backing list of each HIGH assumption and its state: resolved | explicitly-accepted | OPEN) that review_release_readiness_iterate's pre-go-live assumption gate reads.
Acceptance: The rollup boolean is TRUE iff zero HIGH-blast-radius assumption rows are in OPEN state (resolved or explicitly-accepted both pass); a single OPEN HIGH row flips it FALSE; the backing list re-derived from the Assumption Register matches the gate's read row-for-row; review_release_readiness_iterate can read it by a stable contract id.
[AI · Verdict] Convergence-verdict spot-check surface — a second-independent-actor view over Verdict's own ConvergenceVerdicts that re-derives at least 3 underlying receipt rows per verdict and records agreement/discrepancy, so a meta-auditor can confirm Verdict did not self-certify on summaries.
Acceptance: For each ConvergenceVerdict in scope, the surface shows >=3 re-derived receipt rows with each re-derivation's result vs the recorded value; any discrepancy is raised as an OPEN material gap on the dashboard (not silently absorbed); the spot-check actor is recorded and is NOT the author of the artifact under the verdict.
[Human] Sponsor INPUT channel — continuous answers to flagged assumptions and clarifying questions, supplied throughout the run (this is INPUT, not an approval gate; the human never signs off, only informs).
Acceptance: Each sponsor answer lands as a timestamped, attributable record the executor can read; an answer that contradicts an Assumption Register row is detectable as a refute trigger (carries the row id it bears on).
Questions the agent asks (4)
  • Which Assumption Register rows are HIGH blast-radius — i.e., wrongness would force rework of architecture, the release decision, or a security/compliance posture (so the pre-go-live gate keys off the right set)?
  • For each flagged assumption, who or what system holds the authoritative answer, and is that answer expected before go-live or acceptable to explicitly accept-with-risk?
  • What is the canonical artifact→owning-brick map the cascade should route re-reviews against, and who maintains it?
  • What is the minimum cadence at which the human-INPUT channel will supply answers, so the executor knows when an unconfirmed HIGH assumption has become a schedule risk rather than a pending one?
Do (7)
  • Re-derive every displayed number from underlying brick receipts at read time; treat author/brick summaries as unverified until the receipt is re-tallied.
  • Surface OPEN material gaps, unconfirmed flagged assumptions (with blast radius), and refute auto-flags as three first-class, always-visible sections so nothing material can hide.
  • On a refute, auto-flag exactly the artifacts on that row's downstream-refs list and route each back to its OWNING brick's loop — let that loop, not this brick, do the re-review.
  • Make the HIGH-blast-radius rollup a single stable, machine-readable contract that review_release_readiness_iterate's gate reads directly.
  • Spot-check Verdict's convergence verdicts by re-deriving >=3 receipt rows each, with a recorded actor who did not author the artifact.
  • Keep the cascade idempotent: re-flag only refs whose state changed; never duplicate an existing open flag.
  • Make every dashboard item link to a resolvable receipt id; if a receipt cannot be resolved, show the item as a gap, not as green.
Don't (7)
  • Do NOT add a human approval/sign-off gate — the human supplies INPUT (answers) only; this brick blocks on no human decision.
  • Do NOT run a new review loop, re-adjudicate gaps, or author/fix downstream artifacts — route them back to the owning brick; this is a rollup + cascade router only.
  • Do NOT display any status, count, or 'resolved' that cannot be shown from an underlying receipt; never green-wash an item lacking a backing receipt.
  • Do NOT let the HIGH-blast-radius rollup read TRUE while any HIGH assumption is OPEN, and do not silently downgrade a HIGH assumption to clear the gate.
  • Do NOT let Verdict spot-check (or convergence-verify) an artifact Verdict itself authored — preserve independence.
  • Do NOT absorb a spot-check discrepancy silently — every discrepancy becomes an OPEN material gap on the dashboard.
  • Do NOT widen scope into a parallel ledger or assumption store — read the canonical Assumption Register and brick receipts as the single source of truth.
Guardrails (7)
  • Honesty law: the rollup asserts only what it can re-derive from receipts; any unverifiable claim is shown as a gap, never as done — Atlas-style 'show the receipt, never "it works."'
  • Single source of truth: the canonical Assumption Register and per-brick ledger receipts are authoritative; this brick maintains no competing copy.
  • Independence / non-capture: Verdict owns this rollup but is barred from spot-checking or convergence-verifying any artifact it authored; the spot-check actor is recorded and must differ from the artifact author.
  • No-human-gate invariant: zero owner=Human blocking steps; the human INPUT channel informs and never approves.
  • Cascade completeness: a refuted HIGH or any row with downstream-refs MUST produce a routed re-review flag for every ref before the pre-go-live assumption gate can read TRUE.
  • Gate contract stability: the 'all HIGH resolved-or-accepted' boolean and its backing list are exposed under a stable id; changing its semantics requires updating review_release_readiness_iterate's gate in lockstep.
  • Idempotency & auditability: every flag, route, and rollup read is logged with receipt ids, timestamps, and actor, and re-runs produce no duplicate flags.
05

Hypothesis-Driven Requirements Discovery (requirements.json + PRD.md)

AI+Human Agent: Vela

Now that the Team & Operating-System Charter exists (assemble_team), Vela runs hypothesis-driven discovery against the ALREADY-COMMITTED machinery: the canonical Objectives & Success-Metric Rubric, the run-spanning Assumption Register, the Sponsor Question Log, and the Independent-Review Protocol. Discovery is framed as falsifiable hypotheses about vision, personas, jobs-to-be-done, prioritized journeys, scope, NFRs, data, integrations, constraints, risks, and measurable success; each hypothesis is resolved by a sponsor receipt or by an explicitly flagged Assumption Register row — never by an unsourced assertion. Crucially, Vela does NOT mint a new register/scope/NFR/conflict schema: it EXTENDS-IN-PLACE the canonical onboard artifacts by their stable IDs (adding/refining rows on the same Assumption Register and Sponsor Question Log) and maps every success metric back to a canonical Objectives-Rubric entry ID. The sponsor is a CONTINUOUS, NON-BLOCKING input channel: they seed discovery and answer throughout, but discovery never waits on approval; on silence past the cadence window after N timestamped surfacing attempts, Vela proceeds on a logged, default-valued, FLAGGED assumption and surfaces it at the next review. Every FR carries a stable ID, priority, provenance source, and a binary acceptance criterion; every NFR carries a numeric threshold (number+unit+condition) or an explicit "N/A because…". This is a kind=work authoring brick: its sole output is the requirements record shaped to feed the downstream review_spec_iterate loop — it does not gate, approve, or sign off anything. "Done" is exactly when the binary completeness checklist passes, and that checklist result IS the receipt.

Deliverables
[AI · Vela] requirements.json — versioned, schema-conformant requirements record with REQUIRED top-level sections: vision, personas, jobs_to_be_done, journeys, functional_requirements, nfrs, data_and_classification, integrations, constraints, risks, scope, success_metrics, open_questions, provenance_index, version. NOTE: assumption_register and conflict/question logs are NOT re-emitted here — they are referenced by canonical ID into the run-spanning artifacts opened in onboard_brief_and_input_channel.
Acceptance: File exists and the schema-validation lint exits 0 confirming every required top-level section is present and non-empty (or explicitly N/A-with-reason), AND a no-shadow lint exits 0 confirming requirements.json declares NO local assumption_register/conflict_log/sponsor_question_log of its own (those must be canonical-ID references); both validation exit codes attached as the receipt.
[AI · Vela] PRD.md — human-readable render generated FROM requirements.json (single source of truth), with every functional requirement, NFR, success metric, and referenced assumption showing its stable ID and provenance label, and each success metric showing the canonical Objectives-Rubric entry ID it maps to.
Acceptance: PRD.md is regenerated deterministically from requirements.json (re-render produces no diff except the regenerated timestamp); every FR/NFR/success-metric ID in the JSON appears exactly once in the render; and every referenced assumption-id and rubric-entry-id in the render resolves to an existing row in the canonical Assumption Register / Objectives Rubric. Diff-check and ID-resolution exit codes attached.
[AI · Vela] Functional Requirements set — each an object {id (stable, e.g. FR-012), statement, priority (P0–P3 or MoSCoW), source (sponsor-receipt-id | assumption-id | inferred), acceptance_criteria (binary/testable), status}.
Acceptance: A lint confirms ZERO functional requirements lack a non-empty source AND a binary acceptance_criteria, AND every source of type assumption-id resolves to an existing canonical Assumption Register row (no dangling references). Lint output (count checked, 0 violations, 0 dangling refs) attached as receipt.
[AI · Vela] NFR set — one entry per category (performance, scale, availability, security, privacy, accessibility, compliance), each with either a concrete threshold {number, unit, condition} or an explicit "N/A because …". Each NFR is tagged to the canonical Objectives-Rubric entry it supports (or marked rubric-orthogonal-with-reason).
Acceptance: A vague-adjective lint passes: every NFR category is present and resolves to either a numeric threshold with unit+condition OR an explicit N/A-with-reason; no category is missing and no NFR uses an unquantified adjective alone ('fast','scalable'); and every NFR's rubric tag resolves to a real Objectives-Rubric entry id or carries an explicit rubric-orthogonal reason. Lint result attached.
[AI · Vela] Assumption Register rows (EXTEND-IN-PLACE on the canonical run-spanning register from onboard_brief_and_input_channel) — each row/refinement carries the canonical schema {id, statement, default_value_used, why_assumed, source_gap, blast_radius (HIGH/MED/LOW), risk_if_wrong, owner, raised_at, validate_by, surfaced_to_sponsor[timestamps], status (open/confirmed/refuted/superseded)}; Vela appends new rows and refines existing rows by ID, never forking a parallel register.
Acceptance: Lint confirms (a) Vela added/edited rows ONLY on the canonical register (same artifact id as onboard's register — no new register object created), (b) every touched row has all required fields populated, and (c) every HIGH-blast-radius row has status flagged and >=1 surfaced_to_sponsor timestamp; the canonical register's HIGH count (before vs after) is reported. Lint receipt attached.
[Human] Sponsor answers — ongoing answers to surfaced discovery questions (building on the brief already captured at onboard), each stored as a referenceable transcript/message/document id and folded into the provenance_index; questions flow through the canonical Sponsor Question Log.
Acceptance: Every sponsor answer received during discovery is recorded with a provenance id present in provenance_index, and every discovery question raised is appended to the canonical Sponsor Question Log (same artifact id as onboard's log) rather than a new log; provenance_index is non-empty and question-log delta is reported.
[AI · Vela] Provenance / honesty-receipt index — maps every sponsor-sourced item to the transcript/message/document id it came from; every non-sponsor item is labeled 'assumption' (resolving to a canonical Assumption Register id) or 'inferred'.
Acceptance: Honesty lint passes with ZERO unsourced factual claims about sponsor intent: every FR/persona/journey/NFR/constraint/success-metric with source=sponsor resolves to a real provenance id in provenance_index, and every source=assumption resolves to a real canonical Assumption Register id. Lint output attached as receipt.
[AI · Vela] Scope statement — in_scope, out_of_scope (each with rationale), deferred/Phase-2 bucket, and an explicit non_goals list, authored as scope.* sections of requirements.json that reconcile against (and do not contradict) the Non-Goals already captured in the onboard Sponsor Brief.
Acceptance: requirements.json contains scope.in_scope, scope.out_of_scope (every entry has non-empty rationale), scope.deferred, and non_goals; a lint confirms no out_of_scope entry has an empty rationale AND every onboard Sponsor-Brief Non-Goal appears in scope.out_of_scope/non_goals or has a logged conflict_log entry explaining the change. Lint pass/fail attached.
[AI · Vela] Data & Classification + Integrations record — data sources with sensitivity classification (PII/PHI/none), residency, and regulatory regime (GDPR/HIPAA/SOC2 scope), plus each external integration with direction and dependency, captured as first-class structured fields so Cipher and Keystone have a basis for AppSec/architecture review.
Acceptance: If any data item is classified PII or PHI, a data_classification block with residency and regulatory_regime is present and non-empty; lint enforces this conditional and reports pass/fail.
[AI · Vela] Conflict log rows (EXTEND-IN-PLACE on the canonical run-spanning conflict/question log) — every instance where new sponsor input contradicts an existing record item, recorded as {old, new, resolution or 'unresolved->open_question/assumption-id'} on the canonical log; never a silent overwrite, never a forked log.
Acceptance: Lint confirms conflict rows were appended to the canonical log object (same artifact id as onboard's, not a new one), and every Vela-touched entry has old, new, and either a resolution or a link to an existing open_question/Assumption Register id; zero entries with a missing resolution field. Receipt attached.
[AI · Vela] Red-team self-check note — short internal devil's-advocate pass (Vela paired with Iris on personas/journeys) listing challenged unstated assumptions, happy-path bias, missing failure modes, and missing personas, each marked addressed or logged. This is a cheap pre-pass to cut churn; it is NOT the independent review — that is the downstream review_spec_iterate loop run under the canonical Independent-Review Protocol.
Acceptance: challenges-addressed note exists and references concrete record IDs (canonical assumption-id / FR-id / journey-id) for each challenge raised; every listed challenge has a disposition (addressed | assumption-logged | open-question). Note explicitly states it is authoring-side self-check, not a ConvergenceVerdict.
[AI · Vela] Brick completeness/exit checklist — binary gate that is itself the brick's done-receipt and the input contract for review_spec_iterate.
Acceptance: All true: every in-scope journey has >=1 happy + >=1 failure path; every NFR category answered-or-N/A; every persona maps to >=1 job; every open_question is answered-or-has-a-logged-canonical-assumption; data_classification present if any PII/PHI; every success metric is measurable AND maps to a canonical Objectives-Rubric entry id; assumption/conflict/question rows live on the canonical registers (no-shadow lint clean); honesty/source/NFR/assumption/no-shadow lints all exit 0. Checklist output (all checks PASS) attached as the receipt; brick is not marked done otherwise.
Questions the agent asks (10)
  • What outcome are you trying to accomplish, and how will you know it worked — what does success look like in measurable terms (a number, a rate, a time) that we can bind to your existing Objectives & Success-Metric Rubric entries?
  • Who are the primary user personas and what specific job is each trying to get done?
  • Walk us through the most important user journeys: what is the ideal happy path, and what should happen when things go wrong (errors, edge cases, empty/abuse states)?
  • What is explicitly OUT of scope or a non-goal for this build, and what is acceptable to defer to a later phase — and does that match the Non-Goals you gave us at intake, or has anything changed?
  • What data will the system touch — does any of it include PII or PHI, where must it reside, and what regulatory regimes apply (GDPR, HIPAA, SOC2, other)?
  • What external systems must we integrate with, in which direction, and are there existing contracts/credentials/rate limits we must respect?
  • What are your hard non-functional expectations — target latency, concurrent users/throughput, availability/uptime, accessibility level, and security posture, each as a concrete number?
  • What hard constraints exist — budget, deadline, technology mandates, existing infrastructure, or organizational policies — beyond the typed constraints already in the brief?
  • What are the biggest risks or past failures you want us to avoid, and which requirements are non-negotiable P0 versus nice-to-have?
  • Where you cannot answer right now: are you comfortable with us proceeding on a flagged default assumption (logged on the running Assumption Register) that we surface for you to confirm or correct at the next review?
Do (14)
  • Run discovery AGAINST the already-committed machinery: bind every success metric to a canonical Objectives & Success-Metric Rubric entry id, append/refine the canonical Assumption Register, route questions through the canonical Sponsor Question Log, and shape output for the canonical Independent-Review Protocol — all of which already exist because this brick runs after assemble_team.
  • EXTEND-IN-PLACE: add or refine rows on the canonical registers by their stable IDs; never fork a parallel Assumption Register, conflict log, or question log with a different schema.
  • Frame discovery as falsifiable hypotheses and record, per hypothesis, whether it was resolved by a sponsor receipt or by a flagged canonical Assumption Register entry.
  • Assign every functional requirement a stable ID, a priority, a provenance source, and its own binary/testable acceptance criterion so Proof can later test it and Keystone can size it.
  • Attach a concrete numeric threshold (number + unit + condition) to every NFR, or an explicit 'N/A because …'; never accept a bare adjective like 'fast' or 'scalable'.
  • Cite a provenance receipt (transcript/message/document id) for every sponsor-sourced item; label every non-sponsor item as 'assumption' (resolving to a canonical register id) or 'inferred'.
  • Treat the sponsor as a continuous, non-blocking input channel: surface questions any time, fold in new info, update the record — but never pause discovery waiting for approval.
  • On sponsor silence past the cadence window after N timestamped surfacing attempts, convert the open question into a logged, default-valued canonical assumption with blast_radius and surface it at the next review.
  • Capture happy path AND at least one failure/edge path for every in-scope journey.
  • Capture data sensitivity (PII/PHI), residency, and regulatory regime as first-class structured fields so Cipher has a basis for AppSec/privacy review.
  • Log every contradiction on the canonical conflict log with old vs new and a resolution; turn unresolved conflicts into open questions or flagged canonical assumptions.
  • Run the cheap internal red-team/devil's-advocate self-check (with Iris on personas/journeys) BEFORE handing to review_spec_iterate, to cut downstream churn — and label it self-check, not a ConvergenceVerdict.
  • Run the schema/source/NFR/assumption/honesty/no-shadow lints and the completeness checklist, and treat their exit codes as the brick's done-receipt.
  • Shape the output explicitly for the downstream review_spec_iterate loop so Verdict's panel has binary, receipt-backed checks to re-derive against.
Don't (13)
  • Do NOT re-emit a new Assumption Register, scope set, NFR set, conflict log, or success-metric schema with a different shape — the canonical artifacts from onboard already exist; extend them by ID.
  • Do NOT invent your own success-metric definition; every success metric must map to an existing canonical Objectives & Success-Metric Rubric entry id (or propose a new rubric row via the canonical mechanism, never a private one).
  • Do NOT introduce any human approval, sign-off, or gate — this brick authors; it does not seek sponsor sign-off.
  • Do NOT block on sponsor latency; silence becomes a flagged canonical assumption, not a stall.
  • Do NOT assert sponsor intent without a provenance receipt — an unsourced factual claim about what the sponsor wants is invalid and must fail the honesty lint.
  • Do NOT let a HIGH-blast-radius assumption silently become load-bearing for architecture; it must be flagged on the canonical register and surfaced at the next review.
  • Do NOT accept vague NFRs (no bare 'fast/scalable/secure') or functional requirements lacking a binary acceptance criterion.
  • Do NOT silently overwrite an earlier record item when new input contradicts it — log it on the canonical conflict log.
  • Do NOT ship prose-only themes; the deliverable must be the schema'd, ID'd, receipt-cited requirements.json with a deterministic PRD.md render.
  • Do NOT treat the internal red-team self-check as the independent review; the independent panel runs in the downstream review_spec_iterate brick under the canonical Independent-Review Protocol.
  • Do NOT mark the brick done unless the completeness checklist and all lints (including no-shadow) pass (exit 0).
  • Do NOT bundle privacy/compliance/data-classification away as a vague NFR footnote; capture it as first-class structured fields.
  • Do NOT invent personas, journeys, or metrics and present them as sponsor-stated; label inferred items as 'inferred'.
Guardrails (10)
  • Canonical-extension gate (no-shadow): requirements.json must NOT declare its own assumption_register, conflict_log, or sponsor_question_log; assumptions/conflicts/questions live ONLY as rows on the canonical run-spanning artifacts opened in onboard_brief_and_input_channel, and every reference must resolve to a real canonical id — enforced by the no-shadow lint before handoff.
  • Rubric-binding gate: every success metric resolves to an existing canonical Objectives & Success-Metric Rubric entry id; an unmapped success metric fails the completeness checklist.
  • Honesty gate: the artifact must contain ZERO unsourced factual claims about sponsor intent; every sponsor-sourced item resolves to a real provenance id (and every assumption-sourced item to a real canonical register id) or the brick fails.
  • Non-blocking cadence is auditable: 'proceeded on assumption' is only valid when backed by N timestamped surfacing attempts on the canonical Sponsor Question Log plus a canonical Assumption Register entry with a default_value — those timestamps are the receipt.
  • HIGH-blast-radius assumptions must be visibly flagged on the canonical register, carry a validate_by date, and be explicitly surfaced at the next review; architecture must not silently depend on an unconfirmed HIGH assumption.
  • Every functional requirement must have a source receipt AND a binary acceptance criterion; every NFR category must have a numeric threshold or explicit N/A-with-reason — enforced by lint before handoff.
  • Completeness checklist is a hard gate: the brick cannot be marked done until every in-scope journey has happy+failure paths, every NFR is answered/N/A, every persona maps to a job, every open question is answered-or-canonical-assumption-logged, data classification is present where PII/PHI exists, success metrics are measurable and rubric-mapped, and the no-shadow/honesty/source/NFR/assumption lints exit 0 — and that checklist output is the done-receipt.
  • Conflicts are never silently resolved: every contradiction is logged on the canonical conflict log with old/new and a resolution or an escalation to open-question/canonical-assumption.
  • Owner is AI+Human only as INPUT: the Human provides information and answers; the Human never approves. There is no sign-off in this brick.
  • Output contract is fixed: requirements.json conforms to the required schema and PRD.md is deterministically rendered from it; the brick emits into the downstream review_spec_iterate review_loop brick (Verdict-led panel under the canonical Independent-Review Protocol) and does not itself review or gate.
06

Product Spec (PRD), Backlog & Roadmap

AI Agent: Vela

Synthesize the confirmed discovery requirements into a decision-ready Product Requirements Document plus a prioritized backlog of INVEST stories, each carrying machine-checkable acceptance criteria, a MoSCoW priority with a one-line value/effort (cost-of-delay) justification, a rough size, and an explicit MVP cut with a stated hypothesis and the single metric that will validate it post-launch. This brick is AUTHORING (kind=work): its job is to produce an artifact engineered to be FALSIFIED by the downstream independent review_loop, not admired. It consumes the upstream discovery record (requirements.json: stably-ID'd functional requirements and NFRs tagged confirmed-vs-inferred) and never invents unsourced requirements. CRITICAL CHANGE FROM PRIOR VERSION: this brick does NOT fork its own parallel assumption or NFR registers. It EXTENDS-IN-PLACE the single canonical, run-spanning Assumption Register opened in onboard (and enriched in requirements_discovery) BY ID, and the single canonical NFR/cross-cutting set in requirements.json#/nfrs BY ID. Any assumption this brick raises is added as a new row in that one canonical register using the SAME assumptions-as-liabilities schema declared in onboard {id, statement, criticality, blocks_bricks[], confidence, status, raised_to_sponsor, revalidation_trigger} PLUS the shared liability fields {owner, question_asked_to_human (= the Sponsor Question Log entry), default_on_silence, blast_radius, falsification_test}; there is exactly ONE assumption schema across the run, not a prd-local variant. Likewise any new NFR/cross-cutting requirement is appended to requirements.json#/nfrs under a stable NFR-ID with {numeric target+unit+condition, named verification method, owning brick/agent}, never re-declared in a prd-local NFR register. The human (sponsor) is a CONTINUOUS input channel surfaced through the canonical Sponsor Question Log, but is NEVER a sign-off gate: where the sponsor is silent the team proceeds on an explicit, flagged canonical-register assumption. The brick is "done" only when a machine-generated prd_lint receipt is green over the CANONICAL artifacts (zero OPEN requirements, every story has >=1 testable AC, every NFR has a numeric target + named verification method + owning brick/agent, zero orphan requirements, zero owner/falsifier-less assumptions, AND zero schema-fork violations: zero assumptions outside the canonical register, zero NFRs outside requirements.json#/nfrs) and the review-ready package has been emitted to the following review_spec_iterate brick. Every claim of self-bar passage is backed by the lint JSON receipt, never asserted.

Deliverables
[AI · Vela] Product Requirements Document (PRD): problem statement, goals bound to the canonical Single Success Metric + Objectives Rubric (referenced by rubric ID, not re-stated), personas, in-scope vs out-of-scope, MVP cut with stated hypothesis + the single post-launch validation metric, dependencies, and a non-functional/cross-cutting section that REFERENCES canonical NFR-IDs rather than redefining them.
Acceptance: PRD exists at the sprint path; the prd_lint receipt records requirements_total>0 and requirements_open=0; the MVP section names >=1 hypothesis and exactly 1 validation metric resolving to the canonical primary success metric ID; every MoSCoW tier present in scope carries a one-line value/effort or cost-of-delay justification (lint flag moscow_without_justification=0); the PRD's NFR section cites canonical NFR-IDs and declares zero inline NFR definitions (lint flag nfrs_defined_outside_canonical=0).
[AI · Vela] Prioritized INVEST backlog: each story with a stable story_id, MoSCoW priority + justification, rough size estimate, and >=1 acceptance criterion in structured Given/When/Then (or decision-table) form with a stable AC-ID bindable by Proof to a future test ID.
Acceptance: prd_lint records stories_total>0, stories_without_testable_AC=0, and acs_unobservable=0 (no AC phrased as an unobservable adjective like fast/intuitive/secure without a number or observable event); every story has size!=null and a MoSCoW tier; every AC-ID is unique and references exactly one story_id.
[AI · Vela] Bidirectional trace matrix (schema: requirement_id, source, confirmed|inferred, story_ids[]) linking every story to a discovery requirement_id (from requirements.json) or a canonical assumption_id (from the run-spanning Assumption Register), AND every confirmed requirement to >=1 story.
Acceptance: prd_lint records requirements_with_no_story (orphans)=0 and stories_with_no_requirement_or_assumption=0; every requirement_id resolves to an ID present in the consumed requirements.json, and every assumption_id resolves to a row in the canonical Assumption Register (lint flag dangling_ids=0 and assumption_ids_outside_canonical_register=0 — no prd-local assumption IDs).
[AI · Vela] Assumption Register DELTA — new/updated rows APPENDED IN PLACE to the single canonical run-spanning Assumption Register (opened in onboard, enriched in requirements_discovery), each conforming to the ONE canonical schema {id, statement, criticality, blocks_bricks[], confidence, status, raised_to_sponsor, revalidation_trigger} PLUS the shared liability fields {owner, question_asked_to_human, default_on_silence, blast_radius, falsification_test}. No parallel/prd-local register is emitted.
Acceptance: prd_lint records assumptions_register_count==1 (exactly one register file/section exists for the run; no second register), assumptions_with_foreign_schema=0 (every row carries the full canonical field set + shared liability fields), assumptions_without_owner_or_falsifier=0, and assumptions_without_default_on_silence=0; every inferred requirement this brick relies on without human confirmation appears as a canonical register row with non-empty falsification_test and default_on_silence, and every such row's question_asked_to_human resolves to a canonical Sponsor Question Log entry ID.
[AI · Vela] NFR / cross-cutting completeness DELTA — runs the standing checklist (authn/authz, per-client data isolation, retention/privacy, audit/ledger, observability/SLOs, accessibility, i18n, error/empty/loading states, rate limits, abuse cases) and for each item EXTENDS requirements.json#/nfrs IN PLACE by stable NFR-ID with {numeric target+unit+condition, named verification method, owning brick/agent} OR marks an explicit justified not-applicable; never re-declares NFRs in a prd-local register.
Acceptance: prd_lint records nfrs_total>0, nfrs_missing_numeric_target_or_method=0, nfrs_missing_owner_brick_or_agent=0, crosscutting_items_unaddressed=0 (every checklist item is requirement-or-justified-NA, none silent), AND nfr_registers_count==1 with nfrs_defined_outside_canonical=0 (every NFR lives under requirements.json#/nfrs by ID — no parallel set); each security/privacy/isolation NFR names Cipher as owner and each SLO/availability NFR names Vector; new NFR-IDs do not collide with existing requirements.json#/nfrs IDs (lint flag nfr_id_collision=0).
[AI · Cadence] prd_lint validator + machine-generated receipt (JSON) emitting all self-bar counts: requirements_total, requirements_open, stories_total, stories_without_testable_AC, acs_unobservable, nfrs_total, nfrs_missing_numeric_target_or_method, nfrs_missing_owner_brick_or_agent, requirements_with_no_story, stories_with_no_requirement_or_assumption, assumptions_without_owner_or_falsifier, crosscutting_items_unaddressed, PLUS schema-fork guards assumptions_register_count, nfr_registers_count, assumptions_with_foreign_schema, assumption_ids_outside_canonical_register, nfrs_defined_outside_canonical, nfr_id_collision, dangling_ids — plus a commit hash.
Acceptance: prd_lint runs and writes a receipt JSON; the receipt is GREEN (requirements_open=0, stories_without_testable_AC=0, acs_unobservable=0, nfrs_missing_numeric_target_or_method=0, nfrs_missing_owner_brick_or_agent=0, requirements_with_no_story=0, stories_with_no_requirement_or_assumption=0, assumptions_without_owner_or_falsifier=0, crosscutting_items_unaddressed=0, AND assumptions_register_count==1, nfr_registers_count==1, assumptions_with_foreign_schema=0, assumption_ids_outside_canonical_register=0, nfrs_defined_outside_canonical=0, nfr_id_collision=0, dangling_ids=0); the receipt carries the commit hash of the package it validated.
[AI · Vela] Review-ready package handed to the following review_spec_iterate brick: PRD + backlog + bidirectional trace matrix + the canonical Assumption Register (with this brick's appended delta rows) + the canonical requirements.json#/nfrs (with this brick's appended NFR-IDs) + green prd_lint receipt, plus the named panel and lens assignments.
Acceptance: Package manifest lists all artifacts with their file paths and the green lint receipt's commit hash; manifest references the SINGLE canonical Assumption Register and SINGLE canonical NFR set (not prd-local copies); manifest names the panel (Verdict chair, plus Keystone=feasibility/hidden-architecture, Cipher=security/privacy/isolation completeness, Proof=AC-testability, Iris=UX/error-empty-state/accessibility, plus a red-team pass); brick is not marked done until the manifest references a green receipt.
Questions the agent asks (5)
  • What are the top 1-3 outcomes you want this product/increment to achieve, and how would you measure success (the metric that tells us it worked)?
  • Which capabilities are must-have for a first usable release versus nice-to-have later — is there a hard date, event, or commitment driving the MVP scope?
  • Are there known constraints we must design within: per-client data isolation requirements, regulated/sensitive data, retention or privacy rules, target user volume, or specific availability/latency expectations?
  • Which discovery requirements are confirmed by you versus inferred by the team — for any inferred ones, is our default-on-silence assumption acceptable, or do you want to correct it?
  • Who are the primary user types/personas, and are there accessibility, language/locale, or device constraints we must support from day one?
Do (9)
  • Consume requirements.json's stable requirement IDs and NFR-IDs as the source of truth; tag every requirement confirmed or inferred and trace each one forward to a story.
  • EXTEND-IN-PLACE the single canonical run-spanning Assumption Register by ID: append any new assumption as a row in THAT register using the one canonical schema {id, statement, criticality, blocks_bricks[], confidence, status, raised_to_sponsor, revalidation_trigger} plus the shared liability fields {owner, question_asked_to_human, default_on_silence, blast_radius, falsification_test} — never start a prd-local register with a different schema.
  • EXTEND-IN-PLACE requirements.json#/nfrs by stable NFR-ID for every new or refined cross-cutting requirement, with a numeric target+unit+condition, a named verification method, and an owning brick/agent (Cipher for security/privacy/isolation, Vector for SLO/availability, Proof for testability) — never re-declare NFRs in a separate prd register.
  • Write every acceptance criterion in structured Given/When/Then or decision-table form with a stable AC-ID, so Proof can later bind a test to it and prove coverage.
  • Maintain the trace matrix BIDIRECTIONALLY and run prd_lint to mechanically detect orphan requirements (requirement with no story), unfounded stories (story with no requirement or canonical assumption), and schema-fork violations (any assumption outside the canonical register, any NFR outside requirements.json#/nfrs).
  • Tie every question raised to the canonical Sponsor Question Log entry ID so each assumption's question_asked_to_human resolves to a real run-spanning log row, not a prd-local note.
  • Tie every MoSCoW assignment to a one-line value/effort or cost-of-delay justification so the ordering is defensible to an independent reviewer.
  • Generate the prd_lint receipt and treat the brick as done only when it is green (including the schema-fork guards assumptions_register_count==1 and nfr_registers_count==1); carry the receipt's commit hash into the review-ready package manifest.
  • When the sponsor is silent, proceed on an explicit flagged assumption logged in the CANONICAL register with a default_on_silence — keep moving, never block.
Don't (10)
  • Do not fork a parallel assumptions register or a parallel NFR register — assumptions live only in the single canonical run-spanning Assumption Register, and NFRs live only under requirements.json#/nfrs; a second register is a hard lint failure (assumptions_register_count!=1 or nfr_registers_count!=1).
  • Do not invent a prd-local assumption schema — every assumption row must carry the SAME canonical field set declared in onboard plus the shared liability fields; a row missing any of {owner, question_asked_to_human, default_on_silence, blast_radius, falsification_test} or any canonical field fails the lint.
  • Do not invent unsourced requirements — every requirement must trace to requirements.json or be logged as a canonical-register assumption with a falsification test.
  • Do not phrase any acceptance criterion as an unobservable adjective (fast, intuitive, secure, robust) without a number or an observable event.
  • Do not write an NFR target that no downstream brick/agent is on the hook to verify — a number with no owner is theater; and do not collide a new NFR-ID with an existing requirements.json#/nfrs ID.
  • Do not let cross-cutting concerns be OPEN-by-omission; the checklist must force each item to be addressed (as a canonical NFR-ID) or justified-NA.
  • Do not insert any human sign-off, approval, or acceptance gate — the human provides input and answers through the canonical Sponsor Question Log, never blessing (this brick must NOT inherit the legacy 'Human acceptance' pattern in docs/processes/08-product-development.md).
  • Do not claim the self-bar passed without pointing to the green prd_lint JSON receipt.
  • Do not block waiting for a sponsor answer; log the question in the canonical Sponsor Question Log, take the default_on_silence, and proceed.
  • Do not let this brick attempt convergence/sign-off — its job ends at green receipt + package emitted; the following review_spec_iterate brick owns convergence.
Guardrails (10)
  • Honesty architecture: every 'done/green/complete' claim is backed by the machine-generated prd_lint receipt (counts + commit hash); no assertion without the receipt. Reviewers RE-DERIVE these counts, never accept the author summary.
  • SINGLE-REGISTER INVARIANT (the fix): there is exactly ONE Assumption Register and ONE NFR set for the whole run; this brick EXTENDS them in place by ID. prd_lint enforces assumptions_register_count==1, nfr_registers_count==1, assumptions_with_foreign_schema=0, assumption_ids_outside_canonical_register=0, nfrs_defined_outside_canonical=0, nfr_id_collision=0.
  • ONE ASSUMPTION SCHEMA: every assumption row uses the canonical schema declared in onboard {id, statement, criticality, blocks_bricks[], confidence, status, raised_to_sponsor, revalidation_trigger} PLUS shared liability fields {owner, question_asked_to_human, default_on_silence, blast_radius, falsification_test}; no prd-local variant schema is permitted.
  • Binary self-bar (machine-checked, not human-judged): requirements_open=0; stories_without_testable_AC=0; acs_unobservable=0; nfrs_missing_numeric_target_or_method=0; nfrs_missing_owner_brick_or_agent=0; requirements_with_no_story=0; stories_with_no_requirement_or_assumption=0; assumptions_without_owner_or_falsifier=0; crosscutting_items_unaddressed=0; plus all schema-fork guards green.
  • Input contract is mandatory: this brick requires requirements.json with stable confirmed-vs-inferred-tagged requirement IDs + NFR-IDs and the open canonical Assumption Register; if absent, surface the gap upstream rather than fabricating requirements or forking a new register.
  • AC format is fixed: structured Given/When/Then or decision-table, each with a unique stable AC-ID referencing exactly one story.
  • Standing cross-cutting/NFR checklist is non-optional and resolves only into canonical NFR-IDs: authn/authz, per-client data isolation, retention/privacy, audit/ledger, observability/SLOs, accessibility, i18n, error/empty/loading states, rate limits, abuse cases — each an explicit canonical NFR-ID or justified-NA, never silent.
  • No human gate anywhere; sponsor is a continuous input channel only, surfaced via the canonical Sponsor Question Log. Silence resolves to an explicit flagged canonical-register assumption with a default_on_silence, never to a block.
  • Terminal output is the review-ready package (PRD + backlog + bidirectional trace matrix + the canonical Assumption Register including this brick's delta + the canonical requirements.json#/nfrs including this brick's NFR-IDs + green lint receipt) emitted to the review_spec_iterate brick, with the named panel (Verdict chair + Keystone, Cipher, Proof, Iris + red-team) — convergence is owned by that following loop, not here.
  • No build artifacts outside an open sprint (CLAUDE.md Sprint Workflow).
07

Independent Review & Iterate: Spec/PRD

AI Agent: Verdict

Replace the legacy requirements human-sign-off gate with an AI-owned independent-review-and-iterate loop that converges only on evidence. A panel of NON-author agents — Verdict (objectives coverage + bidirectional traceability + adjudication), Keystone (feasibility), Cipher (security/privacy/abuse-case completeness), Proof (testability of every acceptance criterion), Iris (UX completeness) — plus an adversarial red-team pass critiques the PRD/backlog line-by-line against a single FROZEN, VERSIONED review rubric derived from the 7 process objectives. Every gap (including every red-team finding) becomes a first-class GapLog entry with reviewer-owned severity; Vela fixes; the loop re-reviews (delta on changed sections PLUS regression that prior SOLID verdicts still hold against the new artifact hash). The loop converges ONLY when open material gaps (severity >= major) == 0 AND every panel reviewer and the red-teamer record a receipt-backed SOLID verdict bound to one final artifact version+hash, with both traceability matrices complete. The human sponsor is NEVER an approval gate: when a gap needs human-only information the team logs an open question to the sponsor AND proceeds on an explicit flagged assumption recorded in an assumption register, so the loop never stalls into a de-facto human gate. Vela may not review her own PRD; no reviewer may have authored or co-authored the PRD or a parent artifact it derives from. This brick is the canonical template every downstream review_loop brick (architecture, sprint-plan, sprint-accomplishments) copies, so independence, receipts, and mechanical convergence are set here.

Deliverables
[AI · Verdict] Frozen, versioned Review Rubric (review_rubric.vN.json) — binary checks per dimension (objectives-coverage, feasibility, security/abuse-completeness, AC-testability, UX-completeness, brief→PRD traceability, requirement→AC traceability), each check phrased as a yes/no with an explicit pass condition, derived line-by-line from the 7 process objectives.
Acceptance: File exists, carries a version tag and content hash, contains >=1 binary check for EACH of the 7 named dimensions, and is referenced by hash in every reviewer verdict; every rubric line is a yes/no check with a stated pass condition (zero prose-only criteria). Frozen = hash recorded in the convergence record before any verdict is cast.
[AI · Proof] AC-Quality Standard + per-AC testability ledger (ac_quality_ledger.json): each acceptance criterion in the PRD assessed against the standard — observable, has a measurable threshold, has a named test oracle, no subjective terms ('fast','intuitive','robust').
Acceptance: Ledger lists every PRD acceptance-criterion id with pass/fail against each of the 4 standard checks and the cited test oracle; every AC that fails ANY check is auto-logged in the GapLog at severity >= major; Proof's SOLID verdict is invalid unless the ledger shows zero remaining failing ACs.
[AI · Verdict] Bidirectional Traceability Matrices: brief→PRD coverage (every sponsor-stated need maps to >=1 requirement, zero orphaned needs) AND requirement→AC coverage (every requirement maps to >=1 testable AC).
Acceptance: Both matrices exist as files; brief→PRD shows zero sponsor needs with no mapped requirement; requirement→AC shows zero requirements with no mapped AC; any orphan on either side is a GapLog entry at severity >= major; convergence cannot be declared while either matrix has an uncovered row.
[AI · Verdict] GapLog (gaplog.json) as the loop's auditable state machine: each entry has id, source (panel-reviewer | red-team | AC-standard | traceability), raising-agent, severity (minor|major|critical), status (open|fixed|verified|wont-fix), severity-change history, and (for wont-fix) a reviewer-recorded rationale.
Acceptance: Every gap — including EVERY red-team finding — appears as an entry with all fields populated; severity is set by the raising reviewer; any severity downgrade records who changed it (only the raising reviewer or Verdict-as-adjudicator) with timestamp; no entry sits in a status not in the allowed set; at convergence, zero entries are status=open with severity>=major.
[AI · Cadence] Assumption Register + Open-Questions-to-Sponsor channel (assumptions.json, sponsor_questions.json): each gap needing human-only info produces BOTH an open question to the sponsor AND an explicit flagged assumption the team proceeds on.
Acceptance: For every GapLog entry tagged needs-human-info there is exactly one matching open question to the sponsor AND one flagged assumption in the register; each such assumption is also visibly flagged inline in the PRD; the loop is permitted to converge with open assumptions present; any sponsor answer received re-opens the corresponding assumption for re-resolution (recorded).
[AI · Cipher] Adversarial red-team report (redteam_report.md) — dedicated pass for missing requirements, internal contradictions, untestable criteria, and unstated assumptions, run by an agent distinct from the convergence adjudicator.
Acceptance: Report exists and is authored by an agent who is neither the PRD author nor Verdict (the adjudicator); EVERY finding in the report is mirrored as a first-class GapLog entry with a severity set by the red-teamer; closing any red-team finding requires either a fix (status=verified) or a reviewer-recorded rationale-for-no-action (status=wont-fix) — zero silent dismissals.
[AI · Vela] Revised PRD/backlog (prd.vN.md) iterated to close gaps, each revision producing a new version+hash; inline-flagged assumptions carried for any open sponsor questions.
Acceptance: Latest PRD version closes (status=fixed) every GapLog entry assigned to the author; each revision bumps version and content hash; the PRD inline-flags every open assumption; Vela's authorship is recorded so the independence/COI check can confirm she did not also sit on the review panel.
[AI · panel] Receipt-backed per-reviewer verdicts (verdicts.json) — one SOLID/NOT-SOLID record per panel reviewer (Verdict, Keystone, Cipher, Proof, Iris) and the red-teamer, each bound to the FINAL artifact version+hash.
Acceptance: A verdict is counted as SOLID only if it cites (a) the exact PRD version+hash reviewed, (b) the rubric version+hash and the specific rubric line ids checked with per-line pass, (c) the specific PRD section/AC ids inspected, and (d) 'gaps I raised still open: none'; bare 'SOLID' with no citations is rejected by the convergence check; every verdict's hash matches the single final PRD hash (no stale-hash verdicts).
[AI · Verdict] Convergence Record (convergence_record.json) — the signed exit receipt with the independence/COI attestation and the binary exit checklist.
Acceptance: Record is signed by Verdict and asserts, each as a checkable true: frozen rubric exists (hash); GapLog open sev>=major == 0; every panel reviewer + red-teamer has a receipt-backed verdict bound to the FINAL hash; both traceability matrices complete; assumption register + open sponsor questions logged; AC ledger has zero failing ACs; independence attestation confirms no reviewer authored/co-authored the PRD or a parent artifact and that the red-teamer != adjudicator. Brick is DONE only when every box is true; otherwise loop continues.
Questions the agent asks (5)
  • For each gap tagged needs-human-info: what is the sponsor's answer? (logged as an open question; team proceeds on a flagged assumption until answered — never blocks)
  • Where the brief is ambiguous about a stated need, which interpretation is correct? (assumption recorded inline in the PRD if unanswered)
  • Are there constraints, data sources, or compliance obligations the sponsor knows about that are not yet captured as requirements?
  • Are any flagged assumptions in the assumption register unacceptable to the sponsor and need correction?
  • Has the sponsor provided any NEW information since the last iteration that re-opens a closed assumption or adds a need to trace?
Do (9)
  • Freeze and hash the review rubric BEFORE any verdict is cast; every reviewer attests line-by-line against that single rubric hash.
  • Require every SOLID verdict to be receipt-backed: it must cite the PRD version+hash, rubric line ids passed, and PRD section/AC ids inspected — reject bare verdicts.
  • Make convergence a mechanical function of GapLog state + verdicts against ONE final hash, never a judgment call.
  • Let the reviewer who RAISED a gap own its severity; if reviewers split, Verdict adjudicates and records the ruling with reasoning so the loop never deadlocks.
  • On each new PRD version+hash, run delta re-review of changed sections AND a regression check that prior SOLID verdicts still hold against the new hash; invalidate downstream verdicts touching changed sections.
  • Promote EVERY red-team finding and every failing AC into the GapLog as a first-class entry with a severity.
  • When a gap needs human-only info, log an open question to the sponsor AND proceed on an explicit flagged assumption — converge on the resolvable remainder.
  • Enforce non-authorship/COI: confirm in the convergence record that no reviewer authored or co-authored the PRD or a parent artifact it derives from, and that the red-teamer is not the adjudicator.
  • Require both traceability matrices (brief→PRD and requirement→AC) to be complete with zero orphans before declaring convergence.
Don't (9)
  • Do not insert any human approval/sign-off step — the human provides input and answers, never gates.
  • Do not let the PRD author (Vela) sit on the review panel or adjudicate her own artifact.
  • Do not accept a bare 'SOLID' or any verdict not bound to the final artifact hash.
  • Do not let the author unilaterally downgrade a gap's severity to force convergence — only the raising reviewer or Verdict may change severity, with recorded history.
  • Do not allow red-team findings or failing ACs to be acknowledged-and-ignored; closing requires a fix or a recorded rationale-for-no-action.
  • Do not blanket-re-attest the whole PRD on every revision (rubber-stamp fatigue) nor delta-only without a regression check (lets fixes introduce new gaps elsewhere) — do both.
  • Do not stall the loop waiting on the sponsor; convert any human-blocking gap into a flagged assumption + open question and continue.
  • Do not let the loop spin unbounded; if it fails to converge after the iteration cap, escalate per loop-control rather than softening verdicts to terminate.
  • Do not declare 'done' until the convergence record's binary checklist is fully true.
Guardrails (7)
  • Independence is load-bearing: no reviewer may have authored/co-authored the PRD or a parent artifact it directly derives from; the red-teamer must be distinct from the convergence adjudicator (Verdict); this is attested in the convergence record.
  • Honesty architecture: 'converged'/'SOLID' is invalid unless backed by a receipt — rubric hash, GapLog state, per-reviewer verdicts with cited rubric lines and section/AC ids, all bound to one final PRD hash. No claim without a receipt.
  • Convergence is binary and version-pinned: open sev>=major gaps == 0 AND every panel reviewer + red-teamer SOLID against the SAME final version+hash AND both traceability matrices complete AND AC ledger has zero failures.
  • Severity governance: reviewer-owned severity, recorded change history, Verdict as sole adjudicator on splits; convergence-by-downgrade is prohibited.
  • Loop-control: a max-iteration cap with an explicit escalation path; a gap blocked on missing sponsor info is converted to a flagged assumption + open question (never a stall, never a silent guess).
  • Every flagged assumption must be visible inline in the PRD and is automatically re-opened for re-resolution if the sponsor later answers the corresponding open question.
  • This brick is the canonical review_loop template; downstream review_loop bricks (architecture, sprint-plan, sprint-accomplishments) must inherit the frozen-rubric, receipt-backed-verdict, reviewer-owned-severity, delta+regression re-review, and adjudicator rules unchanged.
08

Solution Architecture, Threat Model & Privacy-by-Design

AI Agent: Keystone

Keystone designs the solution that provably satisfies the PRD and every NFR: C4 views (context/container/component), a trust-boundary-annotated Data Flow Diagram, a justified tech stack, the data model, integrations, multi-tenant isolation model, feature-flag/progressive-delivery as a first-class decoupling capability, design-for-observability, and ADRs for every one-way-door decision. The backbone is falsifiable: every NFR is bound to a measurable target, a verification method, and a named test/probe id, with dominant NFRs and their trade-offs recorded as ADRs. Cipher produces a design-time STRIDE threat model anchored to the DFD trust boundaries — a Threat Coverage Matrix with zero blank cells (every container/flow x every STRIDE+agentic category is either a mitigation-with-id or an explicit accepted-risk-with-owner), covering prompt-injection, output-safety, data-exfiltration, tenant-isolation, and honesty-gate-coverage — plus a privacy-by-design artifact (data classification, PII inventory, retention/deletion, encryption at rest/in transit, DPIA where in scope) and a path-by-path Honesty-Gate Coverage Inventory that closes every bypass with a capability+receipt design decision (never a regex hook). Iris contributes the experience architecture and key flows. This brick AUTHORS the artifacts and proves them internally consistent and complete; it does NOT confer "solid" status — that is conferred only by the downstream Verdict-led independent review-and-iterate loop (a separate brick) staffed by agents who did NOT author this work. Where the sponsor is silent, the team proceeds on explicit flagged assumptions surfaced to the continuous human-input channel, never blocking on approval.

Deliverables
[AI · Keystone] C4 architecture model (context, container, component views) for the solution, version-controlled with each diagram element labeled and an accompanying narrative.
Acceptance: All three C4 levels exist as committed files (git hashes recorded); every container and external system in the context/container views is named and described; zero unlabeled elements (mechanically checked diagram-element-vs-legend list shows 0 unlabeled).
[AI · Keystone] Data Flow Diagram with explicit trust boundaries (tenant boundary, model/LLM boundary, connector/MCP boundary, identity boundary) annotated, with every data flow that crosses a boundary marked and given a flow id.
Acceptance: DFD committed (git hash recorded); the four named trust boundaries are all present and drawn; every boundary-crossing flow has a unique flow id; the platform invariants 'data-never-to-model', 'permission pre-filter at the data layer', and 'LLM is never a trust boundary' each appear as an explicit annotation on the relevant boundary.
[AI · Keystone] NFR Register: table with columns NFR | priority | measurable target | verification method | test/probe id | satisfying component/ADR, including latency, throughput, availability/SLO, scalability/capacity, cost/FinOps (cost-per-flow target + token/$ budget + circuit-breaker thresholds), security, privacy, and operability.
Acceptance: Register committed; zero rows with an empty target, verification method, test/probe id, or satisfying-component/ADR trace; a cost-per-flow target and a load/capacity assumption are present as named rows; dominant NFRs are flagged and each NFR conflict has a referenced ADR resolving it.
[AI · Keystone] ADR set for one-way-door decisions, including (at minimum) tenant-isolation model and the dominant-NFR trade-off decisions, each in standard ADR format (context, decision, alternatives, consequences).
Acceptance: Each one-way-door decision identified in the design has a corresponding committed ADR with a unique id; every ADR contains non-empty context, decision, considered-alternatives, and consequences sections; every NFR-conflict referenced in the NFR Register resolves to an existing ADR id.
[AI · Keystone] Tenant Isolation & Blast-Radius Decision (ADR): silo/pool/bridge choice, cross-tenant read-prevention design, parent/client isolation invariant (per the prior P0 where client instances could read parent data), and noisy-neighbor handling.
Acceptance: ADR committed with an explicit silo/pool/bridge selection and rationale; a blast-radius analysis is present; cross-tenant read prevention is described with the enforcing component named; the tenant boundary from this ADR appears as a first-class boundary in both the DFD and the Threat Coverage Matrix.
[AI · Keystone] Feature-flag / progressive-delivery capability design: flag taxonomy, deploy-vs-release decoupling, rollout/rollback mechanism, and kill-switch surface, specified as a first-class architectural capability.
Acceptance: Design committed; describes how a change can be deployed without being released (flag-gated); names the rollback/kill-switch mechanism and its trigger; the NFR Register references this capability as the satisfying component for relevant availability/operability rows.
[AI · Keystone] Design-for-observability surface: the OTel-GenAI instrumentation points, SLO/health signals, and the specific receipts (metrics/traces/health codes) the design commits to emitting so downstream 'live/passed' claims are verifiable.
Acceptance: Committed; each SLO in the NFR Register maps to a named emitted signal; every health/SLO signal lists what it emits and where it is read; at least the agentic-surface action paths have named trace/span coverage so honesty/coverage claims have a receipt to read.
[AI · Cipher] STRIDE Threat Coverage Matrix anchored to the DFD: rows = containers and boundary-crossing data flows; columns = Spoofing/Tampering/Repudiation/Information-disclosure/DoS/Elevation PLUS agentic columns prompt-injection, output-safety, data-exfiltration, tenant-isolation, honesty-gate-coverage; plus an abuse/misuse case list.
Acceptance: Matrix committed; ZERO blank cells — every cell is either a mitigation-with-id or an explicit accepted-risk-with-named-owner; every threat entry references a specific DFD flow id crossing a specific boundary; an abuse/misuse case list with >=1 entry per agentic column is present.
[AI · Cipher] Honesty-Gate Coverage Inventory: enumeration of EVERY agent action path with each marked covered-by-capability-receipt vs bypass, and for each bypass a design decision to close it via capability+receipt.
Acceptance: Inventory committed; zero action paths marked 'unknown'; every path is classified covered or bypass; every bypass has a named closure decision that uses a capability+receipt (zero closures specified as a regex/one-off hook); in-app message and internal-message paths are explicitly listed and classified.
[AI · Cipher] Privacy-by-Design artifact: data classification, PII inventory, retention/deletion policy, encryption-at-rest and encryption-in-transit design, and a DPIA where in scope.
Acceptance: Committed; every data element in the data model has a classification; PII inventory lists each PII element with its retention period and deletion mechanism; encryption-at-rest and in-transit are specified per data store/channel; a DPIA exists or an explicit, justified in-scope=false determination is recorded.
[AI · Iris] Experience architecture and key user flows: the primary journeys, screen/flow inventory, and the agentic interaction surface mapped to the C4 containers.
Acceptance: Committed; every primary PRD user journey has a corresponding flow diagram; each flow references the C4 container(s) it traverses; the agentic interaction touchpoints in the flows are cross-referenced to the action paths in the Honesty-Gate Coverage Inventory.
[AI · Keystone] PRD/NFR → Architecture Traceability Map: every PRD requirement and every NFR mapped to the component(s)/ADR that satisfy it, serving as the substrate the independent reviewers validate against.
Acceptance: Committed; 100% of PRD requirements and 100% of NFR Register rows are traced to at least one component or ADR; zero untraced rows (mechanically counted); reverse check shows no orphan components claiming to satisfy a non-existent requirement.
[AI · Keystone] Flagged Assumptions & Sponsor Questions register: each silent/undecided input (e.g. data residency, expected load, retention obligations, identity provider) with a chosen default and a 'revisit-if' trigger, wired to the continuous human-input channel.
Acceptance: Committed; each consequential silent decision is listed with a chosen default and an explicit revisit trigger; the open questions are surfaced to the sponsor via the human-input channel (surfacing receipt recorded); zero buried/in-prose-only assumptions for residency, load, retention, and identity.
[AI · Keystone] Internal-consistency receipt for this brick: a self-check confirming the artifacts cross-reference correctly and the completeness gates pass, explicitly stamped 'authored, not yet reviewed — solid status pending Verdict convergence'.
Acceptance: Receipt committed; records the mechanical checks (Threat Coverage Matrix blank-cell count = 0, NFR Register missing-field count = 0, Traceability completeness = 100%, Honesty-Gate Inventory unknown-count = 0) with their actual numbers; explicitly states the artifacts are NOT yet certified solid and names the downstream Verdict review-loop brick as the conferring authority.
Questions the agent asks (8)
  • What is the expected concurrency/load profile (peak concurrent flows, transactions/day) and the cost-per-flow budget we should design to?
  • What are the data-residency and data-sovereignty obligations (regions, sovereignty requirements) for this client/workload?
  • What retention and deletion obligations apply (regulatory or contractual) to each data class, and is right-to-erasure in scope?
  • Which identity provider / SSO and authorization model must we integrate with, and are there existing IAM constraints?
  • Is a formal DPIA required for this workload, and is any special-category/regulated PII (health, financial, biometric) in scope?
  • What is the required tenant-isolation posture for this client (dedicated/silo vs pooled), and are there contractual isolation guarantees?
  • What are the hard availability/SLO and latency commitments the architecture must meet, and which is dominant if they conflict with cost?
  • Which external systems/connectors (MCP, SaaS APIs, data sources) must the solution integrate with, and what are their trust/security constraints?
Do (9)
  • Anchor every STRIDE entry to a specific DFD flow crossing a specific trust boundary — STRIDE without the DFD + boundaries is not a threat model.
  • Bind every NFR to a measurable target, a verification method, AND a named test/probe id so it is falsifiable by downstream QA and reviewers.
  • Make the Threat Coverage Matrix and NFR Register mechanically complete: zero blank cells, zero missing fields — completeness is a binary gate, not a judgment call.
  • Record dominant NFRs and resolve every NFR conflict as an explicit ADR (latency vs cost, isolation vs shared-cache, residency vs multi-model routing).
  • Enumerate EVERY agent action path in the Honesty-Gate Coverage Inventory and close each bypass with a capability+receipt design decision.
  • Treat the tenant-isolation model as a first-class architectural decision and a first-class threat-matrix boundary, given the prior parent/client read P0.
  • Maintain the PRD/NFR → Architecture traceability map as the reviewers' checklist substrate so convergence is evidence-based, not opinion.
  • Surface consequential silent decisions to the sponsor as flagged assumptions with chosen defaults and revisit triggers; proceed without blocking.
  • Back every 'mitigation' and 'meets target' claim with a concrete artifact reference (ADR id, diagram element, or planned test/probe id).
Don't (9)
  • Do NOT self-certify the architecture as 'solid', 'reviewed', or 'approved' — this brick only authors; solid status is conferred solely by the downstream Verdict-led independent review loop.
  • Do NOT collapse authoring and independent review into this brick or let Keystone/Cipher/Iris grade their own work.
  • Do NOT treat a clean C4 container diagram as 'the threat model' — without the DFD and trust boundaries the STRIDE matrix is meaningless.
  • Do NOT ship a Threat Coverage Matrix with blank cells or an NFR Register with empty targets/verification/trace fields.
  • Do NOT close any honesty-gate bypass with a regex/one-off hook; use a registered capability that writes a ledger receipt.
  • Do NOT claim honesty-gate coverage broadly while leaving in-app/internal message paths uninventoried — list and classify every path.
  • Do NOT bury consequential assumptions (residency, load, retention, identity) in prose instead of the flagged-assumptions register.
  • Do NOT design to all NFR targets on paper while ignoring their real conflicts — a fantasy design that meets every target is a failure.
  • Do NOT block on a human approval gate; the human is an input channel, not an approver.
Guardrails (7)
  • Honesty: every completeness/coverage claim in this brick's outputs must cite a real receipt (blank-cell count, missing-field count, traceability %, git hashes) — never a bare assertion.
  • Independence is mandatory downstream: the artifacts produced here must be validated by a separate Verdict-led panel (Verdict + an independent architect reviewer + Cipher-as-red-team + Proof + Vela) explicitly excluding the authors of this brick; this brick produces inputs to that loop, not a verdict.
  • Convergence (conferred downstream, not here) is binary: 'solid' = zero open material gaps AND Threat Coverage Matrix zero blank cells AND NFR Register zero missing fields AND PRD/NFR traceability 100%.
  • Platform security invariants are non-negotiable in the design: data-never-to-model, permission pre-filter at the data layer, LLM is never a trust boundary, and per-client/parent isolation.
  • No secrets or credentials in any committed architecture, diagram, or ADR artifact.
  • All artifacts version-controlled with recorded git hashes so reviewers and downstream bricks read receipts, not assertions.
  • The brick produces files only within the open sprint; if none is open, the next sprint is opened first.
09

Infrastructure, CI/CD, Supply-Chain & Security Baseline

AI Agent: Vector

Vector, Cadence, and Cipher author the foundational design for how the product is built, shipped, and run — and prove the security-critical controls are ENFORCED, not merely documented. Vector owns environment strategy, IaC topology, observability (logs/metrics/traces/alerts), backup/DR, deploy strategy + rollback, and the FinOps envelope; Cadence consolidates the end-to-end engineering process and the CI/CD pipeline with BLOCKING merge gates (tests pass + SCA/SAST/secret-scan pass + independent review approved before any merge); Cipher sets the security baseline, the secure-SDLC toolchain (SAST/DAST/SCA), the ASVS target level, and the supply-chain provenance controls (SBOM, dependency hash-locking, base-image/container scanning, signing + SLSA/in-toto attestation). The brick is explicit about the gap between DESIGNED (topology/policy documented) and ENFORCED (a negative-control receipt proves the gate actually blocks): no control is "solid" until a deliberately-failing case is shown to be rejected. This is a Phase-3 foundation every later sprint depends on, so the pipeline and gates exist before any sprint attempts a merge. The human sponsor is a continuous INPUT channel — cloud/region, cost ceiling, compliance regime, data classification, and on-call expectations are pulled as questions; where the sponsor is silent the team proceeds on an explicit flagged assumption recorded in the brick, never on a human approval gate. Every "pass/done/enforced" claim cites a receipt (run URL, scan report, SBOM hash, drill log, deploy-gate rejection); an un-receipted claim is treated as not done. The brick converges only when all binary criteria are green, the security-critical gates are in the ENFORCED state with negative-control receipts, sponsor-input questions are answered or flagged-assumption-recorded, and the paired independent-review loop (Verdict + adversarial Cipher/red-team + Keystone) logs zero open material gaps in a signed convergence verdict.

Deliverables
[AI · Vector] Environment strategy & topology document: prod/non-prod isolation, lower-env data handling, ephemeral PR/preview environments
Acceptance: Document defines >=3 isolated environments (prod, staging, dev) with separate credentials/accounts and network boundaries; states a no-prod-PII-in-lower-envs rule with the enforcement/scrubbing mechanism named; ephemeral PR/preview env spin-up is demonstrated by a run URL showing an environment created on a PR and torn down on merge/close (receipt: PR run URL + teardown log). If preview envs are out of scope, a flagged assumption records why.
[AI · Vector] IaC topology + policy-as-code gating + drift detection: infra-as-source-of-truth with plan-on-PR, tfsec/Conftest/OPA gate, and no out-of-band console changes
Acceptance: All infra is declared in version-controlled IaC; a PR shows a `plan` posted automatically (receipt: plan output URL); an IaC PR that violates a policy-as-code rule is BLOCKED (negative-control receipt: blocked merge + policy violation report); a drift-detection run reports parity or flags drift (receipt: drift report). Acceptance requires the policy gate be ENFORCED (blocking), not advisory.
[AI · Cadence] CI/CD pipeline spec with BLOCKING merge gates and the negative-control proof that each gate blocks
Acceptance: Pipeline doc lists the merge gates: tests-pass, SCA, SAST, secret-scan, independent-review-approval. For EACH security-critical gate a negative-control receipt exists proving it blocks a merge: (a) PR with a deliberately failing test cannot merge (receipt: blocked-merge log + run URL); (b) PR introducing a known-CVE dependency is blocked by SCA (receipt: scan report + blocked merge); (c) PR with a planted plaintext secret is blocked by secret-scanning (receipt); (d) PR lacking independent-review approval cannot merge (receipt: rejected merge). Each gate is marked ENFORCED only with its negative-control receipt; any nightly/advisory/non-blocking gate is recorded as an open material gap, not a pass.
[AI · Cipher] Secrets-management baseline: secrets-in-manager, rotation policy, OIDC short-lived CI credentials, plaintext-secret scan, break-glass procedure
Acceptance: No long-lived cloud keys exist in CI runners — CI authenticates via OIDC-federated short-lived credentials (receipt: pipeline config + an issued short-lived token's TTL); a rotation policy with max secret age is documented; a planted plaintext secret in repo or container image is caught pre-merge (negative-control receipt: blocked scan); a break-glass procedure with audit logging is documented. Acceptance requires the no-plaintext-secrets scan be ENFORCED.
[AI · Cipher] Supply-chain provenance pack: SBOM per build, dependency pinning/hash-locking, base-image/container scanning, signing + SLSA target level + deploy-time verification
Acceptance: An SBOM is generated per build and attached to the release (receipt: SBOM artifact + hash); dependencies are hash-locked (receipt: lockfile with hashes); a base-image/container scan runs and a planted vulnerable base image is flagged (receipt: scan report); a declared SLSA target level is stated; build artifacts are signed with verifiable attestations (receipt: signature + attestation); the deploy path REFUSES an unsigned/unattested artifact (negative-control receipt: deploy-gate rejection of an unsigned artifact). A 'plan' document alone does not satisfy this deliverable.
[AI · Cipher] Secure-SDLC toolchain + ASVS target + compliance-scope traceability derived from sponsor regime and data classification
Acceptance: SAST, DAST, and SCA tools are named and wired into CI (receipt: pipeline config + a sample run of each); an ASVS target level is declared; a traceability table maps compliance scope = f(sponsor regime + data classification) to concrete controls (encryption at rest/in transit, audit logging, residency, retention) each with a named enforcement receipt or a flagged TODO. Compliance scope must be derived, not asserted as a checkbox.
[AI · Vector] Observability stack: logs/metrics/traces + alerting proven by a synthetic probe that pages
Acceptance: Logs, metrics, and distributed traces are emitted from a running service (receipt: sample log line + metric + trace ID); a synthetic probe triggers a test alert that pages the on-call channel within the stated threshold (receipt: alert-fired timestamp + page-delivered timestamp showing latency <= N seconds). 'Has observability' without a fired-alert receipt is not accepted.
[AI · Vector] Backup/DR plan with stated RPO/RTO and a PROVEN restore drill
Acceptance: RPO and RTO targets are stated; a restore drill is actually executed and meets the stated RTO/RPO (receipt: restore drill log with start/finish timestamps and integrity check). A backup that has never been restored is recorded as an open material gap, not a pass.
[AI · Vector] Deploy strategy + rollback with a PROVEN rollback drill and an automatic-rollback trigger
Acceptance: Deploy strategy (e.g., blue-green/canary) is documented; a rollback drill is executed — deploy then roll back — with evidence (receipt: deploy + rollback run URLs); an automatic-rollback trigger (e.g., SLO/error-budget burn) is defined and demonstrated firing on injected SLO burn (receipt: auto-rollback event log). 'We have a rollback button' without a drill is not accepted.
[AI · Vector] FinOps/cost envelope tied to the sponsor cost ceiling, with budget alerts
Acceptance: A monthly cost ceiling is recorded (from sponsor input or flagged assumption); cost monitoring is configured with a budget alert that fires before the ceiling (receipt: budget-alert config + a test-threshold alert receipt). The envelope cites the sponsor-provided figure or the flagged assumption used.
[AI · Cadence] DORA metrics instrumented and emitted from the real pipeline/deploys with named data sources and initial targets
Acceptance: All four DORA metrics (deploy frequency, lead time for changes, change-failure rate, MTTR) are emitted from the actual pipeline/deploy events with a named data source for each (receipt: a real value produced for each metric from pipeline data); initial targets are set. Metrics quoted from estimates rather than instrumentation are not accepted.
[Human] Sponsor input set: cloud/region & data-residency, cost ceiling/FinOps envelope, compliance regime, data classification, on-call/paging expectations
Acceptance: Each of the five input items is either answered by the sponsor (receipt: recorded sponsor response) OR proceeds on an EXPLICIT FLAGGED ASSUMPTION recorded verbatim in the brick (e.g., 'Assumption: single-region us-east, SOC2-lite scope, $X/mo ceiling — flagged for sponsor'). No item blocks progress; this is input, never an approval gate.
[AI · Vector] Convergence verdict + receipt index for the baseline ('SOLID' exit condition)
Acceptance: A signed convergence note states SOLID only when (a) all binary criteria are green with cited receipts, (b) every security-critical gate (tests, SCA, SAST, secret-scan, review-approval, signing-verify) is in the ENFORCED state with a negative-control receipt, (c) all five sponsor-input items are answered or flagged-assumption-recorded, and (d) the paired independent-review loop logs zero open material gaps. A central receipt index links each claim to its artifact (run URL / SBOM hash / drill log / deploy-gate rejection).
Questions the agent asks (8)
  • Which cloud provider and region(s) are required, and are there data-residency constraints (e.g., must data stay in a specific jurisdiction)?
  • What is the monthly cost ceiling / FinOps envelope we should design and alert against?
  • What compliance regime applies (SOC2, HIPAA, PCI-DSS, GDPR, ISO 27001, none yet), and is there a target certification date?
  • How sensitive is the data the system will handle (public, internal, confidential, regulated/PII/PHI/cardholder)? This derives the compliance scope and controls.
  • What are your on-call and paging expectations (who gets paged, target response time, business-hours vs 24x7)?
  • What RPO/RTO can the business tolerate for the product (max acceptable data loss and downtime)?
  • Are there existing infrastructure, accounts, tooling, or vendor contracts (cloud, observability, secrets manager) we must reuse rather than provision new?
  • Are there approved/blocked technology, license, or vendor constraints we must respect in the supply chain?
Do (9)
  • Separate every control into DESIGNED (documented) and ENFORCED (negative-control receipt proves it blocks) and only count ENFORCED toward SOLID.
  • Produce a negative-control for each security-critical gate: a failing test, a known-CVE dependency, a planted plaintext secret, and an unsigned artifact — and prove each is rejected.
  • Make infra the source of truth: plan-on-PR, policy-as-code gating, drift detection, and no out-of-band console changes.
  • Pull cloud/region, cost, compliance, data-classification, and on-call as continuous sponsor INPUT; record an explicit flagged assumption wherever the sponsor is silent and proceed.
  • Use OIDC-federated short-lived CI credentials and keep all secrets in a manager with a rotation policy and a break-glass procedure.
  • Actually run the DR restore drill and the rollback drill and capture timestamped receipts — do not infer from configuration.
  • Instrument DORA metrics from real pipeline/deploy events with a named data source per metric.
  • Cite a receipt (run URL, scan report, SBOM hash, drill log, deploy-gate rejection) for every 'pass/done/enforced' claim, indexed in the convergence note.
  • Pair this brick with the independent-review-and-iterate loop (Verdict + adversarial Cipher/red-team + Keystone) and iterate until zero open material gaps.
Don't (8)
  • Do not declare a gate done when it is advisory, nightly, or non-blocking — that is an open material gap, not a pass (echoing this org's Truth-Gate lesson).
  • Do not accept 'a plan exists' or 'we have a rollback button / observability' as satisfying a deliverable; require the proving receipt.
  • Do not insert any human approval/sign-off gate; the sponsor provides input only.
  • Do not leave long-lived cloud keys or any plaintext secret in the repo, runner, or container image.
  • Do not allow prod PII to flow into staging/dev or preview environments.
  • Do not deploy unsigned or unattested artifacts; the deploy path must refuse them.
  • Do not let compliance scope be a free-floating checkbox — derive it from sponsor regime + data classification and trace each control to it.
  • Do not stall waiting on the sponsor; where silent, record a flagged assumption and continue.
Guardrails (7)
  • SOLID exit = all binary criteria green with receipts + every security-critical gate ENFORCED with a negative-control receipt + sponsor inputs answered-or-flagged + Verdict/red-team panel logs zero open material gaps in a signed convergence note.
  • Honesty architecture: an un-receipted 'enforced/done/live' claim is automatically an open material gap; the independent reviewer re-inspects receipt artifacts rather than trusting author summaries.
  • The human is never a gate — residency, cost, compliance, data-classification, and on-call are input channels; the team proceeds on explicit flagged assumptions when the sponsor is silent.
  • No secret, credential, or token is committed to any tracked file or baked into any image; CI uses short-lived OIDC credentials only.
  • This is a Phase-3 foundation: the pipeline and blocking gates must exist and be proven before any sprint attempts a merge.
  • All work occurs within an open sprint; no files are generated outside one.
  • Independent reviewers must be agents who did NOT author the artifact under review (Verdict lead; Cipher adversarial on security/supply-chain; Keystone on architecture fit; red-team attempts to bypass the gates).
10

Independent Review & Iterate: Architecture, Infra, Process & Security Plan

AI Agent: Verdict

Replace the former human architecture sign-off gate with an AI-owned independent-review-and-iterate loop that converges ONLY on re-checkable evidence, never on a reviewer's say-so. A panel of NON-authors — Verdict (standing independent evaluator) plus cross-discipline specialists reviewing OTHERS' artifacts (Mason on buildability, Vector and Cipher cross-reviewing each other's infra/security lenses, Proof on testability, Iris on UX-architecture fit) — plus an adversarial red-team STRIDE "break-the-design" pass, validates the whole Architecture/Infra/Process/Security plan against BOTH the Objectives rubric AND the upstream PRD/NFRs. The loop is structurally and procedurally independent: reviewers receive artifact+rubric only (never the author's self-grade), file findings blind-first before cross-reading, and disagreement is preserved not averaged. Findings are classified with the canonical assemble_team Independent-Review-Protocol severity taxonomy verbatim — Blocker / Major / Minor — and convergence is a binary state over an append-only gap-ledger: zero open Blockers, zero open Majors, every Minor accepted-with-written-rationale, plus one clean stabilization pass where the latest fixes introduced zero new gaps. Cap behavior is HARD-STOP-CONSERVATIVE for Blockers, matching the security/go-live bricks: Major and Minor gaps unresolved at the iteration cap may proceed-on-a-flagged-assumption, but an unresolved BLOCKER (a one-way-door decision OR an unmitigated security/privacy control) at the cap does NOT proceed on an AI assumption — the plan does NOT advance past that specific Blocker until it is resolved in a later evidence-backed iteration OR a human explicitly, non-AI-grantably risk-accepts it; silence defaults to does-not-advance, never auto-accept. Where the only available reviewers for an artifact are its own authors (e.g., Cipher/Vector reviewing a security/infra artifact they co-authored), the assemble_team independence fallback is invoked: a red-team sub-persona or external-evaluator escalation supplies the non-author lens so no artifact is certified by its author. Every NFR target, STRIDE mitigation, and one-way-door ADR acceptance must cite a concrete basis (benchmark, vendor SLA, prior-art, named control + location in the container model) or be tagged ASSUMPTION; the red-team must attempt to DEFEAT each claimed mitigation, so coverage counts cannot be checkbox-gamed. Keystone (and the other authors) sit on the FIX side and iterate until SOLID. The convergence verdict is itself a receipt — enumerating every rubric line, NFR, ADR, and STRIDE cell with pass marker + evidence pointer + raising-reviewer identity + iteration number — so a meta-auditor or Aseem can spot-check any three rows and catch a fabricated "all pass."

Deliverables
[AI · Verdict] Entry-criteria gate record — preconditions check that the loop does not start until Keystone has submitted all required inputs.
Acceptance: Record exists and shows all FIVE inputs present and non-empty: (1) ADR log with >=1 entry per one-way-door decision, (2) C4-or-equivalent container model, (3) NFR table with at least draft numeric targets for every NFR, (4) threat-model skeleton enumerating every container, (5) PRD-requirement -> architecture-decision traceability matrix. Any missing/empty input => status=BLOCKED-NOT-STARTED and the gap is returned to Keystone before iteration 1.
[AI · Verdict] Severity rubric binding (brick-internal) — the canonical assemble_team Blocker/Major/Minor taxonomy applied verbatim, with cap behavior bound per tier.
Acceptance: Rubric references the assemble_team Independent-Review-Protocol severity taxonomy by id and reuses its three tiers VERBATIM (Blocker / Major / Minor — no renamed or re-scoped tiers). Binary tests: a gap is BLOCKER if it touches a one-way-door (irreversible) decision, OR a security/privacy control with no mitigation, OR an NFR with no numeric target or no verification method, OR a PRD requirement with no architectural home (orphan); Major and Minor similarly defined by id. Rubric states the cap behavior PER TIER: Major/Minor MAY proceed-on-flagged-assumption at the cap; BLOCKER MAY NOT proceed-on-assumption (hard-stop-conservative). Convergence is defined exactly as: 0 open Blockers AND 0 open Majors AND every Minor carries a written accepted-with-rationale note. No undefined notion of 'material' remains in any reviewer finding.
[AI · Verdict] Independence protocol attestation — proof the panel were non-authors and reviewed blind-first, including the author-only-lens fallback.
Acceptance: Attestation lists each reviewer with the artifact they did NOT author (Verdict authored none; Mason did not author the architecture; Cipher/Vector did not co-author each other's lens), confirms reviewers received artifact+rubric only (not Keystone's self-assessment), and shows each reviewer's findings were filed with a timestamp PRIOR to that reviewer reading any other reviewer's findings. At least 2 distinct NON-AUTHOR reviewer identities recorded per reviewable artifact. For any artifact whose only qualified lens-holders are its own authors (e.g., Cipher/Vector on a security/infra artifact they co-authored), the record shows the assemble_team independence fallback was invoked — a red-team sub-persona OR an external-evaluator escalation supplied the non-author lens — and names that non-author identity; count of artifacts certified with <2 distinct non-author identities (after fallback) == 0. Disagreements are listed verbatim, not reconciled away.
[AI · Verdict] Gap-ledger — append-only structured log that IS the loop's primary receipt.
Acceptance: Every row has: id, finding text, severity (Blocker/Major/Minor per the bound rubric), raising-reviewer identity, author-fix description, fix-evidence-pointer (git hash / ADR id / file path / red-team log id), status (open/closed), iteration-raised, iteration-closed, and (for Blockers unresolved at cap) a cap-disposition field (RESOLVED-LATER / HUMAN-RISK-ACCEPTED-<ref> / DOES-NOT-ADVANCE). At convergence: count(status=open AND severity in {Blocker,Major}) == 0, and every Minor row has a non-empty accepted-with-rationale field. Ledger is append-only (no deletions; supersession via new rows).
[AI · Keystone] Challenged one-way-door ADR set — every irreversible decision adversarially challenged and accepted-by-evidence.
Acceptance: Each ADR flagged as one-way-door (irreversible) has: an explicit 'challenge' section authored by a non-author (red-team or Verdict), and an 'accepted-because' resolution citing a concrete basis (benchmark/vendor-SLA/prior-art/measurement) OR tagged ASSUMPTION with the assumption logged. Tenant/per-client isolation is recorded as a one-way-door ADR and is among those challenged. Count of one-way-door ADRs with no challenge section == 0. Any one-way-door ADR left unresolved on evidence is a BLOCKER (not a Major) and is subject to the hard-stop cap behavior.
[AI · Keystone] NFR table with verifiable targets — every non-functional requirement quantified and traceable to a basis.
Acceptance: For every NFR: a numeric target exists, a verification method exists, AND the target cites a basis (benchmark / vendor SLA / prior-art measurement) or is tagged ASSUMPTION. Count of NFRs missing any of {target, verification method, basis-or-ASSUMPTION-tag} == 0. Each NFR maps to >=1 PRD/NFR source row (no orphan NFR). An NFR with no numeric target or no verification method is a BLOCKER per the bound rubric.
[AI · Cipher + Red-team] STRIDE threat model with defeated-mitigation evidence — coverage floor plus quality bar.
Acceptance: For every container x every STRIDE category (Spoofing, Tampering, Repudiation, Info-disclosure, DoS, Elevation): >=1 mitigation, AND each mitigation names (control + protected asset/data + residual-risk acceptance note). Red-team logged a defeat-attempt against each claimed mitigation with outcome (bypassed/held). Count of (container,STRIDE) cells with no mitigation == 0; count of mitigations with no defeat-attempt logged == 0; any 'bypassed' outcome OR any (container,STRIDE) cell with no mitigation opens a BLOCKER (unmitigated security control) that is subject to the hard-stop cap behavior until re-mitigated or human-risk-accepted.
[AI · Cipher] Supply-chain + privacy + isolation acceptance bundle — first-class binary outputs.
Acceptance: Bundle contains: (1) SBOM / dependency-license posture consistent with the open-source-license-policy gate (count of unresolved disallowed licenses == 0); (2) PII / data-flow inventory listing each data class with a lawful-basis note AND a retention note (count of data classes missing either == 0); (3) data-residency and tenant-isolation decisions recorded as challenged one-way-door ADRs. All three sections present and non-empty. Because this artifact is security-authored, its non-author review is supplied via the independence fallback (red-team sub-persona or external-evaluator) where Cipher is the only security lens.
[AI · Vector] Infra plan acceptance lines — topology, IaC, rollback, observability, cost.
Acceptance: Plan specifies, each present and non-empty: environment topology (envs + boundaries), IaC approach (named tooling, infra-as-code not click-ops), deploy + rollback strategy (with rollback trigger criteria), observability with at least one numeric SLO per critical service, and a cost envelope (numeric monthly target/ceiling). Cross-reviewed by Cipher (not the infra author); where Cipher co-authored the security lens of this plan, the independence fallback supplies the additional non-author reviewer.
[AI · Cadence] Process plan acceptance lines — cadence, DoD, ENFORCED CI merge-gate.
Acceptance: Plan specifies: sprint cadence, a written Definition-of-Done, and a CI merge-gate design that is BLOCKING/enforced (explicitly NOT nightly/non-blocking — directly closing the honesty-doc gap). The merge-gate design names the checks it blocks on (tests, security scan, honesty-eval). Reviewed by Proof and Verdict (non-authors).
[AI · Verdict] Traceability acceptance matrix — zero orphans both directions.
Acceptance: Bidirectional matrix: every PRD functional requirement AND every NFR maps to >=1 architecture decision (zero orphan requirements), and every architecture decision maps to >=1 requirement (zero gold-plated/orphan decisions). Both orphan counts == 0 at convergence. An orphan PRD requirement with no architectural home is a BLOCKER per the rubric; a gold-plated/orphan decision is at least a Major.
[AI · Verdict] Loop-termination + stabilization record — anti-stall and HARD-STOP-CONSERVATIVE cap safeguard.
Acceptance: Record shows iteration count <= cap (default 4); the FINAL iteration is a stabilization pass that introduced ZERO new gaps (no-new-gap check passes); convergence cannot be declared on the same iteration that closed the last fix. Cap behavior is enforced PER TIER and recorded: every Major/Minor still open at the cap is dispositioned proceed-on-flagged-assumption (assumption text + sponsor-INPUT-or-silence note logged); every BLOCKER still open at the cap is HARD-STOPPED — it is surfaced to the human as an INPUT question (text logged) and its ledger cap-disposition is DOES-NOT-ADVANCE unless and until either a later evidence-backed iteration resolves it OR a human, non-AI-grantable risk-acceptance record exists for that specific Blocker. The record explicitly asserts: count of open Blockers proceeding on an AI-authored assumption == 0; the plan is NOT marked advanced past any Blocker lacking a resolution or a human risk-acceptance; silence defaults to DOES-NOT-ADVANCE, never auto-accept.
[Human] Blocker risk-acceptance record (the ONLY way a capped, unresolved Blocker advances) — human-authored, human-owned, non-AI-grantable, non-blocking.
Acceptance: If — and only if — present, the record names the SPECIFIC unresolved Blocker (one-way-door ADR id or unmitigated-control id), is human-authored and human-owned (the AI may PROPOSE/surface but cannot self-grant, assume, or back-fill it), and is logged as a ledger row referenced by the Blocker's cap-disposition field. Absent such a human-owned record within the input window, the system defaults that Blocker to DOES-NOT-ADVANCE (the plan does not advance past it) with the assumption flagged in the gap-ledger — never silently applied and never AI-granted. Acceptance = for every capped open Blocker, either a valid human-owned logged risk-acceptance exists OR the conservative DOES-NOT-ADVANCE default is correctly applied; count of Blockers advanced on an AI-granted/back-filled acceptance == 0.
[AI · Verdict] Convergence verdict — itself a spot-auditable receipt.
Acceptance: Verdict enumerates EVERY Objectives-rubric line, every NFR, every one-way-door ADR, and every (container,STRIDE) cell, each with a pass/fail marker + evidence pointer + raising/confirming reviewer identity + iteration number. Verdict states final status=SOLID only if gap-ledger shows 0 open Blockers/Majors, stabilization pass clean, all enumerated lines pass, and no capped Blocker is sitting on an AI-authored assumption (each is either resolved or human-risk-accepted-or-DOES-NOT-ADVANCE). A bare 'converged, N iterations' without the enumeration is rejected as not-a-receipt. Aseem/meta-auditor can re-check any 3 rows against their evidence pointers.
Questions the agent asks (5)
  • For any one-way-door decision the panel cannot resolve on evidence (e.g., cloud region for data residency, single-tenant vs pooled-with-isolation), which option do you want? Note: unlike a Major gap, an unresolved BLOCKER one-way-door at the cap will NOT proceed on an AI assumption — the plan will not advance past it until you answer or explicitly risk-accept it, so your input here is load-bearing.
  • Are there hard NFR targets you already require (latency, availability/SLO, RPO/RTO, max cost envelope) that the architecture must be held to, beyond what the PRD states?
  • Are there compliance, data-residency, or per-client-isolation constraints (regulatory regime, contractual BPO commitments) that must be treated as non-negotiable one-way-door ADRs?
  • Is there an approved dependency-license allowlist/denylist we must conform the SBOM to, or do we apply the existing open-source-license-policy default?
  • Do you want the iteration cap left at the default (4) before unresolved gaps hit cap behavior, or a different ceiling? And for any Blocker we surface at the cap, do you want to risk-accept it (your explicit, owned call) or hold the plan at that point until it is resolved?
Do (11)
  • Enforce the entry-criteria gate BEFORE iteration 1; refuse to start the loop on a half-baked plan so 'open gaps == 0' cannot be reached trivially.
  • Bind the severity rubric to the canonical assemble_team Independent-Review-Protocol taxonomy and reuse Blocker/Major/Minor VERBATIM — do not invent or rename tiers for this loop.
  • Keep reviewers non-authors: Verdict reviews everything, specialists review OTHERS' artifacts, Cipher and Vector cross-review each other's lens; record each reviewer identity in the ledger.
  • When the only qualified lens-holders for an artifact are its own authors, invoke the assemble_team independence fallback (red-team sub-persona or external-evaluator escalation) so the non-author lens exists, and record that identity.
  • Give reviewers the artifact + rubric only; collect blind-first independent findings before any cross-reading; preserve disagreements verbatim.
  • Apply cap behavior PER TIER: at the cap, let Major/Minor gaps proceed-on-a-flagged-assumption, but HARD-STOP every unresolved Blocker (one-way-door or unmitigated security control) so the plan does NOT advance past it.
  • For each capped Blocker, surface it to the human as an INPUT question and either get a human-authored, human-owned risk-acceptance for that specific Blocker or hold the plan at DOES-NOT-ADVANCE; default silence to does-not-advance.
  • Require every NFR target / mitigation / one-way-door acceptance to cite a concrete basis or be explicitly tagged ASSUMPTION.
  • Make the red-team DEFEAT each claimed STRIDE mitigation, not merely confirm its presence; log the bypass attempt and outcome.
  • Run the no-new-gap stabilization pass as the last iteration and only then declare SOLID.
  • Make the convergence verdict enumerate every rubric/NFR/ADR/STRIDE cell with evidence pointers so it is itself spot-auditable.
Don't (11)
  • Do NOT let any author grade their own artifact, and do NOT let one reviewer's 'LGTM' cascade into the others' findings.
  • Do NOT let a capped, unresolved BLOCKER proceed on an AI-authored / flagged assumption — that path is reserved for Major/Minor; a Blocker advances only via human risk-acceptance or a later evidence-backed resolution.
  • Do NOT let the AI self-grant, assume, or back-fill a Blocker risk-acceptance; that record is human-authored and human-owned only.
  • Do NOT treat human silence on a capped Blocker as acceptance; silence defaults to DOES-NOT-ADVANCE (the plan holds at that Blocker), never auto-proceed.
  • Do NOT rename, re-scope, or weaken the Blocker/Major/Minor tiers mid-loop; the canonical assemble_team taxonomy is fixed for the loop.
  • Do NOT accept a bare 'converged / solid / N iterations' without the enumerated, evidence-pointered verdict.
  • Do NOT count coverage proxies (>=1 mitigation per cell) as sufficient without the quality bar (named control + protected asset + residual-risk + defeat-attempt).
  • Do NOT declare convergence on the same iteration that closed the last fix; a clean stabilization pass is required.
  • Do NOT certify any artifact with fewer than 2 distinct NON-AUTHOR reviewer identities; if authors are the only lens, the independence fallback must supply one.
  • Do NOT insert a human approval/sign-off step anywhere; the human provides input (including the optional Blocker risk-acceptance), never a gate that the team waits on for the non-Blocker plan.
  • Do NOT leave the CI merge-gate designed as nightly/non-blocking; the plan must make it blocking/enforced.
Guardrails (10)
  • Honesty architecture: every 'pass/solid/converged' claim must carry a real receipt (reviewer identity, gap-ledger row, numeric NFR target, threat-model coverage + defeat-attempt log, git hash / ADR id); unbacked claims are rejected.
  • HARD-STOP-CONSERVATIVE ON BLOCKERS: at the iteration cap, an unresolved Blocker (one-way-door decision OR unmitigated security/privacy control) does NOT proceed on an AI assumption — it surfaces to the human as INPUT and the plan does NOT advance past that specific Blocker until it is resolved by evidence in a later iteration OR a human, non-AI-grantable risk-acceptance exists; Major/Minor may proceed-on-flagged-assumption, Blocker may not. This matches the security_assessment / review_release_readiness / hypercare-exit cap behavior (a real critical stays held regardless of human-input state; silence = not-accepted).
  • BLOCKER RISK-ACCEPTANCE IS HUMAN-OWNED ONLY: the only override of a capped Blocker's DOES-NOT-ADVANCE default is a human-authored, human-owned, logged risk-acceptance naming that specific Blocker; the AI may propose/surface but never self-grant, assume, or back-fill it; silence defaults to does-not-advance.
  • CANONICAL SEVERITY TAXONOMY: the loop uses the assemble_team Independent-Review-Protocol Blocker/Major/Minor tiers verbatim; redefining 'material' or renaming tiers mid-loop is forbidden.
  • AI-owned, no human sign-off: ownership is AI; the human touchpoint is INPUT (answers/assumptions, plus the optional Blocker risk-acceptance), recorded as such; the non-Blocker plan never blocks on the human.
  • Independence is mandatory and auditable: a converged verdict with fewer than 2 distinct non-author reviewer identities (after the author-only-lens fallback), or with no blind-first attestation, is invalid.
  • Tenant/per-client isolation MUST be treated as a first-class challenged one-way-door ADR (BPO isolation pillar) and therefore a Blocker if unresolved; architecture that does not establish isolation cannot converge.
  • Iteration cap (default 4) is a tripwire, not a license to lower the bar: at the cap, Majors/Minors may proceed-on-assumption but Blockers escalate to human INPUT and are never auto-accepted or AI-assumed.
  • Gap-ledger is append-only; closing a finding requires a fix-evidence-pointer, not a status flip; a capped Blocker requires a cap-disposition of RESOLVED-LATER, HUMAN-RISK-ACCEPTED-<ref>, or DOES-NOT-ADVANCE.
  • The convergence verdict must be re-checkable by a skeptic (Aseem/meta-auditor) on any spot-checked row; a verdict that cannot be spot-audited fails this guardrail.
11

Release Plan

AI Agent: Cadence

Cadence and Vela carve the reviewed MVP spec into a value-sequenced set of sprints — quick wins first to prove the system and de-risk, foundational and strategic bets sequenced behind their prerequisites — where every in-scope spec/PRD requirement maps to exactly one sprint (full coverage, zero orphans, zero double-maps), each sprint has a single demoable goal, a named story set, an explicit capacity/velocity assumption, and a defined entry/exit, and every cross-sprint and external/human-input dependency forms an acyclic, time-consistent graph (each producing sprint precedes its consumer). The requirement-to-sprint coverage matrix is not a throwaway table: it is declared as the Release-Scope Register — the single, NAMED, VERSIONED, single-location, AUTHORITATIVE-and-MUTABLE record of release scope from which sprint_retro's Scope-Burndown and Done-Done receipts are DERIVED (never re-typed), and which is RE-SEQUENCED-IN-PLACE (new version) after each sprint's accomplishments-vs-goals review. This register is the explicit owner of release scope, which is precisely what makes the outer sprint-loop exit deterministic: the loop's PROCEED/LOOP decision reads the register's current version, so there is exactly one place that says what release scope is and whether it is exhausted — closing the unspecified-owner gap that otherwise makes the outer-loop exit non-deterministic. Risks are owned and placed — each carries likelihood/impact, a named owning agent, and either a mitigation that lands in a specific sprint or an explicit accept-decision — and are written into the program RAID log rather than a throwaway list. This is a reviewable PLANNING artifact, distinct from the engineering bootstrap: it answers "what we build, in what order, and why that order" — never "how we stand up the repo/CI." Because the plan's completeness claims (100% coverage, acyclic graph, every risk owned-and-placed, register is the single authoritative source) are checkable receipts, the brick ends in a LIGHT but unfalsifiable independent review-iterate pass: an independent panel that did NOT author the plan (Verdict lead, plus Keystone for architecture-vs-dependency consistency and Proof for per-sprint demoability/testability) runs a binary checklist, logs material gaps, Cadence/Vela fix, and the loop iterates until zero open material gaps remain and a convergence verdict is recorded citing the specific receipts. The sponsor is a continuous input source for priority and data/access availability — surfaced as questions, never as an approval gate; where the human is silent the team proceeds on an explicit flagged assumption. The plan is a living instrument re-sequenced after each sprint's accomplishments-vs-goals review, closing the lifecycle loop through the register.

Deliverables
[AI · Cadence] Value-sequenced Release Plan (sprint list, each with a single demoable goal, named story set, capacity assumption, entry/exit)
Acceptance: The plan lists N ordered sprints; each sprint has exactly one demoable goal, a named set of stories, an explicit capacity/velocity assumption, and a defined entry condition and exit/Definition-of-Done; a documented value-sequencing rationale states why each sprint is placed where it is (quick-wins-first to prove and de-risk, foundational/strategic bets behind their prerequisites). Checkable: every sprint entry contains all five fields (goal / stories / capacity / entry / exit) and the rationale section is non-empty and references value, not build-team convenience — verifiable by inspecting the plan, with zero sprints missing any field.
[AI · Vela] Release-Scope Register — the named, versioned, single-location AUTHORITATIVE requirement-to-sprint coverage matrix (100% coverage, zero orphans, zero double-maps) declared as the canonical mutable source of release scope
Acceptance: BINARY: a single artifact with a unique name and a version identifier exists, located in exactly one canonical place, and is explicitly labelled the authoritative Release-Scope Register (no second competing scope list exists). Every in-scope spec/PRD requirement appears in it mapped to exactly one sprint: count(requirements mapped) == count(in-scope spec requirements) (coverage = 100%, zero orphan requirements); no requirement maps to two or more sprints (zero double-maps); each mapped requirement traces to at least one named story in its sprint and carries a scope-state field (in-scope / out-of-scope-deferred / accepted-cut). The register declares, in-artifact, that sprint_retro's Scope-Burndown and Done-Done receipts are DERIVED FROM it (not re-typed) and that it is re-versioned after each sprint. Receipt: the register file with version tag, the coverage count equality, and the explicit downstream-consumer + re-versioning declaration. Reject if scope lives in more than one place, if any requirement is unmapped/double-mapped, or if the authoritative/mutable/single-source declaration is absent.
[AI · Vela] Register-governance contract: the read/write/version rules binding the Release-Scope Register to the outer sprint-loop exit
Acceptance: BINARY: a written contract specifies (1) the register is the ONLY source of release-scope truth — Scope-Burndown (remaining) and Done-Done (terminal) receipts in sprint_retro are computed from the register's current version and MUST reconcile to it; (2) the register is MUTABLE only via a versioned re-sequence after a sprint's accomplishments-vs-goals review (each change increments the version and records what moved and why); (3) a named owning agent (Vela) is accountable for the register's integrity each version. Checkable: contract names the single source, names the two derived receipts, names the version-bump trigger, and names the owning agent; the outer-loop PROCEED/LOOP rule is stated as reading register-version-current scope-remaining == EMPTY. Reject if the exit's scope source is left implicit, if any second scope list is permitted, or if the owning agent for the register is unnamed.
[AI · Cadence] Story/sprint dependency graph (acyclic, time-consistent) with external/human-input dependencies flagged as assumptions
Acceptance: A directed dependency graph at story/sprint level is produced and passes an acyclicity check (zero cycles) AND a time-consistency check (for every cross-sprint dependency, the producing sprint is ordered strictly before its consuming sprint). Every external dependency (sponsor-supplied data, third-party/IdP/connector access) is flagged as an explicit assumption carrying a needed-by date and a date-or-proceed-on-assumption rule. Receipt: the graph-check output (cycles=0, time-violations=0) and the flagged-assumptions list.
[AI · Cipher] Risk entries written to the program RAID log (each with likelihood/impact, owning agent, and mitigation-sprint-or-accept)
Acceptance: Every identified delivery/security/dependency risk is an entry in the RAID log, not a standalone list. Each entry carries: likelihood, impact, a single named owning agent, and EITHER a mitigation action assigned to a specific sprint OR an explicit documented accept-decision. Receipt-checkable: count(risks with all four fields populated) == count(risks) (zero risks unowned, unplaced, or unscored). Cipher confirms security-class risks are represented.
[AI · Verdict] Independent Convergence Verdict (binary checklist, every PASS citing a specific receipt, written gaps log, iteration count)
Acceptance: An independent panel — Verdict (lead), Keystone (architecture-vs-dependency consistency), Proof (each sprint goal is demoable/testable) — none of whom co-authored the plan, runs a binary pass/fail checklist covering: (1) coverage = 100% with zero orphans/double-maps [cites the Release-Scope Register], (2) the register is the single named-and-versioned authoritative scope source with the documented downstream-consumer/re-versioning governance contract and no competing scope list [cites the register + governance contract], (3) dependency graph acyclic AND time-consistent [cites the graph-check output], (4) every sprint has goal+stories+capacity+entry/exit [cites plan rows], (5) every risk owned, scored, and placed in RAID [cites RAID entries], (6) value-sequencing rationale present and value-justified, (7) each sprint goal binary-demoable [Proof attests]. Each PASS MUST cite the specific receipt; a 'looks fine' with no cited evidence is a REJECTED verdict — and for the register, the reviewer RE-DERIVES the coverage count rather than accepting the author's stated total. A written gaps log records every material gap; Cadence/Vela fix; the panel re-reviews; the loop iterates until zero open material gaps. The recorded verdict states 'converged', cites the checklist + receipts, and notes the iteration count. Checkable: verdict status == converged AND open-material-gaps == 0 AND every checklist PASS has a non-empty cited receipt.
Questions the agent asks (5)
  • Which outcomes are most urgent for you — what are the quick wins where visible value in the first one to two sprints matters most, so we sequence those first?
  • When will the data, third-party/IdP, or connector access that some sprints depend on become available? (If you are silent, we will proceed on a flagged assumption and order those sprints accordingly.)
  • Are there any hard external deadlines (a demo, a board date, a compliance window) that should constrain the sequence?
  • Are there requirements you consider out-of-scope for the MVP that we should exclude from coverage, or stretch goals to park in a later sprint? (We record each as an explicit scope-state in the Release-Scope Register so nothing silently disappears.)
  • Is there a team-capacity or availability constraint we should assume for velocity (team size, parallelism, blackout periods)?
Do (11)
  • Sequence by VALUE: quick wins first to prove the system and de-risk early; foundational/strategic bets sequenced behind their prerequisites; document the sequencing rationale explicitly.
  • Map every in-scope spec/PRD requirement to exactly one sprint, and trace each to at least one named story (full coverage, no orphans, no double-maps) — captured in the Release-Scope Register.
  • Declare the coverage matrix as the Release-Scope Register: a single, named, versioned, single-location, authoritative-and-mutable record of release scope — and state in-artifact that sprint_retro's Scope-Burndown and Done-Done receipts are DERIVED from it, never re-typed.
  • Write the register-governance contract: name the single scope source, the two derived receipts, the version-bump trigger (re-sequence after each accomplishments-vs-goals review), the owning agent (Vela), and the outer-loop exit rule reading the register's current scope-remaining.
  • Re-version the register IN PLACE after each sprint — record what moved and why — so the living plan and the exit decision always read the same current scope.
  • Give every sprint a single demoable goal, a named story set, an explicit capacity/velocity assumption, and a defined entry/exit.
  • Build the story/sprint dependency graph and verify it is acyclic and time-consistent (every producer precedes its consumer).
  • Flag external and human-input dependencies (sponsor data, third-party/IdP/connector access) as explicit assumptions with a needed-by date and a date-or-proceed-on-assumption rule; record them as external dependencies in the graph.
  • Write every risk into the program RAID log with likelihood, impact, a named owning agent, and a mitigation-sprint-or-explicit-accept.
  • Surface priority/urgency and data-availability questions to the sponsor continuously; where the human is silent, proceed on an explicit flagged assumption recorded in the plan and graph — never block.
  • Have the independent panel (Verdict + Keystone + Proof, none of whom authored the plan) run the binary checklist, RE-DERIVE the coverage count from the register, cite a receipt per PASS, log gaps, and iterate to a cited-receipt convergence verdict.
Don't (13)
  • Do NOT sequence by technical or build-team convenience — sequence by value delivery and strategic leverage.
  • Do NOT produce a flat backlog with no single demoable per-sprint goal.
  • Do NOT leave any requirement unmapped or mapped to more than one sprint.
  • Do NOT let release scope live in more than one place — there is ONE named, versioned, authoritative Release-Scope Register; any second scope list is a defect, because two sources make the outer-loop exit non-deterministic.
  • Do NOT let downstream receipts (Scope-Burndown, Done-Done) re-type or diverge from the register — they are derived from it and must reconcile to its current version.
  • Do NOT mutate scope informally — every scope change is a versioned re-sequence with a recorded reason and a named owner (Vela), not a silent edit.
  • Do NOT let a plan author (Cadence or Vela) review or sign off their own plan — the review panel must be independent, and the coverage count must be re-derived by the reviewer, not accepted from the author.
  • Do NOT mark the plan 'reviewed' or 'solid' without a written gaps log and a convergence verdict whose every PASS cites a specific receipt — an assertion with no cited evidence is a rejected verdict.
  • Do NOT ship a dependency graph with a cycle or a producer ordered after its consumer.
  • Do NOT leave a risk unowned, unscored, or unplaced, or keep risks in a standalone list instead of the RAID log.
  • Do NOT block on sponsor approval of the plan — the sponsor provides input, never a gate; proceed on flagged assumptions where the human is silent.
  • Do NOT conflate this with the engineering bootstrap (repo/CI/infra) — this brick is sequencing and planning only.
  • Do NOT treat the plan as frozen — it is re-sequenced (re-versioned) as sprints close.
Guardrails (9)
  • GATE: the plan does not enter sprint execution until the independent pass logs ZERO open material gaps with a cited-receipt convergence verdict from a panel (Verdict + Keystone + Proof) where no reviewer co-authored the plan.
  • Single-source-of-scope is law: there is exactly ONE named, versioned, authoritative Release-Scope Register; the outer sprint-loop exit (sprint_retro PROCEED/LOOP) reads ITS current version, and Scope-Burndown + Done-Done receipts are derived from it — making the loop exit deterministic. A second scope list or an implicit exit source is invalid.
  • Coverage is a hard receipt re-derived at review: 100% of in-scope spec requirements mapped to exactly one sprint (zero orphans, zero double-maps), the count re-computed by the independent reviewer from the register, not accepted from the author.
  • Register mutability is governed: scope changes only via a versioned re-sequence after a sprint's accomplishments-vs-goals review, each version recording what moved, why, and under Vela's named ownership — never a silent edit.
  • The dependency graph must pass the acyclicity AND time-consistency checks (cycles=0, time-violations=0) before convergence; external/human-input dependencies are flagged assumptions, never silent gaps.
  • Every risk is owned, scored, and placed in the RAID log with a mitigation-sprint-or-accept; no rubber-stamped or floating risks.
  • Honesty architecture: every completeness claim (coverage, acyclicity, risk ownership, single-source register) is a checkable receipt; the review verdict is INVALID unless each PASS cites its specific receipt and the coverage count is re-derived.
  • Human-as-input-not-gate: the sponsor is solicited for priority and data/access availability but never blocks; silence is handled by an explicit flagged assumption, not a stall.
  • Living-artifact: the register and plan are re-versioned after each sprint's accomplishments-vs-goals independent review, closing the lifecycle loop through the single authoritative scope source.
12

Sprint 0: Walking Skeleton & Pipeline Bootstrap

AI Agent: Vector

Before a single feature is built, prove the production delivery road exists and is honest-by-receipt: a trivial change must traverse build -> tests -> SAST/SCA/secret-scan -> SBOM(attested) -> deploy(dev/test) -> post-deploy smoke, with EVERY gate wired as a REQUIRED, BLOCKING status check on a protected branch, evidenced by an immutable CI run ID. The done-condition is artifact-anchored, never asserted: the brick is only complete when BOTH a green-path receipt (the good trivial change passes all stages) AND five red-path receipts (each of five planted-defect changes is independently BLOCKED with a failing run ID) are attached. This brick also stands up the honesty-architecture substrate the entire downstream process depends on — a receipt/ledger store and receipt schema (run ID, git SHA, gate verdicts, scan summaries, SBOM digest, reviewer verdict) so every later "done/passed/live" claim has an honest place to anchor. Vector owns the pipeline + environments + deploy/rollback; Mason owns the repo skeleton, coding standards, and machine-readable test harness; Cipher owns SAST/SCA/secret-scanning + attested SBOM + CI identity/secrets hardening; Iris establishes a versioned design-system baseline wired into the same blocking CI. Critically, Vector does NOT get to declare the pipeline "enforced": the brick feeds a paired Independent Review & Iterate loop (Verdict + non-author specialists + an adversarial red-team merge attempt) that converges, with evidence, on a verdict before Sprint 0 closes. Where the human's brief leaves choices open (cloud target, language, severity thresholds), the team proceeds on an explicit, flagged assumption logged as a receipt and surfaces it to the human as input — never blocking on approval.

Deliverables
[AI · Vector] Bootstrapped CI/CD pipeline executing build -> test -> SAST -> SCA -> secret-scan -> SBOM -> deploy -> post-deploy smoke as ordered, named stages, plus the green-path receipt
Acceptance: A single trivial change merged to the protected branch produces ONE immutable CI run whose run ID is recorded in the ledger; the run shows every named stage green; the run ID, git SHA, and per-stage verdicts are written as a receipt. Re-running the SAME git SHA reproduces an identical pass/fail verdict (two run IDs, identical result) — determinism receipt attached.
[AI · Vector] Branch-protection / merge-gate enforcement state captured as a checkable receipt
Acceptance: GET branch-protection API output (or equivalent platform config export) is stored in the ledger and shows: all pipeline gates set as REQUIRED status checks, admin/bypass disabled, force-push disabled, signed commits required, and required-review setting present. A planted merge that skips a required check is rejected by the platform (rejection evidence stored).
[AI · Vector] Real dev/test environment with a non-mocked deploy of the trivial change, a post-deploy smoke gate, and proven rollback
Acceptance: The trivial change is actually deployed to a real dev/test environment; a post-deploy health/smoke check runs as a BLOCKING gate and returns healthy (HTTP/health receipt stored). A deliberate broken deploy triggers rollback to the prior good revision, evidenced by deploy + rollback run IDs and a post-rollback healthy smoke receipt.
[AI · Mason] Repo skeleton + coding standards with lint/format/type-check wired as BLOCKING CI gates
Acceptance: Repository initializes, builds reproducibly, and pinned toolchain + lockfiles are committed. Lint, format, and type-check run as required CI checks; a change violating each is blocked (failing run ID per violation). Pinned-version + lockfile presence verifiable in repo.
[AI · Mason] Test harness producing machine-readable test counts consumed by the pipeline
Acceptance: `make test` (or equivalent) runs the suite and emits a machine-readable report (e.g. JUnit XML/JSON) with pass/fail counts; the CI test gate parses it and BLOCKS on any failure. At least one passing test exists; the planted failing-test change blocks the pipeline with a failing run ID and the parsed count appears in the receipt.
[AI · Cipher] SAST + SCA + secret-scanning configured as REQUIRED blocking checks with a documented severity/vuln-gate threshold policy
Acceptance: Each scanner runs in CI as a required status check with a written threshold policy (e.g. block on High/Critical). Planted defects — a high-severity SAST finding, a known-CVE dependency, and a committed test secret — EACH independently block merge/deploy, evidenced by three distinct failing run IDs. Secret-scanner baseline/allowlist policy is documented and committed.
[AI · Cipher] Per-build SBOM (CycloneDX or SPDX) attached as an immutable run artifact, attested, and CONSUMED by the SCA vuln gate
Acceptance: Every build emits an SBOM in CycloneDX or SPDX, attached to the run with a recorded digest in the receipt; the SBOM is signed/attested (e.g. cosign / SLSA / in-toto provenance) and the attestation verifies. The SCA vuln gate is keyed to the SBOM contents. The planted removed-SBOM-step change BLOCKS the pipeline with a failing run ID.
[AI · Cipher] CI identity & secrets hardening: keyless/OIDC for deploy & signing, least-privilege runner identity, no long-lived admin secrets
Acceptance: Pipeline auth config export shows deploy/signing use keyless OIDC (or a documented, flagged exception) and the runner identity is scoped least-privilege; a grep/audit receipt confirms no long-lived admin secret is stored in repo or CI config. Config receipt stored in ledger.
[AI · Iris] Versioned design-system baseline (tokens + a built component library) published as a versioned artifact with a visual/build check wired into the same blocking CI
Acceptance: Design tokens + a minimal component library build into a versioned artifact (semver tag/package). A build (and visual-regression where available) check for the design system runs as a required CI status check; a change that breaks the component build BLOCKS the pipeline with a failing run ID. Artifact version recorded in the ledger.
[AI · Vector] Honesty-architecture substrate: a receipt/ledger store + published receipt schema that downstream bricks write to
Acceptance: A ledger/receipt store exists and is reachable; the documented receipt schema includes run ID, git SHA, gate verdicts, scan summaries, SBOM digest, and reviewer verdict slots. All Sprint-0 deliverables above have at least one real receipt row written to it. Schema doc is committed and version-tagged.
[AI · Proof] Negative-control suite: five planted-defect changes each independently BLOCKED, with both green-path and red-path receipts collated
Acceptance: Five changes — (1) failing unit test, (2) high-severity SAST finding, (3) committed test secret, (4) known-CVE dependency, (5) removed SBOM step — are each pushed and each BLOCKED from merge/deploy, evidenced by five distinct failing CI run IDs; the matching trivial-good change passes with its own run ID. All six run IDs collated into one negative-control evidence receipt.
[AI · Verdict] Pipeline Independent Review & Iterate convergence verdict (non-author panel + adversarial red-team), citing the green + five red run IDs and the branch-protection receipt
Acceptance: Verdict (independent_reviewer) plus non-author specialists — Cipher reviews Vector's deploy gating, Vector/Lens review Cipher's security gates, Proof reviews the test-harness honesty — and an adversarial red-team pass that TRIES to merge/deploy a bad change critique the pipeline against the objectives, log gaps, the author fixes, and the loop ITERATES until no open material gap. Final verdict document records: zero open material gaps; cites the green run ID, the five failing run IDs, the branch-protection API output, the SBOM attestation verification, the deploy+rollback receipts, and confirms the receipt/ledger substrate exists. Brick is NOT done until this converged verdict is logged.
Questions the agent asks (6)
  • What is the target cloud/runtime for the dev/test (and eventual production) environment — or should the team proceed on a flagged default and you correct later?
  • What primary language/stack should the skeleton use, or do you want the team to choose based on the product brief and log it as an assumption?
  • What severity threshold should block the pipeline (e.g. block on High and Critical), and are there any known dependencies/findings you want pre-allowlisted?
  • Are there compliance or data-residency constraints (e.g. region, no third-party SaaS scanners) that constrain where CI runs and where receipts/artifacts are stored?
  • Is there an existing repo, CI platform, artifact registry, or secrets manager you want reused, or should the team stand up new ones?
  • What artifact/log retention window do you require so CI run-ID receipts remain immutable and non-expiring for audit?
Do (9)
  • Prove gates BLOCK with real red-path receipts (failing run IDs) — a green run only proves the happy path.
  • Wire every gate as a REQUIRED status check on a protected branch with no admin bypass and force-push disabled; capture the branch-protection API output as a receipt.
  • Generate the SBOM per build, attach it as an immutable artifact, attest it (cosign/SLSA/in-toto), and key the SCA vuln gate to it.
  • Make the deploy real to a dev/test environment with a blocking post-deploy smoke gate and a proven rollback.
  • Pin the toolchain and lockfiles and prove determinism by re-running the same SHA for an identical verdict.
  • Use keyless/OIDC for deploy and signing; keep the runner least-privilege; store no long-lived admin secrets in repo or CI.
  • Stand up the receipt/ledger substrate first so every Sprint-0 deliverable (and every downstream brick) anchors its claims to a receipt.
  • When the human's brief leaves a choice open, proceed on an explicit, flagged assumption logged as a receipt and surface the question as input.
  • Close the brick only on Verdict's converged, evidence-citing verdict — never on Vector's self-attestation.
Don't (9)
  • Don't let any gate be advisory/non-blocking or run nightly — gates must block at merge/deploy from the first commit.
  • Don't declare 'gates enforced' without the five red-path failing run IDs proving the block.
  • Don't let Vector (pipeline author) or Cipher (security-gate author) review their own work — independence means a non-author reviews each part.
  • Don't accept a no-op or unattested SBOM, or an SBOM the SCA gate doesn't actually consume.
  • Don't simulate the deploy with a script that echoes success; the deploy and rollback must be real and health-checked.
  • Don't inject long-lived admin secrets or over-privileged credentials into CI.
  • Don't let design-system checks be decorative — either wire them into the blocking CI as a real check or split them out.
  • Don't block on a human approval; capture human input continuously but proceed on flagged assumptions where silent.
  • Don't claim done while any material gap from the independent review remains open.
Guardrails (9)
  • Done-condition is receipt-anchored: a real green CI run ID PLUS five red-path failing run IDs must be attached before the brick can close.
  • Branch protection must show required checks, no bypass, force-push disabled, and signed commits — verified by stored API output, not assertion.
  • SBOM must be CycloneDX/SPDX, attached per build, attested (verifiable), and consumed by the SCA gate; the removed-SBOM planted change must block.
  • Determinism guard: a same-SHA re-run must reproduce the identical verdict, or future 'passed' receipts are not trusted.
  • No long-lived admin secrets in CI; deploy/signing use keyless OIDC or a documented, flagged exception.
  • Independence is mandatory: the reviewer of the security gates is not Cipher; the reviewer of deploy gating is not Vector; an adversarial red-team must actively try to merge/deploy a bad change and must fail.
  • Convergence = the panel agrees, with cited run IDs and the branch-protection receipt, that all six objectives (good-path green, five defects blocked, gates required on protected branch, SBOM attested+consumed, real deploy+rollback+smoke, receipt/ledger substrate exists) are met with no open material gap.
  • Receipt/ledger immutability: CI run logs and artifacts must be retained for the human-specified window so run-ID receipts do not rot.
  • All human-open choices proceed only on an explicit flagged assumption logged as a receipt — never on a silent default.
13

Sprint Planning [LOOP]

AI Agent: Cadence

Open the iterative delivery loop for one sprint by authoring a tight, evidence-backed sprint plan that a top engineering org would actually commit to — then hand it to the independent sprint-plan review loop before any build begins (no human sign-off; the panel's convergence verdict is what authorizes build). Cadence (Delivery Lead) and Vela (Product Owner) pull the highest-priority Ready stories that pass the Definition of Ready, agree exactly ONE measurable sprint goal, decompose stories to tasks, size the commitment against the team's defined AI-team capacity/WIP (concurrent workstreams + review-loop budget + merge/integration throughput, NOT human story-point velocity), and confirm at least one testable acceptance criterion plus a named test approach per story. Every committed story traces back to a PRD/spec item and forward to a test approach, so no orphan or creep work enters the sprint. Sprint dependencies and risks are captured in a register with an owner and need-by date (or flagged blocked) per item. This brick is the loop-entry point: it repeats on each outer iteration through the sprint-increment review, re-pulling Ready stories and re-planning against updated velocity/learnings, and the outer loop exits only when release scope is delivered. Planning surfaces open scope/priority questions to the sponsor as a continuous input channel and, where the human is silent, proceeds on an explicit flagged assumption recorded in the plan — it never blocks on approval.

Deliverables
[AI · Cadence] Committed sprint backlog: a single sprint-goal sentence plus the list of committed stories (each with story ID, points/size, and decomposed tasks), written to the sprint record.
Acceptance: File/record exists containing EXACTLY ONE sprint goal expressed as a single sentence with a measurable outcome; >=1 committed story; every committed story has a unique story ID, a size, and >=1 task. Binary: one-goal check passes (count of goals == 1) and zero stories have empty task lists.
[AI · Vela] DoR-pass matrix: per-story Definition-of-Ready check for every committed story.
Acceptance: Matrix lists each committed story ID with a PASS against each DoR criterion. Binary: 100% of committed stories show DoR=PASS; any story with DoR=FAIL is NOT in the committed backlog (count of committed-with-FAIL == 0).
[AI · Proof] Per-story acceptance-criteria + test-approach sheet.
Acceptance: For every committed story there is >=1 testable acceptance criterion and a named test approach (unit/integration/e2e/manual + tool). Binary: count of committed stories missing an AC OR a named test approach == 0.
[AI · Cadence] Capacity/WIP calculation showing the math: committed size vs the team's defined AI-team capacity, citing the upstream team-charter/working-agreement capacity definition.
Acceptance: Calculation present and cites the capacity-definition source. Binary: committed size <= defined capacity (committed_points <= rolling-3-sprint AI-team capacity ceiling) AND the cited capacity source ID resolves.
[AI · Keystone] Dependency & risk register for the sprint.
Acceptance: Register lists every cross-team/external dependency and material risk; each row has an owner AND a need-by date OR an explicit blocked flag, plus a mitigation-or-flag for each risk. Binary: count of register rows missing an owner == 0 and missing a need-by-date-or-blocked-flag == 0.
[AI · Vela] Traceability map: each committed story -> source PRD/spec item ID -> test approach.
Acceptance: Every committed story ID maps to a non-empty PRD/spec item ID and to a test approach. Binary: count of committed stories with empty upstream-spec-ID or empty downstream-test-approach == 0 (zero orphans).
[AI+Human · Vela] Flagged-assumptions & open-questions log: planning questions surfaced to the sponsor, answers incorporated where given, and an explicit flagged assumption (assumption ID + still-open question) wherever the human is silent.
Acceptance: Log exists. Binary: every planning decision that depended on missing sponsor input has either a recorded sponsor answer OR a flagged assumption with a unique assumption ID and the open question text; count of unresolved-but-unflagged dependencies == 0. No item is marked 'blocked on human approval'.
[AI · Cadence] Sprint-plan honesty receipt + handoff to the independent sprint-plan review loop.
Acceptance: Receipt bundle references the committed-backlog record ID, capacity calc, DoR matrix, AC/test sheet, dependency/risk register, traceability map, and assumptions log; and the plan is submitted to the independent review_loop (Verdict + Keystone/Proof/Cipher lenses). Binary: review-loop intake record created with a status of 'submitted-for-review' (not 'planned/approved'); build does NOT start until that loop returns a convergence verdict ID.
Questions the agent asks (7)
  • Sponsor: for this sprint, is the highest-value outcome correctly captured by the proposed one-sentence sprint goal, or should priority shift?
  • Sponsor: are there any new constraints, deadlines, data, or scope changes since the last sprint we should fold into this plan?
  • Sponsor: for any story where required information is missing, can you confirm the assumption we have flagged, or provide the answer? (If silent, we proceed on the flagged assumption.)
  • Internal (Vela): which Ready stories are highest priority and trace cleanly to a PRD/spec item this sprint?
  • Internal (Cadence): what is the current AI-team capacity/WIP ceiling from the team charter, and does the proposed commitment fit within it?
  • Internal (Keystone): which committed stories carry cross-team or external dependencies that need an owner and need-by date now?
  • Internal (Cipher): do any committed stories touch auth, data handling, or external surfaces such that threat/security work must be scoped into this sprint?
Do (9)
  • Pull only Ready stories that pass the Definition of Ready; reject anything failing DoR back to refinement rather than committing it.
  • State exactly ONE sprint goal as a single measurable sentence; if the backlog implies two goals, split or defer until one remains.
  • Size the commitment against the defined AI-team capacity/WIP (concurrent workstreams, review-loop budget, merge/integration throughput) and show the math.
  • Confirm >=1 testable acceptance criterion and a named test approach for every committed story before committing it.
  • Trace every committed story back to a PRD/spec item and forward to a test approach.
  • Record every dependency and risk with an owner and need-by date (or explicit blocked flag) and a mitigation.
  • Surface open scope/priority questions to the sponsor and incorporate answers; where the sponsor is silent, proceed on an explicit flagged assumption with an assumption ID.
  • Hand the authored plan to the independent sprint-plan review loop and treat that panel's convergence verdict — not Cadence/Vela's own judgment — as the build authorization.
  • On each outer-loop iteration, re-pull Ready stories and re-plan against updated velocity and prior-sprint learnings.
Don't (8)
  • Do NOT insert any human approval/sign-off gate; the sponsor provides input, never blessing.
  • Do NOT treat 'Cadence/Vela authored it' as sufficient — the plan is not authorized to build until the independent review loop converges.
  • Do NOT commit more than one sprint goal or let the sprint become a grab-bag of unrelated work.
  • Do NOT commit stories that fail DoR, lack a testable AC, lack a named test approach, or have no upstream spec trace.
  • Do NOT over-commit beyond the defined AI-team capacity/WIP ceiling.
  • Do NOT copy human-velocity ceremony (raw story-point velocity) as the capacity model for an AI team.
  • Do NOT block planning on a silent sponsor — flag an explicit assumption and proceed.
  • Do NOT claim the sprint is 'planned' without the full receipt bundle (committed backlog, capacity calc, DoR matrix, AC/test sheet, dependency/risk register, traceability map, assumptions log) existing.
Guardrails (8)
  • Single-goal guardrail: the brick fails if the committed plan contains anything other than exactly one measurable sprint goal.
  • Capacity guardrail: committed size must be <= the defined AI-team capacity/WIP ceiling, with the calculation and its source citation present; otherwise descope before submitting.
  • DoR guardrail: zero committed stories may have DoR=FAIL; testability guardrail: zero committed stories may lack a testable AC or named test approach.
  • Traceability guardrail: zero orphan stories — every committed story must map to a PRD/spec item ID and a downstream test approach.
  • Dependency guardrail: every dependency/risk row must carry an owner and a need-by date or explicit blocked flag; unowned items are not allowed to ship in the plan.
  • Human-input guardrail: the human may only provide information/answers; any item marked 'blocked on human approval' is a violation — convert to a flagged assumption and proceed.
  • Honesty/receipt guardrail: 'planned' is claimable only when the full receipt bundle exists; the plan is 'authorized to build' only when the independent review loop returns a convergence verdict ID with zero open material gaps — never asserted.
  • Hand-off guardrail: this brick must NOT mark itself complete-and-building; it submits to the independent review_loop and waits for that panel's verdict (reviewers must be non-authors of the plan).
14

Independent Review & Iterate: Sprint Plan [LOOP]

AI Agent: Verdict

Stop a whole sprint from being burned on a bad plan by validating the committed sprint backlog and goal BEFORE execution starts — through a bounded, evidence-producing independent-review loop, not a human sign-off. A NON-AUTHOR panel — Verdict (independent evaluator, runs an adversarial pre-mortem), Proof (testability of every acceptance criterion), and Keystone (dependency/feasibility/staffability) — scores each committed story line-by-line against the pinned, versioned Definition-of-Ready rubric (owned by Cadence, who commits the canonical DoR in assemble_team), the release plan, and the seven Objectives, returning structured per-story findings classified on the canonical Blocker/Major/Minor severity taxonomy. Cadence (Delivery Lead) and Vela (Product Owner) authored the plan and own the fixes; they revise, the panel re-reviews, and the loop iterates to a binary ConvergenceVerdict: 0 open Blockers AND 0 open Majors AND every Minor accepted-with-written-rationale, with one clean stabilization pass that introduced zero new gaps. The loop is bounded at 3 rounds; on non-convergence the specific unresolved disagreement is surfaced to the sponsor as a NON-BLOCKING input question and the team proceeds on an explicit flagged assumption carried as a sprint risk — never a human gate. Convergence is a durable, re-derivable receipt (panel roster with non-author attestation, per-story pass/fail ledger, capacity arithmetic, append-only gap log with severity, pre-mortem outcomes, flagged-assumptions/risk list), and those assumptions+risks are emitted as the explicit checklist the end-of-sprint accomplishments-vs-goals review must later validate. No claim of "meets DoR / testable / fits capacity / converged" is accepted without its re-derived receipt.

Deliverables
[AI · Cadence/Vela] Sprint-Plan-Review-Packet — the author-submitted artifact under review: sprint goal, the committed story list, and for EACH story (a) acceptance criteria, (b) a line-by-line self-score against the pinned DoR rubric, (c) assigned agent/staffing, (d) inbound/outbound dependencies, plus a sprint-level capacity block (committed points vs rolling velocity with the arithmetic shown) and a questions-to-human / open-assumptions list.
Acceptance: Packet exists at the sprint path and is non-empty for every field; story count in the packet == committed story count in sprint.md; each story has ≥1 acceptance criterion and a DoR self-score for every DoR rubric line; the capacity block shows committed_points, velocity_source, rolling_velocity_number, and committed-minus-velocity delta. Any missing field == packet rejected (binary pass/fail).
[AI · Cadence] Pinned DoR rubric reference (sprint_dor_checklist, versioned) — the review is bound to the canonical, versioned Definition-of-Ready checklist that Cadence commits in assemble_team (e.g. clear problem, testable acceptance criteria, sized/fits-appetite, dependencies identified-and-ready, staffable to a named agent, no open blocker question). Cadence owns the rubric as delivery owner; Verdict is the independent scorer that grades each story against it line-by-line and never authors the rubric it grades against.
Acceptance: The convergence receipt references the sprint_dor_checklist by version id and that version resolves to the canonical rubric Cadence committed in assemble_team (not a brick-local copy); the rubric enumerates ≥6 numbered rules; every story in the packet has an explicit pass/fail recorded against each numbered rule (no rule left unscored). Grading against memory instead of the pinned, versioned file == fails the per-story-coverage check.
[AI · Verdict+Proof+Keystone] Per-story independent findings ledger — for each story, each reviewer returns a structured row (story-id → rule/lens checked → pass|fail → evidence/receipt) from their assigned lens: Proof names the specific test or observation that would prove each acceptance criterion; Keystone marks each dependency ready|not-ready and each story staffable|unstaffable; Verdict assigns canonical severity (Blocker|Major|Minor) and runs the adversarial pre-mortem.
Acceptance: Every committed story has a finding row from EACH of the three panel members (coverage = 3 × story_count rows, none blank); every fail row carries an evidence/receipt field (no evidence-free fail); every Proof testability pass cites a specific test/observation, not the word 'testable'; every Keystone capacity/dependency claim cites the number or the artifact; every fail row carries a Blocker|Major|Minor classification. A verdict lacking full per-story coverage is REJECTED (anti-rubber-stamp gate).
[AI · Verdict] Adversarial pre-mortem record — 'assume this sprint ended with its goal unmet; what in THIS plan most likely caused it?' — enumerating the top failure modes (hidden dependency, under-scoped/over-scoped story, untestable AC, unstaffable step, capacity overrun) with each mode answered or converted to a logged risk.
Acceptance: ≥1 pre-mortem failure mode is recorded; each listed failure mode has either a documented mitigation in the revised plan OR an entry in the open-risk list — zero failure modes left unanswered (binary).
[AI · Verdict] Sprint-Plan Convergence Verdict receipt — the durable, re-derivable record: round number (≤3), panel roster with each member's non-author attestation, the per-story pass/fail ledger, the append-only resolved-vs-open gap log with canonical severity, the capacity arithmetic, the pre-mortem outcomes, the final flagged-assumptions/risk list, the clean-stabilization-pass marker, and the explicit statement 'open Blockers == 0 AND open Majors == 0 AND every Minor accepted-with-rationale'.
Acceptance: Receipt is stored at the sprint path and contains all listed sections; it records count(open Blockers)==0 AND count(open Majors)==0 AND every Minor row carries a non-empty accepted-with-rationale note AND a stabilization pass that introduced zero new gaps; round ≤ 3 (or, if round 3 ended unconverged, it records the human-input escalation + the flagged assumption per the bounded-loop rule); each panel member is attested as a non-author of the plan; Verdict re-derived (not accepted from the author) every cited number. Sprint execution does not start until this receipt exists meeting the convergence condition OR a recorded bounded-loop escalation — checkable by file presence + field values.
[AI · Cadence] Resolved-gap revision log — Cadence/Vela's record of each Blocker- or Major-severity gap raised by the panel and the specific plan edit (story re-scoped, AC made testable, dependency sequenced, story re-staffed, points trimmed to velocity) that closed it, re-submitted for re-review each round.
Acceptance: Every Blocker- and Major-severity finding in the findings ledger maps to either a revision-log entry (closed, with a fix-evidence pointer) or an open-risk entry with a recorded human-input escalation; at convergence the count of Blocker/Major gaps with neither == 0 (binary).
[Human] Sponsor answers to surfaced plan questions (continuous input, non-blocking) — responses to any priority/missing-data/ambiguous-acceptance question the team logs in the packet's questions-to-human; where the sponsor is silent, the team proceeds on the flagged assumption.
Acceptance: Every questions-to-human item has either a recorded sponsor answer OR a flagged assumption carried into the convergence receipt's risk list — open questions with neither == 0. The loop is never recorded as blocked-waiting-on-human (no human-approval state exists in the receipt).
[AI · Verdict] Downstream-accountability handoff — the convergence receipt's flagged-assumptions + predicted failure-mode risks emitted as the explicit input checklist for the end-of-sprint accomplishments-vs-goals review brick.
Acceptance: The downstream sprint-review brick's input references this receipt by id, and every flagged assumption + predicted risk appears as a line item to be validated at retro — count of carried-forward items in the handoff == count in the receipt's risk list (binary match).
Questions the agent asks (4)
  • Which committed stories are highest-priority such that, if capacity is tight, they must survive scope trimming?
  • Is there missing data, an external dependency, or an ambiguous acceptance definition on any story that only the sponsor can resolve (logged as a non-blocking question; otherwise we proceed on a flagged assumption)?
  • Are there any fixed external dates or release-plan commitments this sprint must hit that constrain what can be committed?
  • For any story whose acceptance is currently subjective ('it should feel fast'), what is the concrete, observable threshold we should hold it to?
Do (9)
  • Bind the review to the canonical versioned DoR rubric Cadence committed in assemble_team and have Verdict score every story line-by-line against every numbered rule — reproducible, re-derived, not from memory.
  • Require each panel member to produce structured per-story findings with re-derived evidence; reject any verdict lacking full per-story coverage (anti-rubber-stamp).
  • Show the capacity arithmetic: committed points vs rolling velocity with the source and the delta — 'fits capacity' is a number, never an assertion.
  • Have Proof name the specific test or observation that proves each acceptance criterion, and Keystone cite the artifact/number behind each dependency-ready and staffable claim.
  • Classify every finding on the canonical Blocker/Major/Minor taxonomy and drive convergence to 0 open Blockers, 0 open Majors, Minors accepted-with-written-rationale, plus one clean stabilization pass.
  • Run Verdict's adversarial pre-mortem every round and answer or log every failure mode before converging.
  • Guarantee panel independence: every reviewer (Verdict, Proof, Keystone) attests they did not author the plan or the DoR rubric they grade against; if a needed specialist co-authored the relevant design, swap in an alternate.
  • Bound the loop at 3 rounds; on non-convergence, surface the specific disagreement to the sponsor as INPUT and proceed on an explicit flagged assumption recorded as a sprint risk.
  • Store the convergence verdict as a durable receipt and emit its assumptions+risks as the checklist the end-of-sprint review must validate.
Don't (9)
  • Don't let a reviewer emit a blanket 'looks good' / approval with no per-story evidence — that verdict is rejected.
  • Don't insert any human approval or sign-off; the sponsor only provides input and answers, never gates.
  • Don't converge by reclassifying a Blocker- or Major-severity gap (DoR violation, untestable AC, over-capacity, unready dependency, unstaffable story) down a tier — those severities are fixed by the canonical rubric.
  • Don't let a panel member review a plan they authored, or grade stories against a DoR rubric they themselves wrote.
  • Don't name Atlas, or any non-roster actor, as the owner or author of any deliverable in this brick — the DoR rubric is owned by Cadence and graded by Verdict.
  • Don't run the loop unbounded — an infinite plan-review loop burns the very sprint capacity it protects.
  • Don't block or stall waiting on a human answer; if unanswered, convert the question to a flagged assumption and continue.
  • Don't accept 'fits capacity', 'testable', or 'meets DoR' without the velocity number / the named test / the re-derived per-story score backing it.
  • Don't start sprint execution before the convergence receipt records 0 open Blockers AND 0 open Majors AND every Minor accepted-with-rationale (or a recorded bounded-loop escalation).
Guardrails (8)
  • Severity taxonomy is the canonical assemble_team rubric, used verbatim: every gap is classified Blocker, Major, or Minor; convergence == 0 open Blockers AND 0 open Majors AND every Minor carries a written accepted-with-rationale note, plus a stabilization pass that introduced zero new gaps. No local 'material/minor' or 'blocker/minor' split is used.
  • Independence is mandatory: Verdict, Proof, and Keystone must each be a non-author of both the plan and the DoR rubric under review; conflicted seats are swapped for an independent alternate before the round begins.
  • Roster integrity: no deliverable in this brick may name Atlas or any actor outside the delivery roster (Vela, Cadence, Keystone, Mason, Lens, Proof, Cipher, Vector, Iris, Verdict). The DoR rubric is OWNED by Cadence (committed in assemble_team) and independently GRADED by Verdict.
  • Loop is bounded at 3 rounds with a defined exit: convergence, or escalate-the-disagreement-as-human-input-and-proceed-on-flagged-assumption — never a human gate, never an open-ended loop.
  • Honesty: every 'meets DoR / testable / fits capacity / converged' claim carries its re-derived receipt (pinned-rubric per-story scores, named tests, velocity arithmetic, stored convergence verdict); reviewers re-derive numbers rather than accept the author's summary; evidence-free claims are rejected.
  • Author/reviewer separation is structural: Cadence/Vela author and fix the plan and Cadence owns the DoR rubric; the panel only reviews — this brick stays separate from the planning brick it reviews.
  • Continuous-input channel never converts to a gate: human silence yields a flagged assumption carried as a risk, not a stop.
  • Accountability carries forward: the verdict's assumptions and predicted risks are the input checklist for the end-of-sprint accomplishments-vs-goals review, so reviewer judgment is later audited against reality.
15

Sprint Execution: Build · Review · Test · Secure [LOOP]

AI Agent: Mason

The autonomous build core, run once per story and iterated until each story is "solid." Mason implements the story (test-driven where apt) on a short-lived branch; Iris delivers the UX/UI; Proof writes/extends automated tests across the pyramid (unit/integration/e2e) and owns the acceptance tests so results are not self-marked by the implementer; Cipher runs SAST/SCA/secret-scan per change under a defined severity+waiver policy; Lens performs evidence-backed code review where the reviewer is provably NOT the author of the change. The brick exists to emit re-derivable, outcome-keyed RECEIPTS that the downstream sprint-review loops re-derive from — so it must not repeat the platform's own confessed sins: the merge gate here is BLOCKING (a required, non-overridable status check that returns non-zero and stops the merge on any failing condition — never the nightly exit-0 pattern flagged P0 in docs/honesty-architecture.md), and every receipt is keyed on the CI run_conclusion=='pass' so a RED run can never pose as proof (mirroring the email.sent outcome-filter fix). A story is "solid/converged" only when a binary merge predicate holds: all Definition-of-Done items green, coverage AND new-diff-coverage thresholds met, mutation/assertion guard satisfied, zero open High/Critical scan findings, and an evidence-citing Lens approval from a non-author. Independence and honesty are enforced by provenance metadata on the receipt, not by honor; Verdict (independent_reviewer) periodically re-derives a sample of merged-story receipts to confirm the loop was not rubber-stamped. When a story is materially ambiguous mid-build, the team surfaces a question to the sponsor and proceeds on an explicit FLAGGED ASSUMPTION recorded on the story — it never blocks on human approval.

Deliverables
[AI · Mason] Implemented story on a short-lived branch (branched from trunk, story_id in branch name), test-driven where apt, with a commit trail attributing author_id; PR opened against trunk.
Acceptance: PR exists, branch is short-lived (opened and merged within the sprint, <= agreed max age), commit author_id is recorded, and the diff implements the story's stated scope; branch is NOT trunk and was not pushed directly to a protected branch (git log shows no direct-to-trunk commit for this story).
[AI · Iris] UX/UI implementation for the story (components, states, responsive + a11y) matching the design system, delivered on the same branch.
Acceptance: For any story with a user-facing surface, the implemented UI renders the specified states (empty/loading/error/success) and passes the automated accessibility check (zero critical a11y violations in the CI a11y job); for non-UI stories this deliverable is explicitly marked N/A on the receipt.
[AI · Proof] Automated tests across the pyramid for the story (unit/integration/e2e as apt) PLUS the story's acceptance tests, authored or independently re-run by Proof (not self-marked by Mason).
Acceptance: Test suite executes in CI with run_conclusion=='pass'; line coverage >= threshold AND new-diff branch coverage >= threshold; mutation_score >= threshold (or assertion-density guard passes where mutation is N/A); acceptance-test authorship attributable to Proof (or independently re-run) per receipt provenance — never attributed solely to the change author.
[AI · Cipher] Per-change security scan bundle: SAST + SCA (dependency CVEs) + secret-scan results with severity classification and any waivers.
Acceptance: scan.high_critical_open == 0; any hardcoded-secret finding hard-fails and is non-waiverable; Medium/Low findings each carry a tracked, attributable, time-boxed waiver (who/why/expiry/waiver_id) referenced in the receipt; finding IDs are recorded so downstream review can re-derive.
[AI · Lens] Evidence-backed code review verdict object (not prose 'LGTM') enumerating each Definition-of-Done item with pass/fail + an evidence pointer (file:line or test name), and citing the failing-then-passing acceptance test for the change.
Acceptance: lens_verdict == 'approved_with_evidence'; reviewer_id != commit author_id and != any Co-Authored-By on the change; every DoD item has a pass/fail with an evidence pointer; a bare approval with zero cited evidence on a diff above the agreed size triggers a mandatory second-reviewer or red-team pass before approval can stand.
[AI · Vector] BLOCKING merge gate enforcing the binary merge predicate as a required, non-overridable status check on branch protection.
Acceptance: Merge occurs IFF (run_conclusion=='pass' AND coverage_line>=threshold AND new_diff_branch_coverage>=threshold AND mutation/assertion guard passes AND scan.high_critical_open==0 AND lens_verdict=='approved_with_evidence' AND reviewer_id!=author_id); a failing predicate returns non-zero and blocks the merge (verified by a negative test that a RED run / self-approval / open-critical-finding cannot merge); no --no-verify, admin override, or direct-to-trunk path can bypass it, and any break-glass is itself a logged receipt with justification.
[AI · Mason] Per-story RECEIPT object emitted on merge, schema: story_id, commit_sha, author_id, ci_run_id, run_conclusion, tests_passed, tests_failed, tests_skipped/quarantined, coverage_line, coverage_branch (mutation_score where apt), scan_tool+finding_ids+severity+waiver_ref, reviewer_id, lens_verdict, dod_checklist[item->pass/fail], assumptions_flagged[], and a link to the story's acceptance criteria.
Acceptance: Every merged story has exactly one receipt with all required fields populated; run_conclusion is present and the gate keyed on it (no receipt with run_conclusion!='pass' is attached to a merged story); the receipt links to the story acceptance criteria so the downstream Sprint-Accomplishments-vs-Goals review can re-derive 'goal met' from raw CI artifacts, not a summary.
[AI · Verdict] Independent receipt audit per sprint: re-derives counts/scan results for a sample of merged-story receipts from raw CI artifacts (not the summarized receipt), confirms Lens approvals carried real evidence and the independence (reviewer_id!=author_id) check held, and logs a convergence verdict.
Acceptance: Verdict's audit covers at least the agreed sample of the sprint's merged stories; for each, re-derived counts/scan-status match the receipt (mismatch = logged discrepancy + reopened story); audit records zero rubber-stamp violations (no approval lacking cited evidence) or lists each as an open gap; convergence verdict is written and feeds the sprint-review brick.
[AI · Cadence] Flagged-assumptions log on each story for mid-build ambiguity surfaced to the sponsor.
Acceptance: Any story merged with an unanswered sponsor question carries an explicit FLAGGED ASSUMPTION entry (assumptions_flagged[] non-empty on the receipt) stating the assumption made; the work did not block waiting for human approval; unanswered assumptions are marked for the sprint-review reviewer.
Questions the agent asks (8)
  • What are the coverage thresholds (line %, new-diff branch %) and is a mutation-score threshold or assertion-density guard required for this codebase?
  • What is the security pass policy: confirm zero open High/Critical, and what is the waiver authority, expiry window, and waiver record location for Medium/Low findings?
  • Where is the versioned Definition-of-Done and the coding standards the review enumerates against, and who owns updates to them?
  • What is the diff-size threshold above which a bare Lens approval (zero cited evidence) triggers a mandatory second-reviewer or red-team pass?
  • What is the flaky-test policy — confirm no blanket auto-retry-to-green, and that quarantine requires a tracked ticket and does not count as pass?
  • For this story, are there ambiguities you'd like to answer now, or should the team proceed on a flagged assumption and surface the question on the story?
  • What is the maximum allowed branch age for 'short-lived', and is trunk-based merging (squash) the agreed integration model?
  • Are there any compliance/data-handling constraints that should add story-specific gate conditions (e.g., PII handling tests)?
Do (10)
  • Make the merge gate a required, BLOCKING status check on branch protection — a failing predicate returns non-zero and stops the merge; add a negative test proving a RED run / self-approval / open-critical cannot merge.
  • Key every receipt and the gate on run_conclusion=='pass'; record outcome explicitly so a failed run can never be read as proof of success.
  • Enforce reviewer independence by provenance: verify reviewer_id (and reviewing model instance) != commit author_id and != any Co-Authored-By; recuse to Verdict or a fresh instance if the same agent authored both code and review.
  • Have Proof own or independently re-run the acceptance tests so results are not self-marked by the implementer.
  • Require Lens approvals to enumerate each DoD item pass/fail with an evidence pointer (file:line or test name) and cite the failing-then-passing test — make approval expensive in evidence.
  • Apply Cipher's severity+waiver policy: zero open High/Critical, hardcoded secrets non-waiverable and hard-fail, Medium/Low only via tracked time-boxed attributable waivers referenced by ID.
  • Quarantine flaky tests with a tracked ticket; quarantined tests do not count toward pass and the quarantined-count is a visible receipt field and a tripwire.
  • Surface mid-build ambiguity to the sponsor and proceed on an explicit FLAGGED ASSUMPTION recorded on the story; never block waiting for human approval.
  • Link each story receipt to the story's acceptance criteria so the downstream sprint-review loop can re-derive goal-met from raw CI artifacts.
  • Use short-lived branches off trunk, one per story, and merge via the gate; keep diffs reviewable.
Don't (9)
  • Never merge with the gate disabled, skipped, or overridden — no --no-verify, no admin bypass, no direct-to-trunk push; any break-glass is itself a logged receipt with justification.
  • Never report a green that the CI run_conclusion does not show; 'it works' is not a receipt and a test that didn't pass is a fail.
  • Never let the change author approve their own change, or mark their own acceptance tests as passing.
  • Never retry a failing test to green without a quarantine ticket; no blanket auto-retries.
  • Never emit a success-shaped receipt (CI run ID + counts) for a RED or skipped run.
  • Never waiver a hardcoded-secret finding or let a High/Critical finding pass without remediation.
  • Never accept a bare 'LGTM' on a non-trivial diff — zero cited evidence triggers a second-reviewer or red-team pass.
  • Never let Verdict's audit trust the summarized receipt; re-derive from raw CI artifacts.
  • Never run the merge-gate control path itself behind a feature flag that can disable it.
Guardrails (8)
  • The merge gate is a required, blocking status check with NO flag on its own run path; removing or weakening any gate condition must itself fail CI.
  • Binary merge predicate: MERGE IFF run_conclusion=='pass' AND coverage_line>=threshold AND new_diff_branch_coverage>=threshold AND mutation/assertion guard passes AND scan.high_critical_open==0 AND lens_verdict=='approved_with_evidence' AND reviewer_id!=author_id.
  • Receipts are append-only, outcome-keyed (run_conclusion), and contain provenance (author_id, reviewer_id) that makes independence machine-verifiable; downstream review re-derives from them.
  • Independence is enforced by identity/provenance, not honor; same-author code+review forces recusal to Verdict or a fresh instance.
  • Security pass = zero open High/Critical; secrets non-waiverable; Medium/Low only via tracked, attributable, time-boxed waivers referenced by ID in the receipt.
  • Flaky tests are quarantined with a tracked ticket and do not count as pass; quarantined-count is a visible tripwire field.
  • Verdict independently re-derives a sample of merged-story receipts each sprint; any mismatch reopens the story and is logged as an open gap.
  • No human approval gate exists in this brick; the sponsor provides input/answers continuously and unanswered ambiguities proceed as flagged assumptions logged on the story.
16

Sprint Review, Demo & Accomplishments-vs-Goals Summary [LOOP]

AI Agent: Cadence

Close out each sprint with an honest, receipt-backed closeout that claims ONLY what a re-resolvable receipt proves: the working increment demoed against the IMMUTABLE sprint goal, every story marked accepted or rejected with the binary acceptance check and the test/scan/demo receipt that settles it, and the quantitative state of the sprint (test pass + coverage delta, security findings, DORA, cost/FinOps) each linked to its source run. The goal-vs-actual delta is computed against the sprint-plan brick's committed goal+story set referenced BY ID (never restated or silently descoped), and every mid-sprint scope change is itemized with its receipt. Cadence authors the closeout but does NOT certify it: the draft is handed to an independent review-and-iterate loop (Verdict + Proof/Cipher/Vector/Vela lenses + one adversarial red-team pass) that re-fetches every receipt and adversarially tests the gap between each sentence and what its receipt actually proves; the closeout is final only on a logged panel verdict of zero open material gaps with the gap-log attached as the convergence receipt. The brick absorbs the legacy human-gated Review stage and the separate Learn/retro stage into one AI-owned artifact — there is NO human sign-off; the sponsor is a continuous input channel (open questions surfaced, answers folded in, silence handled by explicit flagged assumptions). It carries an honest missed-goal path (partial increment + blocking receipts, no optimistic narrative) and a next-sprint proposal that traceably carries forward every rejected/carried story, new risk, and human input.

Deliverables
[AI · Cadence] Sprint closeout summary (sprint-NNN/closeout.md) — goal recap (referenced by sprint-plan ID), demoed increment, accepted/rejected stories, metrics block, goal-vs-actual delta, risks, next-sprint proposal
Acceptance: File exists at sprints/sprint-NNN/closeout.md; the sprint goal and committed story set are referenced by the sprint-plan brick's receipt ID (not restated/edited); every section present; NO completed/passed/live/accepted claim appears without an inline receipt ID
[AI · Cadence] Receipt-resolution table — every quantitative or status claim in the closeout mapped to a receipt ID (CI run URL, ledger/DB row, scan report ID, git SHA, health-check code, or demo-artifact hash) with its resolver type
Acceptance: A validator re-fetches each listed receipt ID and confirms (i) it exists, (ii) it is for THIS sprint's commit range, and (iii) its value supports the claimed value; result = ZERO dangling or contradicted claims; any claim whose receipt fails to resolve is logged as CUT or rewritten, never left asserted
[AI · Proof] Test & coverage receipt — CI run ID(s) with pass/fail counts and coverage delta vs. the sprint-plan baseline, provided by Proof (not self-sourced by the author)
Acceptance: Closeout's test-pass and coverage numbers equal the values in the linked Proof-owned CI run; run is at or after the merged increment's SHA; red or skipped-without-waiver tests are listed, not hidden
[AI · Cipher] Security findings receipt — Cipher/SAST + dependency-scan run with severity counts, plus any logged waiver (owner + expiry)
Acceptance: Binary gate satisfied: no unwaived High/Critical finding in the linked Cipher scan ID; each waiver carries an owner and an expiry date; the scan ran against this sprint's merged SHA
[AI · Vector] DORA + cost/FinOps receipt — deployment frequency, lead time, change-failure rate, MTTR from the source query/run, and spend-vs-sprint-budget from the FinOps source
Acceptance: Each DORA value and the spend figure link to a re-fetchable source (query/run/dashboard export ID) for this sprint window; binary gate: spend within sprint budget OR variance explicitly explained with the FinOps receipt linked
[AI · Iris] Demo receipt — recording/screens or a live env URL annotated with the exact commit SHA the demo ran against, mapped to the sprint goal
Acceptance: Demo artifact exists and its annotated SHA MATCHES the merged/deployed increment SHA (served-vs-disk / demo-vs-merge hash check passes); if no demo artifact resolves, the word 'demoed' does not appear in the closeout
[AI · Cadence] Accepted/rejected story ledger — per story: status, the binary acceptance criterion, the receipt that settles it (passing test ID for accepted; failing-check receipt for rejected), and disposition (carry / re-shape / cut) for every non-accepted story
Acceptance: Every story in the referenced committed set has exactly one status; each accepted story links a passing test/check receipt; each rejected/incomplete story names the failed binary criterion + the receipt proving failure + a disposition; zero stories silently dropped
[AI · Cadence] Goal-vs-actual delta + scope-change log — delta computed against the immutable planned set, with every mid-sprint add/descope itemized and receipt-linked
Acceptance: Delta denominator = the sprint-plan committed set by ID; each scope change is a line item with its receipt (commit/decision row); no descoped item is absorbed into a higher completion percentage
[AI · Cadence] Honest missed/partial-goal section (present whenever the goal was not fully met) — the partial increment that exists, what blocked the rest, and the blocking receipts
Acceptance: If goal not fully met: section states the shipped partial increment, names each blocker with its receipt (failing test ID, blocked-deploy health code, scope-cut decision row), and routes the remainder into the next-sprint proposal; no success narrative unsupported by the increment's receipts
[Human] + [AI · Vela] Human-input section — open questions surfaced to the sponsor this sprint, answers received, and every place the team proceeded on a flagged assumption due to sponsor silence (input, NOT a gate)
Acceptance: Section lists each question with its status (answered → answer folded into the relevant story/decision with a link; unanswered → explicit flagged assumption recorded); closeout does NOT block on any human approval and contains no human sign-off field
[AI · Vela] Next-sprint proposal — proposed goal + candidate stories that carry forward this sprint's evidence
Acceptance: Proposal references by ID every rejected/carried story, every new risk logged this sprint, and every human input/flagged assumption; the independent reviewer confirms each linkage; no carried item disappears between sprints
[AI · Verdict] Independent review gap-log + convergence verdict (the closeout-finalization receipt) — produced by an independent panel (Verdict lead + Proof/Cipher/Vector/Vela lenses + one adversarial red-team pass); Cadence (author) is barred from the panel
Acceptance: Gap-log records each gap (reviewer, claim, missing/contradicting receipt, severity material|minor) and its resolution by re-resolving the receipt (not by trusting prose); removed-claims logged as removed (not silently edited); red-team pass files the specific receipt-vs-claim mismatches it probed (test-on-stub, coverage-on-dead-code, demo-on-seed-data, green-on-wrong-commit); loop exits ONLY on a logged binary verdict of zero open MATERIAL gaps; if max iterations hit, escalates to the sponsor as input (not as approval gate) with the open gaps stated
Questions the agent asks (6)
  • What is the canonical immutable reference (ID) for this sprint's committed goal and story set, so the delta denominator cannot drift?
  • Which CI run, scan report, DORA query, FinOps export, and demo artifact are the authoritative receipts for this sprint, and are they all re-fetchable by the validator?
  • Sponsor: of the open questions surfaced during the sprint, which would you like to answer now, and where should we record a flagged assumption if you are silent?
  • Does the sponsor have any new information, priority shift, or data that should reshape the next-sprint proposal before it is finalized?
  • Are there any security findings the team intends to waive, and if so who is the named owner and what is the waiver expiry?
  • Was the increment fully demoable end-to-end on the real path, or did the demo run on seed/stub data that the red-team pass must flag?
Do (9)
  • Reference the sprint-plan brick's committed goal+story set by receipt ID and compute the goal-vs-actual delta against THAT immutable set.
  • Attach a receipt ID to every quantitative or status claim and have the validator re-fetch each one before the closeout is considered drafted.
  • Source test/coverage from Proof, security from Cipher, DORA+cost from Vector, and the demo from Iris — never let the author self-source the numbers being claimed.
  • Hand the draft to the independent panel (Verdict + specialist lenses + a red-team pass) and treat it as final only on a logged zero-material-gaps verdict with the gap-log attached.
  • List every rejected/incomplete story with its failed binary criterion, the receipt proving failure, and an explicit disposition (carry/re-shape/cut).
  • Itemize every mid-sprint scope change with its receipt instead of absorbing it into the completion number.
  • Carry the sponsor as a continuous input channel: surface open questions, fold in answers with links, and record flagged assumptions where the sponsor is silent.
  • When the goal was missed or partial, state the partial increment + blocking receipts honestly and route the remainder into the next sprint.
  • Trace every item in the next-sprint proposal back to a rejected/carried story ID, a new risk, or a human input from this sprint.
Don't (9)
  • Do NOT let the author (Cadence) certify the closeout — independence requires a panel that did not author it.
  • Do NOT state any 'demoed / passed / live / accepted / no criticals' claim without a re-resolvable receipt ID supporting it.
  • Do NOT restate, rewrite, or silently descope the sprint goal to inflate the delta — pin it to the immutable planned set.
  • Do NOT 'fix' a flagged gap by quietly deleting the claim; removed claims must be logged as removed.
  • Do NOT add a human sign-off / approval field; the human provides input, never a gate.
  • Do NOT claim green tests at a commit that differs from the merged/deployed SHA (no served-vs-disk or demo-vs-merge hash mismatch).
  • Do NOT count coverage gains on dead code or test passes on stubbed dependencies as real progress — the red-team pass must catch these.
  • Do NOT drop rejected or carried-over stories; carry-over hiding defeats the flow metrics.
  • Do NOT narrate success the increment's receipts do not support; an honest missed-goal closeout beats an optimistic false one.
Guardrails (7)
  • HONESTY INVARIANT: the closeout may claim only what a re-resolvable receipt proves (test counts, scan IDs, ledger/DB rows, git SHAs, health-check codes, reviewer verdicts, demo-artifact hashes); any unresolvable claim is cut or rewritten — never asserted.
  • INDEPENDENCE INVARIANT: the review panel must exclude the author; approval of any claim requires the reviewer to independently re-fetch its receipt, and at least one adversarial red-team reviewer must file the receipt-vs-claim mismatches it probed.
  • CONVERGENCE INVARIANT: 'solid/final' = a logged binary panel verdict of zero open MATERIAL gaps with the gap-log attached as the convergence receipt; a max-iteration cap escalates remaining gaps to the sponsor as input, never as an approval gate.
  • IMMUTABLE-DELTA INVARIANT: goal-vs-actual is computed against the sprint-plan committed set referenced by ID; scope changes are itemized with receipts, not absorbed.
  • SECURITY/COST GATES: closeout passes only with no unwaived High/Critical finding (Cipher scan ID linked; waivers carry owner+expiry) and spend within budget or variance explained (FinOps receipt linked).
  • NO HUMAN GATE: there is no human sign-off; the sponsor is a continuous input channel and their silence is handled by explicit flagged assumptions, not by blocking.
  • SPRINT DISCIPLINE: runs only inside an OPEN sprint and writes only under sprints/sprint-NNN/; no secrets or credentials in the closeout or its receipts.
17

Independent Review & Iterate: Sprint Increment (Anti-Bluff Honesty Audit) [LOOP]

AI Agent: Verdict

This is the sprint-level Truth Gate for the delivery process itself: it replaces the recurring human sprint-review sign-off AND kills the deepest anti-pattern in AI-run delivery — the author grading its own homework. Verdict (the independent evaluator), joined by specialist lenses who did NOT build the increment, independently RE-DERIVES every "done" claim from source-of-truth: it pins the exact git SHA, checks out clean from source control (never the author's working tree), re-runs the suite under its OWN run IDs, re-reads scan output, and replays each story's demo against its acceptance criteria — never accepting Cadence's or Vela's summary. For every claimed-done story it proves the increment is real, not asserted: new tests fail on the pre-change SHA and pass on the post-change SHA, no new skips/xfail, coverage delta meets threshold, suspect tests survive an N-times flaky re-run, security/non-functional DoD slices (scan, migration/rollback, perf/a11y where relevant) are re-checked, receipt rows show outcome==success, and at least one negative/failure path per story is probed. It red-teams for hidden, skipped, and concealed misses; verifies accomplishments against BOTH the sprint goal AND the Objectives rubric; and converges only when an independent panel agrees there are zero misrepresented-status findings and zero open material gaps. The loop has explicit convergence control (material-vs-immaterial gap definition, iteration budget, deadlock rule) so it neither rubber-stamps nor loops forever; non-convergence routes to the continuous human-INPUT channel as a question plus a flagged assumption or descope — never an open human approval hold. Any caught misrepresentation seeds a permanent honesty-regression case so the loop becomes a hardening immune system, and the whole verdict is written to an append-only, re-auditable ledger so the convergence claim is itself receipt-backed. Re-prioritization for the next sprint is an AI decision informed by the human-input channel — no human gate.

Deliverables
[AI · Verdict] Per-story RE-DERIVATION checklist — one completed record per claimed-done story, built from a clean source-control checkout (not the author's working tree)
Acceptance: For EVERY story marked done: record contains (a) the exact reviewed git SHA and confirmation of a clean checkout from source control; (b) Verdict's OWN suite re-run with Verdict-generated run ID/log hash, distinct from Cadence's; (c) proof each story maps to NEW tests that FAIL on the pre-change SHA and PASS on the post-change SHA; (d) no-new-skips/xfail check = pass; (e) coverage delta >= configured threshold; (f) flaky re-run of changed/suspect tests N>=3 times all stable; (g) Cipher-independent re-read of scan output = pass; (h) demo replayed against each AC = pass AND >=1 negative/failure path probed; (i) receipt/ledger rows for the story show outcome==success. A story missing ANY item is NOT counted converged. Stored as a checklist file with one row per item, each item boolean true/false.
[AI · Verdict] Independent findings log — gaps and misrepresented-status findings formed BEFORE reading the author self-assessment (anti-anchoring), with severity and adjudicated disposition
Acceptance: Log records each finding with severity (material | immaterial) and disposition; timestamp evidence shows Verdict's findings were recorded before Cadence/Vela self-assessment was opened. Every MATERIAL finding routes back to the author (Mason/Proof/Keystone) — Verdict performs NO edits. Count of unresolved material findings at convergence == 0; count of misrepresented-status findings at convergence == 0. Each misrepresented/fabricated-done finding is flagged as a process-honesty incident (see incident-memory deliverable).
[AI · Verdict] Goal-vs-actual + Objectives-rubric scorecard — accomplishments validated against BOTH the sprint goal AND the process Objectives rubric
Acceptance: Document states the sprint goal, lists each planned story with actual status, and a delta narrative for any miss. Separately scores the increment against the Objectives rubric (max AI autonomy, no human gates, honesty/receipts, independent-review integrity, lifecycle fit) with an evidence pointer per line. A sprint cannot be marked converged if it is green against the local goal but drifting from the Objectives rubric (any rubric line failing == material gap).
[AI · Cipher / Proof-independent / red-team] Specialist + adversarial review pass — independent lenses re-deriving the slices Verdict's checklist routes to them
Acceptance: Cipher (independent of the build) re-checks the security + migration/rollback-safety DoD slices and signs each with reproduced scan/run IDs; an independent QA lens re-checks test adequacy (assertions actually exercise the AC, not vacuous). A red-team pass is recorded that actively hunts hidden/skipped/quarantined/flaky tests and Truth-Gate-bypass paths (e.g., in-app/internal message paths, 429/rate-limit silent no-ops); red-team output attached with at least the probed paths enumerated. All specialist sign-offs carry reproduced receipt IDs, not assertions.
[AI · Cadence] Convergence-control record — material-gap definition applied, iteration budget tracked, deadlock/oscillation resolved without a human hold
Acceptance: Record defines material (blocks convergence) vs immaterial (logged, deferred to backlog, non-blocking) for THIS sprint; logs iteration count against a configured max-iteration budget; if oscillation (fix A breaks B) or budget exhaustion occurs, records the non-convergence action = surface a question on the continuous human-INPUT channel PLUS proceed on a flagged logged assumption OR descope the unconverged story to next sprint backlog (AI decision). Record contains ZERO instances of an open-ended human approval hold. Re-prioritization for next sprint is recorded as an AI decision citing the human-input channel.
[AI · Verdict] Append-only sprint-review verdict ledger entry — the binary convergence verdict made re-auditable
Acceptance: An append-only (immutable) ledger entry exists containing: reviewed SHA(s), all reproduced receipt/run IDs, the gaps log with severity, iteration count, and the explicit binary verdict (CONVERGED / NOT-CONVERGED). Verdict == CONVERGED only if: every claimed-done story has an independently reproduced passing receipt; misrepresented-status findings == 0; open material gaps == 0; goal-vs-actual documented; reproduced receipt IDs recorded. The entry is re-checkable by a later auditor or the next sprint's Verdict from the IDs/SHAs alone — no reliance on chat history.
[AI · Verdict] Honesty-incident regression seed — each misrepresented/fabricated-done finding turned into a permanent future check
Acceptance: For every misrepresented-status or fabricated-done finding this sprint, a process-honesty incident record is created AND a regression check is added that will run in future sprint reviews; the new check demonstrably FAILS on the offending pre-fix SHA and PASSES after correction (golden-set 'fails-before/passes-after' rule). If zero such findings occurred, the deliverable records 'none' explicitly. Future Verdict runs load and execute these checks.
[Human] Answers to surfaced sprint-review questions on the continuous input channel (when raised) — input only, never approval
Acceptance: Where Verdict/Cadence surfaced a question to the human-input channel, the record links the question and either (a) the human's answer incorporated, or (b) an explicit flagged assumption the team proceeded on after a bounded wait. No item in this sprint review is recorded as blocked-on-human-approval; the human contributes information, never a sign-off.
Questions the agent asks (5)
  • What is the sprint goal for this increment, in one sentence, and what were the planned stories with their acceptance criteria and Definition of Done?
  • Are there domain-specific failure modes or negative paths you specifically want probed for these stories (data we should not corrupt, side effects, rate limits, rollback expectations)?
  • Is the coverage-delta threshold and flaky re-run count (N) for this project what we have configured, or do you want different bars for this sprint?
  • For any story we descope on non-convergence, is there a priority you want us to weigh when re-prioritizing the next sprint — or should we decide from the input channel and current backlog?
  • Are there non-functional DoD slices (perf budgets, accessibility level, migration/rollback drills) that apply to specific stories this sprint that we must re-derive?
Do (11)
  • Pin the exact reviewed git SHA and re-run everything from a clean checkout out of source control — never the author's working tree.
  • Generate and record Verdict's OWN run IDs / log hashes, distinct from Cadence's or Vela's.
  • Prove each done story's new tests FAIL on the pre-change SHA and PASS on the post-change SHA; verify no new skips/xfail and coverage delta >= threshold.
  • Re-run changed/suspect tests N>=3 times; treat any non-determinism as NOT converged.
  • Form your own findings BEFORE opening the author self-assessment (anti-anchoring).
  • Replay each demo against every AC AND probe at least one negative/failure path per story; verify receipt rows show outcome==success.
  • Re-derive the security and non-functional DoD slices (scan, migration/rollback, perf/a11y where relevant), not just the happy-path demo.
  • Score against BOTH the sprint goal and the Objectives rubric; treat any failing rubric line as a material gap.
  • Write the convergence verdict, reproduced receipt IDs, gaps log, and iteration count to the append-only ledger.
  • Seed a permanent fails-before/passes-after regression check for every misrepresented-status finding.
  • On non-convergence, surface a question to the human-input channel plus a flagged assumption or descope — and proceed.
Don't (10)
  • Don't accept any summary, screenshot, dashboard, or 'it works' as a receipt.
  • Don't re-run the suite against the author's working tree, branch, or seed — only a clean checkout at the pinned SHA.
  • Don't count a green suite as proof when the story shipped 0 new tests, skipped/xfail tests, or vacuous assertions.
  • Don't let a quarantined, retried, or flaky test stand in as a passing receipt for its story.
  • Don't review an artifact you authored — Verdict must not be the author of the code, tests, or sprint plan under review.
  • Don't edit or fix the artifact under review — route material gaps back to Mason/Proof/Keystone.
  • Don't read Cadence's or Vela's self-assessment before forming your own findings.
  • Don't let subjective taste block convergence — route it to the human-input channel as a question, not a hold.
  • Don't loop forever — honor the iteration budget; on deadlock convert to a logged assumption or descope.
  • Don't ever convert a review finding into an open-ended human approval gate.
Guardrails (9)
  • INDEPENDENCE: Verdict and any specialist reviewer must provably NOT be the author of the artifact under review; reviewers are review-only and cannot edit the increment.
  • ANTI-ANCHORING: reviewer findings are timestamped/recorded before the author self-assessment is opened.
  • TAMPER-EVIDENCE: every 'reproduced' claim cites a pinned SHA, a clean-checkout, and Verdict's own run ID/log hash; a verdict without reproduced receipt IDs is treated as unbacked and cannot be CONVERGED.
  • BINARY CONVERGENCE: CONVERGED requires every claimed-done story has an independently reproduced passing receipt, 0 misrepresented-status findings, 0 open material gaps, goal-vs-actual documented, and IDs recorded — all else is NOT-CONVERGED.
  • NO HUMAN GATE: the human provides information/answers only; no brick outcome may be 'awaiting human approval'; non-convergence routes to a human-INPUT question plus flagged assumption or descope, and re-prioritization is an AI decision.
  • CONVERGENCE CONTROL: a defined material/immaterial gap rule, an iteration budget, and an oscillation/deadlock escape must be applied so the loop neither rubber-stamps nor runs forever.
  • IMMUTABLE AUDIT: the convergence verdict and its receipts are written to an append-only ledger, re-auditable from IDs/SHAs alone.
  • IMMUNE MEMORY: any caught misrepresentation seeds a permanent regression check (fails-before/passes-after) for future sprint reviews.
  • Verdict gates on EVIDENCE and time-boxes only — it must never become a standing bottleneck that re-creates the human gate with a robot in the chair.
18

Sprint Retrospective & Backlog Refinement [LOOP → next sprint]

AI Agent: Cadence

This is the loop-closing brick of the Sprints phase and does four jobs at once: (1) Cadence runs a disciplined retrospective that first verifies whether the PRIOR sprint's improvement actions were actually done (with evidence) and only then logs 1-2 new improvement actions, each with an owner and a next-sprint verification signal; (2) Cadence refreshes the capacity/DORA and cost/FinOps trends where every number cites a provenance receipt (CI/CD run IDs, deploy logs, incident records, token/cost-meter exports) — no hand-typed metrics; (3) Vela refines and re-prioritizes the backlog against the sponsor's latest input (a pure INPUT channel), running a bounded review_loop until the candidate next-sprint set is "solid"; and (4) the brick produces the BINARY phase-exit decision — loop back to Sprint Planning vs. proceed to Hardening — as a deterministic function of two receipts, NOT a judgment call. Crucially, the Scope-Burndown receipt (what release-scope work remains, sized) and the Done-Done receipt (every release-scope item in terminal state) are both DERIVED FROM the release_plan release-scope register — declared here the single authoritative MUTABLE register of record (the release_plan's value-sequenced sprint list + requirement-to-sprint traceability matrix, re-sequenced after each sprint). Because both exit receipts read from one owned source rather than re-typed ad-hoc lists, the exit predicate is deterministic against a single owned source: an item is in scope, done, or remaining only as the register says, so no "ghost" scope and no quietly-forgotten item can swing the decision. Neither Cadence (who authored the retro/metrics) nor Vela (who owns the backlog) adjudicates convergence or the exit: Verdict (independent_reviewer, who authored neither) independently validates both that the refined backlog is solid and that the exit verdict is correct given the evidence, free to disagree, and the loop closes only on Verdict's documented convergence verdict. The exit reads the Scope-Burndown receipt (register-derived remaining items, sized) and the Done-Done receipt (each register item in terminal state with merge SHA + Lens verdict + Proof test IDs + Cipher scan + demo link), gated by a quality precondition (zero open P0/P1 defects unless explicitly accepted into hardening, no quarantined tests hiding failures, change-failure-rate not regressing): scope-remaining non-empty → LOOP; scope-remaining empty AND done-done clean AND quality-gate green → PROCEED. The human is never an approval gate; their input feeds re-prioritization and they may be asked questions. The anti-deadlock rule is bounded and honest: on sponsor silence, wait a bounded window, log a flagged assumption to the RAID/assumption register with owner + revisit trigger, and proceed — EXCEPT any re-prioritization that drops or deprioritizes a sponsor-requested scope item may not be silently assumed; it holds at prior priority and is surfaced as a question until answered, so "never block" never becomes "silently steer away from the sponsor's brief."

Deliverables
[AI · Cadence] Retrospective record with closed-loop action tracking: prior-sprint improvement actions carried forward and each marked DONE or NOT-DONE with linked evidence, followed by 1-2 NEW improvement actions, each with an owner and a defined next-sprint verification signal.
Acceptance: BINARY: every prior-sprint action appears with status ∈ {done, not-done} and an evidence link (commit/PR/config/process-doc); count of new actions is between 1 and 2 inclusive; each new action has a non-empty owner field and a non-empty verification-signal field. Reject if any prior action is unresolved/silent or if >2 new actions are logged.
[AI · Cadence] Metrics-trend update for capacity, DORA (deploy frequency, lead time, change-failure-rate, MTTR) and cost/FinOps (token/$ per Flow and per story-point), each value annotated with its provenance source receipt.
Acceptance: BINARY: every reported metric value has a non-empty source reference resolving to a real receipt (CI/CD run ID, deploy log entry, incident-log ID, or cost-meter export ID); zero metrics without a source link. Reject the trend update if any single value is unsourced.
[AI · Cadence] FinOps action-hook check: if cost-per-unit-of-work (token/$ per Flow or per story-point) regressed beyond the configured threshold vs. trend baseline, an auto-created backlog item is filed into the release_plan register (and, if runaway, the cost circuit-breaker is tripped/flagged).
Acceptance: BINARY: regression check ran and is recorded with the threshold and measured delta; if delta > threshold a backlog item ID exists in the release_plan register and is linked; if delta exceeds the runaway bound the circuit-breaker flag is set with a link. Reject if a threshold breach occurred but no backlog item/circuit-breaker action is recorded.
[AI · Vela] Refined & re-prioritized next-sprint candidate backlog (a mutation of the release_plan release-scope register, re-sequenced) reflecting the sponsor's latest input, with sponsor-requested scope changes either applied or — where dropping/deprioritizing a sponsor-requested item — held at prior priority and surfaced as a question.
Acceptance: BINARY: every candidate next-sprint item has all of {acceptance criteria, owner ∈ {AI, AI+Human}, size estimate, value/Decision-Tree link} and a register item ID; zero duplicate items; zero orphan items (no parent epic/goal link); any drop/deprioritization of a sponsor-requested item is recorded in the register with prior-priority-held flag + open question ID. Reject if any candidate item is missing a required field, is not reflected in the release_plan register, or a sponsor-scope drop was silently applied.
[AI · Cadence] Scope-Burndown receipt DERIVED FROM the release_plan release-scope register (the authoritative mutable register): enumerated list of release-scope items NOT yet done (not-started + in-flight), each with item ID and size estimate.
Acceptance: BINARY: receipt enumerates remaining release-scope items with register item IDs and size estimates and states whether the set is EMPTY or NON-EMPTY; the set is read from the release_plan release-scope register, not re-typed or hand-curated. Reject if any release-scope register item's state is unknown/unaccounted or if the receipt's item set diverges from the register's item set.
[AI · Cadence] Done-Done receipt DERIVED FROM the same release_plan release-scope register: every register item declared complete enumerated with its terminal-state evidence join (merge SHA, Lens review verdict, Proof test IDs/pass counts, Cipher security-scan result, demo link).
Acceptance: BINARY: for each register item marked done, all five evidence fields {merge SHA, Lens verdict, Proof test IDs+pass count, Cipher scan result, demo link} are present and non-empty; the receipt lists count of non-terminal register items remaining and reconciles done + non-terminal = full register item set (no item missing). 'Release scope complete' is permitted ONLY if the register's non-terminal set is empty. Reject any 'complete' claim while a non-terminal register item exists or if the receipt's item set does not reconcile against the register.
[AI · Cipher+Proof] Quality-gate precondition check feeding the PROCEED branch: open P0/P1 defect count, quarantined/skipped-test inventory, and change-failure-rate/defect trend direction.
Acceptance: BINARY: report states open-P0/P1 count (each open critical either ZERO or explicitly accepted into hardening with a named owner + hardening-entry condition); quarantined/skipped-test count is listed (and any hiding a failure is named); change-failure-rate/defect trend is stated as regressing or not. PROCEED is blocked unless P0/P1 are zero-or-accepted, no quarantined test hides a failure, and trend is not regressing. Reject if any of the three is unreported.
[AI · Verdict] Independent convergence verdict on the refined backlog (review_loop): Verdict (who authored neither the retro nor the backlog) critiques the candidate set against the 'solid' criteria, logs gaps, Vela fixes, and the loop iterates to convergence or to a bounded-iteration escape.
Acceptance: BINARY: verdict records SOLID or NOT-SOLID with the specific 'solid' criteria checked (acceptance criteria + owner + size + value link per item; no duplicates/orphans; every item reflected in the release_plan register; reflects sponsor input or a flagged assumption); if NOT-SOLID after N bounded iterations, the disagreement is surfaced to the human as INPUT and the team proceeds on a logged flagged assumption — never blocks. Reject closure of the refinement loop without a SOLID verdict OR a logged bounded-escape assumption.
[AI · Verdict] Independent adjudication of the BINARY phase-exit verdict (LOOP vs. PROCEED) as a deterministic function of the register-derived Scope-Burndown + Done-Done receipts plus the Quality-gate receipt, with Verdict re-deriving both receipts against the release_plan register and free to disagree with Cadence/Vela.
Acceptance: BINARY: verdict states LOOP or PROCEED and shows the decision rule evaluated against the three receipts — PROCEED iff (register-derived scope-remaining EMPTY) AND (register-derived done-done non-terminal set EMPTY) AND (quality-gate green); otherwise LOOP. Verdict confirms both exit receipts reconcile against the single release_plan register (no item appears in one receipt but absent from the register). Verdict is signed by independent_reviewer, not by Cadence or Vela. Reject any exit declared by the artifact authors, any receipt that does not reconcile against the register, or any PROCEED whose three receipts do not all satisfy the rule.
[AI · Cadence] Plain-language sponsor status note: what improved this sprint, what release-scope remains, where cost is trending, and whether the team is continuing sprints or moving to hardening — jargon-free, no over-claim.
Acceptance: BINARY: note states the exit decision (continuing sprints / moving to hardening) in plain language, lists remaining scope at a glance, and states the cost trend WITHOUT presenting it as a guaranteed outcome; contains no raw infra jargon (DORA/FinOps internal terms expanded or omitted). Reject if it implies sponsor approval was required or presents a trend as a guarantee.
Questions the agent asks (5)
  • Given the current backlog state, what are your top priorities for the next sprint — has anything changed since you last gave input?
  • Is there any scope you previously asked for that you now want dropped, deferred, or re-ordered? (We will hold prior priority on anything you've requested until you confirm.)
  • Are there new business constraints, deadlines, or data we should fold into re-prioritization?
  • For any release-scope item we are about to defer to a future sprint, are you comfortable with that, or should it stay in the current release?
  • If we move to hardening, are there known issues you'd want explicitly carried as hardening-entry conditions versus fixed before exit?
Do (8)
  • Verify and close LAST sprint's improvement actions with evidence before logging any new ones; cap new actions at 1-2, each with an owner and a next-sprint verification signal.
  • Cite a provenance receipt for every capacity/DORA/FinOps number — CI/CD run IDs, deploy logs, incident records, cost-meter exports.
  • Derive BOTH the Scope-Burndown receipt (what remains) and the Done-Done receipt (what is terminal) from the release_plan release-scope register — the single authoritative mutable register — and make the exit a deterministic function of them plus the quality gate.
  • Keep the release_plan register the one source of truth: every backlog refinement, auto-filed FinOps item, drop, or re-sequence mutates the register, and both exit receipts read from it so they cannot diverge.
  • Have Verdict (independent_reviewer) — not Cadence or Vela — re-derive the receipts against the register and adjudicate both backlog 'solid' and the binary exit verdict; let Verdict disagree and iterate until convergence.
  • Treat the sponsor as a continuous INPUT channel: fold their latest input into re-prioritization, ask the listed questions, and on silence proceed on a logged flagged assumption with a revisit trigger.
  • Run the FinOps action-hook: a cost-per-unit-of-work regression beyond threshold auto-files a backlog item into the register and, if runaway, trips the cost circuit-breaker.
  • Report to the sponsor in plain language: what improved, what remains, where cost is trending, and the loop/hardening decision.
Don't (9)
  • Don't let Cadence (retro/metrics author) or Vela (backlog owner) declare convergence or the phase-exit — that is grading their own homework; only Verdict adjudicates.
  • Don't compute the exit from re-typed, ad-hoc, or memory-held scope lists — both exit receipts MUST reconcile against the release_plan register, or the exit is invalid.
  • Don't assert 'release scope complete' without the Done-Done receipt showing the register's non-terminal set empty; a vibe is not a receipt.
  • Don't hand-type or estimate any DORA/FinOps number — an unsourced metric is theater and must be rejected.
  • Don't open new retro actions while prior actions sit silently unresolved.
  • Don't insert any human-approval/sign-off step; the human provides input and answers questions, never gates.
  • Don't silently assume away a sponsor-requested scope item — dropping/deprioritizing one holds at prior priority and is surfaced as a question until answered.
  • Don't PROCEED to hardening while open P0/P1 defects, failure-hiding quarantined tests, or a regressing change-failure-rate are unaddressed and not explicitly accepted.
  • Don't dump infra jargon on the sponsor or present a cost/velocity trend as a guaranteed outcome.
Guardrails (9)
  • Single source of truth: the release_plan release-scope register (value-sequenced sprint list + requirement-to-sprint traceability matrix) is the authoritative MUTABLE register of record; Scope-Burndown and Done-Done are derived views of it, so the exit predicate is deterministic against one owned source — no shadow scope lists.
  • Independence is mandatory: the exit verdict and the backlog 'solid' verdict are signed by independent_reviewer (Verdict), who authored neither artifact and who re-derives both receipts against the register; an author-signed or non-reconciling exit is invalid.
  • Honesty/receipts bite hardest here: every 'done', 'passed', 'complete', metric value, and 'action done' must resolve to a real receipt (register item IDs in terminal state, CI/deploy/incident/cost-meter sources, prior-action evidence) — never asserted.
  • The exit is a deterministic rule, not a judgment: PROCEED iff register-derived scope-remaining EMPTY AND register-derived done-done non-terminal set EMPTY AND quality-gate green; else LOOP.
  • Quality-gate precondition on PROCEED: zero open P0/P1 (or each explicitly accepted into hardening with owner + entry-condition), no quarantined test hiding a failure, change-failure-rate/defect trend not regressing.
  • Anti-deadlock is bounded and honest: on sponsor silence, wait a bounded window, log a flagged assumption to the RAID/assumption register with owner + revisit trigger, then proceed — but a re-prioritization that drops a sponsor-requested item may NOT be silently assumed; it holds at prior priority in the register and is surfaced until answered. Never block on a human approval.
  • Backlog-refinement review_loop has a bounded iteration count; if not converged in N passes, surface the disagreement to the human as INPUT and proceed on the flagged assumption — never deadlock.
  • No work is generated outside an open sprint; this brick runs within the active sprint and, on LOOP, hands the refined register-backed backlog to the next Sprint Planning brick.
  • Sponsor-facing reporting is plain-language and over-claim-free; the receipts are the gate, the sponsor's input is an input, not an approval.
19

Release Hardening: Full Regression, Performance, Accessibility & UAT

AI Agent: Proof

On a frozen release candidate, Proof runs the full quality pass that goes beyond per-sprint testing: end-to-end regression across the full suite plus every fix's targeted regression and the escaped-defect set; performance/load measured against the NFR numeric targets inherited (read-only) from the phase-1 PRD and phase-3 architecture NFR register; accessibility conformance to a named bar (WCAG 2.2 AA) via automated scan AND assistive-technology pass; the defined cross-browser/device/platform matrix with a per-cell verdict; a confirmation that the security gate is GREEN for THIS exact build; and UAT run as a continuous human INPUT channel (the sponsor exercises real scenarios and reports observations that become triaged defect IDs — never an approval gate). All results are captured as an evidence bundle of receipt-backed numbers (CI run IDs, p50/p95/p99 + RPS + error rate, scan output files, matrix grid, defect tickets) bound to a single commit hash; the verdict is COMPUTED from those receipts, never narrated or assumed. Defects are triaged against a binary severity rubric; release-blockers are fixed AND re-tested. The pass condition is binary: zero open blocker-severity defects, all evidence-bundle receipts green, NFR targets met at p95/p99 under a named production-scale load profile, accessibility clean on both tracks, security green, and the candidate frozen at a named commit. Because this verdict is itself a major build artifact and gates deploy, it does NOT stand alone: the immediately-following companion review_loop brick routes Proof's verdict through an independent panel (Verdict + adversarial Proof red-team + Cipher) so Proof never grades its own homework.

Deliverables
[AI · Proof] Frozen release-candidate manifest: the exact commit hash / build-artifact ID the entire quality pass binds to, recorded at freeze time
Acceptance: A single commit hash (or immutable build-artifact ID) is recorded; every downstream evidence-bundle field references this same hash; if any code lands after freeze the manifest is re-issued with a new hash and affected dimensions are re-run (no mixed-hash evidence permitted)
[AI · Proof] Inherited NFR contract: a read-only register copy of the performance/availability/scale targets pulled from the phase-1 PRD and phase-3 architecture, used as the test pass bar
Acceptance: Each performance/scale target traces by ID to the approved PRD/architecture NFR register; no target is created, raised, or relaxed during hardening; a diff vs. the baseline register shows zero target changes
[AI · Proof] End-to-end regression result with receipts: full automated suite + each fix's targeted regression + the prior-sprint escaped-defect set, with flake handling
Acceptance: Receipt shows CI run ID(s) for the frozen hash; reports passed/total for the full suite, explicit pass results for every escaped-defect case, and per-fix targeted regression results; flake rate is reported and any flaky test is quarantined/re-run to a stable verdict — a flaky green does NOT count as pass
[AI · Proof] Performance/load evidence: raw load-test report run at a named production-scale profile, evaluated against the inherited NFR targets
Acceptance: Report states the load profile (concurrent users, ramp, soak duration, production-scale data volume) and gives p50/p95/p99 latency, RPS, and error rate per critical path; pass is judged at p95/p99 (not mean) against the inherited target; the raw report artifact is linked, not summarized
[AI · Proof] Accessibility conformance evidence to WCAG 2.2 AA on two tracks: automated scan + assistive-technology manual pass
Acceptance: Automated scan (e.g. axe/Lighthouse) output file shows zero critical/serious violations for the frozen hash AND a manual track records keyboard-only + screen-reader results per key flow with pass/fail; automated-only is explicitly insufficient; both artifacts linked
[AI · Proof] Cross-browser/device/platform matrix grid with a defined scope and per-cell verdict
Acceptance: The matrix names the in-scope browsers/OS/devices and marks each cell blocking vs. best-effort up front; every blocking cell has a recorded pass/fail with evidence; no blocking cell is empty or silently dropped; the grid artifact is linked
[Human] Sponsor UAT observations (human INPUT channel): the sponsor exercises real-world acceptance scenarios and reports observations/issues
Acceptance: Sponsor-reported observations are captured and each is converted to a triaged defect ID; if the sponsor is silent, the brick proceeds on an explicitly flagged assumption and the AI-executed acceptance-scenario run stands in as binding evidence — there is NO sponsor approval/sign-off pass condition
[AI · Proof] AI-executed acceptance-scenario run covering the PRD acceptance scenarios end-to-end
Acceptance: Every PRD acceptance scenario has a recorded automated/scripted execution result (pass/fail) against the frozen hash with a run receipt; this run is the binding UAT evidence independent of sponsor response
[AI · Proof] Defect triage log against a binary severity rubric, with blocker fixes re-tested
Acceptance: Each defect carries a severity assigned via the documented binary blocker rubric with logged rationale (so reclassification is auditable); every blocker-severity defect is fixed AND has a passing re-test receipt; open blocker count = 0
[AI · Cipher] Security-green confirmation for the frozen candidate
Acceptance: Cipher confirms no new critical/high findings since the last security assessment for THIS commit hash and that dependency scan is clean; the scan output artifact is linked; if not green, the candidate cannot pass
[AI · Proof] Structured quality-verdict evidence bundle binding all dimensions to the frozen hash, with the verdict COMPUTED from receipts
Acceptance: Bundle has binary fields — regression {suite pass/total, escaped-defect result, flake rate}; performance {profile, p50/p95/p99, RPS, error rate vs each NFR target}; accessibility {standard, automated criticals=0, manual AT pass}; matrix {per-cell grid}; defects {open count by severity, blocker count=0}; security {green for build, deps clean} — each field links its raw receipt and the commit hash; the PASS verdict is the boolean AND of these fields, not a narrative; this bundle is then handed to the companion review_loop brick for independent validation
Questions the agent asks (5)
  • Which production scenarios should the sponsor exercise during UAT, and is there representative production-scale data we may load for the performance profile (or do we synthesize it and flag the assumption)?
  • Are there real-world peak/concurrency numbers we should target in the load profile beyond the inherited NFR baseline (e.g. expected go-live traffic, seasonal peaks)?
  • Which browsers / OS / devices are genuinely in-scope for your users so we can mark the matrix cells blocking vs. best-effort accurately?
  • Are there accessibility commitments or specific assistive technologies (screen readers, regulatory bars) your users rely on that we must include in the manual pass?
  • Is there a target release window or freeze date so we can schedule the candidate freeze and avoid post-freeze drift?
Do (8)
  • Bind the entire quality pass to ONE frozen commit hash and re-run affected dimensions if anything lands after the freeze
  • Inherit NFR targets read-only from the approved PRD/architecture register and pass/fail against those exact numbers
  • Judge performance at p95/p99 under a named production-scale load profile, not mean latency on a toy dataset
  • Test accessibility on BOTH tracks — automated zero-criticals AND keyboard/screen-reader manual pass — to WCAG 2.2 AA
  • Treat UAT as a continuous human INPUT channel: convert sponsor observations into triaged defect IDs and proceed on flagged assumptions when the sponsor is silent
  • Report flake rate, quarantine/re-run flaky tests, and only count a stable green as a pass
  • Triage every defect with the binary blocker rubric, log the rationale, and fix-AND-re-test every blocker
  • Link every number to its raw receipt artifact (CI run ID, load report, scan output, matrix grid, defect ticket) and hand the bundle to the companion review_loop for independent validation
Don't (8)
  • Don't mark any dimension 'pass' from a narrated number with no underlying receipt artifact
  • Don't set, raise, or relax NFR targets during hardening — they are inherited contracts
  • Don't report mean/average latency as the performance verdict or run load against a near-empty database
  • Don't accept automated accessibility scan alone as 'conformance'
  • Don't treat sponsor sign-off as a pass condition or block on a human approval gate
  • Don't count a flaky green run as a pass, and don't reclassify a blocker to a lower severity to clear the zero-blocker gate
  • Don't let the cross-browser/device matrix silently shrink — undefined or empty blocking cells are not a pass
  • Don't treat Proof's verdict as final on its own — it must clear the independent review_loop before it gates deploy
Guardrails (7)
  • HONESTY: no claim without a receipt — every pass carries its raw artifact (test-run ID, load report, scan file, matrix grid, defect ticket) and binds to the frozen commit hash; the verdict is computed, never asserted
  • CANDIDATE FREEZE: the pass verdict binds to a specific commit hash/build; any code change after freeze invalidates the verdict and forces re-run of affected dimensions to prevent drift between hardening and deploy
  • NO HUMAN GATE: UAT is human input only — sponsor observations become defects; sponsor silence proceeds on an explicit flagged assumption with the AI acceptance-scenario run as binding evidence; there is no human sign-off brick
  • NFR INTEGRITY: targets trace to the approved spec/architecture register and are never created or relaxed at test time
  • BINARY PASS: pass = zero open blocker-severity defects AND all evidence-bundle receipts green (regression incl. escaped-defects, p95/p99 vs NFR, accessibility both tracks, full blocking matrix, security green for this build) — stated as one timestamped, commit-hash-frozen gate
  • NO SELF-GRADING: Proof authors the verdict but does not finalize it; the companion 'Release-Verdict Independent Review & Iterate' loop (Verdict + adversarial Proof red-team + Cipher) must find no material gaps and log a convergence verdict before 'pass' is real
  • SECURITY DRIFT: the security gate must be confirmed green for THIS exact candidate (no new criticals since last scan, dependencies clean), not inherited from an earlier build
20

Data Migration, Seed & Backfill Readiness

AI Agent: Vector

Prepare and prove — not merely assert — that any data migration, seed load, reference-data load, search-index/cache rebuild, or backfill required at go-live will execute safely and correctly against real production data. Vector authors and runs the migration artifacts, but under the no-human-sign-off reframe the ONLY thing standing between a self-graded "counts match" report and an irreversible data-loss event is an independent receipt-checker; therefore this work brick hands its evidence to an independent verification loop (Verdict + Proof + Cipher) who reproduce the reconciliation, reversibility, and security evidence themselves and iterate with Vector until no material gap remains. Readiness must cover the four pillars done honestly: (1) idempotency PROVEN by re-running the migration twice and after a simulated mid-run kill with identical end-state hashes; (2) a dry-run against a certified production-REPRESENTATIVE dataset (real volume, skew, encodings, nulls, orphans, known-dirty legacy rows) provisioned through a tenant-isolated, PII-controlled path; (3) multi-signal reconciliation (row counts + key-column checksums/hash-totals + referential-integrity + sampled record-level diffs + business invariants), not COUNT(*) alone; and (4) an honest data-layer recovery story — an explicit reversibility classification, a verified pre-cutover backup/snapshot with a PROVEN restore drill, and either a rehearsed rollback OR a tested restore-plus-forward-fix runbook. The brick also measures runtime/lock-impact versus the cutover window, proves resumability, records a big-bang-vs-expand/contract decision, surfaces migration-semantics questions to the sponsor as continuous input, and defines a post-cutover hypercare reconciliation canary. The skip path (product genuinely has no migration/seed/reference/index/cache backfill) is permitted only when an independent reviewer confirms the emptiness with a logged verdict — it is not an unchecked self-note.

Deliverables
[AI · Vector] Migration scope & reversibility classification — enumerates every required data change (schema migration, data transform, seed/reference data, search-index/cache rebuild, backfill) and classifies each as REVERSIBLE / FORWARD-FIX-ONLY / RESTORE-ONLY, with the cutover strategy decision (big-bang maintenance window vs. expand/contract dual-write+backfill) and rationale.
Acceptance: Document committed to version control; every enumerated change has exactly one reversibility class and a stated reason; cutover-strategy decision recorded with rationale; at least one independent reviewer (Verdict) confirms the enumeration is complete (no missing seed/reference/index/cache backfill). Receipt: committed file path + git hash + Verdict completeness verdict line.
[AI · Vector] Migration & rollback/restore scripts in version control — idempotent, parameterized, environment-agnostic migration scripts plus the matching rollback script (where reversible) or restore-plus-forward-fix runbook (where not), with checkpoint/resume support for mid-run abort.
Acceptance: All scripts present under version control with git hashes recorded; no hard-coded secrets/credentials (Cipher scan clean — see security receipt); each non-reversible change links to its restore/forward-fix runbook section; resume-from-checkpoint code path exists and is exercised by the idempotency test. Receipt: file paths + git hashes + Cipher secret-scan result code.
[AI · Vector + Cipher] Certified production-representative dataset — a dry-run dataset (or live-clone) matching production in volume/scale, value distribution & skew, encodings, null patterns, orphaned FKs, duplicate keys, and known-dirty legacy rows, provisioned through a tenant-isolated, PII-masked-or-access-controlled pipeline.
Acceptance: Dataset provenance documented (source, scale vs. prod, dirty/edge cases included); Cipher signs off that provisioning is tenant-isolated with PII masked or access-controlled and no cross-tenant bleed; dry-run on synthetic/seed-only data is explicitly DISALLOWED and the receipt states the dataset is production-representative. Receipt: dataset manifest + scale comparison figures + Cipher provisioning-security verdict line.
[AI · Vector] Pre-cutover backup/snapshot + PROVEN restore drill — a point-in-time backup/snapshot taken on the production-representative environment and an actually-executed restore drill returning the data layer to the known-good pre-migration state.
Acceptance: Backup/snapshot created with timestamp + identifier logged; restore drill executed end-to-end (not just 'backup exists'); post-restore state hash matches pre-backup state hash. Receipt: backup ID + restore-drill log + matching state-hash pair.
[AI · Vector] Idempotency proof — the migration run twice end-to-end AND run once more after a simulated mid-run kill, producing an identical final-state hash each time.
Acceptance: Three recorded runs (run-1, run-2 re-apply, run-3 resume-after-kill); end-state content hash identical across all three; no duplicate rows / double-applied transforms detected. Receipt: three run logs + three end-state hashes shown equal.
[AI · Vector] Multi-signal reconciliation report — dry-run reconciliation against the source-of-truth covering row counts, key-column checksums/hash-totals, referential-integrity checks, N sampled record-level diffs, and aggregate business invariants (e.g., sum-of-balances, distinct counts).
Acceptance: Report attached with all five signal classes present and PASSING: counts match expected per table; key-column hash-totals match; zero referential-integrity violations; sampled record diffs show zero unexpected discrepancies; business invariants reconcile within stated tolerance (or exactly). Every figure is sourced to a re-runnable query/script. Receipt: reconciliation report file + the scripts/queries that produced each signal.
[AI · Vector] Timing, lock-impact & cutover-window fit measurement — measured runtime of the full migration on production-scale data, lock/contention behavior, and resume-after-abort timing, compared against the agreed cutover/maintenance window.
Acceptance: Measured wall-clock runtime recorded; lock/contention impact characterized; total time (including a resume) fits within the cutover window OR the expand/contract strategy is selected with rationale; if it does not fit, a flagged blocker is logged. Receipt: timing log + window comparison + lock-impact notes.
[AI · Vector + Cipher] Migration data-security review — Cipher review of the migration scripts and dataset path for secrets-in-scripts, PII exposure in lower environments, audit-logging of the migration, and cross-tenant isolation.
Acceptance: Cipher scan of scripts is clean (no secrets); PII handling across environments reviewed and either masked or access-controlled; migration actions are audit-logged; no cross-tenant data path identified. Receipt: Cipher scan result code + written security verdict line referencing the scripts' git hashes.
[AI · Vela] Migration-semantics question set to the sponsor (continuous human input, non-blocking) — targeted questions on source-of-truth, acceptable data loss, garbage/legacy fields, and retention/regulatory constraints, with flagged assumptions recorded where the sponsor is silent.
Acceptance: Question set surfaced to the sponsor and recorded; each unanswered item has an explicit flagged assumption the team proceeded on; no brick step blocked waiting on the human. Receipt: question/answer log + list of flagged assumptions.
[AI · Vector] Hypercare data-quality canary — a post-cutover reconciliation re-run plus a data-quality monitor/canary defined to run against the first window of live production traffic.
Acceptance: Canary/monitor defined and wired to execute post-cutover; the post-cutover reconciliation re-run is scripted and reuses the dry-run reconciliation signals; alerting threshold defined for drift/integrity failure. Receipt: canary definition + scheduled/triggerable re-run script + alert config.
[AI · Verdict + Proof + Cipher] Independent verification & iterate loop — an independent panel (Verdict lead, Proof re-runs the reconciliation against source-of-truth without trusting the attached report, Cipher re-checks security/isolation) critiques all evidence against the objectives, logs gaps, Vector fixes, and the loop iterates until the panel REPRODUCES a clean reconciliation/reversibility/security result with no material gap.
Acceptance: Proof independently re-runs reconciliation and reproduces matching counts + checksums + integrity result (does not merely accept Vector's report); Verdict confirms reversibility/backup-restore evidence is real and re-runnable; Cipher confirms no security/isolation gap; convergence verdict logged stating zero open material gaps with iteration count. Receipt: Proof's independent reconciliation output + Verdict convergence verdict + Cipher sign-off, all naming the git hashes/run IDs they checked.
[AI · Verdict] Independently-confirmed SKIP verdict (conditional) — when the product genuinely has no migration/seed/reference-data/index/cache backfill, a Verdict-logged confirmation of true emptiness rather than an unchecked self-note.
Acceptance: Applies ONLY when no migration is required; Verdict independently confirms there is no seed/reference/index/cache/backfill obligation and logs an explicit skip verdict; if Verdict finds any hidden data-load obligation the skip is rejected and the full brick runs. Receipt: Verdict skip-confirmation verdict line (or its absence, triggering full execution).
Questions the agent asks (7)
  • Which system is the authoritative source of truth for each migrated entity, and where do we resolve conflicts when two sources disagree?
  • What is the acceptable data-loss tolerance at go-live (zero, or are certain legacy/garbage fields safe to drop)?
  • Which legacy fields/tables are known to be unreliable, deprecated, or 'garbage' and should NOT be trusted or carried forward?
  • Are there retention, residency, or regulatory constraints (e.g., PII handling, region pinning, audit requirements) that bound how we migrate or where we test?
  • What is the agreed maintenance/cutover window length, and is any downtime acceptable — or must this be zero-downtime (expand/contract)?
  • Can we obtain a production-representative dataset (masked clone or sample) for the dry-run, and through what approved, tenant-isolated path?
  • Are there business invariants (e.g., total balances, record counts per tenant, distinct-key sets) that must be exactly preserved and can serve as reconciliation anchors?
Do (9)
  • Classify every data change by reversibility BEFORE running anything, and pick big-bang vs. expand/contract with explicit rationale.
  • Take a verified pre-cutover backup/snapshot and actually execute a restore drill — prove you can return to a known-good state, do not just confirm a backup exists.
  • Prove idempotency by re-running the migration twice and after a simulated mid-run kill, comparing end-state hashes — never assert idempotency.
  • Run the dry-run on a certified production-representative dataset with real volume, skew, encodings, nulls, orphans, and known-dirty rows.
  • Reconcile with multiple signals — counts AND checksums AND referential integrity AND sampled record diffs AND business invariants — all from re-runnable scripts.
  • Measure runtime, lock/contention, and resume timing against the cutover window, and flag a blocker if it does not fit.
  • Hand the evidence to an independent panel (Verdict + Proof + Cipher) who REPRODUCE the reconciliation and reversibility/security results themselves and iterate until no material gap.
  • Surface migration-semantics questions to the sponsor as continuous input and proceed on explicit flagged assumptions when they are silent.
  • Define a post-cutover hypercare reconciliation canary so correctness is observed under real live traffic, not just declared at cutover.
Don't (9)
  • Do not let Vector grade his own migration — no 'done' without independent reproduction of the reconciliation by an agent who did not run it.
  • Do not pass the brick on synthetic or seed-only data that fails to reproduce production volume, skew, and dirty/legacy edge cases.
  • Do not reduce reconciliation to COUNT(*) parity — row counts can match while data is silently corrupted.
  • Do not claim 'rollback validated' for a destructive/one-way migration; classify it honestly and prove restore-from-backup plus a forward-fix runbook instead.
  • Do not skip the pre-migration backup-and-restore drill; 'rollback exists' is not 'we restored to known-good in a drill.'
  • Do not copy production PII into lower environments unmasked or through a non-isolated path; Cipher must clear the dataset pipeline.
  • Do not block any step waiting for human approval — humans provide input, they do not gate; record a flagged assumption and proceed.
  • Do not allow a self-asserted SKIP — an independent reviewer must confirm there is genuinely no migration/seed/reference/index/cache backfill.
  • Do not assert any 'passed/done/live' claim without an attached, re-runnable receipt (hashes, logs, scan codes, reviewer verdicts).
Guardrails (8)
  • HONESTY: every done-criterion maps to a re-runnable receipt — backup/restore-drill log, double-run + post-kill hash equality, checksum/integrity reconciliation output, timing measurement, Cipher scan code, Verdict/Proof reproduction verdict — never an assertion.
  • INDEPENDENT VERIFICATION IS MANDATORY: the migration is not 'ready' until Proof independently reproduces the reconciliation against source-of-truth and Verdict + Cipher confirm reversibility and security; the runner never certifies his own irreversible operation.
  • REVERSIBILITY HONESTY: for any change not cleanly reversible, the rollback requirement converts to a tested restore-from-pre-migration-backup plus a documented, rehearsed forward-fix runbook — pretending a clean rollback exists is forbidden.
  • DATA-LOSS GUARD: no destructive transform runs in any environment without a verified, restore-proven backup of that environment first.
  • SECURITY/ISOLATION: the production-representative dataset and migration scripts must be tenant-isolated, PII-masked-or-access-controlled, secret-free, and audit-logged, with a Cipher verdict on record — consistent with the company's per-client-isolation/data-security pillar.
  • NO HUMAN GATE: the sponsor is a continuous input channel (migration semantics, acceptable loss, regulatory constraints); the team never blocks on human approval and records flagged assumptions when the human is silent.
  • SKIP INTEGRITY: a skip is valid only with an independent Verdict confirming true emptiness; a hidden seed/reference/index/cache/backfill obligation rejects the skip and forces full execution.
  • CUTOVER FIT: a migration that is correct but exceeds the cutover window or locks tables is NOT go-live ready — timing/lock evidence and a resumable/expand-contract path are required before 'ready.'
21

Security Assessment & Penetration Test

AI Agent: Cipher

Produce the authoritative, evidence-backed security go/no-go for the release candidate (RC) — the decision that REPLACES a human security sign-off — such that a real CISO would accept it because the receipts, not the assertions, carry it. Cipher leads a four-stage flow on the EXACT RC artifact (pinned commit + built-image digest): (1) refresh the threat model (STRIDE per trust boundary, attack-trees) and author a scope-of-record that an independent reviewer approves BEFORE any testing, so scope cannot be drawn to dodge the dangerous surface; (2) execute the full battery — SAST, DAST, SCA, secret-scanning, SBOM/provenance, ASVS-level control verification, compliance/privacy + per-tenant isolation tests, a concrete agentic/LLM attack suite (OWASP LLM Top 10 + Flowtely's own known sinks: prompt injection direct/indirect, tool/capability/confused-deputy abuse, cross-tenant/cross-flow exfiltration, responder-injection, Truth-Gate/Faithfulness-gate bypass, secret egress via model output), and a regression golden-set seeded from real prior incidents — every run emitting a receipt (tool+version, ruleset hash, target digest, finding IDs); (3) remediate-and-RE-TEST in an iterate-until-clean loop where Mason fixes and Cipher re-runs the specific check against the patched build, closing a finding only on a find→fix-commit→retest receipt; (4) submit the Security Verdict of Record to an INDEPENDENT review-and-iterate loop where Verdict plus an adversarial red-team (agents who did NOT run the assessment) challenge scope, severity ratings, and every "fixed/closed" claim, attempt to re-exploit a sample of declared-closed findings and scope-excluded surfaces, log material gaps, and iterate until they converge that the objectives are met. The verdict is binary and binding: GO requires 0 open high/critical, every blocking finding retest-verified by receipt, the assessed digest unchanged, and panel convergence; a no-go routes back to the sprint loop with the findings register as input. The human (sponsor) is a continuous INPUT channel for residual-risk grey zones and business context (e.g., "this endpoint is internal-only"), never an approval gate — and silence never converts a true critical into a pass.

Deliverables
[AI · Cipher] Scope-of-Record & Refreshed Threat Model bound to the RC digest — STRIDE-per-trust-boundary + attack-trees enumerating every reachable surface (incl. agentic/LLM and multi-tenant data paths), each threat mapped to at least one planned test or an explicit, justified accepted-risk; pinned to the RC commit SHA + built-image digest.
Acceptance: File exists and references the exact RC commit SHA and image digest; every enumerated surface is marked tested-or-risk-accepted-with-reason (0 surfaces left 'unaddressed'); every STRIDE/attack-tree threat row links to >=1 test ID or a justified accepted-risk ID; no test execution receipt predates the independent scope-approval receipt.
[AI · Verdict] Independent Scope Approval — Verdict + red-team review the scope-of-record/threat model BEFORE testing for scope-gaming, omitted surfaces, stale/divergent build, and env non-parity; gaps logged and re-iterated until they sign that scope is complete and bound to the correct digest.
Acceptance: Approval record exists with reviewer = agents who did NOT author the scope; lists every gap raised with status RESOLVED; final verdict field = APPROVED; references the same RC digest as the scope-of-record; timestamp precedes the first scan/pentest receipt.
[AI · Cipher] Automated Scan Battery Results with receipts — SAST + DAST + SCA + secret-scanning + SBOM/provenance, each run against the prod-representative environment built from the pinned digest, with a coverage table (surface × tool × result) and per-run receipt.
Acceptance: Each tool has >=1 run receipt recording tool name+version, ruleset/config hash, target digest, run ID, and finding count; coverage table lists every in-scope surface with a non-empty tool×result cell or a linked risk-acceptance; DAST/pentest env-of-record digest/config matches the prod-parity manifest; SBOM generated and provenance attestation present for the assessed image.
[AI · Cipher] Agentic/LLM Attack Suite Results — concrete, named test cases for OWASP LLM Top 10 + Flowtely sinks: direct & indirect prompt injection (via documents/email/uploaded files), tool/capability/confused-deputy abuse, cross-tenant/cross-flow data exfiltration, responder-injection, Truth-Gate/Faithfulness-gate bypass (in-app message hooks + internal messages), secret egress via model output, honesty-control jailbreak.
Acceptance: Each named attack class has >=1 executed test case with a pass/fail result and a run receipt (input, target digest, observed output, verdict); any fail becomes a finding in the register with a CVSS score; the known P0 gate-bypass sinks each have an explicit executed case (no class left 'not run').
[AI · Cipher] Compliance/Privacy & Multi-Tenant Isolation Test Pack — explicit pass/fail cases for per-tenant data isolation (incl. the recorded cross-tenant/parent-data-read incident), authz/access-control, PII handling & data-retention, audit-log completeness, mapped to the target ASVS level controls.
Acceptance: Each control area has >=1 executed test case with binary pass/fail and a receipt; the recorded cross-tenant-read incident appears as an explicit case and result; ASVS-level coverage table shows every required control as verified/failed/risk-accepted with no blank cells; 0 failed controls remain open at verdict (else they are blocking findings).
[AI · Cipher] Regression Golden-Set Result — re-run of prior real incidents (responder-injection, Truth-Gate bypass, cross-tenant read, secret egress) to prove previously-fixed criticals have not regressed.
Acceptance: Golden-set file lists each seeded incident with an executed test and PASS/FAIL + receipt; 0 golden-set cases FAIL at verdict time; any FAIL is filed as a blocking critical finding.
[AI · Cipher+Mason] Findings Register with Remediate-and-Retest receipts — every finding carries id, CVSS vector+score, status, and for every blocking (high/critical) finding a find→fix-commit→retest chain proving the re-run of that specific check passes against the PATCHED build.
Acceptance: Every high/critical finding row has a fix-commit SHA AND a retest receipt whose target digest == the patched build AND whose result = PASS; 0 high/critical findings remain in OPEN state; no finding is marked CLOSED without a retest receipt (assertion-only closure is rejected).
[Human] Residual-Risk & Business-Context Inputs (input, not approval) — sponsor-supplied context for grey-zone findings (e.g., 'endpoint is internal-only') and any residual-risk acceptances; provided through the continuous input channel.
Acceptance: Any risk-acceptance/suppression in the register is itemized with justification, expiry date, and either a sponsor-input reference or an explicit flagged-assumption note used because the sponsor was silent; NO true high/critical is risk-accepted into GO on silence (a real critical stays no-go regardless of human input state).
[AI · Cipher] Security Verdict of Record — signed, digest-bound artifact aggregating scope-of-record, coverage table, full findings register, itemized suppressions/risk-acceptances, agentic/LLM + compliance + golden-set results, and the binary GO/NO-GO with its evidence index.
Acceptance: Verdict states GO or NO-GO; GO is asserted only if all of: 0 open high/critical, every blocking finding retest-verified by receipt, golden-set 0 FAIL, assessed image digest == current RC digest, and independent panel convergence record = CONVERGED; every 'scanned/found/fixed/retested/blocked' claim links to a ledger receipt (no free-typed claims); a NO-GO lists the open blocking findings handed back to the sprint loop.
[AI · Verdict] Independent Verdict Review & Convergence Record — Verdict + adversarial red-team (who did NOT run the assessment) critique the verdict against the objectives, attempt to re-exploit a sample of declared-closed findings and scope-excluded surfaces, log material gaps, the author fixes, and the loop iterates until the panel agrees no material gap remains.
Acceptance: Record lists every gap/objection raised with status RESOLVED; red-team re-exploit attempts on a sampled set of closed findings all = STILL-CLOSED and surface a no new material high/critical; final convergence field = CONVERGED with reviewer identities distinct from the assessment author; bound to the same RC digest as the verdict; if digest changed, record = INVALIDATED and loop re-opened.
Questions the agent asks (5)
  • Are there endpoints, surfaces, or data flows you consider internal-only or out-of-scope for external attack — and what is the business justification (so we scope correctly rather than guess)?
  • For any borderline residual-risk finding, what is your risk tolerance / business context, and is there a compliance regime (e.g., SOC 2, GDPR, client contractual security terms) whose obligations must be treated as blocking?
  • Are there known sensitive tenants or client-data boundaries that must get the strictest isolation testing, and any prior security incidents you want explicitly re-tested in the golden set?
  • Is there a target ASVS level (e.g., L2) and an expected go-live date that should drive the depth/timebox of the assessment?
  • Do you have or want a third-party pentest in addition to the AI-run assessment, and if so does it gate or merely inform the verdict?
Do (8)
  • Bind the entire assessment to the EXACT release candidate — pinned commit SHA + built-image digest — and run DAST/pentest against a prod-parity environment of record.
  • Approve scope independently BEFORE any testing; derive the test plan from the refreshed threat model so coverage is provably driven by threats, not by whatever the tools happen to catch.
  • Operationalize the agentic/LLM suite concretely against OWASP LLM Top 10 + the platform's own known gate-bypass sinks; re-run the golden set of past real incidents every time.
  • Close a finding only on a find→fix-commit→retest receipt against the patched build; iterate the remediate-and-retest loop until 0 open high/critical.
  • Make every security claim receipt-backed (scan run ID, tool+version, ruleset hash, target digest, finding IDs, retest diffs) so the verdict is server-reconcilable, not free-typed.
  • Subject the verdict itself to the independent review-and-iterate loop (Verdict + adversarial red-team who did not run the assessment) and converge explicitly before GO.
  • Itemize every suppression/risk-acceptance with justification, expiry date, and independent approval; count a risk-accepted critical against the gate.
  • Surface residual-risk grey zones and business-context questions to the sponsor as INPUT; where silent, proceed on an explicit flagged assumption — except never for a true critical.
Don't (8)
  • Don't let the assessment author self-certify the go/no-go — the verdict is a gated artifact that an independent panel must review and converge on.
  • Don't mark anything 'scanned/found/fixed/retested/blocked' without a real receipt; no green dashboard, suppressed finding, or untested fix may stand in for evidence.
  • Don't draw scope to exclude the scary surface, scan a stale/divergent build, or test in an env that differs materially from prod.
  • Don't close a finding by assertion or by 'the dev says it's fixed' — only a passing re-test against the patched build closes it.
  • Don't ship a GO if any high/critical is open, any golden-set case fails, the assessed digest no longer matches the RC, or the panel hasn't converged.
  • Don't reduce 'agentic/LLM' or 'compliance/privacy' to a checkbox — every named attack class and control area needs an executed pass/fail case.
  • Don't let the human become an approval gate, and don't let human silence turn a real critical into a pass.
  • Don't treat the verdict as durable after a post-assessment code change — any digest change invalidates it and re-opens the retest loop.
Guardrails (7)
  • Digest immutability: the verdict is bound to the assessed image/commit digest; any change to that digest after the verdict automatically INVALIDATES it and re-opens the remediate-and-retest loop — no GO survives a post-assessment 'tiny fix'.
  • Binary gate: GO requires ALL of {0 open high/critical, every blocking finding retest-verified by receipt, golden-set 0 FAIL, all required ASVS/compliance controls verified-or-independently-risk-accepted, assessed digest == RC digest, panel CONVERGED}; otherwise NO-GO.
  • Independent review is mandatory and the reviewers (Verdict + red-team) must not be the assessment author; convergence is evidence-based, may require multiple iterations, and is documented.
  • Honesty architecture: every security claim references a ledger receipt; the verdict text is reconcilable against receipts and cannot be free-typed — unbacked claims are rejected by construction.
  • Scope-of-record must be independently approved before testing begins; suppressions/risk-acceptances are itemized, justified, expiry-dated, independently approved, and counted against the gate.
  • Human-as-input only: the sponsor supplies residual-risk/business context but never approves; a true high/critical remains NO-GO regardless of human silence or input.
  • No-go is a closed feedback loop, not a dead end: open blocking findings route back to the sprint loop with the findings register as input; this brick feeds deploy/go-live ONLY on a converged GO verdict.
22

Infrastructure Readiness, Signed Packaging & Release Prep

AI Agent: Vector

Vector makes production real: provision and finalize prod infrastructure entirely via IaC, configure scaling, runtime-resolved secrets, backups/DR, and observability (dashboards, alerts, on-call), then produce a versioned, reproducible, signed and provenance-attested (SLSA/in-toto) release artifact with a published SBOM and release notes, plus a deployment runbook and a rollback procedure exercised in staging. Crucially, this brick must never self-certify: every "done" claim (reproducible, signed, attested, rollback-safe, DR-restorable, alerts-fire, secret-free, parity-matched) is converted from prose into a machine-checkable receipt, and an INDEPENDENT panel (Verdict as independent evaluator + Cipher for AppSec/supply-chain, neither of whom authored the release) RE-EXECUTES the proofs — independent rebuild-and-hash-compare, re-running provenance/signature verification, re-scanning the SBOM and secrets, and replaying the rollback/DR/alert drills — rather than reading Vector's summary. The brick terminates in a single Release-Readiness Record linking every receipt with the panel's evidence-based convergence verdict ("solid: no open material gaps") that is independently reproducible from the receipts; a criterion with no machine-checkable receipt is automatically a FAIL, and the loop iterates (Vector fixes, panel re-verifies) until no material gap remains. Where real prod inputs are absent (cloud account, prod DNS, DR region, compliance scope), the team proceeds on an EXPLICIT, FLAGGED assumption surfaced to the human as a non-blocking question rather than blocking on approval, and Verdict confirms no fake-prod assumption is silently load-bearing. DELIVERY-PACKAGE ADDENDUM (Sprint 117): the packaging output is a SELF-DEPLOYING PACKAGE per the deploy.md standard — a fresh Claude must be able to deploy it from deploy.md alone.

Deliverables
[AI · Vector] Production infrastructure provisioned/finalized via IaC, with staging provisioned from the SAME IaC modules/version (environment-parity receipt)
Acceptance: `terraform apply` (or equivalent) completes with exit 0 and a stored state; a parity diff (`terraform plan` of staging vs prod-target modules, or equivalent) is attached showing ONLY intended deltas (instance size, secret refs, replica count) and ZERO topology/resource-type drift; receipt file lists module versions pinned by digest for both environments.
[AI · Vector] Versioned, signed, provenance-attested release artifact with build receipt
Acceptance: Artifact is tagged with a semantic version + immutable digest; `cosign verify` succeeds on the signature; an in-toto/SLSA provenance attestation exists and `slsa-verifier`/`cosign verify-attestation` passes against the stated policy (builder identity + source repo + commit pinned); receipt records artifact digest, signer identity, provenance predicate, and the exact source commit hash.
[AI · Verdict] Independent rebuild-and-compare receipt (reproducibility proven, not asserted)
Acceptance: Verdict (or a clean builder that did NOT produce the original) rebuilds the artifact from the pinned source commit in a fresh environment; rebuilt digest == Vector's published digest (bit/hash identical) — recorded with both digests side by side; if non-identical, the diff is documented and the criterion is FAIL until resolved.
[AI · Vector] Published SBOM with vuln-policy scan receipt
Acceptance: SBOM (CycloneDX or SPDX) is published as a named, versioned artifact tied to the release digest; a vuln scan of the SBOM (e.g. grype/trivy) runs and produces ZERO unwaived critical/high findings per the policy Cipher set earlier, with every exception traced to a ledger waiver ID; scan output attached.
[AI · Vector] Rollback drill in staging with transaction-level receipt (forward+backward migration safety, measured RTO)
Acceptance: Drill deploys vN, then vN+1 INCLUDING a forward DB migration, then rolls back to vN; receipt proves (a) measured rollback time <= stated RTO target, (b) post-rollback health checks all green, (c) a smoke/canary transaction returns correct data, and (d) data written under vN+1 either survives or is handled exactly per the documented backward-compat/migration policy; timestamps and outputs attached.
[AI · Vector] DR backup + restore drill with measured RPO/RTO receipt
Acceptance: A backup is taken, a clean target is provisioned, the backup is restored, and a smoke transaction against the restored system returns correct data; receipt records measured RPO and RTO and confirms both meet stated targets (backup-never-restored does not count); restore logs attached.
[AI · Vector] Observability live + synthetic-failure alert-fire drill receipt
Acceptance: Dashboards exist for golden signals and on-call rotation is configured; a fault is injected (kill a process / saturate a dependency) and the receipt shows the matching alert fired (alert-fired timestamp) AND was delivered to the on-call channel (delivery receipt) — 'alerts configured' alone is FAIL.
[AI · Cipher] Independent secrets scan across artifact, SBOM, provenance attestation, IaC state, CI logs, and runbook + runtime-secret-source verification
Acceptance: Cipher independently re-runs secret scanning (e.g. gitleaks/trufflehog) across ALL named surfaces (artifact, SBOM, in-toto attestation, IaC state files, CI logs, runbook) with ZERO unwaived findings; plus a check confirming runtime secrets resolve from the secret manager at deploy time (not baked into the artifact); combined scan report attached.
[AI · Vector] Deployment runbook + release notes
Acceptance: Runbook documents step-by-step deploy, rollback, DR-restore, and on-call escalation, and a dry-runner (Proof or Verdict) executes the deploy section against staging end-to-end with exit-0 / health-green outcome recorded; release notes enumerate changes, the artifact digest, and known limitations.
[AI · Vector] Staging deploy dry-run success receipt
Acceptance: A full deploy to staging (provisioned from the same IaC as prod-target per the parity receipt) completes with exit 0, all post-deploy health checks green, and a smoke transaction passing; receipt links to the parity diff so the green dry-run is provably representative of prod.
[AI+Human · Vector] Flagged-assumptions ledger for missing real-prod inputs
Acceptance: Every place a real prod input is unavailable (prod cloud account, prod DNS, real DR region, compliance scope) is recorded as an explicit, dated assumption with the proxy used (e.g. 'staging account used as prod-proxy'), surfaced to the human as a NON-BLOCKING question; the brick proceeds; Verdict confirms no listed assumption is silently load-bearing on a release-readiness criterion without being flagged.
[AI · Verdict] Release-Readiness Record with independent convergence verdict (anti-rubber-stamp gate)
Acceptance: A single manifest links every receipt above (build digest, provenance-verify output, SBOM+vuln scan, secret scan, rollback drill, DR drill, alert drill, parity diff, dry-run) with its digest/timestamp; Verdict + Cipher record an evidence-based verdict that EACH criterion has an attached, independently-RE-EXECUTED receipt (they rebuilt/re-scanned/replayed, not read Vector's summary) and that no material gap remains; any criterion lacking a machine-checkable receipt is logged FAIL and the loop iterates until the verdict reads 'solid: no open material gaps'; the record is independently reproducible from its linked digests.
[AI · Vector] Self-deploying PACKAGE per the deploy.md standard (docs/standards/deploy-md-standard.md): the functional source + the test suite + a PINNED dependency manifest + packaging (Dockerfile and/or scripts) + an agent-executable deploy.md + README — everything a fresh machine needs, with NO secret baked in.
Acceptance: The package contains runnable source, a real test suite, pinned deps, packaging, and a deploy.md that passes the standard's COMPLETE checklist (all 9 sections; env-only secrets; binary test gate; readiness poll not sleep; exact expected health + a real feature smoke test; teardown; redeploy note). A package content hash is recorded.
Questions the agent asks (7)
  • What are the production cloud account, region(s), and DR region we should target — and if none is provided yet, do you accept us proceeding on a staging-account proxy recorded as a flagged assumption?
  • What are the production DNS/domains and TLS/cert ownership for go-live?
  • What are the binding RTO and RPO targets the rollback and DR drills must meet?
  • Are there compliance/data-residency constraints (e.g. SOC2, HIPAA, GDPR, region pinning) that change infra topology or backup handling?
  • Which secret manager is the system of record for runtime secrets (e.g. cloud KMS/Secrets Manager, Vault), and who controls access?
  • What is the on-call rotation and escalation channel (e.g. PagerDuty/Opsgenie/Slack) the alert-fire drill must page?
  • What is the approved policy for handling data written under vN+1 if a rollback to vN occurs (preserve, migrate-down, quarantine)?
Do (8)
  • Convert every 'done' into a machine-checkable receipt with a digest/timestamp before claiming it; a criterion with no receipt is a FAIL, not a pass-on-trust.
  • Have Verdict and Cipher RE-EXECUTE proofs themselves — independent rebuild, re-run cosign/slsa-verifier, re-scan SBOM and secrets, replay rollback/DR/alert drills — never accept Vector's prose summary.
  • Provision staging from the identical IaC modules/version as prod-target and prove parity with a plan-diff before trusting any dry-run as representative.
  • Pin all sources, base images, and build inputs by digest so the independent rebuild can produce a hash-identical artifact.
  • Exercise (not just configure) backups, DR restore, rollback, and alerting with real fault injection and measured RPO/RTO.
  • Source runtime secrets from the secret manager and verify at deploy time that nothing sensitive is baked into the artifact, attestation, IaC state, logs, or runbook.
  • Record every missing real-prod input as an explicit flagged assumption surfaced to the human as a non-blocking question, and proceed.
  • Iterate the Vector-fix / panel-re-verify loop until the Release-Readiness Record reads 'solid: no open material gaps.'
Don't (9)
  • Don't let Vector be its own auditor — the release-readiness verdict belongs to Verdict + Cipher, who did not author the release.
  • Don't call the artifact 'reproducible' without a second independent rebuild yielding a hash-identical digest.
  • Don't call rollback 'validated' without a transaction-level receipt covering forward/backward migration, measured RTO, post-rollback health, and a correct smoke transaction.
  • Don't call backups/DR 'configured' as proof — only a successful restore drill with a smoke transaction counts.
  • Don't call alerts 'live' without a synthetic-failure drill proving the alert fired AND reached on-call.
  • Don't narrow secrets checking to the artifact only — scan SBOM, attestation, IaC state, CI logs, and runbook too.
  • Don't block on a human approval gate; surface questions and proceed on flagged assumptions where the human is silent.
  • Don't silently use staging-as-prod or any proxy without recording it in the flagged-assumptions ledger and having Verdict confirm it isn't load-bearing unflagged.
  • Don't ship an SBOM with unwaived critical/high CVEs and call it published-and-done.
Guardrails (7)
  • No human sign-off: this brick is AI-owned; the human only provides inputs/answers and the team proceeds on flagged assumptions where silent.
  • Independent verification is mandatory and adversarial: the red-team pass must actively try to (1) rebuild and get a DIFFERENT hash, (2) find a secret in the attestation/IaC state/logs, (3) break rollback with a non-backward-compatible migration, (4) prove an alert silently doesn't fire, and (5) catch staging masquerading as prod.
  • Honesty architecture: no 'done/passed/live/releasable' claim ships without an attached, independently-reproducible receipt; unreceiptable claims auto-FAIL.
  • No real secrets in any artifact, SBOM, provenance attestation, IaC state file, CI log, or runbook; runtime secrets must resolve from the secret manager.
  • Any 'N/A because no real prod yet' must appear in the flagged-assumptions ledger and be explicitly accepted there — never silently waved through.
  • The release is not 'live' until an agent who did not build it has independently reproduced the proof and the convergence verdict reads 'solid: no open material gaps.'
  • Convergence must be evidence-based and reproducible from the linked receipt digests, not a narrative attestation.
23

Clean-room Deployment Verification (fresh Claude deploys the package from deploy.md)

AI Agent: Proxy

Prove the delivered package is REAL: a fresh, ZERO-CONTEXT Claude Code session, in an isolated clean workspace containing ONLY the package, executes the package's own deploy.md — installs, runs the test gate, starts the service, health-checks, runs a feature smoke test, and verifies persistence — escalating to the human sponsor ONLY for declared inputs (secrets) it cannot supply. The receipt-backed verdict (DEPLOYED-AND-VERIFIED or FAILED at step N) bound to the package hash is what gates release; a FAIL routes the defect back into the build loop. This is the operational proof that replaces self-graded test claims (run 6a4bd141 failed exactly here). Canonical contract: docs/standards/clean-room-verification-brick.md.

Deliverables
[AI · Proxy] Clean-room provisioned: a fresh isolated workspace containing ONLY the package (copy of the build output), with no build-agent context and no parent-repo access.
Acceptance: Workspace exists, holds the full package incl. deploy.md, is isolated to its own dir, and the verifier session has zero other project context. (Isolation hardening per the brick contract; until OS-enforced, the directory-scoping limitation is recorded honestly.)
[AI · verifier] Deployment transcript + receipts (deploy_receipt.json): per deploy.md step — command + real exit code + observed output; tool versions; tests N/M; exact /health response; smoke-test result; restart-persistence check; secret-escalation events (declared-input names redacted).
Acceptance: Every deploy.md step has a recorded command+exit+output; the test gate result is present; health + smoke match deploy.md's stated expected outputs; no step skipped; no secret value written in cleartext to the receipt.
[AI · verifier] Binary verdict (verification_verdict.json): DEPLOYED-AND-VERIFIED or FAILED at step <n>: <reason>, bound to the package content hash.
Acceptance: Verdict is one of the two literals, cites the package hash, and is consistent with the receipts (DEPLOYED-AND-VERIFIED requires install ok + tests pass + health ok + smoke ok).
[AI+Human · Proxy] Secret-escalation log: each declared input the verifier could not supply, the prompt sent to the sponsor, and that the run proceeded only after the sponsor supplied it.
Acceptance: Every required declared input appears with its escalation; no secret persisted to a tracked artifact; minimal-human-input path honored (only unavoidable inputs asked).
Questions the agent asks (1)
  • (verifier, if blocked) deploy.md declares input <X> which I cannot supply — sponsor, what is its value?
Do (4)
  • Hand the verifier ONLY the package; it must have zero build context (independence).
  • Execute deploy.md exactly; capture per-step command+exit+output as receipts.
  • Escalate to the sponsor only for declared inputs the verifier cannot supply; otherwise run autonomously.
  • Bind the verdict to the package content hash; FAIL routes the defect back to the build loop.
Don't (4)
  • Don't let the verifier read the parent repo, other tenants, or build-agent context.
  • Don't write any secret value into the receipt/transcript in cleartext.
  • Don't mark verified without install-ok + tests-pass + health-ok + smoke-ok receipts.
  • Don't soften a FAILED verdict to terminate — fail closed.
Guardrails (6)
  • GATE-FREE INVARIANT: the human is consulted ONLY for declared inputs the verifier cannot supply; never an approval gate.
  • INDEPENDENCE: the verifier is a NON-AUTHOR, zero-context session with only the package.
  • HONESTY-RECEIPT: DEPLOYED-AND-VERIFIED is valid only with receipts (per-step exits, real test output, health/smoke matching deploy.md) bound to the package hash; reviewers re-derive, never accept a summary.
  • NO-SECRETS-AT-REST: env-only secret injection; declared-input names redacted in receipts; transcript scrubbed of secret values.
  • FAIL-CLOSED: a failed step yields FAILED at step N; package NOT released; defect re-enters the build loop.
  • ISOLATION HONESTY: on a shared host, directory-scoping is NOT OS-enforced isolation; record the limitation until the verifier runs as a non-parent UID / container (P2a).
24

Independent Review & Iterate: Production Release Readiness (Go = Evidence) [GO/NO-GO]

AI Agent: Verdict

This terminal brick replaces the retired human go-live sign-off with an independent, adversarial, receipt-anchored review loop that is the last defense before irreversible production impact. Verdict (lead) plus a red-team that authored none of the upstream artifacts independently RE-VERIFY the four upstream independent-review verdicts — Proof's quality verdict, Cipher's security verdict, Vector's infra/rollback/data-migration readiness, and the independently-reviewed accessibility/privacy NFRs — by REPRODUCING each one's underlying receipt (re-run the suite by run ID, recompute open high/critical from raw scanner output, replay the staging rollback dry-run, re-run the migration reconciliation), never by reading an upstream GREEN summary. The loop consumes and does not redo upstream judgments, but it trusts nothing it has not personally reproduced and bound to the exact shipped commit SHA. The red-team's explicit job is to find a reason NOT to ship: re-adjudicate every downgraded defect/finding, hunt stale or mismatched or failure-outcome receipts, and attempt to break the conservative NO-GO default. This brick also closes the program-long assumption-drift gap: no HIGH-blast-radius flagged assumption may ride 'open' into production — GO requires every HIGH-blast-radius entry in the canonical Assumption Register (sourced from the program_health_rollup) to be in a terminal state (CONFIRMED, REFUTED-and-handled, or EXPLICITLY-ACCEPTED-AS-RISK by a logged human owner, never AI-self-granted). Convergence is gap-gated and evidence-based: GO is impossible while any logged gap is open OR any HIGH-blast-radius assumption is open, requires at least two concurring independent reviewers plus a red-team that found no ship-blocker, names the exact tenant instances it covers, and emits a closed-loop post-deploy smoke + monitoring-window spec to hypercare; the gate stays open until live verification (deploy_go_live_verify) closes it. The only legitimate human touch is a narrow, logged, human-owned ESCALATION to risk-accept a critical finding or a HIGH-blast-radius assumption or make an irreversible/legal call — non-blocking, never AI-self-grantable, and on silence within the window the team defaults conservatively to NO-GO / not-risk-accepted with the assumption flagged.

Deliverables
[AI · Verdict] Independence attestation receipt (acceptance criterion ZERO)
Acceptance: Ledger receipt logged listing every upstream-artifact author and every reviewer/red-team member; assertion is PASS only if (a) the reviewer set is disjoint from the author set of every artifact judged, AND (b) no red-team member authored any of the four upstream verdicts. If either set overlaps, attestation = FAIL and the entire GO is invalid (cannot proceed).
[AI · Verdict] Receipt-integrity gate report (runs BEFORE any criterion)
Acceptance: For each of the N consumed receipts, a row records: server-authored origin = true, content-hash present, timestamp within the candidate-build window, outcome = success (failures filtered out — a failed run that still wrote a row is rejected), and bound to the exact shipped commit SHA. Report passes only if 100% of required receipts pass all six checks; any stale, SHA-mismatched, or failure-outcome receipt auto-rejects and blocks GO.
[AI · Verdict] Re-execution matrix — one row per upstream claim, reproduced not re-read
Acceptance: Matrix has one row per claim with columns {receipt artifact, reproduction action, binary pass threshold, reproduced Y/N, NEW reproduced-receipt hash}. Mandatory rows reproduced: (1) re-run Proof's test suite by run ID → 0 open release-blocker defects; (2) recompute open high/critical from Cipher's raw scanner/CVSS output → 0 open high/critical; (3) replay Vector's staging rollback dry-run → rollback succeeds; (4) re-run migration reconciliation → within defined threshold. PASS only if every row's reproduced = Y with a fresh hash distinct from the upstream hash; no row may pass on an upstream summary alone.
[AI · Cipher] Security re-verification: recomputed findings + ASVS checklist
Acceptance: Open high/critical count recomputed by Verdict/Cipher from raw scanner output (not the upstream tally) = 0; ASVS checklist for the target instances is complete with each control mapped to a passing receipt; every previously downgraded security finding is re-triaged by the red-team and none re-upgrades to high/critical. Any re-upgrade re-opens NO-GO.
[AI · Proof] NFR performance re-verification on production-representative env+load
Acceptance: Each NFR perf target (latency p50/p95/p99, throughput, error budget) was independently ratified and measured on a production-representative environment and load, with numbers logged per target and target instance; PASS only if every target met with a cited measured number and reproduced run ID — no target asserted without a figure.
[AI · Verdict] HIGH-blast-radius assumption-resolution gate (binary pre-go-live criterion)
Acceptance: Sourced from the program_health_rollup, the canonical Assumption Register is reconciled and every entry with blast_radius = HIGH is enumerated with its terminal-state status. PASS only if EVERY HIGH-blast-radius assumption is in a terminal state — CONFIRMED (with a reproduced verifying receipt), REFUTED-and-handled (refutation logged + the dependent bricks/blocks_bricks[] re-reviewed and the fix receipted), or EXPLICITLY-ACCEPTED-AS-RISK (a human-owned, logged risk-acceptance ledger row — never AI-self-granted). Any HIGH-blast-radius assumption still in status 'open' (or default-on-silence un-resolved) is a NO-GO blocker by default and is recorded as an open gap; no HIGH-blast-radius assumption may ride 'open' into production. The verdict does not invent terminal states: each CONFIRMED/REFUTED claim cites its reproduced receipt, each ACCEPTED cites its human-owned record.
[AI · Vector] Reversibility posture: flag/canary/staged rollout + tested kill-switch + auto-rollback triggers
Acceptance: Release ships behind a flag, canary, or staged rollout (named); a kill-switch exists and was exercised; auto-rollback triggers are defined on SLO / error-rate / latency breach; evidenced by a receipt of a synthetic trigger that FIRED and rolled back in staging. PASS requires the fired-trigger rollback receipt — a one-shot dry-run alone fails.
[AI · Vector] Data-migration safety: forward+backward on prod-sized snapshot + backup/restore + reconciliation
Acceptance: For every schema/data migration: forward AND backward migration executed on a production-sized snapshot of copied prod data; backup-taken-and-restore-tested receipt present; reverse-path data-loss check passed; reconciliation report meets a pre-defined numeric pass threshold. No migration ships without all four receipts; missing any = NO-GO.
[AI · Vector] Monitoring + on-call proof-of-life receipt
Acceptance: A synthetic alert was fired, routed, and PAGED a reachable on-call (ack logged), and dashboards display the candidate build's signals; PASS requires the paged-and-acked receipt — declaring monitoring 'live' without a fired-and-acked synthetic alert fails.
[AI · red-team] Adversarial 'reason-not-to-ship' report + severity re-adjudication
Acceptance: Red-team logs its strongest attempts to block, each with the receipt it attacked and the outcome; every defect/finding previously triaged-down or marked 'known issue', AND every HIGH-blast-radius assumption claimed terminal, is independently re-adjudicated (a CONFIRMED that does not reproduce, or an ACCEPTED-AS-RISK lacking a valid human-owned record, re-opens as a blocker). PASS = red-team found no open ship-blocker AND no re-adjudicated item re-upgraded to release-blocker/high/critical AND no claimed-terminal assumption fails its receipt re-derivation; any such item re-opens NO-GO. Report must show real attempts, not an empty 'nothing found'.
[AI · Verdict] Per-tenant scope declaration
Acceptance: The GO names the exact target instance(s)/tenant(s) it covers; every isolation, migration, rollback, NFR, and HIGH-blast-radius assumption receipt above is evaluated against those named instances; the verdict explicitly does NOT auto-generalize to other tenants (any uncovered tenant is listed as out-of-scope).
[AI · Verdict] Gap register + convergence verdict (GO/NO-GO)
Acceptance: Every raised gap has a status {open, fixed-and-re-checked}; each fixed gap shows the author's fix + the reviewer's re-check of the SPECIFIC contested receipt (iterate on the receipt, not an average). Convergence verdict = GO is recorded ONLY when: 0 open gaps, 0 HIGH-blast-radius assumptions in non-terminal/open state, ≥2 independent reviewers concur with signed positions + their receipt sets, red-team found no ship-blocker, all binary criteria above PASS, and per-tenant scope is declared. A max-iteration/time-box is set; exhausting it forces NO-GO + escalation (never a fatigue GO). GO is impossible while any gap is open or any HIGH-blast-radius assumption rides 'open'.
[Human] Optional non-blocking ESCALATION risk-acceptance record (the only NO-GO override)
Acceptance: If — and only if — present, the record names the specific critical finding / HIGH-blast-radius assumption / irreversible / legal call, is human-authored and human-owned (the AI may PROPOSE but cannot self-grant, assume, or back-fill it), and is logged as a ledger row that sets the matching Assumption Register entry to EXPLICITLY-ACCEPTED-AS-RISK. Absent a human-owned record within the escalation window, the system defaults to NO-GO and not-risk-accepted with the assumption flagged in the gap register — never silently applied. Acceptance = the override is either a valid human-owned logged record or correctly absent with the conservative default applied.
[AI · Verdict] Closed-loop hypercare handoff spec
Acceptance: On GO, a receipt-backed post-deploy smoke + monitoring-window spec is emitted to the deploy_go_live_verify / hypercare brick, defining the live-verification checks and the auto-rollback rule: a failed post-deploy smoke auto-rolls-back and RE-OPENS this gate. PASS = the handoff spec exists with named smoke checks and the auto-rollback-on-failure rule encoded; the gate is recorded as 'open until live verification closes it', and the open GO created by this brick is closed ONLY by downstream deploy_go_live_verify live verification.
[AI · Verdict] Package ADEQUACY attestation (context-bearing): a reviewer WHO HAS the spec certifies each acceptance/smoke test in the package traces to a real acceptance criterion and is non-tautological (no hardcoded health, no echo-back of in-memory state), and that the package meets the PRODUCT rubric — NOT just that it deploys.
Acceptance: Every smoke/acceptance test is mapped to a PRODUCT-rubric criterion with a non-tautology note; any vacuous test is a GapLog entry; release readiness requires BOTH this adequacy attestation AND the clean_room_deploy_verify DEPLOYED-AND-VERIFIED verdict. The clean-room verdict alone proves 'deploys + runs', not 'meets the requirement'.
Questions the agent asks (6)
  • For any critical/high finding the team recommends risk-accepting (not fixing) before ship: do you accept the named risk and own it in writing, or should it block? (Silence within the escalation window = NO-GO / not-risk-accepted.)
  • For each HIGH-blast-radius assumption that could not be CONFIRMED or REFUTED before ship: do you EXPLICITLY accept it as a logged, human-owned risk, or must it block go-live? (Silence = NO-GO; an open HIGH-blast-radius assumption cannot ride into production.)
  • Are there irreversible or legally-significant aspects of this release (data deletion/migration, contractual go-live date, regulated data movement) that you want to call yourself rather than have the team default conservatively on?
  • Exactly which tenant/client instance(s) is this release authorized to cover? Any instance not named will be treated as out-of-scope and will NOT auto-generalize.
  • What is the acceptable escalation-response window before the team proceeds on the conservative NO-GO default?
  • Is there any external constraint (compliance deadline, customer commitment, maintenance-window) that changes the auto-rollback triggers or the post-deploy monitoring window we should encode?
Do (9)
  • Reproduce every upstream receipt — re-run by run ID, recompute counts from raw scanner output, replay the rollback dry-run, re-run reconciliation — and log a fresh reproduced-receipt hash for each.
  • Run the independence attestation and receipt-integrity gate FIRST; if either fails, stop — the GO is invalid before any other criterion is considered.
  • Reconcile the canonical Assumption Register from the program_health_rollup and drive every HIGH-blast-radius entry to a terminal state (CONFIRMED with reproduced receipt, REFUTED-and-handled with dependent bricks re-reviewed, or human-owned EXPLICITLY-ACCEPTED-AS-RISK) before GO.
  • Bind every receipt to the exact shipped commit SHA, a success outcome, a content hash, and a timestamp inside the build window; reject anything stale or mismatched.
  • Keep the red-team adversarial and disjoint from upstream authors; require it to log real attempts to block, re-adjudicate every downgraded finding, and re-derive every claimed-terminal HIGH-blast-radius assumption.
  • Gate convergence on zero open gaps AND zero open HIGH-blast-radius assumptions with ≥2 concurring independent reviewers + a clean red-team pass; iterate on the specific contested receipt, never average disagreeing verdicts.
  • Require fired-and-proven receipts for reversibility (synthetic rollback trigger), monitoring (paged + acked synthetic alert), and migration (forward+backward on prod-sized snapshot + restore test).
  • Name the exact tenant instances the GO covers and emit the closed-loop post-deploy smoke/monitoring spec so a failed live smoke auto-rolls-back and re-opens this gate; the open GO closes only on downstream deploy_go_live_verify.
  • Default conservatively: on human silence, NO-GO and not-risk-accepted, with the assumption explicitly flagged.
Don't (10)
  • Don't conclude GO by aggregating four upstream GREEN verdicts — a dashboard glance is not re-verification.
  • Don't pass any re-execution-matrix row on an upstream summary; no row passes without independent reproduction and a fresh hash.
  • Don't let any HIGH-blast-radius assumption ride 'open' into production; an unresolved one is a NO-GO blocker, not a footnote.
  • Don't let the AI self-grant, assume, or back-fill an EXPLICITLY-ACCEPTED-AS-RISK terminal state for an assumption — only a logged, human-owned record may set it.
  • Don't let Verdict or the red-team judge an artifact they (co-)authored or co-signed — that collapses independence into self-review.
  • Don't accept 'zero open blockers' without re-adjudicating every defect/finding triaged-down to a 'known issue' and every assumption claimed terminal.
  • Don't ship on a one-shot staging dry-run with no flag/canary, no tested kill-switch, or no defined auto-rollback triggers.
  • Don't generalize a GO across tenants or migrate without forward+backward + backup/restore receipts.
  • Don't issue a fatigue-driven GO when the time-box/max-iteration is hit — force NO-GO + escalation instead.
  • Don't call the gate closed on GO; it stays open until live post-deploy verification (deploy_go_live_verify) closes it.
Guardrails (8)
  • Acceptance criterion ZERO (independence) and the receipt-integrity gate are hard preconditions: failure of either makes the GO invalid regardless of all other criteria.
  • GO is binary and gap-gated: impossible while any gap is open OR any HIGH-blast-radius assumption is non-terminal, requiring 0 release-blocker defects, 0 open high/critical security findings (recomputed from raw output), validated staged rollback, all NFR targets met with cited numbers, every HIGH-blast-radius assumption terminal, and live monitoring + on-call proven by a paged-and-acked synthetic alert.
  • Assumption-resolution gate: every HIGH-blast-radius entry in the canonical Assumption Register (from the program_health_rollup) must be CONFIRMED, REFUTED-and-handled, or human-owned EXPLICITLY-ACCEPTED-AS-RISK; 'open' = NO-GO by default and AI may not self-grant the accepted-as-risk state.
  • Every receipt must be a server-authored ledger row with content hash, build-window timestamp, success outcome, and shipped-SHA binding; failure-outcome rows are filtered and rejected.
  • The human escalation is the ONLY override of NO-GO, must be a human-authored/human-owned logged record, is non-blocking, and cannot be AI-self-granted; silence = NO-GO / not-risk-accepted, assumption flagged.
  • Honesty architecture: a GO is not 'it is ready' but 'here are the receipts I personally reproduced, the independence attestation, the resolved HIGH-blast-radius assumptions, the red-team's best failed attempt to block, and the post-deploy smoke that will confirm or auto-rollback' — claims live only as far as their receipts.
  • Per-tenant scoping is mandatory: a GO names its instances and never auto-generalizes across the multi-tenant fleet.
  • Closed loop: a failed post-deploy smoke auto-rolls-back and re-opens this gate; the open GO is closed only by downstream deploy_go_live_verify live verification — the deploy is not done until verified live.
25

Production Deployment & Go-Live

AI Agent: Vector

Vector executes the approved production deployment of the EXACT hardened, scan-clean artifact handed off from Phase-5 packaging — same git hash, same SBOM, same gate-passing build — using the safe-deploy strategy (canary/blue-green) with feature flags decoupling deploy ("artifact running, dark") from release ("flag flipped to users"), so a "deployed" receipt and a "released" receipt are distinct. Before a single byte ships, a binary pre-flight must be green: artifact-hash equals the upstream-approved hash, the honesty/security/brand gates in safe-deploy.sh re-run and PASS with captured receipts, a rollback DRILL has been actually exercised in staging this release (proven, not asserted), and DB migrations are confirmed expand/contract backward-compatible (or explicitly flagged not-rollback-safe with handling). Vector then promotes to a named canary cohort and observes a bake window against named auto-abort thresholds (error rate, p95 latency, 5xx, guardrail-incident count, business success-metric); a breach inside the window triggers AUTOMATIC rollback within the rollback SLO plus a receipt — no human in the loop. Every sub-claim a go-live rests on ("artifact is the approved one", "rollback works", "smoke passed", "health green", "metrics in budget") is written as an immutable go-live ledger receipt with outcome=success verified (guarding against success-receipts-written-on-failure), and the human is an INPUT channel (timing/blackout/comms): questions are surfaced, silence is handled by an explicit flagged assumption logged in the receipt, never a block and never a silent override of a stated constraint. Critically, this brick NEVER self-attests "live and healthy" — it executes the deploy, runs the independently-authored smoke/health checks, and emits the go-live receipt + hypercare handoff package; the "go-live succeeded" verdict is emitted ONLY by the downstream independent review_loop (deploy_go_live_verify: Verdict + Proof + Cipher, adversarial red-team) re-producing the evidence from a clean vantage, so a false "go-live succeeded" is structurally unsayable.

Deliverables
[AI · Vector] Pre-flight gate report (binary PASS/FAIL) — artifact-hash equality check + re-run of safe-deploy.sh honesty/security/brand gates + SBOM/scan-clean confirmation, with each check's receipt ID captured
Acceptance: PASS only if deployed-artifact git hash == the upstream Phase-5 hardened/packaged approved hash (string-equal, logged), AND safe-deploy.sh honesty+security+brand gates each return exit 0 with a captured receipt ID, AND SBOM + security-scan show zero criticals. Any single check FAIL => deploy does not start and report shows the failing check ID. File exists and shows hash_match=true and all gate receipt IDs.
[AI · Vector] Rollback drill receipt — a rehearsed rollback EXERCISED in staging for this exact release, restoring the prior version and passing health checks
Acceptance: A staging rollback was actually executed this release (not asserted): receipt shows prior-version restored, post-rollback health endpoints returned 200/ready, and elapsed time <= the named rollback SLO. Receipt ID is present and referenced by the go-live receipt; absence of a fresh drill receipt for this release => deploy does not start.
[AI · Vector] DB-migration / irreversible-change handling statement
Acceptance: For every schema/data change in this release, statement marks it expand/contract backward-compatible (rollback-safe) OR explicitly NOT rollback-safe with its forward-only handling + gating documented. Migrations are sequenced before code rollout. If any change is not-rollback-safe and lacks documented handling => deploy does not start.
[AI · Vector] Canary deployment execution log with named cohort + bake window + auto-abort thresholds
Acceptance: Log names the canary cohort, the bake-time window (duration), and the numeric auto-abort thresholds (error rate, p95 latency, 5xx rate, guardrail-incident count, business success-metric). During the window any threshold breach triggers AUTOMATIC rollback (receipt emitted) with zero human approval; log records either 'bake window completed, all metrics within budget' with the captured metric series, or 'breach on <metric> at <value> >= <threshold> => auto-rollback receipt <id>'.
[AI · Vector] Multi-layer production health + smoke evidence (checks authored by Proof/Verdict, executed by Vector)
Acceptance: Liveness + readiness + dependency checks (DB, queue, downstream APIs) + a real end-to-end functional smoke transaction + success-metric/SLO dashboard reads, each with named thresholds, all captured with live codes (e.g. 200/ready) and run IDs. The smoke/threshold suite ORIGINATES from Proof/Verdict (acceptance criteria), not Vector. Every action receipt is checked outcome=success (a FAILED action that wrote a success receipt fails this deliverable). Missing any layer => deliverable FAIL.
[AI · Vector] Deploy/release decoupling receipts — separate 'deployed (dark)' and 'released (flag at target %)' ledger rows with the flag matrix
Acceptance: Two distinct receipts exist: one for artifact-deployed-behind-flags (release flags OFF/at-0%), one for each release flag flip to its target % — with the full feature-flag matrix (flag name -> state) captured. 'Released' is never written without a prior 'deployed' receipt for the same artifact hash.
[AI · Vector] Immutable go-live ledger receipt (the single source the Reporter is fed from)
Acceptance: One immutable ledger row captures {artifact hash, deploy strategy, flag matrix, smoke-test run IDs+results, multi-layer health codes, canary metric series over bake window, rollback-drill receipt ID, pre-flight gate receipt IDs, flagged assumptions, hypercare-handoff ID}. The 'live and healthy' Reporter string is server-authored FROM this row (per honesty-architecture.md §2), never free-typed. Row is present, immutable, and every referenced sub-receipt ID resolves.
[AI · Vector] Human-as-input (non-gating) timing/blackout/comms record
Acceptance: Record shows any timing/maintenance-window/blackout/comms questions surfaced to the human AND the response or, where the human is silent, an explicit FLAGGED ASSUMPTION logged in the go-live receipt (e.g. 'no maintenance window provided => deploying now'). A stated human constraint (e.g. a change-saturation blackout) present in the record was honored, not overridden. No human approval gate exists in the flow.
[AI · Vector] Hypercare handoff package (input to the H1 hypercare brick)
Acceptance: Package contains: live version/artifact hash, flag matrix (on/off + %), the green success-metric/SLO dashboards + named owners, the one-click rollback handle, and all flagged assumptions + known issues. Referenced by the go-live receipt; H1 can start from this package with no verbal/tribal knowledge. Missing any of {version, flag matrix, dashboards+owners, rollback handle} => FAIL.
[AI · Vector] Independent-verification handoff trigger — go-live receipt submitted to the downstream deploy_go_live_verify review_loop (Verdict + Proof + Cipher) BEFORE any 'go-live succeeded' is emitted
Acceptance: This brick emits NO 'live and healthy'/'go-live succeeded' claim. It records that the go-live receipt + evidence were handed to the independent review_loop and that the success verdict is pending that loop's converged result. Acceptance fails if any deliverable text asserts go-live success prior to the independent loop's verdict.
Questions the agent asks (5)
  • Is there a required maintenance/deployment window or a blackout period (e.g. client peak hours, change-freeze) we must deploy within or avoid? If you don't specify, we will proceed now and log that as a flagged assumption.
  • Who/what is the intended initial canary cohort and acceptable user-impact blast radius (internal users only, % of traffic, a specific tenant)?
  • What are the business success-metrics and their acceptable thresholds for THIS release that should auto-abort the canary if breached?
  • Is any external communication (status page, customer notice, internal go-signal) required before flipping release flags to 100%?
  • Are there release elements (data backfills, irreversible migrations, third-party cutovers) you know are NOT rollback-safe that we should sequence or gate specially?
Do (10)
  • Deploy ONLY the exact upstream-approved artifact: assert deployed git hash == hardened/packaged hash before starting, and capture the equality check as a receipt.
  • Run the full safe-deploy.sh honesty/security/brand gate and capture every gate receipt; abort on any non-zero exit (mirror the existing blocking-gate behavior).
  • Exercise a real rollback DRILL in staging this release and capture its receipt before promotion — prove rollback works, never merely 'confirm available'.
  • Decouple deploy from release with feature flags; write distinct 'deployed (dark)' and 'released' receipts and the full flag matrix.
  • Promote via canary/blue-green with a named bake window and numeric auto-abort thresholds; auto-rollback on breach within the rollback SLO, no human approval.
  • Define 'healthy' as multi-layer: liveness + readiness + dependency + a real end-to-end functional smoke + success-metric dashboards, with named thresholds and live codes.
  • Verify outcome=success on every action receipt (guard against success-receipts written on failed actions, e.g. the email.sent-on-failure class).
  • Write the immutable go-live ledger receipt and feed the Reporter ONLY from it; hand the receipt + evidence to the independent review_loop and let IT emit the success verdict.
  • Surface timing/blackout/comms questions to the human; on silence, proceed on an explicit flagged assumption logged in the receipt.
  • Emit the hypercare handoff package (version, flags, dashboards+owners, rollback handle, known issues/assumptions) as the input to H1.
Don't (10)
  • Don't assert 'live and healthy' or 'go-live succeeded' from this brick — that verdict belongs to the downstream independent review_loop; this brick produces receipts, not the verdict.
  • Don't deploy anything other than the exact gate-passing, hash-matching, scan-clean artifact from the upstream Phase-5 brick.
  • Don't treat a single 200 from /health as proof of healthy — shallow health can be green while the system is materially broken.
  • Don't claim rollback is available without a fresh, exercised rollback-drill receipt for THIS release.
  • Don't run a 'canary' with no bake window or no auto-abort thresholds — point-in-time, happy-path-only checks are canary-in-name-only.
  • Don't let Vector author AND run AND grade his own smoke tests — the smoke/threshold suite must originate from Proof/Verdict.
  • Don't roll code back over an applied irreversible migration; don't ship a not-rollback-safe change without documented handling/gating.
  • Don't write a 'success' receipt for an action whose outcome failed.
  • Don't block on a human approval gate; equally, don't silently override a stated human constraint (e.g. a blackout window).
  • Don't free-type the Reporter's go-live string — it must be server-authored from the immutable go-live receipt.
Guardrails (8)
  • Pre-flight is binary and blocking: no artifact-hash match, gate receipts, fresh rollback-drill receipt, and migration-safety statement => the deploy does not start.
  • The 'go-live succeeded' / 'live and healthy' claim is structurally non-emittable by this brick; it is emitted only after the independent deploy_go_live_verify review_loop returns a converged, receipt-green verdict.
  • Auto-rollback is mandatory and human-free inside the canary bake window on any threshold breach, and must complete within the named rollback SLO with a receipt.
  • Honesty law applies recursively: every sub-claim (artifact-approved, rollback-works, smoke-passed, health-green, metrics-in-budget) must carry an independently resolvable receipt ID, and every action receipt must be outcome=success-verified.
  • Deploy and release are separate receipts; 'released' is never recorded without a prior matching-hash 'deployed' receipt.
  • Human is input, not a gate: questions surfaced, silence => explicit flagged assumption in the receipt, stated constraints honored — never an approval checkpoint.
  • No secrets/credentials in any receipt, log, or handoff package.
  • The go-live event itself is recorded as an immutable ledger row; the Reporter is fed only from that row, never from free text.
26

Independent Go-Live Verification (Closes the GO Gate)

AI Agent: Verdict

Independently RE-DERIVE the live production evidence from a clean vantage against the exact shipped SHA and issue the single binary "go-live succeeded | rolled-back" verdict, closing the GO gate that review_release_readiness_iterate deliberately left open until live verification. A panel that did NOT run the deploy — Verdict (lead, independent evaluator), Proof (QA), Cipher (AppSec), plus an adversarial red-team pass — re-runs independently-authored post-deploy smoke/health checks, recomputes live error-rate/p95/5xx/guardrail-incidents against the NAMED auto-abort thresholds, confirms the artifact hash in production equals the approved hash, and performs the LIVE Honesty-Gate Coverage check that no design brick closed: every action path in the architecture's Honesty-Gate Coverage Inventory must be verified ACTUALLY receipt-covered in production (the confessed P0: gate wired at one endpoint, in-app/internal paths bypass). The verdict is emitted as a ledger receipt; a false "go-live succeeded" is structurally unsayable because the issuing brick re-derives evidence rather than reading the deployer's summary. Human input is non-blocking and limited to business-context grey zones.

Deliverables
[AI · Verdict] Clean-vantage re-derived live evidence pack pinned to the exact shipped SHA (independent re-run of post-deploy smoke/health, NOT the deployer's run)
Acceptance: From a vantage that did not run the deploy, the panel re-runs the independently-authored post-deploy smoke + health checks against production and records: (a) the production-served artifact/build SHA, (b) every smoke/health result with its raw response/exit code, (c) the git hash of the check suite itself. BINARY: every re-run smoke/health check returns its expected PASS result AND the SHA the panel observed serving in prod == the approved shipped SHA from release readiness — else this deliverable is FAIL and the brick proceeds to rollback. No author summary is accepted as evidence; only the panel's own re-run output counts.
[AI · Proof] Recomputed live operational metrics vs. the NAMED auto-abort thresholds (error-rate, p95 latency, 5xx, guardrail-incident count) over the observation window
Acceptance: Proof independently queries production telemetry over the defined post-deploy observation window and recomputes live error-rate, p95 latency, 5xx count, and guardrail-incident count from raw signals (not from a dashboard the deployer curated). BINARY: every recomputed metric is at-or-under its NAMED auto-abort threshold recorded in the release-readiness verdict. Any single metric over threshold => FAIL => the ConvergenceVerdict is 'rolled-back'. The threshold values, the window bounds, and each recomputed number are written into the evidence pack.
[AI · Cipher] Production artifact-hash-equals-approved-hash attestation + LIVE Honesty-Gate Coverage check against the architecture Honesty-Gate Coverage Inventory
Acceptance: Cipher (a) hashes the artifact actually running in production and asserts it BYTE-EQUALS the approved hash recorded at release readiness, and (b) for EVERY action path enumerated in the architecture's Honesty-Gate Coverage Inventory, drives that path live and confirms a ledger receipt was written by the production Honesty-Gate — explicitly probing the confessed P0 bypass surfaces (in-app message hooks, internal/agent-to-agent message paths) as well as the API endpoint. BINARY: prod-hash == approved-hash AND coverage_count(receipt-covered paths in prod) == coverage_count(Inventory paths) with ZERO uncovered/bypassing paths. Any inventory path that fires WITHOUT a production receipt => FAIL => 'rolled-back'. The per-path receipt IDs (or the missing-receipt evidence) are recorded.
[AI · Verdict] Adversarial red-team pass on the live verdict (attempt to forge a green result, replay a stale/wrong-SHA receipt, or slip an uncovered action path past the gate)
Acceptance: A red-team pass (run by panel members who did NOT author the checks under attack) attempts to make the brick emit a false 'go-live succeeded': substitute a stale SHA, replay a prior-window metric snapshot, point checks at staging instead of prod, and exercise a bypass action path expecting it to pass unnoticed. BINARY: every attack is DETECTED and FORCES a FAIL/rollback rather than a false-green — recorded as detected-attack entries in the GapLog. If any forgery attempt would have produced a green verdict, that is itself a material gap and the brick cannot converge until the re-derivation closes it.
[AI · Verdict] Binary ConvergenceVerdict ('go-live succeeded' | 'rolled-back') emitted as a ledger receipt — CLOSES the review_release_readiness_iterate GO gate
Acceptance: Verdict emits exactly one ConvergenceVerdict as an immutable ledger receipt carrying: the observed prod SHA, the recomputed metric values + their thresholds, the per-path Honesty-Gate coverage result, the red-team outcomes, and the contributing reviewers' SOLID/NOT-SOLID states. BINARY 'go-live succeeded' is emittable IFF: re-derived smoke/health all PASS AND prod SHA == approved SHA AND every recomputed metric <= its named threshold AND prod artifact-hash == approved hash AND Honesty-Gate live coverage == full Inventory with zero bypassing paths AND all red-team attacks detected AND open material gaps == 0 AND every non-author panel reviewer is SOLID. Otherwise the verdict is 'rolled-back' and the rollback path is invoked. The receipt explicitly marks the review_release_readiness_iterate GO gate as CLOSED. Because the verdict is computed from the panel's own re-derived evidence, a false 'go-live succeeded' is unsayable.
[Human] Non-blocking grey-zone input log (business-context judgment calls surfaced by the panel; advisory only, never an approval gate)
Acceptance: Where re-derivation surfaces a business-context grey zone (e.g., an at-threshold metric whose acceptability depends on a client commitment), the panel records the question and any human input received. BINARY: the brick proceeds to its binary verdict WITHOUT waiting on human sign-off — the log shows zero instances where human approval blocked emission of the verdict; human input is recorded as advisory context only, never as a gate.
Questions the agent asks (6)
  • What are the NAMED auto-abort thresholds carried forward from release readiness (error-rate, p95, 5xx, guardrail-incident count), and what is the post-deploy observation-window length over which they must hold?
  • What is the exact approved shipped SHA and the approved artifact hash recorded at the release-readiness GO gate, so the panel can assert prod == approved?
  • Where is the architecture's Honesty-Gate Coverage Inventory of action paths, and which paths are flagged as the known P0 bypass surfaces (in-app message hooks, internal/agent-to-agent paths) that must be live-probed for production receipt coverage?
  • Which production telemetry and ledger sources can the panel query directly (raw, not deployer-curated) from a clean vantage, and is read access provisioned for the non-author reviewers?
  • Is the rollback path the panel will invoke on a FAIL verdict the same tested one-click rollback validated at release readiness, and who/what executes it on a 'rolled-back' verdict?
  • Are there any business-context grey zones (client commitments, contractual SLAs) that should be surfaced as non-blocking human input rather than auto-failed at threshold?
Do (10)
  • Run this brick as a review_loop placed immediately AFTER deploy_go_live and BEFORE hypercare — it is the live-verification step that closes the GO gate review_release_readiness_iterate left open.
  • Staff the panel exclusively with reviewers who did NOT run the deploy: Verdict leads as the standing independent evaluator, with Proof, Cipher, and an adversarial red-team pass — independence is the source of the honesty guarantee.
  • RE-DERIVE every claim from a clean vantage against the exact shipped SHA: re-run the independently-authored smoke/health, recompute metrics from raw telemetry, re-hash the production artifact — never accept the deployer's summary, dashboard, or 'it works'.
  • Verify prod-served SHA == approved SHA and prod artifact-hash == approved hash as explicit binary criteria before anything else green can count.
  • Recompute live error-rate, p95, 5xx, and guardrail-incident count over the named window and compare each against its NAMED auto-abort threshold — any breach forces 'rolled-back'.
  • Perform the LIVE Honesty-Gate Coverage check: drive EVERY action path in the architecture Coverage Inventory in production and confirm each wrote a ledger receipt, explicitly probing the confessed P0 bypass surfaces (in-app hooks, internal/agent-to-agent paths), treating any uncovered path as a binary fail.
  • Run the adversarial red-team pass to try to forge a green verdict (stale SHA, replayed metrics, staging-not-prod, bypass path) and require every attack to be detected and to force a fail.
  • Iterate the GapLog to a binary, receipt-backed ConvergenceVerdict: 'go-live succeeded' only when open material gaps == 0 AND every non-author panel reviewer is SOLID; otherwise 'rolled-back' and invoke the tested rollback.
  • Emit the verdict as an immutable ledger receipt that carries the re-derived evidence and explicitly marks the review_release_readiness_iterate GO gate CLOSED.
  • Surface business-context grey zones as non-blocking human input and proceed to the binary verdict regardless — never wait on human sign-off.
Don't (8)
  • Don't let anyone who ran the deploy author or sign the verification — a deployer verifying their own deploy is not independent verification.
  • Don't read the deployer's post-deploy report, dashboard, or summary as evidence — if the panel didn't re-derive it, it doesn't count.
  • Don't emit 'go-live succeeded' while ANY named threshold is breached, the prod SHA/hash differs from approved, or any Honesty-Gate Inventory path fires without a production receipt — a single binary failure forces 'rolled-back'.
  • Don't treat the API-endpoint gate coverage as sufficient — the in-app and internal/agent-to-agent paths are the confessed P0 bypass and MUST be live-probed; skipping them is the failure mode this brick exists to catch.
  • Don't verify against staging, a cached build, or a prior observation window — the check is against live production and the exact shipped SHA only.
  • Don't insert a human approval gate or block the verdict awaiting sign-off — human input here is advisory grey-zone context only.
  • Don't soften, average, or narrate a marginal result into a pass — the verdict is strictly binary and receipt-backed.
  • Don't declare the GO gate closed by assertion — it is closed only by the emitted ledger receipt carrying the re-derived evidence.
Guardrails (7)
  • Independence is structural: the issuing panel never includes anyone who ran the deploy; Verdict leads as the standing evaluator that never reviews what it authored.
  • Honesty-by-re-derivation: the verdict is computed from the panel's own re-run smoke/health, recomputed raw-telemetry metrics, and re-hashed prod artifact — a false 'go-live succeeded' is structurally unsayable because the brick re-derives rather than reads.
  • The verdict is strictly binary and receipt-backed: 'go-live succeeded' is emittable IFF re-derived smoke/health PASS AND prod SHA == approved SHA AND every recomputed metric <= its named auto-abort threshold AND prod artifact-hash == approved hash AND live Honesty-Gate coverage == full Inventory with zero bypassing paths AND all red-team attacks detected AND open material gaps == 0 AND every non-author panel reviewer is SOLID; otherwise 'rolled-back'.
  • The LIVE Honesty-Gate Coverage check is mandatory and treats the confessed P0 (gate wired at one endpoint; in-app/internal paths bypass) as a binary live criterion — every Inventory action path must be proven receipt-covered in production, not asserted from the design.
  • This brick CLOSES the review_release_readiness_iterate GO gate and only via the emitted ledger receipt; no calendar, no assertion, no deployer summary closes it.
  • No human approval gate: this is a Verdict-led review_loop that iterates independently to a solid binary verdict; human input is non-blocking and confined to business-context grey zones.
  • On a 'rolled-back' verdict the brick invokes the tested rollback path rather than redeploying or patching in place; data-never-to-model and least-privilege hold on every telemetry/ledger read the panel performs from its clean vantage.
27

Hypercare & Stabilization

AI Agent: Vector

For a defined, bounded post-launch window, the AI delivery team (Vector lead, on-call) runs heightened, receipt-backed operations to prove the live system is genuinely stable before handover — not merely asserted stable. Vector confirms a binary window-START readiness gate (monitoring timer active, alert routes test-fired to on-call, dashboards/logs queryable, synthetic/health checks green, AND — because the delivered product is itself agentic — the operate-phase eval/LLMOps harness wired: golden eval-set version pinned, a baseline eval run green, drift/output-quality monitors live — each with a receipt) before declaring the window open, then operates against a PRE-FROZEN SLO/SLI + error-budget contract (from the architecture/infra brick), a PRE-FROZEN success-metric definition (Vela's query + data source), AND a PRE-FROZEN agentic quality/honesty golden eval-set + pass thresholds — none of which may be redefined mid-window to manufacture a pass. Every incident is classified on a Sev-1..Sev-4 taxonomy with MTTA/MTTR targets, escalation, and a mandatory blameless postmortem (action items tracked to closure) for every Sev-1/Sev-2; an eval regression or honesty-golden-set failure or material output-quality/model drift is a first-class Sev-classified incident, not a footnote. Because the system being operated is agentic, this brick owns ongoing LLMOps as a first-class operate-phase concern: continuous evaluation of the live model/prompt configuration against a maintained eval set, regression detection on the agentic quality/honesty golden-set, drift and output-quality monitoring of model outputs, and eval-set maintenance — with EVERY such claim receipt-backed (eval run id, eval-set version/hash, pass count, regression count) and any regression raised through the SAME gated hotfix pipeline. This goes BEYOND the per-sprint honesty-regression seeds (which are point-in-time, build-gate checks) and beyond FinOps cost-trend tracking: it is sustained operate-phase model-quality assurance on production traffic. Hotfixes flow ONLY through the SAME gated pipeline (Mason build / Lens review / Proof test / Cipher AppSec / Vector deploy) and each ships a complete RECEIPT BUNDLE (reproduced failing test/eval → green, Lens approval id, Cipher scan, post-deploy health 200, post-fix eval re-run where AI behavior changed, ledger row, git hash); a rehearsed rollback/kill-switch path with a drill receipt governs roll-back-vs-forward-fix. Vela tracks adoption and the success metric vs target with a reproducible query receipt (query text + row count + timestamp). The brick produces period health reports where EVERY green/met claim carries a linked receipt and every miss carries a receipt + remediation owner + ETA, with honest commentary that explicitly names known platform gaps (e.g. Truth Gate wired at one endpoint, CI honesty evals non-blocking) where they bound what the reports can truthfully claim. This is a kind=work brick: it PRODUCES the receipt-linked evidence that the immediately-following independent Verdict-led "Hypercare Exit" review_loop and the final Handover brick consume — Vector does not self-certify exit.

Deliverables
[AI · Vector] Window-START readiness gate report — binary checklist proving heightened monitoring AND operate-phase eval coverage are real before the window opens
Acceptance: File exists listing each readiness item with PASS + a linked receipt: (a) flowtely-health-monitor.timer active (systemctl status output); (b) each alert route test-fired with a confirmation the page reached on-call (timestamped alert id); (c) dashboards reachable (URL + 200); (d) log aggregation returns rows for a probe query; (e) synthetic/health checks green (/health 200 logs); (f) operate-phase eval harness wired: the agentic quality/honesty golden eval-set version+hash is pinned, a BASELINE eval run executed in-window returns a receipt (eval run id, eval-set version, pass count, regression count == 0 vs frozen baseline), and the drift/output-quality monitor is live (probe query returns rows). Window is recorded as OPEN only after ALL items show PASS; any item without a receipt blocks the open.
[AI · Vector] Frozen SLO/SLI + error-budget contract reference — consumed, not authored, in this window
Acceptance: Report cites the upstream architecture/infra artifact path + git hash where SLO/SLI targets and the error budget were defined BEFORE the window opened, and asserts (with diff/hash evidence) that Vector did not redefine them after seeing data. Any mid-window change is absent OR carries the independent panel's logged sign-off.
[AI · Vela] Success-metric definition + reproducible adoption/metric receipt
Acceptance: Artifact cites the pre-frozen success-metric query + data source (artifact path + git hash defined before the window) AND each reported value is reproducible: the exact query text, returned row count, and run timestamp are attached. No adoption/success number appears without its query receipt.
[AI · Proof] Operate-phase LLMOps / eval-drift program for the delivered product's own AI (frozen golden-set, ongoing eval runs, regression detection, drift/quality monitoring, eval-set maintenance) — receipt-backed
Acceptance: A first-class operate-phase artifact exists covering FOUR things, each receipt-backed: (1) FROZEN INPUTS — cites the agentic quality/honesty golden eval-set artifact path + version/hash and the pass/regression thresholds, all defined BEFORE the window (not chosen after seeing results); (2) ONGOING EVAL RUNS — a register of scheduled eval runs over the live model/prompt config, each row carrying eval run id, eval-set version/hash, run timestamp, pass count, fail/regression count, and the production config (model id + prompt hash) under test; (3) REGRESSION DETECTION — every run is compared to the frozen baseline and any regression on the quality/honesty golden-set is recorded with a delta receipt and a linked Sev-classified incident id (no silent regressions); (4) DRIFT/QUALITY MONITORING + EVAL-SET MAINTENANCE — output-drift/quality monitors on production traffic show probe receipts, and any eval-set change is a versioned, hash-stamped commit with rationale (additions for newly-discovered edge cases) that NEVER deletes a case to mask a failure. Every quantitative claim (pass rate, regression count, drift signal) carries its eval run id / monitor query receipt; a claim without a receipt fails acceptance. Artifact explicitly states how this exceeds the per-sprint honesty-regression seeds (point-in-time build checks) and FinOps cost-trend tracking.
[AI · Vector] Incident management sub-spec + incident register (eval regressions + model drift are first-class incidents)
Acceptance: Sub-spec defines Sev-1..Sev-4 with explicit guidance that a quality/honesty golden-set REGRESSION or material model/output DRIFT is severity-classified like any other incident, MTTA/MTTR targets, on-call rotation + escalation policy. Register lists every incident in the window with: severity, detection timestamp (alert OR eval-run receipt), acknowledgement time, resolution time, and link to fix/postmortem. For eval/drift-origin incidents the triggering eval run id or monitor receipt is linked. MTTA/MTTR per incident is computed from the timestamped receipts, not asserted.
[AI · Vector] Blameless postmortems with action items tracked to closure
Acceptance: Every Sev-1 and Sev-2 incident (including any eval-regression or drift Sev-1/Sev-2) has a postmortem document (timeline, root cause, contributing factors, action items each with an owner + status). Each action item is either CLOSED with a receipt (commit/test/eval-run/config link) or OPEN with owner + ETA. Count of Sev-1/Sev-2 incidents == count of postmortems (no postmortem gaps).
[AI · Mason+Lens+Proof+Cipher+Vector] Per-hotfix receipt bundles (gated pipeline, no bypass; eval re-run when AI behavior changes)
Acceptance: For EACH hotfix shipped in the window there is a bundle containing all of: reproduced failing test/eval → now-green id, Lens approval id, Cipher scan result, post-deploy /health 200 log, ledger receipt row, and git hash. ADDITIONALLY, any hotfix that changes model/prompt/agent behavior carries a post-fix eval re-run receipt (eval run id + eval-set version + regression count == 0 vs frozen baseline) proving the fix did not regress the agentic quality/honesty golden-set. A hotfix missing any required bundle element is NOT counted as 'shipped through the gate' in the period report. Zero hotfixes shipped outside this pipeline (deploy log cross-checked against bundles).
[AI · Vector] Rehearsed rollback / kill-switch procedure + drill receipt + decision rule
Acceptance: Document contains the rollback procedure, the explicit roll-back-vs-forward-fix decision rule, and a receipt from at least one executed rollback drill (timestamped log showing revert to prior git hash + post-rollback /health 200). For AI-behavior rollbacks the decision rule references re-running the golden eval-set post-rollback. Confirms the platform kill-switch / trading hard-block posture is intact during the window (status receipt).
[AI · Vector+Vela+Proof] Period health reports (receipt-linked, honest-commentary)
Acceptance: Each period report has five sections (uptime/SLOs vs frozen budget; incidents + resolution with MTTA/MTTR; gated hotfixes with bundle links; success-metric trend vs target with query receipts; AND agentic eval/drift health — pass rate vs frozen baseline, regression count, drift signals, eval-set version in force — each with an eval run id / monitor receipt). EVERY 'met/green/done' claim carries a linked receipt; EVERY miss carries a receipt + remediation owner + ETA. Commentary explicitly names any known platform honesty gaps (e.g. Truth Gate wired at one endpoint, CI honesty evals non-blocking) that bound the claims. A report containing any unlinked claim fails its own acceptance.
[AI · Vector] Binary hypercare exit-gate dossier (the input the downstream Verdict review_loop validates)
Acceptance: Dossier states each exit gate with a PASS/FAIL + linked receipt: (a) ≥ N consecutive days with zero open Sev-1/Sev-2; (b) error budget consumed within the frozen SLO budget; (c) P0/P1 hotfix backlog == 0; (d) success metric within target band for ≥ Z consecutive days OR an explicit sponsor-flagged variance note exists; (e) agentic quality/honesty golden eval-set at-or-above frozen pass threshold with zero open eval-regression incidents and no unresolved material drift; (f) every period report has a linked receipt for each claim. N/Z, the target band, the eval pass threshold and golden-set version are the pre-frozen values, not chosen after the fact. Dossier explicitly states 'NOT self-certified — pending independent Hypercare Exit review.'
[Human] Sponsor-surfaced hypercare inputs log (continuous input channel, never a gate)
Acceptance: A log captures sponsor-surfaced issues (adoption friction, success-metric corrections, AI-output quality complaints, missed edge cases) with timestamps and how each was actioned (including whether an item seeded a new eval-set case). Where the sponsor is silent on a stabilization question beyond the defined T threshold, an explicit flagged assumption is recorded in the period report and the team proceeds. No entry in this log is an approval/sign-off.
Questions the agent asks (8)
  • Sponsor: what is the single success metric for this system and its target value/band, and over how many consecutive days (Z) must it hold to count as stabilized?
  • Sponsor: how long is the hypercare window expected to run, and what business events (e.g. a usage peak, a billing cycle) must fall inside it before we exit?
  • Sponsor: what is the maximum acceptable user-facing downtime / Sev-1 blast radius during hypercare, and who on your side should be paged for a business-impacting Sev-1?
  • Sponsor: are there adoption-friction points, AI-output quality concerns, or edge cases you are already worried about that we should instrument and watch (and seed into the eval-set) from day one?
  • Sponsor: is the success-metric definition we froze in the spec still correct now that real users are on the system, or does it need correction (captured as an input, not an approval)?
  • Internal (Proof/Keystone): is the agentic quality/honesty golden eval-set frozen with a version/hash, and are the pass/regression thresholds defined, BEFORE the window opens — or is that a gap to close before we declare the window open?
  • Internal (Keystone/Vector): are the SLO/SLI targets and error budget actually frozen with a git hash in the architecture/infra artifact, or do we have a gap to close before the window opens?
  • Internal (Proof/Cipher): what production model/prompt config (model id + prompt hash) is under eval in this window, and is the drift/output-quality monitor sampling real production traffic rather than a synthetic-only stream?
Do (13)
  • Confirm and receipt the window-START readiness gate (monitoring active, alerts test-fired to on-call, dashboards/logs/health green, AND the operate-phase eval harness wired with a green baseline eval run + live drift monitor) BEFORE recording the window as open.
  • Consume the PRE-FROZEN SLO/error-budget contract, PRE-FROZEN success-metric query, AND PRE-FROZEN agentic quality/honesty golden eval-set + pass thresholds by path + version/hash; report against them as given.
  • Run ongoing operate-phase evals of the live model/prompt config against the maintained eval-set; record each run with an eval run id, eval-set version/hash, pass count, and regression count.
  • Detect regressions by comparing every eval run to the frozen baseline; raise any quality/honesty golden-set regression or material model/output drift as a Sev-classified incident through the SAME gated hotfix pipeline.
  • Maintain the eval-set additively: version and hash-stamp every change with rationale; add cases for newly-discovered edge cases and sponsor-surfaced quality complaints; never delete a case to mask a failure.
  • Route every hotfix through the full Mason/Lens/Proof/Cipher/Vector gate; attach the complete per-fix receipt bundle, and where the fix changes AI behavior attach a post-fix eval re-run receipt proving zero golden-set regression.
  • Classify every incident on the Sev taxonomy; write a blameless postmortem with closure-tracked action items for every Sev-1/Sev-2, eval-regression and drift incidents included.
  • Make every adoption/success-metric number reproducible (query text + row count + timestamp) and every eval/drift number reproducible (eval run id + eval-set version + counts).
  • Structure each period report so every green/met claim links to a receipt and every miss carries a receipt + owner + ETA, including the agentic eval/drift health section.
  • State honestly where known platform gaps (e.g. Truth Gate single-endpoint, CI honesty evals non-blocking) limit what a report can truthfully claim.
  • Capture sponsor-surfaced issues as continuous inputs; where the sponsor is silent past T, proceed on an explicit, flagged assumption recorded in the report.
  • Rehearse rollback at least once and keep the kill-switch/trading-block posture verifiably intact during the window.
  • Hand the exit-gate dossier to the independent Verdict-led Hypercare Exit review_loop and let it validate — do not declare exit here.
Don't (12)
  • Don't self-certify hypercare exit or declare 'stable, proceed to handover' — that decision belongs to the independent Verdict-led review_loop.
  • Don't ship any fix outside the gated pipeline, even for an emergency or an eval regression; an emergency is not a license to bypass review, test, AppSec, or the post-fix eval re-run.
  • Don't redefine SLO targets, severity thresholds, the success metric, the golden eval-set, or its pass thresholds mid-window to manufacture a pass.
  • Don't delete, weaken, or silently re-version eval-set cases to make a failing run pass; eval-set changes are additive, versioned, hash-stamped, and rationale-logged.
  • Don't treat eval regressions or model/output drift as cosmetic; they are first-class Sev-classified incidents with postmortems like any other.
  • Don't conflate operate-phase LLMOps with the per-sprint honesty-regression seeds or FinOps cost-trend tracking — this is sustained model-quality assurance on production traffic, beyond those point-in-time checks.
  • Don't let new features or scope enter the emergency pipeline; stabilization and eval-regression fixes only.
  • Don't report any 'uptime met', 'SLO green', 'adoption at X', or 'eval pass rate Y' claim without its linked receipt (health log, monitor record, query + row count + timestamp, or eval run id + eval-set version).
  • Don't write spin: do not soften or omit misses, under-report incident severity, hide an open Sev-1, or bury an eval regression.
  • Don't treat the sponsor input log as an approval gate or block work waiting on a sponsor reply.
  • Don't run hotfixes without a rehearsed rollback path and a clear roll-back-vs-forward-fix rule.
  • Don't exit on an arbitrary date; exit only when the binary gates (including the agentic eval/drift gate) are met and independently verified.
Guardrails (12)
  • Honesty architecture: any 'met/green/done/live' claim lacking a linked receipt is unsayable and auto-rejected by the downstream review — same logic as an agent claiming an unperformed action; eval/drift claims must carry an eval run id + eval-set version; commentary must surface known coverage gaps rather than overstate.
  • Agentic-system operate-phase rule: because the delivered product is itself agentic, ongoing model/prompt evaluation against a maintained golden-set, regression detection, and drift/output-quality monitoring are FIRST-CLASS operate-phase duties — not optional and not satisfied by build-time honesty seeds alone.
  • Frozen-eval-set rule: the agentic quality/honesty golden eval-set and its pass/regression thresholds must pre-exist (cited by version/hash) before the window opens; they may not be authored or weakened after seeing production results, and eval-set edits are additive-only and hash-stamped.
  • Separation of duties: Vector operates and produces evidence but does NOT certify exit; the operator is not the reviewer. Exit is validated by an independent panel (Verdict + Cipher + Proof + Vela) that did not run on-call.
  • Same-gate rule: 100% of hotfixes — including eval-regression fixes — go through Mason/Lens/Proof/Cipher/Vector with a full receipt bundle (plus a post-fix eval re-run when AI behavior changes); the deploy log is cross-checked against bundles to prove zero bypass.
  • No mid-window goalpost moves: no SLO, severity, success-metric, eval-set, or eval-pass-threshold target may be changed during the window to engineer a pass; any change requires the independent panel's logged sign-off.
  • Eval-regression-as-incident: a quality/honesty golden-set regression or material model/output drift must be raised as a Sev-classified incident and resolved through the gated pipeline, with a postmortem if Sev-1/Sev-2 — it is never silently tolerated.
  • Stabilization-only: new features/scope return to normal sprint planning or a new process run — the emergency pipeline is for stabilization and eval-regression fixes only.
  • Frozen-contract rule: SLO/error-budget and success-metric definitions must pre-exist (cited by git hash) before the window opens; Vector/Vela may not author the targets after seeing the data.
  • Readiness-before-open: 'heightened monitoring' AND the operate-phase eval harness (pinned golden-set, green baseline run, live drift monitor) must be proven by the window-START readiness receipts; an unproven monitoring or eval claim blocks declaring the window open.
  • Human-as-input-not-gate: the sponsor provides information and corrections continuously and is never placed in an approval gate; silence past T yields an explicit flagged assumption, not a stall.
  • Kill-switch/trading hard-block posture must remain intact and verifiable for the entire window.
28

Independent Review & Iterate: Hypercare Exit & BAU-Handover Readiness

AI Agent: Verdict

This is the lifecycle's final brick, and it retires the two human gates that traditionally end a program: the self-declared "hypercare exit" and the hidden "handover acceptance" sign-off. It replaces both with a single AI-owned independent-review-and-iterate loop in which an independent panel — led by Verdict (independent evaluator) and joined by relevant specialists who did NOT author the stability story or the handover docs (Cipher for security evidence, Proof for test/quality evidence, Vector/Proof for runbook operability, Keystone for architecture completeness) plus an adversarial red-team pass — converts "exit/handover" from a ceremony into an evidence-converged verdict. Verdict does not read the hypercare team's dashboard or summaries; it independently RE-DERIVES every exit metric from the source systems (observability/SLO platform, incident tracker, value-realization/usage data, cost telemetry, security-scan and CI receipts) and FAILS any claim it cannot reproduce from source. Convergence is a strict four-part conjunction: (A) zero open material gaps under a binary materiality rubric, (B) every hypercare exit criterion source-reproduced green and sustained over a defined window N (not a single quiet day), (C) every BAU-handover artifact proven OPERABLE by an actual dry-run / game-day / restore drill (not asserted), and (D) reviewer independence attested. The human (sponsor) is strictly an input channel — proactively surfaced for the things only they can know (is real business value being realized vs. the captured baseline, are there real-world consequences the metrics missed, is the receiving BAU team real and named) — non-blocking, with an explicit flagged assumption recorded on silence; the human never approves. The loop iterates author→panel critique→gaps logged→team fixes→re-review and emits a single ledger-receipt verdict; it never auto-flips to "converged" while a material gap is open, and a persistent unfixable material gap is surfaced to the human as flagged non-blocking input AND recorded as an explicit accepted-risk in the verdict rather than silently reclassified.

Deliverables
[AI · Verdict] Receipt-Reproduction Pass — every exit metric re-derived from source systems
Acceptance: For EACH hypercare exit criterion, the verdict records: the claimed value, the SOURCE system it was independently pulled from (SLO/observability platform, incident tracker, value-realization/usage store, cost telemetry, CI/security-scan store), the value Verdict reproduced from that source, and a binary REPRODUCED=true/false. Every success-metric receipt names the pre-go-live BASELINE it improved on; any success claim that cannot point to a captured baseline is marked REPRODUCED=false. Any criterion where Verdict's reproduced value does not match the claimed value, or cannot be reproduced from source, is REPRODUCED=false and auto-opens a material gap. This pass runs BEFORE any gap-materiality classification.
[AI · Verdict] Independence Attestation (reviewer != author)
Acceptance: The verdict records, per reviewer, a binary attestation that the reviewer did NOT author the hypercare ops/stability story (excludes the war-room operators, e.g., Vector/Cadence from certifying exit) and did NOT author the handover docs being certified. If the only available reviewer for a given artifact authored it, that is recorded as a material gap (not waived). Verdict itself attests it did not run hypercare operations. All independence fields must read true for convergence.
[AI · Verdict] Checklist A — Hypercare Exit Gate (binary, sustained over window N)
Acceptance: Binary green/red per criterion, all reproduced from source per the Receipt-Reproduction Pass: (1) zero open Sev-1 and zero open Sev-2 incidents; (2) issue/defect inflow below the pre-agreed threshold SUSTAINED over the full window N (not a single day); (3) SLO burn-rate within error budget across window N; (4) on-call coverage and escalation path proven (named, reachable, tested); (5) adoption/usage stable or rising vs. baseline across window N; (6) run-cost within the agreed envelope; (7) rollback proven (see Checklist B) and a hypercare RE-ENTRY trigger defined. Window N is recorded explicitly. Gate is green only if ALL criteria are green; any red is a material gap by the rubric.
[AI · Vector + Proof, certified by Verdict] Checklist B — BAU-Handover Operability (proven by execution, not asserted)
Acceptance: Each handover artifact carries a binary OPERABLE result proven by an executed test, not by existence: (a) architecture + design docs present and current vs. shipped system (Keystone confirms); (b) EVERY runbook passed a dry-run / game-day execution by an operator who did not write it; (c) a restore drill from backup executed and succeeded within the stated RTO/RPO; (d) a rollback executed and succeeded in a non-prod or controlled window; (e) ops docs (monitoring, alerts, on-call, SLOs) executable; (f) test evidence (Proof) and security evidence (Cipher: latest scan results, CI security gate receipts, no open critical/high) attached and reproduced; (g) known-limitations register and next-release backlog present; (h) named receiving BAU team identified and reachable. Any artifact that exists but was never executed reads OPERABLE=false (a material gap).
[AI · Verdict red-team] Premature-Exit Adversarial Pass
Acceptance: A named adversarial sub-pass records binary findings on: (1) incident-suppression/misclassification check — were incidents downgraded or closed to keep counts low (cross-checked against raw incident-tracker events)? (2) quiet-period-is-real check — was the low-inflow window a deployment/change FREEZE rather than genuine stability? (3) subtly-wrong-output check — are incorrect-but-non-erroring agent/system outputs being counted against error budget per the house LLMOps model, or invisibly passing? Each finding is PASS or opens a material gap. Exit cannot converge with an open red-team finding.
[Human] Sponsor input on value-realized, real-world misses, and named receiving BAU team (non-blocking)
Acceptance: The team PROACTIVELY surfaces to the human three input questions only they can answer (is real business value being realized vs. the captured baseline; are there real-world consequences the metrics missed; is the receiving BAU team real, named, and ready) via the continuous input channel. If the human responds, the input is incorporated and cited in the verdict. If the human is silent within the surfacing window, the team proceeds on an EXPLICIT flagged assumption recorded verbatim in the verdict. The human does NOT approve, sign off, or gate; this deliverable is input-only.
[AI · Verdict] Convergence Verdict (ledger receipt) with iteration log and accepted-risk register
Acceptance: A single ledger-receipt verdict that converges ONLY when (open material gaps == 0) AND (every Checklist A criterion source-reproduced green over window N) AND (every Checklist B artifact OPERABLE=true via executed test) AND (independence attested) AND (no open red-team finding). The receipt enumerates: every exit criterion + source + reproduced value + pass/fail; every handover artifact + operability-test result; every gap opened/fixed/accepted with timestamps; the iteration count; the independence attestation; the human-input citations or the flagged assumption on silence; and any explicit accepted-risk entries. Any gap moved to known-limitations/backlog is listed with its surfacing on the human input channel and is confirmed to be a non-exit-blocker. If a material gap is persistent and unfixable, the verdict reads NOT CONVERGED → escalated-as-accepted-risk (surfaced to human, non-blocking) and never auto-flips to converged.
Questions the agent asks (5)
  • What is the sustained-stability window N (e.g., 7/14/30 days) over which exit criteria must hold, and what is the pre-agreed issue-inflow threshold below which inflow must stay across that window?
  • Where are the authoritative SOURCE systems Verdict must re-derive each metric from — which SLO/observability platform, which incident tracker, which value-realization/usage store, which cost telemetry, and which CI/security-scan receipt store?
  • Was a pre-go-live BASELINE captured for each success metric, and where is it stored — so every 'success metric met' claim can be anchored to the baseline it improved on?
  • Is real business value being realized against that baseline, are there real-world consequences the dashboards may have missed, and is the receiving BAU team real, named, and ready to operate? (surfaced to the sponsor as input, non-blocking)
  • What are the agreed RTO/RPO for the restore drill and the cost envelope for the run-cost criterion, and what event reopens the war room (the hypercare RE-ENTRY trigger) if BAU destabilizes post-handover?
Do (7)
  • Re-derive EVERY exit metric independently from the source system (observability, incident tracker, value/usage data, cost telemetry, CI/security receipts) and FAIL any claim Verdict cannot reproduce from source — run this reproduction pass BEFORE classifying any gap's materiality
  • Hold exit criteria over the full sustained window N, not a single quiet day, and treat zero open Sev-1/Sev-2, SLO burn within error budget, stable adoption, and in-envelope cost as binary gates
  • Prove handover OPERABILITY by execution — run each runbook as a dry-run/game-day by an operator who did not write it, execute a restore drill within RTO/RPO, and execute a rollback — never accept a never-run document as a capability
  • Run the adversarial premature-exit pass every iteration: check for incident suppression/misclassification, freeze-masquerading-as-stability, and subtly-wrong (non-erroring) outputs consuming error budget
  • Attest reviewer independence explicitly (reviewer != author of the hypercare story and != author of the handover docs); exclude the war-room operators from certifying their own exit
  • Proactively surface to the human the things only they can know (real value vs. baseline, real-world misses, the named receiving BAU team) as non-blocking input, and record verbatim either their answer or the flagged assumption taken on silence
  • Make the verdict itself a ledger receipt: every criterion, source, reproduced value, operability result, gap lifecycle with timestamps, iteration count, independence attestation, and accepted-risk register — claim only what the receipt proves
Don't (7)
  • Do NOT accept any receipt, SLO number, incident count, or success metric that Verdict cannot independently reproduce from the source system — a non-reproducible receipt is a bluff and a material gap, never a pass
  • Do NOT reintroduce a human approval/sign-off anywhere — the human is input only; 'exit' and 'handover acceptance' are evidence-convergence verdicts, not a reviewer or a human feeling confident
  • Do NOT silently reclassify an open material gap into 'known limitations' or 'next-release backlog' to force convergence — any such move must be a non-exit-blocker, surfaced on the human input channel, and recorded in the verdict
  • Do NOT mark a runbook, restore, or rollback OPERABLE because it exists or 'should work' — only an executed, passing dry-run/game-day/restore/rollback counts
  • Do NOT exit on a lucky quiet day, a change/deployment freeze, or by downgrading/misclassifying incidents to keep counts low — the sustained window and the red-team pass exist to catch exactly this
  • Do NOT let the war-room operators (who authored the stability story) certify their own exit, and do NOT let Verdict claim independence it cannot attest
  • Do NOT auto-flip the loop to 'converged' while any material gap, unreproduced metric, failed operability test, open red-team finding, or unattested independence remains
Guardrails (9)
  • Convergence is a strict four-part conjunction enforced by the verdict receipt: open material gaps == 0 AND every exit criterion source-reproduced green over window N AND every handover artifact OPERABLE by executed test AND independence attested AND no open red-team finding — anything less stays OPEN or escalates as accepted-risk
  • Binary materiality rubric (non-negotiable): any open Sev-1/Sev-2, any unmet/unreproduced SLO, any runbook/restore/rollback that fails or was never executed, any missing or unreproduced security/test evidence, and any unanchored (no-baseline) success claim are MATERIAL by definition and cannot be argued down
  • Receipt provenance: the reviewer recomputes from raw source systems; a number read only from the team's dashboard/summary does not satisfy the criterion and is a material gap
  • Independence is a precondition, not a formality: reviewer must be a non-author of the artifact being certified; if the only available reviewer is the author, that is itself a material gap
  • No-silent-reclassification: a gap may move to known-limitations/backlog ONLY if it is not an exit blocker AND is explicitly surfaced to the human input channel AND recorded in the verdict — never silently
  • Human stays strictly input/non-blocking with a flagged-assumption fallback on silence; the human never approves, signs off, or gates this brick
  • Non-convergence semantics: a persistent unfixable material gap is surfaced to the human as flagged non-blocking input and recorded as an explicit accepted-risk in the verdict; the loop never auto-converges on an open material gap
  • Handover scope must include a proven rollback and a defined hypercare RE-ENTRY trigger so BAU can both run and safely revert the system — exit is not treated as one-directional
  • Honesty architecture: the verdict may claim only what a reproducible receipt proves; every 'met/operable/exited' assertion is backed by a source-reproduced receipt, an executed-test result, or a reviewer verdict — never asserted
29

Handover, Documentation & Continuous Improvement

AI Agent: Cadence

Close the engagement without anyone "declaring victory": execute the closure as a work brick — assemble the complete handover package (architecture, executable runbooks, ops docs, test + security evidence, a structured known-limitations register, a prioritized next-release backlog, and a HANDOVER RECEIPT MANIFEST that maps every closure claim to a real receipt id), run a full end-to-end delivery retrospective that emits concrete template diffs (not a dead lessons-learned doc), transition the system from hypercare to a genuinely-owned BAU state, and re-open the continuous-discovery loop for the next release. Every "done / passed / secure / live" claim must be backed by an inspectable receipt (test run id + counts, security scan id + result, health-check timestamps + codes, git hash, runbook-execution transcript, prior independent-review verdict ids); an unbacked claim blocks closure exactly like the Truth Gate blocks an unbacked agent action. The human (sponsor) is an INFORMED RECIPIENT and a CONTINUOUS INPUT CHANNEL — they receive a walkthrough of the LIVE system (per the Discovery Step-23 precedent: walk the running platform, never a deck), and their questions / new asks flow into the next-release backlog as input; they do NOT sit in an approval gate and there is NO sign-off. Where the human is silent the team closes on an explicit FLAGGED assumption logged in the manifest rather than waiting. This work brick is immediately followed by the independent review_loop brick `handover_acceptance_review` (Verdict + Proof + Cipher + Vector, none of whom authored the package), so the final and most consequential claim — "the engagement is done" — is itself independently reviewed against the 7 objectives and the receipt manifest, never self-asserted.

Deliverables
[AI · Cadence] Handover package manifest — enumerates every required artifact (architecture & design docs, runbooks, ops docs, test evidence, security evidence, known-limitations register, next-release backlog, retrospective, receipt manifest, BAU-readiness checklist) with each item's location, version/git-hash, and reviewer-verified flag.
Acceptance: Manifest lists 100% of required artifacts; every row has a resolvable location + git hash; zero rows missing or with reviewer_verified=false at closure.
[AI · Cadence] HANDOVER RECEIPT MANIFEST — a table mapping every closure claim ('deployed', 'tests pass', 'security clean', 'runbook works', 'reviewed solid') to a concrete receipt id (test run id + pass/fail counts, security scan id + result, health-check timestamps + HTTP codes, git hash, runbook-execution transcript id, prior independent-review verdict id).
Acceptance: Count of closure claims with no linked receipt id = 0; every receipt id resolves to an inspectable artifact; manifest is committed to the repo.
[AI · Keystone] Architecture & design handover doc — current as-built architecture, data model, integration map, and key decision log, reconciled against the running system.
Acceptance: Doc git hash matches the deployed release tag; every component in the architecture diagram maps to a deployed service confirmed by a health-check receipt id; no component listed that is not running and no running service omitted.
[AI · Vector] Verified-executable runbook set — deploy, rollback, restart, backup/restore, scale, incident-response, and routine-ops runbooks, each executed end-to-end in a clean context against the live/staging environment.
Acceptance: Each runbook has an execution transcript with captured commands + exit codes; 100% of runbooks ran to completion with exit code 0 (or documented expected non-zero); transcript ids are listed in the receipt manifest; runbooks tagged with 'Last verified against the live environment' date.
[AI · Vector] BAU transition pack + BAU-readiness checklist — per-service SRE ownership map, SLOs + alerting wired, on-call/escalation path, rehearsed rollback, live monitoring dashboards, cost/quota ownership, and the binary hypercare-exit criteria.
Acceptance: Checklist is all-green: every running service has a named owner; at least one test alert fired and was received (alert receipt id); rollback rehearsal has a transcript receipt; dashboards return 200 (health-check receipt); cost/quota owner assigned for every paid resource; hypercare-exit criteria (e.g. N days stable, no open Sev-1/Sev-2, error budget intact, on-call validated) are stated and each is met with an evidence receipt id.
[AI · Proof] Test & security evidence bundle — final test-suite run, coverage summary, and current security scan results (SAST/DAST/dependency/SARIF) collected as receipts, with Cipher confirming currency.
Acceptance: Test bundle has a run id with explicit pass/fail/skip counts and the run timestamp is within the closure window on the release git hash; security scan has a scan id + result and timestamp within the closure window; both ids are linked in the receipt manifest; zero claimed-passed items without a counted receipt.
[AI · Cipher] Known-limitations register — structured: each item = description + severity + workaround + owner + target release, cross-linked to the next-release backlog, audited so no real defect is relabeled as a 'limitation'.
Acceptance: Every register row has all five fields populated and a backlog link; Cipher red-team audit recorded a verdict that no entry is a silently-failing defect mislabeled as a limitation; any item found to be a real failure is moved to a backlog defect with a fail-receipt.
[AI · Vela] Prioritized next-release backlog — each item with priority, estimate, and rationale, traceable to a limitation, a retro lesson, or a human-input question.
Acceptance: 100% of backlog items have priority + estimate + rationale; every limitations-register item with a future target release has a matching backlog item; human questions captured during handover appear as backlog items with source='human-input'.
[AI · Cadence] End-to-end delivery retrospective with template diffs — facilitated retro across the delivery team that converts each material lesson into a concrete committed diff to THIS process template (tightened acceptance criterion, new guardrail, new reviewer lens, or a new honesty-eval/golden case seeded from any incident in this run).
Acceptance: Retro doc exists and lists >=1 committed template diff (a real patch / PR id) per material lesson; any incident from this run is seeded as a new honesty-eval/golden case that fails-before / passes-after (test ids recorded); count of material lessons with no resulting template diff = 0.
[Human] Informed handover walkthrough record + continuous-input intake — a live walkthrough of the running system + limitations register + next-release backlog delivered to the sponsor (never a deck); the human provides questions/new requirements as input, captured into the backlog.
Acceptance: Walkthrough was delivered against the LIVE system (session/recording receipt id logged, deck-as-deliverable count = 0); every human question/new ask is logged as a backlog item with source='human-input'; if the human is silent, an explicit flagged closure assumption is recorded in the receipt manifest. No 'sign-off' field exists or is required anywhere in the package.
[AI · Vela] Re-opened continuous-discovery loop + documented closure verdict — the next-release discovery loop is opened with seed inputs (backlog + limitations + human input), and a single documented closure verdict records that the done-definition is met.
Acceptance: Next-release discovery loop is open with a link to its seed backlog; closure verdict explicitly confirms all of: package manifest complete, receipt manifest fully backed, handover_acceptance_review verdict = closure-solid, BAU checklist green, retro template-diffs committed, discovery loop opened — with the receipt/verdict ids it relied on cited.
[AI · Cadence] DELIVERED package + proof: the self-deploying package handed to the sponsor (download/repo link) together with the clean-room deploy_receipt + DEPLOYED-AND-VERIFIED verdict and the adequacy attestation — i.e. a real, tested, locally-deployed, deployable-anywhere product.
Acceptance: The sponsor receives the package and can deploy it from its deploy.md; the clean-room verdict (DEPLOYED-AND-VERIFIED, package hash) + adequacy attestation are linked as the go-live evidence. No 'done' claim without these receipts.
[AI · Cadence] Rich HTML PROCESS WALKTHROUGH (process_walkthrough.html): branded, self-contained HTML that walks the sponsor BRICK-BY-BRICK through the delivery process that ran — phase-grouped, each brick with objective, owner/agent, deliverables + acceptance, and key guardrails — surfaced as a tab/link in the handover.
Acceptance: Single self-contained HTML renders every brick in order, grouped by phase; linked from handover; opens standalone in a browser.
[AI · Iris+Vela] THREE rich HTML PRODUCT documents about the delivered software, each branded + self-contained, scoped to the product's sizing level: (a) PRODUCT WALK-THRU — what it does, the key user journeys, screens/flows; (b) TRAINING MANUAL — task-based how-to for end users AND admins; (c) USER & REFERENCE MANUAL — complete feature / API / configuration reference.
Acceptance: All three self-contained HTML docs are delivered + linked in the handover; each covers the ACTUAL delivered functionality at the chosen sizing; Walk-Thru is narrative, Training is task-based step-by-step, Reference is exhaustive (every feature/endpoint/config).
Questions the agent asks (5)
  • Who are the named post-engagement owners (on-call, escalation, cost/quota) for each running service, and is there any handover-of-ownership constraint we should encode in the BAU ownership map?
  • Are there operational constraints (maintenance windows, change-freeze periods, compliance/audit retention) the runbooks and BAU transition must respect?
  • What outcomes or open questions from this release matter most to you for the next release — so we seed the next-release backlog with your priorities, not just ours?
  • Are there any limitations or risks you consider unacceptable to ship as 'known limitations' versus must-fix before BAU? (Input only — used to re-triage the register, not as a gate.)
  • Is there a preferred format/cadence for ongoing visibility into the running system (dashboards, periodic summary) you want wired before we close hypercare?
Do (8)
  • Back every closure claim with an inspectable receipt id and put it in the receipt manifest before declaring anything done; treat 'it works' as a non-receipt that blocks closure.
  • Execute every runbook end-to-end in a clean context against the live/staging environment and capture commands + exit codes as the receipt — observed receipt beats reported claim.
  • Hand over by walking the human through the LIVE running system (per Discovery Step-23), with the limitations register and next-release backlog open, and capture their questions as backlog input.
  • Convert every material retro lesson into a committed template diff, and seed any incident from this run as a permanent honesty-eval/golden case (fails-before / passes-after).
  • Structure the known-limitations register with severity + workaround + owner + target release per item, cross-linked to the backlog, and audit it for relabeled defects.
  • State and evidence the binary hypercare-exit criteria; only flip to BAU when the BAU-readiness checklist is all-green with receipts.
  • Where the human is silent, proceed and close on an explicit FLAGGED assumption logged in the manifest rather than waiting.
  • Hand the assembled package straight to the independent review_loop (handover_acceptance_review) — do not treat this work brick's own assertion as closure.
Don't (8)
  • Do NOT insert any human sign-off, approval gate, or 'client approves handover' step — the human is informed input and recipient, never a converger or blocker.
  • Do NOT relabel a real defect or silent failure as a 'known limitation' to close out; mislabeled failures must move to backlog defects with a fail-receipt.
  • Do NOT ship a runbook nobody executed, or claim a runbook 'works' without an execution transcript receipt.
  • Do NOT declare 'done / passed / secure / live' without a linked, resolvable receipt id; no unbacked claim survives closure.
  • Do NOT deliver the handover as a PowerPoint/deck-as-deliverable; the deck or DOCX/PDF is at most a by-product of the live system.
  • Do NOT let the retrospective end as a dead lessons-learned doc with no committed template change.
  • Do NOT close the engagement on the delivery team's self-assertion; closure is only valid after the independent acceptance review returns closure-solid.
  • Do NOT block or fake closure on the absence of a human reply — convergence is the independent panel's evidence, human silence becomes a flagged assumption.
Guardrails (9)
  • Truth-Gate parity: any closure claim without a linked receipt id is blocked from the manifest exactly as an unbacked agent action is blocked at the platform Truth Gate.
  • Closure done-definition (single documented verdict): package manifest complete AND receipt manifest fully backed AND handover_acceptance_review = closure-solid AND BAU checklist all-green AND retro template-diffs committed AND next-release discovery loop opened — all six, with cited receipt/verdict ids.
  • No-self-closure: this work brick is always immediately followed by the independent review_loop brick handover_acceptance_review (Verdict lead; panel Proof + Cipher + Vector; none authored the package); the engagement cannot be marked closed until that loop converges.
  • Anti-rubber-stamp: a reviewer may mark an item 'verified' only by citing the specific receipt id inspected; a verdict with no cited receipt is itself unbacked and rejected.
  • Re-execute high-risk receipts: the independent panel re-runs the test suite, re-triggers health checks, and re-executes at least one runbook rather than trusting transcripts — observed beats reported.
  • Red-team pass is mandatory: actively hunt for one closure claim with no real receipt and one 'known limitation' that is actually a relabeled silent failure; any finding reopens the work brick.
  • Human-as-input invariant: human questions become next-release backlog items; human silence becomes a flagged closure assumption; the human is never the converger and never a gate.
  • Hypercare boundary is binary: BAU begins only when stated hypercare-exit criteria are each met with an evidence receipt; no transition on narrative alone.
  • Permanent-lesson invariant: every incident from this run must become a committed golden/honesty-eval case (fails-before / passes-after) so the same defect cannot silently recur in the next run of this template.
30

Final INDEPENDENT handover-acceptance review AFTER handover: the lifecycle's most consequential claim — "the engagement is done" — is independently reviewed against the 7 objectives and the receipt manifest, never self-asserted by Cadence.

AI Agent: Verdict

This is the lifecycle's terminal gate: the single most consequential claim — "the engagement is done" — must be independently proven, never self-asserted by the team that did the work. A non-author panel led by Verdict (who authored none of the handover package) re-derives every row of the HANDOVER RECEIPT MANIFEST from primary source: test-run ids and pass/fail counts re-read from the runner, security-scan ids and results re-pulled from the scanner, health-check codes and timestamps re-hit live, git hashes re-resolved, runbook-execution transcripts re-read, and the ids of every prior independent-review verdict (spec, architecture, each sprint, release-readiness, hypercare-exit) confirmed SOLID and unsuperseded. Each BAU runbook is proven OPERABLE by an actual dry-run/restore drill — executed, not asserted. The known-limitations register is checked for honesty and completeness against the GapLog and the live system. The receiving BAU team is confirmed real and named (human-input, non-blocking; silence becomes a flagged assumption, never a block). The brick converges to a binary, ledger-backed closure-solid | not-solid ConvergenceVerdict — solid iff open material gaps == 0 AND every non-author reviewer is SOLID. This is the brick whose SOLID verdict handover_continuous_improvement gates its own closure on. There is no human sign-off.

Deliverables
[AI · Verdict] Re-derived HANDOVER RECEIPT MANIFEST — every row independently regenerated from primary source, not from Cadence's summary
Acceptance: Verdict reproduces the manifest row-by-row from source and the re-derived value matches the claimed value for EVERY row: test-run ids re-read from the test runner with matching pass/fail/skip counts; security-scan ids re-pulled from the scanner with matching findings/severities; health-check codes + timestamps obtained by re-hitting the live endpoints; every git hash re-resolved to an existing commit on the released ref; every runbook-execution transcript re-read; and every prior independent-review verdict id (spec, architecture, each sprint increment, release-readiness, hypercare-exit) confirmed to exist, read SOLID, and be unsuperseded. Any row that cannot be re-derived, or whose re-derived value differs from the claimed value, is logged as a material gap. Output is a ledger receipt listing each row with {claimed, re-derived, source-id, match=true|false}; acceptance = zero rows with match=false.
[AI · Vector] BAU runbook operability proof — each runbook proven OPERABLE by an actual executed dry-run / restore drill
Acceptance: For EVERY BAU runbook handed over (deploy, rollback, restore-from-backup, incident-escalation, cost circuit-breaker, on-call), Vector executes a real dry-run or restore drill in a safe environment and captures a timestamped execution transcript with the resulting exit/health codes. Acceptance is binary per runbook: the drill ran to its documented success state (e.g., restore drill produced a verified-good restored instance; rollback drill flipped to the prior immutable artifact and passed health) — operability is DEMONSTRATED, never asserted. Any runbook not drilled, or whose drill did not reach its documented success state, is a material gap. Output is a ledger receipt: per runbook {drill-run-id, transcript-id, reached-success-state=true|false}; acceptance = every runbook reached-success-state=true.
[AI · Proof] Known-limitations register honesty + completeness audit against the GapLog and the live system
Acceptance: Proof cross-checks the handed-over known-limitations register against (a) the full lifecycle GapLog (every gap ever raised), (b) the deferred/accepted items from every prior ConvergenceVerdict, and (c) the live system's actual behavior on a limitations probe set. Acceptance is binary: ZERO known limitation, deferred item, or accepted-risk exists in the GapLog/prior-verdicts/live-system that is absent from or misrepresented in the register, AND zero register entry is stale (already fixed but still listed as open). Each discrepancy is a material gap with its source id. Output is a ledger receipt enumerating register entries vs. source-of-truth with {present, faithful} flags; acceptance = no missing, no misrepresented, no stale entries.
[AI · Cipher] Adversarial red-team pass on the closure claim — attempt to break "done"
Acceptance: Cipher runs an adversarial pass that actively tries to falsify the closure claim: probes for receipts that look valid but point to superseded/stale artifacts (wrong git ref, expired scan, pre-fix health code), runbooks that pass a happy-path drill but fail a realistic failure-injection, manifest rows re-derived from a source the author also controlled (circular evidence), and any prior verdict that was SOLID but has since been invalidated by later changes. Acceptance is binary: every red-team finding is either refuted with a re-derived receipt or logged as a material gap; acceptance = the red-team pass completed and produced zero unrefuted findings. Output is a ledger receipt of attempted falsifications with {finding, refuted=true|false, evidence-id}.
[Human] Receiving BAU team confirmation — the team that will own run-state is real and named
Acceptance: Confirmation that the receiving BAU/run-state team is real and named — owner(s), escalation contacts, and on-call rota — captured as human input. NON-BLOCKING: this deliverable never blocks the ConvergenceVerdict. On human silence past the review window, the panel records a flagged assumption ('BAU recipient unconfirmed — assumed [named party] per handover package') in the closure ledger and proceeds; the assumption is surfaced in the verdict, not used to withhold it. Acceptance = either a human-confirmed named team OR an explicitly flagged-assumption record exists in the ledger.
[AI · Verdict] Binary, ledger-backed closure ConvergenceVerdict (closure-solid | not-solid) with the iterated GapLog
Acceptance: A single ledger-backed ConvergenceVerdict is emitted reading exactly closure-solid OR not-solid. closure-solid is permitted IFF (1) open material gaps == 0 across the manifest re-derivation, runbook drills, limitations audit, and red-team pass, AND (2) every non-author panel reviewer (Verdict, Proof, Cipher, Vector) records SOLID, AND (3) every prior independent-review verdict referenced in the manifest is confirmed SOLID and unsuperseded. While any condition is unmet the verdict is not-solid and the panel iterates the shared GapLog (file → owner re-derives/re-drills → re-verify) to closure; the BAU-confirmation deliverable is excluded from the blocking set (flagged-assumption on silence). The verdict carries the receipt ids backing every condition; there is no human sign-off gate. Acceptance = the verdict is binary, every SOLID condition is backed by a re-derived receipt id in the ledger, and the GapLog shows zero open material gaps at closure-solid.
Questions the agent asks (7)
  • Does EVERY row of the handover receipt manifest re-derive from primary source to the same value Cadence claimed — and which row, if any, fails to reproduce?
  • Has every BAU runbook (deploy, rollback, restore, escalation, cost-breaker) been proven by an ACTUAL executed dry-run/restore drill that reached its documented success state — or is any merely asserted operable?
  • Is the known-limitations register honest and complete against the full GapLog, the prior verdicts' deferred items, and the live system — any missing, misrepresented, or stale entry?
  • Is every prior independent-review verdict (spec, architecture, each sprint, release-readiness, hypercare-exit) confirmed to exist, read SOLID, and remain unsuperseded by later changes?
  • Did the adversarial red-team pass produce any unrefuted way to break the 'done' claim — circular evidence, stale receipts, happy-path-only drills, or invalidated prior verdicts?
  • Is the receiving BAU team real and named — and if the human is silent, is a flagged assumption recorded so closure proceeds non-blocking?
  • Are open material gaps exactly zero AND is every non-author reviewer SOLID — the two conditions the binary closure verdict turns on?
Do (8)
  • Place this brick LAST in the lifecycle, immediately AFTER handover_continuous_improvement, and make its SOLID closure verdict the thing handover_continuous_improvement gates its own closure on
  • Staff the panel exclusively with reviewers who authored NONE of the handover package — Verdict leads as standing independent evaluator, with Proof, Cipher, and Vector as non-author specialists plus an adversarial red-team pass
  • RE-DERIVE every manifest row from primary source (test runner, scanner, live health endpoints, git, runbook transcripts, prior-verdict ledger) — never accept Cadence's summary or any author-supplied roll-up as evidence
  • Prove each BAU runbook OPERABLE by executing a real dry-run/restore drill and capturing a timestamped transcript with exit/health codes — demonstrate, do not assert
  • Confirm every prior independent-review verdict (spec, architecture, each sprint, release-readiness, hypercare-exit) exists, reads SOLID, and is unsuperseded before counting it toward closure
  • Audit the known-limitations register against the full GapLog, prior deferred/accepted items, and live-system behavior for honesty AND completeness — catch both missing and stale entries
  • Treat the receiving-BAU-team confirmation as human-input, NON-BLOCKING — on silence, record an explicit flagged assumption in the ledger and proceed
  • Iterate the shared GapLog to convergence (file → re-derive/re-drill → re-verify) and emit a single binary, ledger-backed ConvergenceVerdict carrying the receipt id behind every condition
Don't (7)
  • Do NOT let Cadence (or anyone who authored the handover package) self-assert closure or sit on the review panel — the most consequential claim cannot be marked done by its own author
  • Do NOT accept any manifest row, test count, scan result, health code, or git hash on the author's word — an un-re-derived receipt is not evidence
  • Do NOT count a BAU runbook as operable because it is documented or asserted — without an executed drill that reached its success state, it is a material gap
  • Do NOT emit closure-solid while any material gap is open or any non-author reviewer is below SOLID — there is no partial or date-based closure
  • Do NOT count a prior verdict toward closure if it is missing, not SOLID, or has been superseded by later changes — stale upstream SOLID is not SOLID
  • Do NOT block the ConvergenceVerdict on the human BAU-team confirmation — silence becomes a flagged assumption, never a stop
  • Do NOT insert any human sign-off / approval gate — closure is an independent, receipt-backed AI verdict, not a human signature
Guardrails (7)
  • GATE: closure-solid IFF open material gaps == 0 AND every non-author panel reviewer (Verdict, Proof, Cipher, Vector) is SOLID AND every referenced prior independent-review verdict is SOLID-and-unsuperseded — otherwise not-solid and iterate
  • INDEPENDENCE: the panel authored none of the handover package; Verdict never reviews what it wrote; closure cannot be self-asserted by Cadence or any author
  • RE-DERIVATION IS LAW: every manifest row is regenerated from primary source with {claimed, re-derived, source-id, match}; the verdict cites the receipt id behind each condition — no author summary is ever evidence
  • OPERABILITY IS DRILLED, NOT DECLARED: each BAU runbook carries a real dry-run/restore-drill transcript that reached its documented success state, or it is an open gap
  • NON-BLOCKING HUMAN INPUT: the receiving-BAU-team confirmation is human-input and never blocks the verdict; on silence a flagged assumption is recorded and closure proceeds
  • NO HUMAN SIGN-OFF: there is zero owner=Human-blocking approval gate; closure is a binary, ledger-backed, independently-reviewed AI ConvergenceVerdict
  • TERMINAL POSITION: this brick is LAST, runs AFTER handover_continuous_improvement, and its SOLID verdict is the precondition that brick gates its own closure on

Generated from the live Flowtely process library · self-contained · AI-Led Agile Software Delivery