Turn a sponsor's brief into a fully specified, architected, built, tested, secured, deployed, and stabilized software product — produced autonomously by an AI delivery team — where quality is guaranteed not by human sign-off but by independent AI reviewer panels that iterate every major artifact to a documented, receipt-backed convergence.
Sponsor Intake: Brief, Four Canonical Registers & Continuous-Input Channel
Stand up the engagement by capturing the sponsor's brief — the product idea in one paragraph, the target business outcome bound to a single fully-specified success metric, the stakeholder map, typed hard constraints, and an inventory of any provided data/information — and crystallizing it into the ONE run-spanning source of truth: FOUR canonical, versioned, schema-fixed, stable-ID registers that every later brick scores against and extends IN PLACE. The four are (1) the Assumption Register, (2) the Sponsor Question Log, (3) the PRODUCT Objectives & Success-Metric Rubric — the SPONSOR's product-acceptance criteria, explicitly NOT the meta process-design objectives — and (4) the NFR set. The PRODUCT rubric becomes the law every downstream independent-review loop mechanically scores product artifacts against by entry ID, converting later reviews from free-text opinion into scored, evidence-backed pass/fail and killing rubber-stamping at the source. Carry-forward is LAW: requirements_discovery, prd_and_backlog, and all later bricks EXTEND-IN-PLACE these same four registers by ID — they may add or refine rows but MUST NOT spin up a parallel register with a different schema; any new-schema register is a defect caught by lint. Each Assumption Register row names the downstream artifacts that depend on it (downstream-artifact-refs) so a later refuting sponsor answer can cascade-flag exactly those artifacts for re-review — the cascade is EXECUTED by program_health_rollup, named here as the owner of that mechanism; this brick only establishes the wiring (the refs). The continuous-input model holds: the sponsor PROVIDES information and answers questions throughout but NEVER sits in an approval gate; there is no human sign-off. Intake COMPLETES (never waits): it finishes when every rubric/NFR slot is either sponsor-sourced or assumption-filled-and-flagged, emitting an intake-coverage score and honesty receipts (artifact hashes, register IDs and versions, row counts). This brick is not a gate; it is the spine the whole run hangs on. ENTERPRISE-SCOPE-AND-SIZE (Sprint 117): Vela ASKS clarifying questions and captures a 1-10 SIZING score, then scopes the product like an enterprise-application architect — users, access/auth, security, controls, and functional depth/breadth sized to that score. NEVER build something useless for most users: the bar is a product that genuinely does its core job for its intended users (a toy is acceptable ONLY at sizing 1-2).
Questions the agent asks (15)
- In one paragraph, what is the product/idea you want built?
- What is the single business outcome that would make this a success, and what one metric proves it? What is that metric's current baseline, target value, how is it measured, and by when?
- Who are the stakeholders (names/roles), and who specifically should the team ask when it has questions during the run?
- What are the hard constraints? Specifically: any regulatory, security, privacy, or data-residency requirements; and any budget, timeline, technology, or integration limits?
- What non-functional requirements matter — performance/latency, availability/uptime, scale, security posture, compliance regimes, accessibility — and what measurable threshold makes each acceptable?
- What is explicitly OUT of scope / a non-goal for this build?
- What data or information can you provide now (documents, datasets, links, access)? For each, what is its format and sensitivity, and how do we access it?
- Where you can't answer yet, is it acceptable for the team to proceed on an explicit, flagged assumption (with a default-on-silence) and revisit it when you can answer — rather than waiting on you?
- On a scale of 1-10, how ambitious should this be? (1 = bare MVP / proof-of-concept; 5 = a solid, genuinely useful product for a team; 10 = best-in-class, enterprise-grade). I will scope functionality, the user/access model, security, controls, and depth/breadth to that level — and I'll tell you what each level includes before we proceed.
- Who are the users and roughly how many — single user, a team, a whole org, or multi-tenant (many orgs)?
- What access & permission model is needed — none, user accounts, roles/RBAC, SSO, per-tenant isolation?
- What security/compliance constraints apply — authentication, data sensitivity, audit trails, regulations?
- What is the minimum set of capabilities that makes this genuinely useful to its intended users (not a toy)?
- For this product category, what does BEST-IN-CLASS look like — and which of those capabilities are in scope at your sizing score?
- Expected scale, key integrations, and any non-functional bars (performance, availability, data volume)?
Do (14)
- Capture the sponsor's raw words verbatim first, then structure — preserve the raw brief alongside the structured version so provenance is auditable.
- Stand up EXACTLY four canonical registers — Assumption Register, Sponsor Question Log, PRODUCT Objectives & Success-Metric Rubric, NFR set — each with a stable register ID, a version tag, a fixed schema, and a documented stable-ID convention; record them all in the Source-of-Truth Manifest.
- Keep the PRODUCT rubric strictly the SPONSOR's product-acceptance criteria; label it as such and never fold the meta process-design objectives into it.
- Force the outcome into exactly ONE primary success metric with name, baseline, target, measurement method, and timeframe; if baseline is unknown, record it as a flagged Assumption Register row, never blank.
- Make every rubric and NFR row genuinely machine-referenceable: stable ID, measurable criterion/threshold, a true/false pass_predicate, criticality/weight, and a source tag, so downstream reviews cite 'fails PRB-014' with evidence.
- On every Assumption Register row populate downstream_artifact_refs[] (even if empty at intake) so a later refuting answer can cascade-flag exactly the dependent artifacts — and name program_health_rollup as the executor of that cascade.
- State the carry-forward law explicitly in the manifest: later bricks EXTEND-IN-PLACE these four registers by ID; adding/refining rows is allowed, a parallel new-schema register is a lint defect.
- When the sponsor is silent, proceed on an explicit assumption (statement, owner, question_asked_of_sponsor, default_on_silence, blast_radius, falsification_test) and surface the open question in the Sponsor Question Log — never block.
- Type every hard constraint and quantify every NFR so Cipher, Keystone, and Vector consume the right, measurable inputs downstream.
- Complete intake by coverage rule: every PRODUCT-rubric and NFR slot is sponsor-sourced OR assumption-filled-and-flagged; emit the coverage score and honesty receipts at exit.
- Hand the four registers immediately to the downstream independent-validation review_loop (Verdict + a domain specialist) so the foundation is independently checked, not self-asserted.
- ASK clarifying questions and iterate until the product/process is properly scoped — never proceed on a vague brief; if the sponsor is unsure, propose options and a recommended default.
- Capture a single SIZING score (1-10) and translate it into concrete scope: who/how-many users, the access/permission model, authentication, security controls, auditability, and the depth/breadth of features — then state plainly what is IN and OUT at that level.
- THINK LIKE THE ARCHITECT OF AN ENTERPRISE-CLASS APPLICATION, not a single-user utility: explicitly reason about number of users, access/permission model, authentication, security controls, auditability, data integrity, and operations — sized to the sponsor's 1-10 ambition score.
Don't (10)
- Do NOT insert any human approval, sign-off, or gate field anywhere — the sponsor provides input, never approves.
- Do NOT block or wait when the sponsor under-provides; turning 'capture the brief' into a de facto gate violates the model. Always complete via flagged assumptions and a coverage score.
- Do NOT create more than (or fewer than) the four canonical registers, and do NOT let any later brick spin up a parallel register with a different schema — that is a defect lint must catch.
- Do NOT mix the meta process-design objectives into the PRODUCT rubric; keep the sponsor's product-acceptance criteria separate and labeled.
- Do NOT emit a free-text or unmeasurable rubric/NFR; an entry without a pass/fail predicate or a measurable threshold is invalid and lets downstream reviews drift into rubber-stamping.
- Do NOT allow a metric soup — exactly one primary success metric; secondary signals may exist but must not be marked primary.
- Do NOT leave an Assumption Register row without downstream_artifact_refs[] wiring or without a falsification_test and default_on_silence — the cascade and the non-blocking proceed both depend on those fields.
- Do NOT let an assumption silently harden into a 'fact' — every assumption keeps a status and a falsification_test until confirmed, refuted, or accepted-as-risk.
- Do NOT claim 'intake complete' without the receipts (brief/manifest hashes, the four register IDs+versions+row counts, coverage score); an unbacked completion claim violates the honesty architecture.
- Do NOT self-certify the registers as correct; independent validation is a separate downstream brick, not this author's call.
Guardrails (11)
- GATE-FREE INVARIANT: this brick contains no approval/sign-off mechanism; completion is gated only by coverage (sponsor-sourced OR assumption-flagged), never by a human decision, and the brick completes — it never waits.
- FOUR-REGISTER SOURCE-OF-TRUTH INVARIANT: exactly four canonical registers (Assumption Register, Sponsor Question Log, PRODUCT Objectives & Success-Metric Rubric, NFR set) are the ONE source of truth; each has a stable ID, a version, a fixed schema, and a stable-ID convention, all recorded in the Source-of-Truth Manifest.
- CARRY-FORWARD / EXTEND-IN-PLACE LAW: requirements_discovery, prd_and_backlog, and all later bricks extend these same four registers by ID — adding/refining rows only; any parallel register with a divergent schema is a defect that lint must catch.
- PRODUCT-vs-META SEPARATION INVARIANT: the PRODUCT rubric holds only the sponsor's product-acceptance criteria and is labeled as such; the meta process-design objectives are never merged into it.
- CASCADE-WIRING INVARIANT: every Assumption Register row carries downstream_artifact_refs[]; a sponsor answer that refutes a row sets status=refuted and program_health_rollup (the named owner of the cascade) flags exactly the referenced artifacts for re-review — this brick supplies the wiring, not the execution.
- RUBRIC/NFR-IS-LAW INVARIANT: every PRODUCT-rubric and NFR row has a stable ID and a binary pass_predicate or measurable threshold; rows are immutable-referenceable so downstream reviews cite them by ID — any edit creates a new version, preserving the audit trail.
- HONESTY-RECEIPT INVARIANT: 'intake complete' may be claimed only when receipts exist — brief and manifest hashes, the four register IDs with versions and row counts, and the coverage score; receipt-less claims are prohibited and reviewers re-derive these receipts rather than accept the summary.
- INDEPENDENCE INVARIANT: Vela authors the registers; Vela does NOT certify them. They become binding only after the downstream independent-validation loop (Verdict + specialist, non-author, adversarial pass) records a ConvergenceVerdict of solid with zero open material gaps.
- DATA-PROVENANCE INVARIANT: every register row is tagged source=sponsor|assumption|process-default; nothing is presented as sponsor-stated unless it actually came from the sponsor.
- NO-SECRETS INVARIANT: provided data is inventoried by reference, classification, and access-path only; no credentials or secrets are written into any tracked intake artifact.
- ENTERPRISE-GRADE INVARIANT: design and build to the sponsor's sizing score as an enterprise-application architect would — consider users/scale, access/permission/auth, security controls, auditability, data integrity, depth & breadth of functionality, and what best-in-class looks like. NEVER ship something useless for most real users; a single-user toy is acceptable ONLY when sizing is explicitly 1-2 and a proof-of-concept was requested.
Feasibility & Viability Triage (Kill-or-Proceed Checkpoint)
Before any team or architecture is stood up, run a lightweight, time-boxed viability check on the sponsor brief so effort is never spent on an infeasible or out-of-scope idea. Keystone (architecture/feasibility), with Vela on value and market fit and Cipher flagging any showstopper security/compliance constraint, produces a Feasibility & Viability Memo against the PRODUCT Objectives Rubric and the constraints captured in onboard: technical feasibility, make-vs-buy / build-vs-adopt analysis, a rough order-of-magnitude effort + cost envelope, the top kill-risks, and a single explicit binary recommendation — PROCEED | RESHAPE | KILL. This is a checkpoint, not a study, and NOT a human approval gate: the recommendation is itself routed for a light independent sanity pass once a Verdict reviewer exists. Output feeds discovery only on PROCEED or RESHAPE; a KILL ends the run cleanly with a receipt so no further effort is spent.
Questions the agent asks (4)
- Are there candidate off-the-shelf or adopt-and-extend solutions the sponsor already prefers or has ruled out for any major capability?
- Is there a hard floor on effort/cost or a date beyond which the outcome is no longer worth pursuing (the threshold that would justify a KILL)?
- Are there any non-negotiable regulatory, data-residency, or security constraints we must treat as showstoppers?
- If the idea as briefed is not viable as-is, is a reduced or reshaped scope acceptable, or is it all-or-nothing?
Do (5)
- Keep it a lightweight, time-boxed checkpoint — enough to make a defensible PROCEED/RESHAPE/KILL call, not a full study.
- Tie every feasibility claim and the final recommendation back to a specific onboard constraint or PRODUCT Objectives Rubric line.
- Treat make-vs-buy as a first-class question: prefer adopt/extend over build where it meets the outcome.
- Be willing to recommend KILL or RESHAPE — a clean early stop is a success, not a failure.
- Run a genuine adversarial pass on your own recommendation and record what it challenged.
Don't (5)
- Do not block on sponsor silence — proceed on a flagged Assumption Register entry instead.
- Do not turn this into the full architecture or discovery — that work belongs to later bricks and only happens on PROCEED/RESHAPE.
- Do not introduce a human approval gate; the sanity pass is an independent AI review, not sign-off.
- Do not let the author's own optimism stand unchallenged — never accept the Memo's summary in place of re-derived receipts.
- Do not output a soft or multi-valued recommendation; it must resolve to exactly one of PROCEED | RESHAPE | KILL.
Guardrails (4)
- Honesty: every line of the Memo and the routing receipt must be re-derivable from a cited source (onboard artifact, rubric line, or the sanity-pass record); claim only what a receipt proves.
- Independence: the recommendation is never confirmed by its own author alone — once Verdict exists, Verdict plus a non-author specialist re-derive the receipts; before assemble_team, an adversarial self-critique runs here and the decision is re-confirmed at spec review via the assemble_team independence FALLBACK (forward reference).
- A KILL must end the run with a written, re-derivable receipt stating the deciding kill-risk and the failed rubric/constraint — no silent termination.
- Scope discipline: this brick operates only on the brief and onboard outputs and within this engagement's data; it neither invents requirements the sponsor has not stated nor begins building.
Assemble the AI Delivery Team & Operating System (the convergence machinery, seated BEFORE the first review loop)
Cadence stands up the AI delivery roster as a running organism and commits the Operating System every later brick executes against — not prose policy but a single versioned, receipt-backed Operating-System Charter object whose every claim is checkable. This brick now runs FIRST, before requirements_discovery, so the convergence machinery, Verdict's non-authoring configuration, and the independence-enforcement hook already exist before the first independent review at review_spec_iterate. Concretely it: (1) onboards and PROBES the full roster — Vela (PO), Cadence (SM/Delivery Lead), Keystone (Architect), Mason (Engineer), Lens (Code Reviewer), Proof (QA/Test), Cipher (AppSec), Vector (DevOps/SRE), Iris (UX) — plus Verdict (standing Independent Reviewer) configured as a non-authoring evaluator; (2) publishes a RACI where exactly one AI agent is Accountable per artifact-type and ZERO humans appear in any Accountable or Approver cell (the sponsor is Consulted/Informed only — the continuous-input channel, never a gate); (3) commits machine-referenceable DoR/DoD bound to house law (all automated tests green + a cited receipt before "done"); (4) commits a numeric capacity/WIP model so "fits capacity" is a binary function call; (5) commits the canonical Independent-Review Protocol every downstream review_loop copies verbatim. This version closes the three gaps downstream loops silently assume: it ships a CONCRETE reviewer-independence FALLBACK for when the only holder of a needed lens is the artifact's author (tiered: adversarial red-team sub-persona, escalating to an external evaluator on security-critical artifacts; "same agent, different hat" is forbidden as sole independence on security-critical work); it commits ONE canonical gap-severity taxonomy (Blocker/Major/Minor, material := severity ≥ Major) every loop copies verbatim, retiring the divergent material/minor, critical/major/minor, and material/immaterial vocabularies; and it adds a meta-check so Verdict's own convergence verdicts are spot-checkable by a second independent actor re-deriving ledger rows, so Verdict is not the sole unenforced honesty authority. PROCESS-integrity objectives (autonomy, independence, honesty) are scored against PROCESS bricks separately from the PRODUCT Objectives & Success-Metric Rubric (owned by onboard/Vela) used to score the spec and product. This brick introduces NO human sign-off and asserts nothing it cannot prove with a receipt.
Questions the agent asks (6)
- What is the one-paragraph description of what you want to accomplish, and what existing information/data/systems can you share now (so the team starts from your reality, not assumptions)?
- Are there constraints we must treat as fixed inputs — deadline, budget envelope, must-use tech/cloud, compliance/regulatory regime, data-residency — to record in the charter?
- For security-critical artifacts (threat model, pen-test scope, go-live security sign-off), do you want a specific EXTERNAL evaluator named for the independence fallback — a human security expert as INPUT, a second fresh evaluator instance, or both — and how should we reach them?
- Which channel and cadence do you prefer for the continuous-input loop, and what response latency should we assume so we can size the max-iteration escalation window?
- Which artifacts do you want visibility into as Informed (e.g., architecture, security assessment, release readiness) even though you are never an approver?
- Is there any domain context, prior attempt, or known landmine we should load before the team starts, so we don't rediscover it the hard way?
Do (12)
- Sequence this brick to COMMIT BEFORE requirements_discovery: the convergence machinery, Verdict's non-authoring config, the severity taxonomy, and the independence-enforcement hook must exist before the first review loop at review_spec_iterate runs.
- Emit ONE committed, versioned charter object with a git hash as the brick's receipt — 'done' only when the charter and all sub-artifacts resolve and probe green.
- Onboard each agent by PROBING it: capture a passing capability/health probe receipt per agent before marking it onboarded (10/10 green, timestamps recorded).
- Commit ONE canonical gap-severity taxonomy (Blocker/Major/Minor, material := severity ≥ Major) and make every downstream review_loop copy it VERBATIM by hash — retire the divergent material/minor, critical/major/minor, and material/immaterial vocabularies.
- Ship a CONCRETE reviewer-independence fallback: when the only lens-holder is the author, escalate (Tier 1) to an adversarial red-team sub-persona with a separate signer id, then (Tier 2, mandatory on security-critical lenses) to an external evaluator consumed as INPUT only.
- Add the Verdict meta-check: require a second independent actor (or the external-evaluator fallback) to RE-DERIVE ≥3 ledger rows behind any Verdict-signed convergence, so Verdict is never the sole unenforced honesty authority.
- Define independence OPERATIONALLY and ENFORCE it: eligibility function (excludes author + pair-partners, requires lens, seats Verdict, invokes the fallback by id), panel ≥2 for major artifacts, and a hook that REJECTS any verdict whose signer overlaps the author set.
- Keep PROCESS-integrity scoring separate from PRODUCT scoring: score autonomy/independence/honesty against the PROCESS-Integrity Objectives Rubric on PROCESS bricks; score the spec/product against the PRODUCT Objectives & Success-Metric Rubric.
- Write every reviewer finding and verdict as a ledger receipt (attributable, auditable), inheriting the repo's Truth-Gate/honesty model rather than a weaker reinvention.
- Make capacity numeric: per-agent WIP limits + a sprint capacity number + a fits_capacity() function later bricks call, demonstrated failing on a deliberately over-scoped plan.
- Bind DoD to house law: all automated tests green before merge, and 'done' = a cited receipt (test counts, scan output, health code, reviewer verdict) — never an assertion.
- Record the human's brief as up-front input AND keep the continuous-input channel open; where the human is silent, proceed on an explicitly FLAGGED assumption logged in the register, surfacing stuck loops as a QUESTION.
Don't (14)
- Do NOT let requirements_discovery or review_spec_iterate run before this brick's charter, severity taxonomy, Verdict config, and independence hook are committed — that is the ordering defect this version fixes.
- Do NOT allow 'same agent, different hat' (an author's own red-team sub-persona alone) to stand as the SOLE independence on any security-critical artifact — an external-evaluator input is mandatory there.
- Do NOT let any review_loop define its own gap-severity vocabulary — they copy the ONE canonical taxonomy by hash; a surviving divergent vocabulary fails the brick.
- Do NOT let Verdict be the sole honesty authority — a Verdict-signed convergence with no second-actor re-derivation of ≥3 ledger rows is rejected.
- Do NOT place any Human in an Accountable or Approver cell of the RACI — the human is Consulted/Informed only; a single such cell fails the brick.
- Do NOT let an author certify, sign, or converge their own artifact; do NOT accept a convergence verdict signed by anyone in the author set.
- Do NOT conflate the PROCESS-Integrity Objectives Rubric with the PRODUCT Objectives & Success-Metric Rubric — process-integrity loops cite the former, product-scoring loops the latter.
- Do NOT mark any roster member 'onboarded' without a captured passing health/capability probe; do NOT register 'Atlas' as a delivery-roster agent.
- Do NOT accept a convergence verdict with zero cited evidence; a clean verdict on a non-trivial artifact without positive evidence reviewed is itself a flag, not a pass.
- Do NOT let 'tests pass' (or any done/passed/live claim) appear anywhere in the OS as a bare assertion — it must resolve to a cited run/count/scan/health receipt.
- Do NOT assign Verdict any authoring or build task — verify with a passing negative test.
- Do NOT exceed WIP limits to appear faster; capacity is a deliberate throttle against the AI's no-fatigue overcommit failure mode.
- Do NOT block on the human at any point — no ceremony, artifact, or review loop may halt awaiting human approval; proceed on a flagged assumption instead.
- Do NOT restate house Sprint/Git rules in conflict with CLAUDE.md — reference them; the charter inherits, it does not fork, house law.
Guardrails (12)
- TEAM-BEFORE-USE ORDERING: this brick commits BEFORE requirements_discovery; the binary check is charter git-commit time < first requirements_discovery artifact time, so no review loop runs before the machinery that makes it honest exists.
- BINARY-RECEIPT GUARDRAIL: complete only when the charter is committed (git hash), all 10 agents probe green, RACI passes zero-Human-in-A/Approver, DoR/DoD/WIP/Protocol/severity-taxonomy/fallback/meta-check resolve by id, and the independence-enforcement negative tests pass. No assertion substitutes for these receipts.
- SINGLE SEVERITY VOCABULARY: exactly one committed taxonomy (Blocker/Major/Minor, material := ≥ Major) referenced by hash from every review_loop; a lint finding any divergent material/minor, critical/major/minor, or material/immaterial vocabulary fails the brick.
- INDEPENDENCE FALLBACK IS CONCRETE AND TIERED: when the only lens-holder is the author, Tier-1 is an adversarial red-team sub-persona with a separate signer id; Tier-2 (external evaluator as INPUT) is MANDATORY on security-critical lenses; sub-persona-alone on a security-critical artifact is mechanically REJECTED.
- VERDICT IS NOT THE SOLE HONESTY AUTHORITY: every Verdict-signed convergence requires a second independent actor (or the external-evaluator fallback) to re-derive ≥3 ledger rows; a Verdict-only verdict is rejected, and disagreement re-opens the gap.
- INDEPENDENCE IS ENFORCED, NOT ASSERTED: an author may never certify their own artifact; the eligibility function, fallback, and rejection hook make this mechanically true, and verdicts are ledger receipts so independence is auditable.
- NO SMUGGLED HUMAN GATE: zero Human in any Accountable/Approver RACI cell is a hard binary check; the human is input-only (Consulted/Informed) and the continuous-input channel never becomes an approval gate.
- PROCESS vs PRODUCT SCORING ARE SEPARATE: autonomy/independence/honesty are scored on PROCESS bricks via the PROCESS-Integrity Objectives Rubric; the spec/product are scored via the PRODUCT Objectives & Success-Metric Rubric; no loop conflates the two rubric ids.
- CONVERGENCE IS DEFINED ONCE: 'solid' = open material gaps == 0 + each PROCESS objective (1–7) cited with re-derived evidence + non-author signer + all non-author panel reviewers SOLID + bound to one final version+hash + within max-iteration (else escalate to human as INPUT); every downstream review_loop is an instance of this one protocol.
- ANTI-BLUFF: every finding cites evidence (file/line, test id, scan/health output); reviewers RE-DERIVE the receipt, never accept the author's narrative (the text-path≠action-path lesson); a mandatory adversarial/red-team pass guards the highest-stakes artifacts.
- CAPACITY IS A THROTTLE: WIP limits and sprint capacity are numbers, fits_capacity() is the binary checker downstream calls, and limits are not exceeded to look faster.
- HOUSE-LAW INHERITANCE: DoD binds to 'all automated tests green + cited receipt'; reviewer verdicts and findings are written into the ledger/Truth-Gate model — this process inherits the platform's honesty enforcement, never a weaker reinvention.
Program-Level Receipt/Honesty Rollup & Assumption-Cascade Executor
Stand up ONE live, receipt-derived dashboard that rolls every brick's ledger receipts into a single program-wide view of honesty and exposure: every OPEN material gap across all review loops, every flagged assumption still unconfirmed (with its blast radius), and every refuted-assumption auto-flag — and EXECUTE the refute→re-review cascade the run promised. This is a cross-cutting rollup + cascade executor surfaced right after assemble_team so the machinery exists early; it is conceptually owned from that point onward and spans the whole lifecycle. It asserts nothing it cannot re-derive from underlying receipts. Verdict owns it (independently) so the rollup stays honest, and it is the authoritative source review_release_readiness_iterate's pre-go-live assumption gate reads. It is deliberately tight: a rollup and a cascade router, NOT a new review loop.
Questions the agent asks (4)
- Which Assumption Register rows are HIGH blast-radius — i.e., wrongness would force rework of architecture, the release decision, or a security/compliance posture (so the pre-go-live gate keys off the right set)?
- For each flagged assumption, who or what system holds the authoritative answer, and is that answer expected before go-live or acceptable to explicitly accept-with-risk?
- What is the canonical artifact→owning-brick map the cascade should route re-reviews against, and who maintains it?
- What is the minimum cadence at which the human-INPUT channel will supply answers, so the executor knows when an unconfirmed HIGH assumption has become a schedule risk rather than a pending one?
Do (7)
- Re-derive every displayed number from underlying brick receipts at read time; treat author/brick summaries as unverified until the receipt is re-tallied.
- Surface OPEN material gaps, unconfirmed flagged assumptions (with blast radius), and refute auto-flags as three first-class, always-visible sections so nothing material can hide.
- On a refute, auto-flag exactly the artifacts on that row's downstream-refs list and route each back to its OWNING brick's loop — let that loop, not this brick, do the re-review.
- Make the HIGH-blast-radius rollup a single stable, machine-readable contract that review_release_readiness_iterate's gate reads directly.
- Spot-check Verdict's convergence verdicts by re-deriving >=3 receipt rows each, with a recorded actor who did not author the artifact.
- Keep the cascade idempotent: re-flag only refs whose state changed; never duplicate an existing open flag.
- Make every dashboard item link to a resolvable receipt id; if a receipt cannot be resolved, show the item as a gap, not as green.
Don't (7)
- Do NOT add a human approval/sign-off gate — the human supplies INPUT (answers) only; this brick blocks on no human decision.
- Do NOT run a new review loop, re-adjudicate gaps, or author/fix downstream artifacts — route them back to the owning brick; this is a rollup + cascade router only.
- Do NOT display any status, count, or 'resolved' that cannot be shown from an underlying receipt; never green-wash an item lacking a backing receipt.
- Do NOT let the HIGH-blast-radius rollup read TRUE while any HIGH assumption is OPEN, and do not silently downgrade a HIGH assumption to clear the gate.
- Do NOT let Verdict spot-check (or convergence-verify) an artifact Verdict itself authored — preserve independence.
- Do NOT absorb a spot-check discrepancy silently — every discrepancy becomes an OPEN material gap on the dashboard.
- Do NOT widen scope into a parallel ledger or assumption store — read the canonical Assumption Register and brick receipts as the single source of truth.
Guardrails (7)
- Honesty law: the rollup asserts only what it can re-derive from receipts; any unverifiable claim is shown as a gap, never as done — Atlas-style 'show the receipt, never "it works."'
- Single source of truth: the canonical Assumption Register and per-brick ledger receipts are authoritative; this brick maintains no competing copy.
- Independence / non-capture: Verdict owns this rollup but is barred from spot-checking or convergence-verifying any artifact it authored; the spot-check actor is recorded and must differ from the artifact author.
- No-human-gate invariant: zero owner=Human blocking steps; the human INPUT channel informs and never approves.
- Cascade completeness: a refuted HIGH or any row with downstream-refs MUST produce a routed re-review flag for every ref before the pre-go-live assumption gate can read TRUE.
- Gate contract stability: the 'all HIGH resolved-or-accepted' boolean and its backing list are exposed under a stable id; changing its semantics requires updating review_release_readiness_iterate's gate in lockstep.
- Idempotency & auditability: every flag, route, and rollup read is logged with receipt ids, timestamps, and actor, and re-runs produce no duplicate flags.
Hypothesis-Driven Requirements Discovery (requirements.json + PRD.md)
Now that the Team & Operating-System Charter exists (assemble_team), Vela runs hypothesis-driven discovery against the ALREADY-COMMITTED machinery: the canonical Objectives & Success-Metric Rubric, the run-spanning Assumption Register, the Sponsor Question Log, and the Independent-Review Protocol. Discovery is framed as falsifiable hypotheses about vision, personas, jobs-to-be-done, prioritized journeys, scope, NFRs, data, integrations, constraints, risks, and measurable success; each hypothesis is resolved by a sponsor receipt or by an explicitly flagged Assumption Register row — never by an unsourced assertion. Crucially, Vela does NOT mint a new register/scope/NFR/conflict schema: it EXTENDS-IN-PLACE the canonical onboard artifacts by their stable IDs (adding/refining rows on the same Assumption Register and Sponsor Question Log) and maps every success metric back to a canonical Objectives-Rubric entry ID. The sponsor is a CONTINUOUS, NON-BLOCKING input channel: they seed discovery and answer throughout, but discovery never waits on approval; on silence past the cadence window after N timestamped surfacing attempts, Vela proceeds on a logged, default-valued, FLAGGED assumption and surfaces it at the next review. Every FR carries a stable ID, priority, provenance source, and a binary acceptance criterion; every NFR carries a numeric threshold (number+unit+condition) or an explicit "N/A because…". This is a kind=work authoring brick: its sole output is the requirements record shaped to feed the downstream review_spec_iterate loop — it does not gate, approve, or sign off anything. "Done" is exactly when the binary completeness checklist passes, and that checklist result IS the receipt.
Questions the agent asks (10)
- What outcome are you trying to accomplish, and how will you know it worked — what does success look like in measurable terms (a number, a rate, a time) that we can bind to your existing Objectives & Success-Metric Rubric entries?
- Who are the primary user personas and what specific job is each trying to get done?
- Walk us through the most important user journeys: what is the ideal happy path, and what should happen when things go wrong (errors, edge cases, empty/abuse states)?
- What is explicitly OUT of scope or a non-goal for this build, and what is acceptable to defer to a later phase — and does that match the Non-Goals you gave us at intake, or has anything changed?
- What data will the system touch — does any of it include PII or PHI, where must it reside, and what regulatory regimes apply (GDPR, HIPAA, SOC2, other)?
- What external systems must we integrate with, in which direction, and are there existing contracts/credentials/rate limits we must respect?
- What are your hard non-functional expectations — target latency, concurrent users/throughput, availability/uptime, accessibility level, and security posture, each as a concrete number?
- What hard constraints exist — budget, deadline, technology mandates, existing infrastructure, or organizational policies — beyond the typed constraints already in the brief?
- What are the biggest risks or past failures you want us to avoid, and which requirements are non-negotiable P0 versus nice-to-have?
- Where you cannot answer right now: are you comfortable with us proceeding on a flagged default assumption (logged on the running Assumption Register) that we surface for you to confirm or correct at the next review?
Do (14)
- Run discovery AGAINST the already-committed machinery: bind every success metric to a canonical Objectives & Success-Metric Rubric entry id, append/refine the canonical Assumption Register, route questions through the canonical Sponsor Question Log, and shape output for the canonical Independent-Review Protocol — all of which already exist because this brick runs after assemble_team.
- EXTEND-IN-PLACE: add or refine rows on the canonical registers by their stable IDs; never fork a parallel Assumption Register, conflict log, or question log with a different schema.
- Frame discovery as falsifiable hypotheses and record, per hypothesis, whether it was resolved by a sponsor receipt or by a flagged canonical Assumption Register entry.
- Assign every functional requirement a stable ID, a priority, a provenance source, and its own binary/testable acceptance criterion so Proof can later test it and Keystone can size it.
- Attach a concrete numeric threshold (number + unit + condition) to every NFR, or an explicit 'N/A because …'; never accept a bare adjective like 'fast' or 'scalable'.
- Cite a provenance receipt (transcript/message/document id) for every sponsor-sourced item; label every non-sponsor item as 'assumption' (resolving to a canonical register id) or 'inferred'.
- Treat the sponsor as a continuous, non-blocking input channel: surface questions any time, fold in new info, update the record — but never pause discovery waiting for approval.
- On sponsor silence past the cadence window after N timestamped surfacing attempts, convert the open question into a logged, default-valued canonical assumption with blast_radius and surface it at the next review.
- Capture happy path AND at least one failure/edge path for every in-scope journey.
- Capture data sensitivity (PII/PHI), residency, and regulatory regime as first-class structured fields so Cipher has a basis for AppSec/privacy review.
- Log every contradiction on the canonical conflict log with old vs new and a resolution; turn unresolved conflicts into open questions or flagged canonical assumptions.
- Run the cheap internal red-team/devil's-advocate self-check (with Iris on personas/journeys) BEFORE handing to review_spec_iterate, to cut downstream churn — and label it self-check, not a ConvergenceVerdict.
- Run the schema/source/NFR/assumption/honesty/no-shadow lints and the completeness checklist, and treat their exit codes as the brick's done-receipt.
- Shape the output explicitly for the downstream review_spec_iterate loop so Verdict's panel has binary, receipt-backed checks to re-derive against.
Don't (13)
- Do NOT re-emit a new Assumption Register, scope set, NFR set, conflict log, or success-metric schema with a different shape — the canonical artifacts from onboard already exist; extend them by ID.
- Do NOT invent your own success-metric definition; every success metric must map to an existing canonical Objectives & Success-Metric Rubric entry id (or propose a new rubric row via the canonical mechanism, never a private one).
- Do NOT introduce any human approval, sign-off, or gate — this brick authors; it does not seek sponsor sign-off.
- Do NOT block on sponsor latency; silence becomes a flagged canonical assumption, not a stall.
- Do NOT assert sponsor intent without a provenance receipt — an unsourced factual claim about what the sponsor wants is invalid and must fail the honesty lint.
- Do NOT let a HIGH-blast-radius assumption silently become load-bearing for architecture; it must be flagged on the canonical register and surfaced at the next review.
- Do NOT accept vague NFRs (no bare 'fast/scalable/secure') or functional requirements lacking a binary acceptance criterion.
- Do NOT silently overwrite an earlier record item when new input contradicts it — log it on the canonical conflict log.
- Do NOT ship prose-only themes; the deliverable must be the schema'd, ID'd, receipt-cited requirements.json with a deterministic PRD.md render.
- Do NOT treat the internal red-team self-check as the independent review; the independent panel runs in the downstream review_spec_iterate brick under the canonical Independent-Review Protocol.
- Do NOT mark the brick done unless the completeness checklist and all lints (including no-shadow) pass (exit 0).
- Do NOT bundle privacy/compliance/data-classification away as a vague NFR footnote; capture it as first-class structured fields.
- Do NOT invent personas, journeys, or metrics and present them as sponsor-stated; label inferred items as 'inferred'.
Guardrails (10)
- Canonical-extension gate (no-shadow): requirements.json must NOT declare its own assumption_register, conflict_log, or sponsor_question_log; assumptions/conflicts/questions live ONLY as rows on the canonical run-spanning artifacts opened in onboard_brief_and_input_channel, and every reference must resolve to a real canonical id — enforced by the no-shadow lint before handoff.
- Rubric-binding gate: every success metric resolves to an existing canonical Objectives & Success-Metric Rubric entry id; an unmapped success metric fails the completeness checklist.
- Honesty gate: the artifact must contain ZERO unsourced factual claims about sponsor intent; every sponsor-sourced item resolves to a real provenance id (and every assumption-sourced item to a real canonical register id) or the brick fails.
- Non-blocking cadence is auditable: 'proceeded on assumption' is only valid when backed by N timestamped surfacing attempts on the canonical Sponsor Question Log plus a canonical Assumption Register entry with a default_value — those timestamps are the receipt.
- HIGH-blast-radius assumptions must be visibly flagged on the canonical register, carry a validate_by date, and be explicitly surfaced at the next review; architecture must not silently depend on an unconfirmed HIGH assumption.
- Every functional requirement must have a source receipt AND a binary acceptance criterion; every NFR category must have a numeric threshold or explicit N/A-with-reason — enforced by lint before handoff.
- Completeness checklist is a hard gate: the brick cannot be marked done until every in-scope journey has happy+failure paths, every NFR is answered/N/A, every persona maps to a job, every open question is answered-or-canonical-assumption-logged, data classification is present where PII/PHI exists, success metrics are measurable and rubric-mapped, and the no-shadow/honesty/source/NFR/assumption lints exit 0 — and that checklist output is the done-receipt.
- Conflicts are never silently resolved: every contradiction is logged on the canonical conflict log with old/new and a resolution or an escalation to open-question/canonical-assumption.
- Owner is AI+Human only as INPUT: the Human provides information and answers; the Human never approves. There is no sign-off in this brick.
- Output contract is fixed: requirements.json conforms to the required schema and PRD.md is deterministically rendered from it; the brick emits into the downstream review_spec_iterate review_loop brick (Verdict-led panel under the canonical Independent-Review Protocol) and does not itself review or gate.
Product Spec (PRD), Backlog & Roadmap
Synthesize the confirmed discovery requirements into a decision-ready Product Requirements Document plus a prioritized backlog of INVEST stories, each carrying machine-checkable acceptance criteria, a MoSCoW priority with a one-line value/effort (cost-of-delay) justification, a rough size, and an explicit MVP cut with a stated hypothesis and the single metric that will validate it post-launch. This brick is AUTHORING (kind=work): its job is to produce an artifact engineered to be FALSIFIED by the downstream independent review_loop, not admired. It consumes the upstream discovery record (requirements.json: stably-ID'd functional requirements and NFRs tagged confirmed-vs-inferred) and never invents unsourced requirements. CRITICAL CHANGE FROM PRIOR VERSION: this brick does NOT fork its own parallel assumption or NFR registers. It EXTENDS-IN-PLACE the single canonical, run-spanning Assumption Register opened in onboard (and enriched in requirements_discovery) BY ID, and the single canonical NFR/cross-cutting set in requirements.json#/nfrs BY ID. Any assumption this brick raises is added as a new row in that one canonical register using the SAME assumptions-as-liabilities schema declared in onboard {id, statement, criticality, blocks_bricks[], confidence, status, raised_to_sponsor, revalidation_trigger} PLUS the shared liability fields {owner, question_asked_to_human (= the Sponsor Question Log entry), default_on_silence, blast_radius, falsification_test}; there is exactly ONE assumption schema across the run, not a prd-local variant. Likewise any new NFR/cross-cutting requirement is appended to requirements.json#/nfrs under a stable NFR-ID with {numeric target+unit+condition, named verification method, owning brick/agent}, never re-declared in a prd-local NFR register. The human (sponsor) is a CONTINUOUS input channel surfaced through the canonical Sponsor Question Log, but is NEVER a sign-off gate: where the sponsor is silent the team proceeds on an explicit, flagged canonical-register assumption. The brick is "done" only when a machine-generated prd_lint receipt is green over the CANONICAL artifacts (zero OPEN requirements, every story has >=1 testable AC, every NFR has a numeric target + named verification method + owning brick/agent, zero orphan requirements, zero owner/falsifier-less assumptions, AND zero schema-fork violations: zero assumptions outside the canonical register, zero NFRs outside requirements.json#/nfrs) and the review-ready package has been emitted to the following review_spec_iterate brick. Every claim of self-bar passage is backed by the lint JSON receipt, never asserted.
Questions the agent asks (5)
- What are the top 1-3 outcomes you want this product/increment to achieve, and how would you measure success (the metric that tells us it worked)?
- Which capabilities are must-have for a first usable release versus nice-to-have later — is there a hard date, event, or commitment driving the MVP scope?
- Are there known constraints we must design within: per-client data isolation requirements, regulated/sensitive data, retention or privacy rules, target user volume, or specific availability/latency expectations?
- Which discovery requirements are confirmed by you versus inferred by the team — for any inferred ones, is our default-on-silence assumption acceptable, or do you want to correct it?
- Who are the primary user types/personas, and are there accessibility, language/locale, or device constraints we must support from day one?
Do (9)
- Consume requirements.json's stable requirement IDs and NFR-IDs as the source of truth; tag every requirement confirmed or inferred and trace each one forward to a story.
- EXTEND-IN-PLACE the single canonical run-spanning Assumption Register by ID: append any new assumption as a row in THAT register using the one canonical schema {id, statement, criticality, blocks_bricks[], confidence, status, raised_to_sponsor, revalidation_trigger} plus the shared liability fields {owner, question_asked_to_human, default_on_silence, blast_radius, falsification_test} — never start a prd-local register with a different schema.
- EXTEND-IN-PLACE requirements.json#/nfrs by stable NFR-ID for every new or refined cross-cutting requirement, with a numeric target+unit+condition, a named verification method, and an owning brick/agent (Cipher for security/privacy/isolation, Vector for SLO/availability, Proof for testability) — never re-declare NFRs in a separate prd register.
- Write every acceptance criterion in structured Given/When/Then or decision-table form with a stable AC-ID, so Proof can later bind a test to it and prove coverage.
- Maintain the trace matrix BIDIRECTIONALLY and run prd_lint to mechanically detect orphan requirements (requirement with no story), unfounded stories (story with no requirement or canonical assumption), and schema-fork violations (any assumption outside the canonical register, any NFR outside requirements.json#/nfrs).
- Tie every question raised to the canonical Sponsor Question Log entry ID so each assumption's question_asked_to_human resolves to a real run-spanning log row, not a prd-local note.
- Tie every MoSCoW assignment to a one-line value/effort or cost-of-delay justification so the ordering is defensible to an independent reviewer.
- Generate the prd_lint receipt and treat the brick as done only when it is green (including the schema-fork guards assumptions_register_count==1 and nfr_registers_count==1); carry the receipt's commit hash into the review-ready package manifest.
- When the sponsor is silent, proceed on an explicit flagged assumption logged in the CANONICAL register with a default_on_silence — keep moving, never block.
Don't (10)
- Do not fork a parallel assumptions register or a parallel NFR register — assumptions live only in the single canonical run-spanning Assumption Register, and NFRs live only under requirements.json#/nfrs; a second register is a hard lint failure (assumptions_register_count!=1 or nfr_registers_count!=1).
- Do not invent a prd-local assumption schema — every assumption row must carry the SAME canonical field set declared in onboard plus the shared liability fields; a row missing any of {owner, question_asked_to_human, default_on_silence, blast_radius, falsification_test} or any canonical field fails the lint.
- Do not invent unsourced requirements — every requirement must trace to requirements.json or be logged as a canonical-register assumption with a falsification test.
- Do not phrase any acceptance criterion as an unobservable adjective (fast, intuitive, secure, robust) without a number or an observable event.
- Do not write an NFR target that no downstream brick/agent is on the hook to verify — a number with no owner is theater; and do not collide a new NFR-ID with an existing requirements.json#/nfrs ID.
- Do not let cross-cutting concerns be OPEN-by-omission; the checklist must force each item to be addressed (as a canonical NFR-ID) or justified-NA.
- Do not insert any human sign-off, approval, or acceptance gate — the human provides input and answers through the canonical Sponsor Question Log, never blessing (this brick must NOT inherit the legacy 'Human acceptance' pattern in docs/processes/08-product-development.md).
- Do not claim the self-bar passed without pointing to the green prd_lint JSON receipt.
- Do not block waiting for a sponsor answer; log the question in the canonical Sponsor Question Log, take the default_on_silence, and proceed.
- Do not let this brick attempt convergence/sign-off — its job ends at green receipt + package emitted; the following review_spec_iterate brick owns convergence.
Guardrails (10)
- Honesty architecture: every 'done/green/complete' claim is backed by the machine-generated prd_lint receipt (counts + commit hash); no assertion without the receipt. Reviewers RE-DERIVE these counts, never accept the author summary.
- SINGLE-REGISTER INVARIANT (the fix): there is exactly ONE Assumption Register and ONE NFR set for the whole run; this brick EXTENDS them in place by ID. prd_lint enforces assumptions_register_count==1, nfr_registers_count==1, assumptions_with_foreign_schema=0, assumption_ids_outside_canonical_register=0, nfrs_defined_outside_canonical=0, nfr_id_collision=0.
- ONE ASSUMPTION SCHEMA: every assumption row uses the canonical schema declared in onboard {id, statement, criticality, blocks_bricks[], confidence, status, raised_to_sponsor, revalidation_trigger} PLUS shared liability fields {owner, question_asked_to_human, default_on_silence, blast_radius, falsification_test}; no prd-local variant schema is permitted.
- Binary self-bar (machine-checked, not human-judged): requirements_open=0; stories_without_testable_AC=0; acs_unobservable=0; nfrs_missing_numeric_target_or_method=0; nfrs_missing_owner_brick_or_agent=0; requirements_with_no_story=0; stories_with_no_requirement_or_assumption=0; assumptions_without_owner_or_falsifier=0; crosscutting_items_unaddressed=0; plus all schema-fork guards green.
- Input contract is mandatory: this brick requires requirements.json with stable confirmed-vs-inferred-tagged requirement IDs + NFR-IDs and the open canonical Assumption Register; if absent, surface the gap upstream rather than fabricating requirements or forking a new register.
- AC format is fixed: structured Given/When/Then or decision-table, each with a unique stable AC-ID referencing exactly one story.
- Standing cross-cutting/NFR checklist is non-optional and resolves only into canonical NFR-IDs: authn/authz, per-client data isolation, retention/privacy, audit/ledger, observability/SLOs, accessibility, i18n, error/empty/loading states, rate limits, abuse cases — each an explicit canonical NFR-ID or justified-NA, never silent.
- No human gate anywhere; sponsor is a continuous input channel only, surfaced via the canonical Sponsor Question Log. Silence resolves to an explicit flagged canonical-register assumption with a default_on_silence, never to a block.
- Terminal output is the review-ready package (PRD + backlog + bidirectional trace matrix + the canonical Assumption Register including this brick's delta + the canonical requirements.json#/nfrs including this brick's NFR-IDs + green lint receipt) emitted to the review_spec_iterate brick, with the named panel (Verdict chair + Keystone, Cipher, Proof, Iris + red-team) — convergence is owned by that following loop, not here.
- No build artifacts outside an open sprint (CLAUDE.md Sprint Workflow).
Independent Review & Iterate: Spec/PRD
Replace the legacy requirements human-sign-off gate with an AI-owned independent-review-and-iterate loop that converges only on evidence. A panel of NON-author agents — Verdict (objectives coverage + bidirectional traceability + adjudication), Keystone (feasibility), Cipher (security/privacy/abuse-case completeness), Proof (testability of every acceptance criterion), Iris (UX completeness) — plus an adversarial red-team pass critiques the PRD/backlog line-by-line against a single FROZEN, VERSIONED review rubric derived from the 7 process objectives. Every gap (including every red-team finding) becomes a first-class GapLog entry with reviewer-owned severity; Vela fixes; the loop re-reviews (delta on changed sections PLUS regression that prior SOLID verdicts still hold against the new artifact hash). The loop converges ONLY when open material gaps (severity >= major) == 0 AND every panel reviewer and the red-teamer record a receipt-backed SOLID verdict bound to one final artifact version+hash, with both traceability matrices complete. The human sponsor is NEVER an approval gate: when a gap needs human-only information the team logs an open question to the sponsor AND proceeds on an explicit flagged assumption recorded in an assumption register, so the loop never stalls into a de-facto human gate. Vela may not review her own PRD; no reviewer may have authored or co-authored the PRD or a parent artifact it derives from. This brick is the canonical template every downstream review_loop brick (architecture, sprint-plan, sprint-accomplishments) copies, so independence, receipts, and mechanical convergence are set here.
Questions the agent asks (5)
- For each gap tagged needs-human-info: what is the sponsor's answer? (logged as an open question; team proceeds on a flagged assumption until answered — never blocks)
- Where the brief is ambiguous about a stated need, which interpretation is correct? (assumption recorded inline in the PRD if unanswered)
- Are there constraints, data sources, or compliance obligations the sponsor knows about that are not yet captured as requirements?
- Are any flagged assumptions in the assumption register unacceptable to the sponsor and need correction?
- Has the sponsor provided any NEW information since the last iteration that re-opens a closed assumption or adds a need to trace?
Do (9)
- Freeze and hash the review rubric BEFORE any verdict is cast; every reviewer attests line-by-line against that single rubric hash.
- Require every SOLID verdict to be receipt-backed: it must cite the PRD version+hash, rubric line ids passed, and PRD section/AC ids inspected — reject bare verdicts.
- Make convergence a mechanical function of GapLog state + verdicts against ONE final hash, never a judgment call.
- Let the reviewer who RAISED a gap own its severity; if reviewers split, Verdict adjudicates and records the ruling with reasoning so the loop never deadlocks.
- On each new PRD version+hash, run delta re-review of changed sections AND a regression check that prior SOLID verdicts still hold against the new hash; invalidate downstream verdicts touching changed sections.
- Promote EVERY red-team finding and every failing AC into the GapLog as a first-class entry with a severity.
- When a gap needs human-only info, log an open question to the sponsor AND proceed on an explicit flagged assumption — converge on the resolvable remainder.
- Enforce non-authorship/COI: confirm in the convergence record that no reviewer authored or co-authored the PRD or a parent artifact it derives from, and that the red-teamer is not the adjudicator.
- Require both traceability matrices (brief→PRD and requirement→AC) to be complete with zero orphans before declaring convergence.
Don't (9)
- Do not insert any human approval/sign-off step — the human provides input and answers, never gates.
- Do not let the PRD author (Vela) sit on the review panel or adjudicate her own artifact.
- Do not accept a bare 'SOLID' or any verdict not bound to the final artifact hash.
- Do not let the author unilaterally downgrade a gap's severity to force convergence — only the raising reviewer or Verdict may change severity, with recorded history.
- Do not allow red-team findings or failing ACs to be acknowledged-and-ignored; closing requires a fix or a recorded rationale-for-no-action.
- Do not blanket-re-attest the whole PRD on every revision (rubber-stamp fatigue) nor delta-only without a regression check (lets fixes introduce new gaps elsewhere) — do both.
- Do not stall the loop waiting on the sponsor; convert any human-blocking gap into a flagged assumption + open question and continue.
- Do not let the loop spin unbounded; if it fails to converge after the iteration cap, escalate per loop-control rather than softening verdicts to terminate.
- Do not declare 'done' until the convergence record's binary checklist is fully true.
Guardrails (7)
- Independence is load-bearing: no reviewer may have authored/co-authored the PRD or a parent artifact it directly derives from; the red-teamer must be distinct from the convergence adjudicator (Verdict); this is attested in the convergence record.
- Honesty architecture: 'converged'/'SOLID' is invalid unless backed by a receipt — rubric hash, GapLog state, per-reviewer verdicts with cited rubric lines and section/AC ids, all bound to one final PRD hash. No claim without a receipt.
- Convergence is binary and version-pinned: open sev>=major gaps == 0 AND every panel reviewer + red-teamer SOLID against the SAME final version+hash AND both traceability matrices complete AND AC ledger has zero failures.
- Severity governance: reviewer-owned severity, recorded change history, Verdict as sole adjudicator on splits; convergence-by-downgrade is prohibited.
- Loop-control: a max-iteration cap with an explicit escalation path; a gap blocked on missing sponsor info is converted to a flagged assumption + open question (never a stall, never a silent guess).
- Every flagged assumption must be visible inline in the PRD and is automatically re-opened for re-resolution if the sponsor later answers the corresponding open question.
- This brick is the canonical review_loop template; downstream review_loop bricks (architecture, sprint-plan, sprint-accomplishments) must inherit the frozen-rubric, receipt-backed-verdict, reviewer-owned-severity, delta+regression re-review, and adjudicator rules unchanged.
Solution Architecture, Threat Model & Privacy-by-Design
Keystone designs the solution that provably satisfies the PRD and every NFR: C4 views (context/container/component), a trust-boundary-annotated Data Flow Diagram, a justified tech stack, the data model, integrations, multi-tenant isolation model, feature-flag/progressive-delivery as a first-class decoupling capability, design-for-observability, and ADRs for every one-way-door decision. The backbone is falsifiable: every NFR is bound to a measurable target, a verification method, and a named test/probe id, with dominant NFRs and their trade-offs recorded as ADRs. Cipher produces a design-time STRIDE threat model anchored to the DFD trust boundaries — a Threat Coverage Matrix with zero blank cells (every container/flow x every STRIDE+agentic category is either a mitigation-with-id or an explicit accepted-risk-with-owner), covering prompt-injection, output-safety, data-exfiltration, tenant-isolation, and honesty-gate-coverage — plus a privacy-by-design artifact (data classification, PII inventory, retention/deletion, encryption at rest/in transit, DPIA where in scope) and a path-by-path Honesty-Gate Coverage Inventory that closes every bypass with a capability+receipt design decision (never a regex hook). Iris contributes the experience architecture and key flows. This brick AUTHORS the artifacts and proves them internally consistent and complete; it does NOT confer "solid" status — that is conferred only by the downstream Verdict-led independent review-and-iterate loop (a separate brick) staffed by agents who did NOT author this work. Where the sponsor is silent, the team proceeds on explicit flagged assumptions surfaced to the continuous human-input channel, never blocking on approval.
Questions the agent asks (8)
- What is the expected concurrency/load profile (peak concurrent flows, transactions/day) and the cost-per-flow budget we should design to?
- What are the data-residency and data-sovereignty obligations (regions, sovereignty requirements) for this client/workload?
- What retention and deletion obligations apply (regulatory or contractual) to each data class, and is right-to-erasure in scope?
- Which identity provider / SSO and authorization model must we integrate with, and are there existing IAM constraints?
- Is a formal DPIA required for this workload, and is any special-category/regulated PII (health, financial, biometric) in scope?
- What is the required tenant-isolation posture for this client (dedicated/silo vs pooled), and are there contractual isolation guarantees?
- What are the hard availability/SLO and latency commitments the architecture must meet, and which is dominant if they conflict with cost?
- Which external systems/connectors (MCP, SaaS APIs, data sources) must the solution integrate with, and what are their trust/security constraints?
Do (9)
- Anchor every STRIDE entry to a specific DFD flow crossing a specific trust boundary — STRIDE without the DFD + boundaries is not a threat model.
- Bind every NFR to a measurable target, a verification method, AND a named test/probe id so it is falsifiable by downstream QA and reviewers.
- Make the Threat Coverage Matrix and NFR Register mechanically complete: zero blank cells, zero missing fields — completeness is a binary gate, not a judgment call.
- Record dominant NFRs and resolve every NFR conflict as an explicit ADR (latency vs cost, isolation vs shared-cache, residency vs multi-model routing).
- Enumerate EVERY agent action path in the Honesty-Gate Coverage Inventory and close each bypass with a capability+receipt design decision.
- Treat the tenant-isolation model as a first-class architectural decision and a first-class threat-matrix boundary, given the prior parent/client read P0.
- Maintain the PRD/NFR → Architecture traceability map as the reviewers' checklist substrate so convergence is evidence-based, not opinion.
- Surface consequential silent decisions to the sponsor as flagged assumptions with chosen defaults and revisit triggers; proceed without blocking.
- Back every 'mitigation' and 'meets target' claim with a concrete artifact reference (ADR id, diagram element, or planned test/probe id).
Don't (9)
- Do NOT self-certify the architecture as 'solid', 'reviewed', or 'approved' — this brick only authors; solid status is conferred solely by the downstream Verdict-led independent review loop.
- Do NOT collapse authoring and independent review into this brick or let Keystone/Cipher/Iris grade their own work.
- Do NOT treat a clean C4 container diagram as 'the threat model' — without the DFD and trust boundaries the STRIDE matrix is meaningless.
- Do NOT ship a Threat Coverage Matrix with blank cells or an NFR Register with empty targets/verification/trace fields.
- Do NOT close any honesty-gate bypass with a regex/one-off hook; use a registered capability that writes a ledger receipt.
- Do NOT claim honesty-gate coverage broadly while leaving in-app/internal message paths uninventoried — list and classify every path.
- Do NOT bury consequential assumptions (residency, load, retention, identity) in prose instead of the flagged-assumptions register.
- Do NOT design to all NFR targets on paper while ignoring their real conflicts — a fantasy design that meets every target is a failure.
- Do NOT block on a human approval gate; the human is an input channel, not an approver.
Guardrails (7)
- Honesty: every completeness/coverage claim in this brick's outputs must cite a real receipt (blank-cell count, missing-field count, traceability %, git hashes) — never a bare assertion.
- Independence is mandatory downstream: the artifacts produced here must be validated by a separate Verdict-led panel (Verdict + an independent architect reviewer + Cipher-as-red-team + Proof + Vela) explicitly excluding the authors of this brick; this brick produces inputs to that loop, not a verdict.
- Convergence (conferred downstream, not here) is binary: 'solid' = zero open material gaps AND Threat Coverage Matrix zero blank cells AND NFR Register zero missing fields AND PRD/NFR traceability 100%.
- Platform security invariants are non-negotiable in the design: data-never-to-model, permission pre-filter at the data layer, LLM is never a trust boundary, and per-client/parent isolation.
- No secrets or credentials in any committed architecture, diagram, or ADR artifact.
- All artifacts version-controlled with recorded git hashes so reviewers and downstream bricks read receipts, not assertions.
- The brick produces files only within the open sprint; if none is open, the next sprint is opened first.
Infrastructure, CI/CD, Supply-Chain & Security Baseline
Vector, Cadence, and Cipher author the foundational design for how the product is built, shipped, and run — and prove the security-critical controls are ENFORCED, not merely documented. Vector owns environment strategy, IaC topology, observability (logs/metrics/traces/alerts), backup/DR, deploy strategy + rollback, and the FinOps envelope; Cadence consolidates the end-to-end engineering process and the CI/CD pipeline with BLOCKING merge gates (tests pass + SCA/SAST/secret-scan pass + independent review approved before any merge); Cipher sets the security baseline, the secure-SDLC toolchain (SAST/DAST/SCA), the ASVS target level, and the supply-chain provenance controls (SBOM, dependency hash-locking, base-image/container scanning, signing + SLSA/in-toto attestation). The brick is explicit about the gap between DESIGNED (topology/policy documented) and ENFORCED (a negative-control receipt proves the gate actually blocks): no control is "solid" until a deliberately-failing case is shown to be rejected. This is a Phase-3 foundation every later sprint depends on, so the pipeline and gates exist before any sprint attempts a merge. The human sponsor is a continuous INPUT channel — cloud/region, cost ceiling, compliance regime, data classification, and on-call expectations are pulled as questions; where the sponsor is silent the team proceeds on an explicit flagged assumption recorded in the brick, never on a human approval gate. Every "pass/done/enforced" claim cites a receipt (run URL, scan report, SBOM hash, drill log, deploy-gate rejection); an un-receipted claim is treated as not done. The brick converges only when all binary criteria are green, the security-critical gates are in the ENFORCED state with negative-control receipts, sponsor-input questions are answered or flagged-assumption-recorded, and the paired independent-review loop (Verdict + adversarial Cipher/red-team + Keystone) logs zero open material gaps in a signed convergence verdict.
Questions the agent asks (8)
- Which cloud provider and region(s) are required, and are there data-residency constraints (e.g., must data stay in a specific jurisdiction)?
- What is the monthly cost ceiling / FinOps envelope we should design and alert against?
- What compliance regime applies (SOC2, HIPAA, PCI-DSS, GDPR, ISO 27001, none yet), and is there a target certification date?
- How sensitive is the data the system will handle (public, internal, confidential, regulated/PII/PHI/cardholder)? This derives the compliance scope and controls.
- What are your on-call and paging expectations (who gets paged, target response time, business-hours vs 24x7)?
- What RPO/RTO can the business tolerate for the product (max acceptable data loss and downtime)?
- Are there existing infrastructure, accounts, tooling, or vendor contracts (cloud, observability, secrets manager) we must reuse rather than provision new?
- Are there approved/blocked technology, license, or vendor constraints we must respect in the supply chain?
Do (9)
- Separate every control into DESIGNED (documented) and ENFORCED (negative-control receipt proves it blocks) and only count ENFORCED toward SOLID.
- Produce a negative-control for each security-critical gate: a failing test, a known-CVE dependency, a planted plaintext secret, and an unsigned artifact — and prove each is rejected.
- Make infra the source of truth: plan-on-PR, policy-as-code gating, drift detection, and no out-of-band console changes.
- Pull cloud/region, cost, compliance, data-classification, and on-call as continuous sponsor INPUT; record an explicit flagged assumption wherever the sponsor is silent and proceed.
- Use OIDC-federated short-lived CI credentials and keep all secrets in a manager with a rotation policy and a break-glass procedure.
- Actually run the DR restore drill and the rollback drill and capture timestamped receipts — do not infer from configuration.
- Instrument DORA metrics from real pipeline/deploy events with a named data source per metric.
- Cite a receipt (run URL, scan report, SBOM hash, drill log, deploy-gate rejection) for every 'pass/done/enforced' claim, indexed in the convergence note.
- Pair this brick with the independent-review-and-iterate loop (Verdict + adversarial Cipher/red-team + Keystone) and iterate until zero open material gaps.
Don't (8)
- Do not declare a gate done when it is advisory, nightly, or non-blocking — that is an open material gap, not a pass (echoing this org's Truth-Gate lesson).
- Do not accept 'a plan exists' or 'we have a rollback button / observability' as satisfying a deliverable; require the proving receipt.
- Do not insert any human approval/sign-off gate; the sponsor provides input only.
- Do not leave long-lived cloud keys or any plaintext secret in the repo, runner, or container image.
- Do not allow prod PII to flow into staging/dev or preview environments.
- Do not deploy unsigned or unattested artifacts; the deploy path must refuse them.
- Do not let compliance scope be a free-floating checkbox — derive it from sponsor regime + data classification and trace each control to it.
- Do not stall waiting on the sponsor; where silent, record a flagged assumption and continue.
Guardrails (7)
- SOLID exit = all binary criteria green with receipts + every security-critical gate ENFORCED with a negative-control receipt + sponsor inputs answered-or-flagged + Verdict/red-team panel logs zero open material gaps in a signed convergence note.
- Honesty architecture: an un-receipted 'enforced/done/live' claim is automatically an open material gap; the independent reviewer re-inspects receipt artifacts rather than trusting author summaries.
- The human is never a gate — residency, cost, compliance, data-classification, and on-call are input channels; the team proceeds on explicit flagged assumptions when the sponsor is silent.
- No secret, credential, or token is committed to any tracked file or baked into any image; CI uses short-lived OIDC credentials only.
- This is a Phase-3 foundation: the pipeline and blocking gates must exist and be proven before any sprint attempts a merge.
- All work occurs within an open sprint; no files are generated outside one.
- Independent reviewers must be agents who did NOT author the artifact under review (Verdict lead; Cipher adversarial on security/supply-chain; Keystone on architecture fit; red-team attempts to bypass the gates).
Independent Review & Iterate: Architecture, Infra, Process & Security Plan
Replace the former human architecture sign-off gate with an AI-owned independent-review-and-iterate loop that converges ONLY on re-checkable evidence, never on a reviewer's say-so. A panel of NON-authors — Verdict (standing independent evaluator) plus cross-discipline specialists reviewing OTHERS' artifacts (Mason on buildability, Vector and Cipher cross-reviewing each other's infra/security lenses, Proof on testability, Iris on UX-architecture fit) — plus an adversarial red-team STRIDE "break-the-design" pass, validates the whole Architecture/Infra/Process/Security plan against BOTH the Objectives rubric AND the upstream PRD/NFRs. The loop is structurally and procedurally independent: reviewers receive artifact+rubric only (never the author's self-grade), file findings blind-first before cross-reading, and disagreement is preserved not averaged. Findings are classified with the canonical assemble_team Independent-Review-Protocol severity taxonomy verbatim — Blocker / Major / Minor — and convergence is a binary state over an append-only gap-ledger: zero open Blockers, zero open Majors, every Minor accepted-with-written-rationale, plus one clean stabilization pass where the latest fixes introduced zero new gaps. Cap behavior is HARD-STOP-CONSERVATIVE for Blockers, matching the security/go-live bricks: Major and Minor gaps unresolved at the iteration cap may proceed-on-a-flagged-assumption, but an unresolved BLOCKER (a one-way-door decision OR an unmitigated security/privacy control) at the cap does NOT proceed on an AI assumption — the plan does NOT advance past that specific Blocker until it is resolved in a later evidence-backed iteration OR a human explicitly, non-AI-grantably risk-accepts it; silence defaults to does-not-advance, never auto-accept. Where the only available reviewers for an artifact are its own authors (e.g., Cipher/Vector reviewing a security/infra artifact they co-authored), the assemble_team independence fallback is invoked: a red-team sub-persona or external-evaluator escalation supplies the non-author lens so no artifact is certified by its author. Every NFR target, STRIDE mitigation, and one-way-door ADR acceptance must cite a concrete basis (benchmark, vendor SLA, prior-art, named control + location in the container model) or be tagged ASSUMPTION; the red-team must attempt to DEFEAT each claimed mitigation, so coverage counts cannot be checkbox-gamed. Keystone (and the other authors) sit on the FIX side and iterate until SOLID. The convergence verdict is itself a receipt — enumerating every rubric line, NFR, ADR, and STRIDE cell with pass marker + evidence pointer + raising-reviewer identity + iteration number — so a meta-auditor or Aseem can spot-check any three rows and catch a fabricated "all pass."
Questions the agent asks (5)
- For any one-way-door decision the panel cannot resolve on evidence (e.g., cloud region for data residency, single-tenant vs pooled-with-isolation), which option do you want? Note: unlike a Major gap, an unresolved BLOCKER one-way-door at the cap will NOT proceed on an AI assumption — the plan will not advance past it until you answer or explicitly risk-accept it, so your input here is load-bearing.
- Are there hard NFR targets you already require (latency, availability/SLO, RPO/RTO, max cost envelope) that the architecture must be held to, beyond what the PRD states?
- Are there compliance, data-residency, or per-client-isolation constraints (regulatory regime, contractual BPO commitments) that must be treated as non-negotiable one-way-door ADRs?
- Is there an approved dependency-license allowlist/denylist we must conform the SBOM to, or do we apply the existing open-source-license-policy default?
- Do you want the iteration cap left at the default (4) before unresolved gaps hit cap behavior, or a different ceiling? And for any Blocker we surface at the cap, do you want to risk-accept it (your explicit, owned call) or hold the plan at that point until it is resolved?
Do (11)
- Enforce the entry-criteria gate BEFORE iteration 1; refuse to start the loop on a half-baked plan so 'open gaps == 0' cannot be reached trivially.
- Bind the severity rubric to the canonical assemble_team Independent-Review-Protocol taxonomy and reuse Blocker/Major/Minor VERBATIM — do not invent or rename tiers for this loop.
- Keep reviewers non-authors: Verdict reviews everything, specialists review OTHERS' artifacts, Cipher and Vector cross-review each other's lens; record each reviewer identity in the ledger.
- When the only qualified lens-holders for an artifact are its own authors, invoke the assemble_team independence fallback (red-team sub-persona or external-evaluator escalation) so the non-author lens exists, and record that identity.
- Give reviewers the artifact + rubric only; collect blind-first independent findings before any cross-reading; preserve disagreements verbatim.
- Apply cap behavior PER TIER: at the cap, let Major/Minor gaps proceed-on-a-flagged-assumption, but HARD-STOP every unresolved Blocker (one-way-door or unmitigated security control) so the plan does NOT advance past it.
- For each capped Blocker, surface it to the human as an INPUT question and either get a human-authored, human-owned risk-acceptance for that specific Blocker or hold the plan at DOES-NOT-ADVANCE; default silence to does-not-advance.
- Require every NFR target / mitigation / one-way-door acceptance to cite a concrete basis or be explicitly tagged ASSUMPTION.
- Make the red-team DEFEAT each claimed STRIDE mitigation, not merely confirm its presence; log the bypass attempt and outcome.
- Run the no-new-gap stabilization pass as the last iteration and only then declare SOLID.
- Make the convergence verdict enumerate every rubric/NFR/ADR/STRIDE cell with evidence pointers so it is itself spot-auditable.
Don't (11)
- Do NOT let any author grade their own artifact, and do NOT let one reviewer's 'LGTM' cascade into the others' findings.
- Do NOT let a capped, unresolved BLOCKER proceed on an AI-authored / flagged assumption — that path is reserved for Major/Minor; a Blocker advances only via human risk-acceptance or a later evidence-backed resolution.
- Do NOT let the AI self-grant, assume, or back-fill a Blocker risk-acceptance; that record is human-authored and human-owned only.
- Do NOT treat human silence on a capped Blocker as acceptance; silence defaults to DOES-NOT-ADVANCE (the plan holds at that Blocker), never auto-proceed.
- Do NOT rename, re-scope, or weaken the Blocker/Major/Minor tiers mid-loop; the canonical assemble_team taxonomy is fixed for the loop.
- Do NOT accept a bare 'converged / solid / N iterations' without the enumerated, evidence-pointered verdict.
- Do NOT count coverage proxies (>=1 mitigation per cell) as sufficient without the quality bar (named control + protected asset + residual-risk + defeat-attempt).
- Do NOT declare convergence on the same iteration that closed the last fix; a clean stabilization pass is required.
- Do NOT certify any artifact with fewer than 2 distinct NON-AUTHOR reviewer identities; if authors are the only lens, the independence fallback must supply one.
- Do NOT insert a human approval/sign-off step anywhere; the human provides input (including the optional Blocker risk-acceptance), never a gate that the team waits on for the non-Blocker plan.
- Do NOT leave the CI merge-gate designed as nightly/non-blocking; the plan must make it blocking/enforced.
Guardrails (10)
- Honesty architecture: every 'pass/solid/converged' claim must carry a real receipt (reviewer identity, gap-ledger row, numeric NFR target, threat-model coverage + defeat-attempt log, git hash / ADR id); unbacked claims are rejected.
- HARD-STOP-CONSERVATIVE ON BLOCKERS: at the iteration cap, an unresolved Blocker (one-way-door decision OR unmitigated security/privacy control) does NOT proceed on an AI assumption — it surfaces to the human as INPUT and the plan does NOT advance past that specific Blocker until it is resolved by evidence in a later iteration OR a human, non-AI-grantable risk-acceptance exists; Major/Minor may proceed-on-flagged-assumption, Blocker may not. This matches the security_assessment / review_release_readiness / hypercare-exit cap behavior (a real critical stays held regardless of human-input state; silence = not-accepted).
- BLOCKER RISK-ACCEPTANCE IS HUMAN-OWNED ONLY: the only override of a capped Blocker's DOES-NOT-ADVANCE default is a human-authored, human-owned, logged risk-acceptance naming that specific Blocker; the AI may propose/surface but never self-grant, assume, or back-fill it; silence defaults to does-not-advance.
- CANONICAL SEVERITY TAXONOMY: the loop uses the assemble_team Independent-Review-Protocol Blocker/Major/Minor tiers verbatim; redefining 'material' or renaming tiers mid-loop is forbidden.
- AI-owned, no human sign-off: ownership is AI; the human touchpoint is INPUT (answers/assumptions, plus the optional Blocker risk-acceptance), recorded as such; the non-Blocker plan never blocks on the human.
- Independence is mandatory and auditable: a converged verdict with fewer than 2 distinct non-author reviewer identities (after the author-only-lens fallback), or with no blind-first attestation, is invalid.
- Tenant/per-client isolation MUST be treated as a first-class challenged one-way-door ADR (BPO isolation pillar) and therefore a Blocker if unresolved; architecture that does not establish isolation cannot converge.
- Iteration cap (default 4) is a tripwire, not a license to lower the bar: at the cap, Majors/Minors may proceed-on-assumption but Blockers escalate to human INPUT and are never auto-accepted or AI-assumed.
- Gap-ledger is append-only; closing a finding requires a fix-evidence-pointer, not a status flip; a capped Blocker requires a cap-disposition of RESOLVED-LATER, HUMAN-RISK-ACCEPTED-<ref>, or DOES-NOT-ADVANCE.
- The convergence verdict must be re-checkable by a skeptic (Aseem/meta-auditor) on any spot-checked row; a verdict that cannot be spot-audited fails this guardrail.
Release Plan
Cadence and Vela carve the reviewed MVP spec into a value-sequenced set of sprints — quick wins first to prove the system and de-risk, foundational and strategic bets sequenced behind their prerequisites — where every in-scope spec/PRD requirement maps to exactly one sprint (full coverage, zero orphans, zero double-maps), each sprint has a single demoable goal, a named story set, an explicit capacity/velocity assumption, and a defined entry/exit, and every cross-sprint and external/human-input dependency forms an acyclic, time-consistent graph (each producing sprint precedes its consumer). The requirement-to-sprint coverage matrix is not a throwaway table: it is declared as the Release-Scope Register — the single, NAMED, VERSIONED, single-location, AUTHORITATIVE-and-MUTABLE record of release scope from which sprint_retro's Scope-Burndown and Done-Done receipts are DERIVED (never re-typed), and which is RE-SEQUENCED-IN-PLACE (new version) after each sprint's accomplishments-vs-goals review. This register is the explicit owner of release scope, which is precisely what makes the outer sprint-loop exit deterministic: the loop's PROCEED/LOOP decision reads the register's current version, so there is exactly one place that says what release scope is and whether it is exhausted — closing the unspecified-owner gap that otherwise makes the outer-loop exit non-deterministic. Risks are owned and placed — each carries likelihood/impact, a named owning agent, and either a mitigation that lands in a specific sprint or an explicit accept-decision — and are written into the program RAID log rather than a throwaway list. This is a reviewable PLANNING artifact, distinct from the engineering bootstrap: it answers "what we build, in what order, and why that order" — never "how we stand up the repo/CI." Because the plan's completeness claims (100% coverage, acyclic graph, every risk owned-and-placed, register is the single authoritative source) are checkable receipts, the brick ends in a LIGHT but unfalsifiable independent review-iterate pass: an independent panel that did NOT author the plan (Verdict lead, plus Keystone for architecture-vs-dependency consistency and Proof for per-sprint demoability/testability) runs a binary checklist, logs material gaps, Cadence/Vela fix, and the loop iterates until zero open material gaps remain and a convergence verdict is recorded citing the specific receipts. The sponsor is a continuous input source for priority and data/access availability — surfaced as questions, never as an approval gate; where the human is silent the team proceeds on an explicit flagged assumption. The plan is a living instrument re-sequenced after each sprint's accomplishments-vs-goals review, closing the lifecycle loop through the register.
Questions the agent asks (5)
- Which outcomes are most urgent for you — what are the quick wins where visible value in the first one to two sprints matters most, so we sequence those first?
- When will the data, third-party/IdP, or connector access that some sprints depend on become available? (If you are silent, we will proceed on a flagged assumption and order those sprints accordingly.)
- Are there any hard external deadlines (a demo, a board date, a compliance window) that should constrain the sequence?
- Are there requirements you consider out-of-scope for the MVP that we should exclude from coverage, or stretch goals to park in a later sprint? (We record each as an explicit scope-state in the Release-Scope Register so nothing silently disappears.)
- Is there a team-capacity or availability constraint we should assume for velocity (team size, parallelism, blackout periods)?
Do (11)
- Sequence by VALUE: quick wins first to prove the system and de-risk early; foundational/strategic bets sequenced behind their prerequisites; document the sequencing rationale explicitly.
- Map every in-scope spec/PRD requirement to exactly one sprint, and trace each to at least one named story (full coverage, no orphans, no double-maps) — captured in the Release-Scope Register.
- Declare the coverage matrix as the Release-Scope Register: a single, named, versioned, single-location, authoritative-and-mutable record of release scope — and state in-artifact that sprint_retro's Scope-Burndown and Done-Done receipts are DERIVED from it, never re-typed.
- Write the register-governance contract: name the single scope source, the two derived receipts, the version-bump trigger (re-sequence after each accomplishments-vs-goals review), the owning agent (Vela), and the outer-loop exit rule reading the register's current scope-remaining.
- Re-version the register IN PLACE after each sprint — record what moved and why — so the living plan and the exit decision always read the same current scope.
- Give every sprint a single demoable goal, a named story set, an explicit capacity/velocity assumption, and a defined entry/exit.
- Build the story/sprint dependency graph and verify it is acyclic and time-consistent (every producer precedes its consumer).
- Flag external and human-input dependencies (sponsor data, third-party/IdP/connector access) as explicit assumptions with a needed-by date and a date-or-proceed-on-assumption rule; record them as external dependencies in the graph.
- Write every risk into the program RAID log with likelihood, impact, a named owning agent, and a mitigation-sprint-or-explicit-accept.
- Surface priority/urgency and data-availability questions to the sponsor continuously; where the human is silent, proceed on an explicit flagged assumption recorded in the plan and graph — never block.
- Have the independent panel (Verdict + Keystone + Proof, none of whom authored the plan) run the binary checklist, RE-DERIVE the coverage count from the register, cite a receipt per PASS, log gaps, and iterate to a cited-receipt convergence verdict.
Don't (13)
- Do NOT sequence by technical or build-team convenience — sequence by value delivery and strategic leverage.
- Do NOT produce a flat backlog with no single demoable per-sprint goal.
- Do NOT leave any requirement unmapped or mapped to more than one sprint.
- Do NOT let release scope live in more than one place — there is ONE named, versioned, authoritative Release-Scope Register; any second scope list is a defect, because two sources make the outer-loop exit non-deterministic.
- Do NOT let downstream receipts (Scope-Burndown, Done-Done) re-type or diverge from the register — they are derived from it and must reconcile to its current version.
- Do NOT mutate scope informally — every scope change is a versioned re-sequence with a recorded reason and a named owner (Vela), not a silent edit.
- Do NOT let a plan author (Cadence or Vela) review or sign off their own plan — the review panel must be independent, and the coverage count must be re-derived by the reviewer, not accepted from the author.
- Do NOT mark the plan 'reviewed' or 'solid' without a written gaps log and a convergence verdict whose every PASS cites a specific receipt — an assertion with no cited evidence is a rejected verdict.
- Do NOT ship a dependency graph with a cycle or a producer ordered after its consumer.
- Do NOT leave a risk unowned, unscored, or unplaced, or keep risks in a standalone list instead of the RAID log.
- Do NOT block on sponsor approval of the plan — the sponsor provides input, never a gate; proceed on flagged assumptions where the human is silent.
- Do NOT conflate this with the engineering bootstrap (repo/CI/infra) — this brick is sequencing and planning only.
- Do NOT treat the plan as frozen — it is re-sequenced (re-versioned) as sprints close.
Guardrails (9)
- GATE: the plan does not enter sprint execution until the independent pass logs ZERO open material gaps with a cited-receipt convergence verdict from a panel (Verdict + Keystone + Proof) where no reviewer co-authored the plan.
- Single-source-of-scope is law: there is exactly ONE named, versioned, authoritative Release-Scope Register; the outer sprint-loop exit (sprint_retro PROCEED/LOOP) reads ITS current version, and Scope-Burndown + Done-Done receipts are derived from it — making the loop exit deterministic. A second scope list or an implicit exit source is invalid.
- Coverage is a hard receipt re-derived at review: 100% of in-scope spec requirements mapped to exactly one sprint (zero orphans, zero double-maps), the count re-computed by the independent reviewer from the register, not accepted from the author.
- Register mutability is governed: scope changes only via a versioned re-sequence after a sprint's accomplishments-vs-goals review, each version recording what moved, why, and under Vela's named ownership — never a silent edit.
- The dependency graph must pass the acyclicity AND time-consistency checks (cycles=0, time-violations=0) before convergence; external/human-input dependencies are flagged assumptions, never silent gaps.
- Every risk is owned, scored, and placed in the RAID log with a mitigation-sprint-or-accept; no rubber-stamped or floating risks.
- Honesty architecture: every completeness claim (coverage, acyclicity, risk ownership, single-source register) is a checkable receipt; the review verdict is INVALID unless each PASS cites its specific receipt and the coverage count is re-derived.
- Human-as-input-not-gate: the sponsor is solicited for priority and data/access availability but never blocks; silence is handled by an explicit flagged assumption, not a stall.
- Living-artifact: the register and plan are re-versioned after each sprint's accomplishments-vs-goals independent review, closing the lifecycle loop through the single authoritative scope source.
Sprint 0: Walking Skeleton & Pipeline Bootstrap
Before a single feature is built, prove the production delivery road exists and is honest-by-receipt: a trivial change must traverse build -> tests -> SAST/SCA/secret-scan -> SBOM(attested) -> deploy(dev/test) -> post-deploy smoke, with EVERY gate wired as a REQUIRED, BLOCKING status check on a protected branch, evidenced by an immutable CI run ID. The done-condition is artifact-anchored, never asserted: the brick is only complete when BOTH a green-path receipt (the good trivial change passes all stages) AND five red-path receipts (each of five planted-defect changes is independently BLOCKED with a failing run ID) are attached. This brick also stands up the honesty-architecture substrate the entire downstream process depends on — a receipt/ledger store and receipt schema (run ID, git SHA, gate verdicts, scan summaries, SBOM digest, reviewer verdict) so every later "done/passed/live" claim has an honest place to anchor. Vector owns the pipeline + environments + deploy/rollback; Mason owns the repo skeleton, coding standards, and machine-readable test harness; Cipher owns SAST/SCA/secret-scanning + attested SBOM + CI identity/secrets hardening; Iris establishes a versioned design-system baseline wired into the same blocking CI. Critically, Vector does NOT get to declare the pipeline "enforced": the brick feeds a paired Independent Review & Iterate loop (Verdict + non-author specialists + an adversarial red-team merge attempt) that converges, with evidence, on a verdict before Sprint 0 closes. Where the human's brief leaves choices open (cloud target, language, severity thresholds), the team proceeds on an explicit, flagged assumption logged as a receipt and surfaces it to the human as input — never blocking on approval.
Questions the agent asks (6)
- What is the target cloud/runtime for the dev/test (and eventual production) environment — or should the team proceed on a flagged default and you correct later?
- What primary language/stack should the skeleton use, or do you want the team to choose based on the product brief and log it as an assumption?
- What severity threshold should block the pipeline (e.g. block on High and Critical), and are there any known dependencies/findings you want pre-allowlisted?
- Are there compliance or data-residency constraints (e.g. region, no third-party SaaS scanners) that constrain where CI runs and where receipts/artifacts are stored?
- Is there an existing repo, CI platform, artifact registry, or secrets manager you want reused, or should the team stand up new ones?
- What artifact/log retention window do you require so CI run-ID receipts remain immutable and non-expiring for audit?
Do (9)
- Prove gates BLOCK with real red-path receipts (failing run IDs) — a green run only proves the happy path.
- Wire every gate as a REQUIRED status check on a protected branch with no admin bypass and force-push disabled; capture the branch-protection API output as a receipt.
- Generate the SBOM per build, attach it as an immutable artifact, attest it (cosign/SLSA/in-toto), and key the SCA vuln gate to it.
- Make the deploy real to a dev/test environment with a blocking post-deploy smoke gate and a proven rollback.
- Pin the toolchain and lockfiles and prove determinism by re-running the same SHA for an identical verdict.
- Use keyless/OIDC for deploy and signing; keep the runner least-privilege; store no long-lived admin secrets in repo or CI.
- Stand up the receipt/ledger substrate first so every Sprint-0 deliverable (and every downstream brick) anchors its claims to a receipt.
- When the human's brief leaves a choice open, proceed on an explicit, flagged assumption logged as a receipt and surface the question as input.
- Close the brick only on Verdict's converged, evidence-citing verdict — never on Vector's self-attestation.
Don't (9)
- Don't let any gate be advisory/non-blocking or run nightly — gates must block at merge/deploy from the first commit.
- Don't declare 'gates enforced' without the five red-path failing run IDs proving the block.
- Don't let Vector (pipeline author) or Cipher (security-gate author) review their own work — independence means a non-author reviews each part.
- Don't accept a no-op or unattested SBOM, or an SBOM the SCA gate doesn't actually consume.
- Don't simulate the deploy with a script that echoes success; the deploy and rollback must be real and health-checked.
- Don't inject long-lived admin secrets or over-privileged credentials into CI.
- Don't let design-system checks be decorative — either wire them into the blocking CI as a real check or split them out.
- Don't block on a human approval; capture human input continuously but proceed on flagged assumptions where silent.
- Don't claim done while any material gap from the independent review remains open.
Guardrails (9)
- Done-condition is receipt-anchored: a real green CI run ID PLUS five red-path failing run IDs must be attached before the brick can close.
- Branch protection must show required checks, no bypass, force-push disabled, and signed commits — verified by stored API output, not assertion.
- SBOM must be CycloneDX/SPDX, attached per build, attested (verifiable), and consumed by the SCA gate; the removed-SBOM planted change must block.
- Determinism guard: a same-SHA re-run must reproduce the identical verdict, or future 'passed' receipts are not trusted.
- No long-lived admin secrets in CI; deploy/signing use keyless OIDC or a documented, flagged exception.
- Independence is mandatory: the reviewer of the security gates is not Cipher; the reviewer of deploy gating is not Vector; an adversarial red-team must actively try to merge/deploy a bad change and must fail.
- Convergence = the panel agrees, with cited run IDs and the branch-protection receipt, that all six objectives (good-path green, five defects blocked, gates required on protected branch, SBOM attested+consumed, real deploy+rollback+smoke, receipt/ledger substrate exists) are met with no open material gap.
- Receipt/ledger immutability: CI run logs and artifacts must be retained for the human-specified window so run-ID receipts do not rot.
- All human-open choices proceed only on an explicit flagged assumption logged as a receipt — never on a silent default.
Sprint Planning [LOOP]
Open the iterative delivery loop for one sprint by authoring a tight, evidence-backed sprint plan that a top engineering org would actually commit to — then hand it to the independent sprint-plan review loop before any build begins (no human sign-off; the panel's convergence verdict is what authorizes build). Cadence (Delivery Lead) and Vela (Product Owner) pull the highest-priority Ready stories that pass the Definition of Ready, agree exactly ONE measurable sprint goal, decompose stories to tasks, size the commitment against the team's defined AI-team capacity/WIP (concurrent workstreams + review-loop budget + merge/integration throughput, NOT human story-point velocity), and confirm at least one testable acceptance criterion plus a named test approach per story. Every committed story traces back to a PRD/spec item and forward to a test approach, so no orphan or creep work enters the sprint. Sprint dependencies and risks are captured in a register with an owner and need-by date (or flagged blocked) per item. This brick is the loop-entry point: it repeats on each outer iteration through the sprint-increment review, re-pulling Ready stories and re-planning against updated velocity/learnings, and the outer loop exits only when release scope is delivered. Planning surfaces open scope/priority questions to the sponsor as a continuous input channel and, where the human is silent, proceeds on an explicit flagged assumption recorded in the plan — it never blocks on approval.
Questions the agent asks (7)
- Sponsor: for this sprint, is the highest-value outcome correctly captured by the proposed one-sentence sprint goal, or should priority shift?
- Sponsor: are there any new constraints, deadlines, data, or scope changes since the last sprint we should fold into this plan?
- Sponsor: for any story where required information is missing, can you confirm the assumption we have flagged, or provide the answer? (If silent, we proceed on the flagged assumption.)
- Internal (Vela): which Ready stories are highest priority and trace cleanly to a PRD/spec item this sprint?
- Internal (Cadence): what is the current AI-team capacity/WIP ceiling from the team charter, and does the proposed commitment fit within it?
- Internal (Keystone): which committed stories carry cross-team or external dependencies that need an owner and need-by date now?
- Internal (Cipher): do any committed stories touch auth, data handling, or external surfaces such that threat/security work must be scoped into this sprint?
Do (9)
- Pull only Ready stories that pass the Definition of Ready; reject anything failing DoR back to refinement rather than committing it.
- State exactly ONE sprint goal as a single measurable sentence; if the backlog implies two goals, split or defer until one remains.
- Size the commitment against the defined AI-team capacity/WIP (concurrent workstreams, review-loop budget, merge/integration throughput) and show the math.
- Confirm >=1 testable acceptance criterion and a named test approach for every committed story before committing it.
- Trace every committed story back to a PRD/spec item and forward to a test approach.
- Record every dependency and risk with an owner and need-by date (or explicit blocked flag) and a mitigation.
- Surface open scope/priority questions to the sponsor and incorporate answers; where the sponsor is silent, proceed on an explicit flagged assumption with an assumption ID.
- Hand the authored plan to the independent sprint-plan review loop and treat that panel's convergence verdict — not Cadence/Vela's own judgment — as the build authorization.
- On each outer-loop iteration, re-pull Ready stories and re-plan against updated velocity and prior-sprint learnings.
Don't (8)
- Do NOT insert any human approval/sign-off gate; the sponsor provides input, never blessing.
- Do NOT treat 'Cadence/Vela authored it' as sufficient — the plan is not authorized to build until the independent review loop converges.
- Do NOT commit more than one sprint goal or let the sprint become a grab-bag of unrelated work.
- Do NOT commit stories that fail DoR, lack a testable AC, lack a named test approach, or have no upstream spec trace.
- Do NOT over-commit beyond the defined AI-team capacity/WIP ceiling.
- Do NOT copy human-velocity ceremony (raw story-point velocity) as the capacity model for an AI team.
- Do NOT block planning on a silent sponsor — flag an explicit assumption and proceed.
- Do NOT claim the sprint is 'planned' without the full receipt bundle (committed backlog, capacity calc, DoR matrix, AC/test sheet, dependency/risk register, traceability map, assumptions log) existing.
Guardrails (8)
- Single-goal guardrail: the brick fails if the committed plan contains anything other than exactly one measurable sprint goal.
- Capacity guardrail: committed size must be <= the defined AI-team capacity/WIP ceiling, with the calculation and its source citation present; otherwise descope before submitting.
- DoR guardrail: zero committed stories may have DoR=FAIL; testability guardrail: zero committed stories may lack a testable AC or named test approach.
- Traceability guardrail: zero orphan stories — every committed story must map to a PRD/spec item ID and a downstream test approach.
- Dependency guardrail: every dependency/risk row must carry an owner and a need-by date or explicit blocked flag; unowned items are not allowed to ship in the plan.
- Human-input guardrail: the human may only provide information/answers; any item marked 'blocked on human approval' is a violation — convert to a flagged assumption and proceed.
- Honesty/receipt guardrail: 'planned' is claimable only when the full receipt bundle exists; the plan is 'authorized to build' only when the independent review loop returns a convergence verdict ID with zero open material gaps — never asserted.
- Hand-off guardrail: this brick must NOT mark itself complete-and-building; it submits to the independent review_loop and waits for that panel's verdict (reviewers must be non-authors of the plan).
Independent Review & Iterate: Sprint Plan [LOOP]
Stop a whole sprint from being burned on a bad plan by validating the committed sprint backlog and goal BEFORE execution starts — through a bounded, evidence-producing independent-review loop, not a human sign-off. A NON-AUTHOR panel — Verdict (independent evaluator, runs an adversarial pre-mortem), Proof (testability of every acceptance criterion), and Keystone (dependency/feasibility/staffability) — scores each committed story line-by-line against the pinned, versioned Definition-of-Ready rubric (owned by Cadence, who commits the canonical DoR in assemble_team), the release plan, and the seven Objectives, returning structured per-story findings classified on the canonical Blocker/Major/Minor severity taxonomy. Cadence (Delivery Lead) and Vela (Product Owner) authored the plan and own the fixes; they revise, the panel re-reviews, and the loop iterates to a binary ConvergenceVerdict: 0 open Blockers AND 0 open Majors AND every Minor accepted-with-written-rationale, with one clean stabilization pass that introduced zero new gaps. The loop is bounded at 3 rounds; on non-convergence the specific unresolved disagreement is surfaced to the sponsor as a NON-BLOCKING input question and the team proceeds on an explicit flagged assumption carried as a sprint risk — never a human gate. Convergence is a durable, re-derivable receipt (panel roster with non-author attestation, per-story pass/fail ledger, capacity arithmetic, append-only gap log with severity, pre-mortem outcomes, flagged-assumptions/risk list), and those assumptions+risks are emitted as the explicit checklist the end-of-sprint accomplishments-vs-goals review must later validate. No claim of "meets DoR / testable / fits capacity / converged" is accepted without its re-derived receipt.
Questions the agent asks (4)
- Which committed stories are highest-priority such that, if capacity is tight, they must survive scope trimming?
- Is there missing data, an external dependency, or an ambiguous acceptance definition on any story that only the sponsor can resolve (logged as a non-blocking question; otherwise we proceed on a flagged assumption)?
- Are there any fixed external dates or release-plan commitments this sprint must hit that constrain what can be committed?
- For any story whose acceptance is currently subjective ('it should feel fast'), what is the concrete, observable threshold we should hold it to?
Do (9)
- Bind the review to the canonical versioned DoR rubric Cadence committed in assemble_team and have Verdict score every story line-by-line against every numbered rule — reproducible, re-derived, not from memory.
- Require each panel member to produce structured per-story findings with re-derived evidence; reject any verdict lacking full per-story coverage (anti-rubber-stamp).
- Show the capacity arithmetic: committed points vs rolling velocity with the source and the delta — 'fits capacity' is a number, never an assertion.
- Have Proof name the specific test or observation that proves each acceptance criterion, and Keystone cite the artifact/number behind each dependency-ready and staffable claim.
- Classify every finding on the canonical Blocker/Major/Minor taxonomy and drive convergence to 0 open Blockers, 0 open Majors, Minors accepted-with-written-rationale, plus one clean stabilization pass.
- Run Verdict's adversarial pre-mortem every round and answer or log every failure mode before converging.
- Guarantee panel independence: every reviewer (Verdict, Proof, Keystone) attests they did not author the plan or the DoR rubric they grade against; if a needed specialist co-authored the relevant design, swap in an alternate.
- Bound the loop at 3 rounds; on non-convergence, surface the specific disagreement to the sponsor as INPUT and proceed on an explicit flagged assumption recorded as a sprint risk.
- Store the convergence verdict as a durable receipt and emit its assumptions+risks as the checklist the end-of-sprint review must validate.
Don't (9)
- Don't let a reviewer emit a blanket 'looks good' / approval with no per-story evidence — that verdict is rejected.
- Don't insert any human approval or sign-off; the sponsor only provides input and answers, never gates.
- Don't converge by reclassifying a Blocker- or Major-severity gap (DoR violation, untestable AC, over-capacity, unready dependency, unstaffable story) down a tier — those severities are fixed by the canonical rubric.
- Don't let a panel member review a plan they authored, or grade stories against a DoR rubric they themselves wrote.
- Don't name Atlas, or any non-roster actor, as the owner or author of any deliverable in this brick — the DoR rubric is owned by Cadence and graded by Verdict.
- Don't run the loop unbounded — an infinite plan-review loop burns the very sprint capacity it protects.
- Don't block or stall waiting on a human answer; if unanswered, convert the question to a flagged assumption and continue.
- Don't accept 'fits capacity', 'testable', or 'meets DoR' without the velocity number / the named test / the re-derived per-story score backing it.
- Don't start sprint execution before the convergence receipt records 0 open Blockers AND 0 open Majors AND every Minor accepted-with-rationale (or a recorded bounded-loop escalation).
Guardrails (8)
- Severity taxonomy is the canonical assemble_team rubric, used verbatim: every gap is classified Blocker, Major, or Minor; convergence == 0 open Blockers AND 0 open Majors AND every Minor carries a written accepted-with-rationale note, plus a stabilization pass that introduced zero new gaps. No local 'material/minor' or 'blocker/minor' split is used.
- Independence is mandatory: Verdict, Proof, and Keystone must each be a non-author of both the plan and the DoR rubric under review; conflicted seats are swapped for an independent alternate before the round begins.
- Roster integrity: no deliverable in this brick may name Atlas or any actor outside the delivery roster (Vela, Cadence, Keystone, Mason, Lens, Proof, Cipher, Vector, Iris, Verdict). The DoR rubric is OWNED by Cadence (committed in assemble_team) and independently GRADED by Verdict.
- Loop is bounded at 3 rounds with a defined exit: convergence, or escalate-the-disagreement-as-human-input-and-proceed-on-flagged-assumption — never a human gate, never an open-ended loop.
- Honesty: every 'meets DoR / testable / fits capacity / converged' claim carries its re-derived receipt (pinned-rubric per-story scores, named tests, velocity arithmetic, stored convergence verdict); reviewers re-derive numbers rather than accept the author's summary; evidence-free claims are rejected.
- Author/reviewer separation is structural: Cadence/Vela author and fix the plan and Cadence owns the DoR rubric; the panel only reviews — this brick stays separate from the planning brick it reviews.
- Continuous-input channel never converts to a gate: human silence yields a flagged assumption carried as a risk, not a stop.
- Accountability carries forward: the verdict's assumptions and predicted risks are the input checklist for the end-of-sprint accomplishments-vs-goals review, so reviewer judgment is later audited against reality.
Sprint Execution: Build · Review · Test · Secure [LOOP]
The autonomous build core, run once per story and iterated until each story is "solid." Mason implements the story (test-driven where apt) on a short-lived branch; Iris delivers the UX/UI; Proof writes/extends automated tests across the pyramid (unit/integration/e2e) and owns the acceptance tests so results are not self-marked by the implementer; Cipher runs SAST/SCA/secret-scan per change under a defined severity+waiver policy; Lens performs evidence-backed code review where the reviewer is provably NOT the author of the change. The brick exists to emit re-derivable, outcome-keyed RECEIPTS that the downstream sprint-review loops re-derive from — so it must not repeat the platform's own confessed sins: the merge gate here is BLOCKING (a required, non-overridable status check that returns non-zero and stops the merge on any failing condition — never the nightly exit-0 pattern flagged P0 in docs/honesty-architecture.md), and every receipt is keyed on the CI run_conclusion=='pass' so a RED run can never pose as proof (mirroring the email.sent outcome-filter fix). A story is "solid/converged" only when a binary merge predicate holds: all Definition-of-Done items green, coverage AND new-diff-coverage thresholds met, mutation/assertion guard satisfied, zero open High/Critical scan findings, and an evidence-citing Lens approval from a non-author. Independence and honesty are enforced by provenance metadata on the receipt, not by honor; Verdict (independent_reviewer) periodically re-derives a sample of merged-story receipts to confirm the loop was not rubber-stamped. When a story is materially ambiguous mid-build, the team surfaces a question to the sponsor and proceeds on an explicit FLAGGED ASSUMPTION recorded on the story — it never blocks on human approval.
Questions the agent asks (8)
- What are the coverage thresholds (line %, new-diff branch %) and is a mutation-score threshold or assertion-density guard required for this codebase?
- What is the security pass policy: confirm zero open High/Critical, and what is the waiver authority, expiry window, and waiver record location for Medium/Low findings?
- Where is the versioned Definition-of-Done and the coding standards the review enumerates against, and who owns updates to them?
- What is the diff-size threshold above which a bare Lens approval (zero cited evidence) triggers a mandatory second-reviewer or red-team pass?
- What is the flaky-test policy — confirm no blanket auto-retry-to-green, and that quarantine requires a tracked ticket and does not count as pass?
- For this story, are there ambiguities you'd like to answer now, or should the team proceed on a flagged assumption and surface the question on the story?
- What is the maximum allowed branch age for 'short-lived', and is trunk-based merging (squash) the agreed integration model?
- Are there any compliance/data-handling constraints that should add story-specific gate conditions (e.g., PII handling tests)?
Do (10)
- Make the merge gate a required, BLOCKING status check on branch protection — a failing predicate returns non-zero and stops the merge; add a negative test proving a RED run / self-approval / open-critical cannot merge.
- Key every receipt and the gate on run_conclusion=='pass'; record outcome explicitly so a failed run can never be read as proof of success.
- Enforce reviewer independence by provenance: verify reviewer_id (and reviewing model instance) != commit author_id and != any Co-Authored-By; recuse to Verdict or a fresh instance if the same agent authored both code and review.
- Have Proof own or independently re-run the acceptance tests so results are not self-marked by the implementer.
- Require Lens approvals to enumerate each DoD item pass/fail with an evidence pointer (file:line or test name) and cite the failing-then-passing test — make approval expensive in evidence.
- Apply Cipher's severity+waiver policy: zero open High/Critical, hardcoded secrets non-waiverable and hard-fail, Medium/Low only via tracked time-boxed attributable waivers referenced by ID.
- Quarantine flaky tests with a tracked ticket; quarantined tests do not count toward pass and the quarantined-count is a visible receipt field and a tripwire.
- Surface mid-build ambiguity to the sponsor and proceed on an explicit FLAGGED ASSUMPTION recorded on the story; never block waiting for human approval.
- Link each story receipt to the story's acceptance criteria so the downstream sprint-review loop can re-derive goal-met from raw CI artifacts.
- Use short-lived branches off trunk, one per story, and merge via the gate; keep diffs reviewable.
Don't (9)
- Never merge with the gate disabled, skipped, or overridden — no --no-verify, no admin bypass, no direct-to-trunk push; any break-glass is itself a logged receipt with justification.
- Never report a green that the CI run_conclusion does not show; 'it works' is not a receipt and a test that didn't pass is a fail.
- Never let the change author approve their own change, or mark their own acceptance tests as passing.
- Never retry a failing test to green without a quarantine ticket; no blanket auto-retries.
- Never emit a success-shaped receipt (CI run ID + counts) for a RED or skipped run.
- Never waiver a hardcoded-secret finding or let a High/Critical finding pass without remediation.
- Never accept a bare 'LGTM' on a non-trivial diff — zero cited evidence triggers a second-reviewer or red-team pass.
- Never let Verdict's audit trust the summarized receipt; re-derive from raw CI artifacts.
- Never run the merge-gate control path itself behind a feature flag that can disable it.
Guardrails (8)
- The merge gate is a required, blocking status check with NO flag on its own run path; removing or weakening any gate condition must itself fail CI.
- Binary merge predicate: MERGE IFF run_conclusion=='pass' AND coverage_line>=threshold AND new_diff_branch_coverage>=threshold AND mutation/assertion guard passes AND scan.high_critical_open==0 AND lens_verdict=='approved_with_evidence' AND reviewer_id!=author_id.
- Receipts are append-only, outcome-keyed (run_conclusion), and contain provenance (author_id, reviewer_id) that makes independence machine-verifiable; downstream review re-derives from them.
- Independence is enforced by identity/provenance, not honor; same-author code+review forces recusal to Verdict or a fresh instance.
- Security pass = zero open High/Critical; secrets non-waiverable; Medium/Low only via tracked, attributable, time-boxed waivers referenced by ID in the receipt.
- Flaky tests are quarantined with a tracked ticket and do not count as pass; quarantined-count is a visible tripwire field.
- Verdict independently re-derives a sample of merged-story receipts each sprint; any mismatch reopens the story and is logged as an open gap.
- No human approval gate exists in this brick; the sponsor provides input/answers continuously and unanswered ambiguities proceed as flagged assumptions logged on the story.
Sprint Review, Demo & Accomplishments-vs-Goals Summary [LOOP]
Close out each sprint with an honest, receipt-backed closeout that claims ONLY what a re-resolvable receipt proves: the working increment demoed against the IMMUTABLE sprint goal, every story marked accepted or rejected with the binary acceptance check and the test/scan/demo receipt that settles it, and the quantitative state of the sprint (test pass + coverage delta, security findings, DORA, cost/FinOps) each linked to its source run. The goal-vs-actual delta is computed against the sprint-plan brick's committed goal+story set referenced BY ID (never restated or silently descoped), and every mid-sprint scope change is itemized with its receipt. Cadence authors the closeout but does NOT certify it: the draft is handed to an independent review-and-iterate loop (Verdict + Proof/Cipher/Vector/Vela lenses + one adversarial red-team pass) that re-fetches every receipt and adversarially tests the gap between each sentence and what its receipt actually proves; the closeout is final only on a logged panel verdict of zero open material gaps with the gap-log attached as the convergence receipt. The brick absorbs the legacy human-gated Review stage and the separate Learn/retro stage into one AI-owned artifact — there is NO human sign-off; the sponsor is a continuous input channel (open questions surfaced, answers folded in, silence handled by explicit flagged assumptions). It carries an honest missed-goal path (partial increment + blocking receipts, no optimistic narrative) and a next-sprint proposal that traceably carries forward every rejected/carried story, new risk, and human input.
Questions the agent asks (6)
- What is the canonical immutable reference (ID) for this sprint's committed goal and story set, so the delta denominator cannot drift?
- Which CI run, scan report, DORA query, FinOps export, and demo artifact are the authoritative receipts for this sprint, and are they all re-fetchable by the validator?
- Sponsor: of the open questions surfaced during the sprint, which would you like to answer now, and where should we record a flagged assumption if you are silent?
- Does the sponsor have any new information, priority shift, or data that should reshape the next-sprint proposal before it is finalized?
- Are there any security findings the team intends to waive, and if so who is the named owner and what is the waiver expiry?
- Was the increment fully demoable end-to-end on the real path, or did the demo run on seed/stub data that the red-team pass must flag?
Do (9)
- Reference the sprint-plan brick's committed goal+story set by receipt ID and compute the goal-vs-actual delta against THAT immutable set.
- Attach a receipt ID to every quantitative or status claim and have the validator re-fetch each one before the closeout is considered drafted.
- Source test/coverage from Proof, security from Cipher, DORA+cost from Vector, and the demo from Iris — never let the author self-source the numbers being claimed.
- Hand the draft to the independent panel (Verdict + specialist lenses + a red-team pass) and treat it as final only on a logged zero-material-gaps verdict with the gap-log attached.
- List every rejected/incomplete story with its failed binary criterion, the receipt proving failure, and an explicit disposition (carry/re-shape/cut).
- Itemize every mid-sprint scope change with its receipt instead of absorbing it into the completion number.
- Carry the sponsor as a continuous input channel: surface open questions, fold in answers with links, and record flagged assumptions where the sponsor is silent.
- When the goal was missed or partial, state the partial increment + blocking receipts honestly and route the remainder into the next sprint.
- Trace every item in the next-sprint proposal back to a rejected/carried story ID, a new risk, or a human input from this sprint.
Don't (9)
- Do NOT let the author (Cadence) certify the closeout — independence requires a panel that did not author it.
- Do NOT state any 'demoed / passed / live / accepted / no criticals' claim without a re-resolvable receipt ID supporting it.
- Do NOT restate, rewrite, or silently descope the sprint goal to inflate the delta — pin it to the immutable planned set.
- Do NOT 'fix' a flagged gap by quietly deleting the claim; removed claims must be logged as removed.
- Do NOT add a human sign-off / approval field; the human provides input, never a gate.
- Do NOT claim green tests at a commit that differs from the merged/deployed SHA (no served-vs-disk or demo-vs-merge hash mismatch).
- Do NOT count coverage gains on dead code or test passes on stubbed dependencies as real progress — the red-team pass must catch these.
- Do NOT drop rejected or carried-over stories; carry-over hiding defeats the flow metrics.
- Do NOT narrate success the increment's receipts do not support; an honest missed-goal closeout beats an optimistic false one.
Guardrails (7)
- HONESTY INVARIANT: the closeout may claim only what a re-resolvable receipt proves (test counts, scan IDs, ledger/DB rows, git SHAs, health-check codes, reviewer verdicts, demo-artifact hashes); any unresolvable claim is cut or rewritten — never asserted.
- INDEPENDENCE INVARIANT: the review panel must exclude the author; approval of any claim requires the reviewer to independently re-fetch its receipt, and at least one adversarial red-team reviewer must file the receipt-vs-claim mismatches it probed.
- CONVERGENCE INVARIANT: 'solid/final' = a logged binary panel verdict of zero open MATERIAL gaps with the gap-log attached as the convergence receipt; a max-iteration cap escalates remaining gaps to the sponsor as input, never as an approval gate.
- IMMUTABLE-DELTA INVARIANT: goal-vs-actual is computed against the sprint-plan committed set referenced by ID; scope changes are itemized with receipts, not absorbed.
- SECURITY/COST GATES: closeout passes only with no unwaived High/Critical finding (Cipher scan ID linked; waivers carry owner+expiry) and spend within budget or variance explained (FinOps receipt linked).
- NO HUMAN GATE: there is no human sign-off; the sponsor is a continuous input channel and their silence is handled by explicit flagged assumptions, not by blocking.
- SPRINT DISCIPLINE: runs only inside an OPEN sprint and writes only under sprints/sprint-NNN/; no secrets or credentials in the closeout or its receipts.
Independent Review & Iterate: Sprint Increment (Anti-Bluff Honesty Audit) [LOOP]
This is the sprint-level Truth Gate for the delivery process itself: it replaces the recurring human sprint-review sign-off AND kills the deepest anti-pattern in AI-run delivery — the author grading its own homework. Verdict (the independent evaluator), joined by specialist lenses who did NOT build the increment, independently RE-DERIVES every "done" claim from source-of-truth: it pins the exact git SHA, checks out clean from source control (never the author's working tree), re-runs the suite under its OWN run IDs, re-reads scan output, and replays each story's demo against its acceptance criteria — never accepting Cadence's or Vela's summary. For every claimed-done story it proves the increment is real, not asserted: new tests fail on the pre-change SHA and pass on the post-change SHA, no new skips/xfail, coverage delta meets threshold, suspect tests survive an N-times flaky re-run, security/non-functional DoD slices (scan, migration/rollback, perf/a11y where relevant) are re-checked, receipt rows show outcome==success, and at least one negative/failure path per story is probed. It red-teams for hidden, skipped, and concealed misses; verifies accomplishments against BOTH the sprint goal AND the Objectives rubric; and converges only when an independent panel agrees there are zero misrepresented-status findings and zero open material gaps. The loop has explicit convergence control (material-vs-immaterial gap definition, iteration budget, deadlock rule) so it neither rubber-stamps nor loops forever; non-convergence routes to the continuous human-INPUT channel as a question plus a flagged assumption or descope — never an open human approval hold. Any caught misrepresentation seeds a permanent honesty-regression case so the loop becomes a hardening immune system, and the whole verdict is written to an append-only, re-auditable ledger so the convergence claim is itself receipt-backed. Re-prioritization for the next sprint is an AI decision informed by the human-input channel — no human gate.
Questions the agent asks (5)
- What is the sprint goal for this increment, in one sentence, and what were the planned stories with their acceptance criteria and Definition of Done?
- Are there domain-specific failure modes or negative paths you specifically want probed for these stories (data we should not corrupt, side effects, rate limits, rollback expectations)?
- Is the coverage-delta threshold and flaky re-run count (N) for this project what we have configured, or do you want different bars for this sprint?
- For any story we descope on non-convergence, is there a priority you want us to weigh when re-prioritizing the next sprint — or should we decide from the input channel and current backlog?
- Are there non-functional DoD slices (perf budgets, accessibility level, migration/rollback drills) that apply to specific stories this sprint that we must re-derive?
Do (11)
- Pin the exact reviewed git SHA and re-run everything from a clean checkout out of source control — never the author's working tree.
- Generate and record Verdict's OWN run IDs / log hashes, distinct from Cadence's or Vela's.
- Prove each done story's new tests FAIL on the pre-change SHA and PASS on the post-change SHA; verify no new skips/xfail and coverage delta >= threshold.
- Re-run changed/suspect tests N>=3 times; treat any non-determinism as NOT converged.
- Form your own findings BEFORE opening the author self-assessment (anti-anchoring).
- Replay each demo against every AC AND probe at least one negative/failure path per story; verify receipt rows show outcome==success.
- Re-derive the security and non-functional DoD slices (scan, migration/rollback, perf/a11y where relevant), not just the happy-path demo.
- Score against BOTH the sprint goal and the Objectives rubric; treat any failing rubric line as a material gap.
- Write the convergence verdict, reproduced receipt IDs, gaps log, and iteration count to the append-only ledger.
- Seed a permanent fails-before/passes-after regression check for every misrepresented-status finding.
- On non-convergence, surface a question to the human-input channel plus a flagged assumption or descope — and proceed.
Don't (10)
- Don't accept any summary, screenshot, dashboard, or 'it works' as a receipt.
- Don't re-run the suite against the author's working tree, branch, or seed — only a clean checkout at the pinned SHA.
- Don't count a green suite as proof when the story shipped 0 new tests, skipped/xfail tests, or vacuous assertions.
- Don't let a quarantined, retried, or flaky test stand in as a passing receipt for its story.
- Don't review an artifact you authored — Verdict must not be the author of the code, tests, or sprint plan under review.
- Don't edit or fix the artifact under review — route material gaps back to Mason/Proof/Keystone.
- Don't read Cadence's or Vela's self-assessment before forming your own findings.
- Don't let subjective taste block convergence — route it to the human-input channel as a question, not a hold.
- Don't loop forever — honor the iteration budget; on deadlock convert to a logged assumption or descope.
- Don't ever convert a review finding into an open-ended human approval gate.
Guardrails (9)
- INDEPENDENCE: Verdict and any specialist reviewer must provably NOT be the author of the artifact under review; reviewers are review-only and cannot edit the increment.
- ANTI-ANCHORING: reviewer findings are timestamped/recorded before the author self-assessment is opened.
- TAMPER-EVIDENCE: every 'reproduced' claim cites a pinned SHA, a clean-checkout, and Verdict's own run ID/log hash; a verdict without reproduced receipt IDs is treated as unbacked and cannot be CONVERGED.
- BINARY CONVERGENCE: CONVERGED requires every claimed-done story has an independently reproduced passing receipt, 0 misrepresented-status findings, 0 open material gaps, goal-vs-actual documented, and IDs recorded — all else is NOT-CONVERGED.
- NO HUMAN GATE: the human provides information/answers only; no brick outcome may be 'awaiting human approval'; non-convergence routes to a human-INPUT question plus flagged assumption or descope, and re-prioritization is an AI decision.
- CONVERGENCE CONTROL: a defined material/immaterial gap rule, an iteration budget, and an oscillation/deadlock escape must be applied so the loop neither rubber-stamps nor runs forever.
- IMMUTABLE AUDIT: the convergence verdict and its receipts are written to an append-only ledger, re-auditable from IDs/SHAs alone.
- IMMUNE MEMORY: any caught misrepresentation seeds a permanent regression check (fails-before/passes-after) for future sprint reviews.
- Verdict gates on EVIDENCE and time-boxes only — it must never become a standing bottleneck that re-creates the human gate with a robot in the chair.
Sprint Retrospective & Backlog Refinement [LOOP → next sprint]
This is the loop-closing brick of the Sprints phase and does four jobs at once: (1) Cadence runs a disciplined retrospective that first verifies whether the PRIOR sprint's improvement actions were actually done (with evidence) and only then logs 1-2 new improvement actions, each with an owner and a next-sprint verification signal; (2) Cadence refreshes the capacity/DORA and cost/FinOps trends where every number cites a provenance receipt (CI/CD run IDs, deploy logs, incident records, token/cost-meter exports) — no hand-typed metrics; (3) Vela refines and re-prioritizes the backlog against the sponsor's latest input (a pure INPUT channel), running a bounded review_loop until the candidate next-sprint set is "solid"; and (4) the brick produces the BINARY phase-exit decision — loop back to Sprint Planning vs. proceed to Hardening — as a deterministic function of two receipts, NOT a judgment call. Crucially, the Scope-Burndown receipt (what release-scope work remains, sized) and the Done-Done receipt (every release-scope item in terminal state) are both DERIVED FROM the release_plan release-scope register — declared here the single authoritative MUTABLE register of record (the release_plan's value-sequenced sprint list + requirement-to-sprint traceability matrix, re-sequenced after each sprint). Because both exit receipts read from one owned source rather than re-typed ad-hoc lists, the exit predicate is deterministic against a single owned source: an item is in scope, done, or remaining only as the register says, so no "ghost" scope and no quietly-forgotten item can swing the decision. Neither Cadence (who authored the retro/metrics) nor Vela (who owns the backlog) adjudicates convergence or the exit: Verdict (independent_reviewer, who authored neither) independently validates both that the refined backlog is solid and that the exit verdict is correct given the evidence, free to disagree, and the loop closes only on Verdict's documented convergence verdict. The exit reads the Scope-Burndown receipt (register-derived remaining items, sized) and the Done-Done receipt (each register item in terminal state with merge SHA + Lens verdict + Proof test IDs + Cipher scan + demo link), gated by a quality precondition (zero open P0/P1 defects unless explicitly accepted into hardening, no quarantined tests hiding failures, change-failure-rate not regressing): scope-remaining non-empty → LOOP; scope-remaining empty AND done-done clean AND quality-gate green → PROCEED. The human is never an approval gate; their input feeds re-prioritization and they may be asked questions. The anti-deadlock rule is bounded and honest: on sponsor silence, wait a bounded window, log a flagged assumption to the RAID/assumption register with owner + revisit trigger, and proceed — EXCEPT any re-prioritization that drops or deprioritizes a sponsor-requested scope item may not be silently assumed; it holds at prior priority and is surfaced as a question until answered, so "never block" never becomes "silently steer away from the sponsor's brief."
Questions the agent asks (5)
- Given the current backlog state, what are your top priorities for the next sprint — has anything changed since you last gave input?
- Is there any scope you previously asked for that you now want dropped, deferred, or re-ordered? (We will hold prior priority on anything you've requested until you confirm.)
- Are there new business constraints, deadlines, or data we should fold into re-prioritization?
- For any release-scope item we are about to defer to a future sprint, are you comfortable with that, or should it stay in the current release?
- If we move to hardening, are there known issues you'd want explicitly carried as hardening-entry conditions versus fixed before exit?
Do (8)
- Verify and close LAST sprint's improvement actions with evidence before logging any new ones; cap new actions at 1-2, each with an owner and a next-sprint verification signal.
- Cite a provenance receipt for every capacity/DORA/FinOps number — CI/CD run IDs, deploy logs, incident records, cost-meter exports.
- Derive BOTH the Scope-Burndown receipt (what remains) and the Done-Done receipt (what is terminal) from the release_plan release-scope register — the single authoritative mutable register — and make the exit a deterministic function of them plus the quality gate.
- Keep the release_plan register the one source of truth: every backlog refinement, auto-filed FinOps item, drop, or re-sequence mutates the register, and both exit receipts read from it so they cannot diverge.
- Have Verdict (independent_reviewer) — not Cadence or Vela — re-derive the receipts against the register and adjudicate both backlog 'solid' and the binary exit verdict; let Verdict disagree and iterate until convergence.
- Treat the sponsor as a continuous INPUT channel: fold their latest input into re-prioritization, ask the listed questions, and on silence proceed on a logged flagged assumption with a revisit trigger.
- Run the FinOps action-hook: a cost-per-unit-of-work regression beyond threshold auto-files a backlog item into the register and, if runaway, trips the cost circuit-breaker.
- Report to the sponsor in plain language: what improved, what remains, where cost is trending, and the loop/hardening decision.
Don't (9)
- Don't let Cadence (retro/metrics author) or Vela (backlog owner) declare convergence or the phase-exit — that is grading their own homework; only Verdict adjudicates.
- Don't compute the exit from re-typed, ad-hoc, or memory-held scope lists — both exit receipts MUST reconcile against the release_plan register, or the exit is invalid.
- Don't assert 'release scope complete' without the Done-Done receipt showing the register's non-terminal set empty; a vibe is not a receipt.
- Don't hand-type or estimate any DORA/FinOps number — an unsourced metric is theater and must be rejected.
- Don't open new retro actions while prior actions sit silently unresolved.
- Don't insert any human-approval/sign-off step; the human provides input and answers questions, never gates.
- Don't silently assume away a sponsor-requested scope item — dropping/deprioritizing one holds at prior priority and is surfaced as a question until answered.
- Don't PROCEED to hardening while open P0/P1 defects, failure-hiding quarantined tests, or a regressing change-failure-rate are unaddressed and not explicitly accepted.
- Don't dump infra jargon on the sponsor or present a cost/velocity trend as a guaranteed outcome.
Guardrails (9)
- Single source of truth: the release_plan release-scope register (value-sequenced sprint list + requirement-to-sprint traceability matrix) is the authoritative MUTABLE register of record; Scope-Burndown and Done-Done are derived views of it, so the exit predicate is deterministic against one owned source — no shadow scope lists.
- Independence is mandatory: the exit verdict and the backlog 'solid' verdict are signed by independent_reviewer (Verdict), who authored neither artifact and who re-derives both receipts against the register; an author-signed or non-reconciling exit is invalid.
- Honesty/receipts bite hardest here: every 'done', 'passed', 'complete', metric value, and 'action done' must resolve to a real receipt (register item IDs in terminal state, CI/deploy/incident/cost-meter sources, prior-action evidence) — never asserted.
- The exit is a deterministic rule, not a judgment: PROCEED iff register-derived scope-remaining EMPTY AND register-derived done-done non-terminal set EMPTY AND quality-gate green; else LOOP.
- Quality-gate precondition on PROCEED: zero open P0/P1 (or each explicitly accepted into hardening with owner + entry-condition), no quarantined test hiding a failure, change-failure-rate/defect trend not regressing.
- Anti-deadlock is bounded and honest: on sponsor silence, wait a bounded window, log a flagged assumption to the RAID/assumption register with owner + revisit trigger, then proceed — but a re-prioritization that drops a sponsor-requested item may NOT be silently assumed; it holds at prior priority in the register and is surfaced until answered. Never block on a human approval.
- Backlog-refinement review_loop has a bounded iteration count; if not converged in N passes, surface the disagreement to the human as INPUT and proceed on the flagged assumption — never deadlock.
- No work is generated outside an open sprint; this brick runs within the active sprint and, on LOOP, hands the refined register-backed backlog to the next Sprint Planning brick.
- Sponsor-facing reporting is plain-language and over-claim-free; the receipts are the gate, the sponsor's input is an input, not an approval.
Release Hardening: Full Regression, Performance, Accessibility & UAT
On a frozen release candidate, Proof runs the full quality pass that goes beyond per-sprint testing: end-to-end regression across the full suite plus every fix's targeted regression and the escaped-defect set; performance/load measured against the NFR numeric targets inherited (read-only) from the phase-1 PRD and phase-3 architecture NFR register; accessibility conformance to a named bar (WCAG 2.2 AA) via automated scan AND assistive-technology pass; the defined cross-browser/device/platform matrix with a per-cell verdict; a confirmation that the security gate is GREEN for THIS exact build; and UAT run as a continuous human INPUT channel (the sponsor exercises real scenarios and reports observations that become triaged defect IDs — never an approval gate). All results are captured as an evidence bundle of receipt-backed numbers (CI run IDs, p50/p95/p99 + RPS + error rate, scan output files, matrix grid, defect tickets) bound to a single commit hash; the verdict is COMPUTED from those receipts, never narrated or assumed. Defects are triaged against a binary severity rubric; release-blockers are fixed AND re-tested. The pass condition is binary: zero open blocker-severity defects, all evidence-bundle receipts green, NFR targets met at p95/p99 under a named production-scale load profile, accessibility clean on both tracks, security green, and the candidate frozen at a named commit. Because this verdict is itself a major build artifact and gates deploy, it does NOT stand alone: the immediately-following companion review_loop brick routes Proof's verdict through an independent panel (Verdict + adversarial Proof red-team + Cipher) so Proof never grades its own homework.
Questions the agent asks (5)
- Which production scenarios should the sponsor exercise during UAT, and is there representative production-scale data we may load for the performance profile (or do we synthesize it and flag the assumption)?
- Are there real-world peak/concurrency numbers we should target in the load profile beyond the inherited NFR baseline (e.g. expected go-live traffic, seasonal peaks)?
- Which browsers / OS / devices are genuinely in-scope for your users so we can mark the matrix cells blocking vs. best-effort accurately?
- Are there accessibility commitments or specific assistive technologies (screen readers, regulatory bars) your users rely on that we must include in the manual pass?
- Is there a target release window or freeze date so we can schedule the candidate freeze and avoid post-freeze drift?
Do (8)
- Bind the entire quality pass to ONE frozen commit hash and re-run affected dimensions if anything lands after the freeze
- Inherit NFR targets read-only from the approved PRD/architecture register and pass/fail against those exact numbers
- Judge performance at p95/p99 under a named production-scale load profile, not mean latency on a toy dataset
- Test accessibility on BOTH tracks — automated zero-criticals AND keyboard/screen-reader manual pass — to WCAG 2.2 AA
- Treat UAT as a continuous human INPUT channel: convert sponsor observations into triaged defect IDs and proceed on flagged assumptions when the sponsor is silent
- Report flake rate, quarantine/re-run flaky tests, and only count a stable green as a pass
- Triage every defect with the binary blocker rubric, log the rationale, and fix-AND-re-test every blocker
- Link every number to its raw receipt artifact (CI run ID, load report, scan output, matrix grid, defect ticket) and hand the bundle to the companion review_loop for independent validation
Don't (8)
- Don't mark any dimension 'pass' from a narrated number with no underlying receipt artifact
- Don't set, raise, or relax NFR targets during hardening — they are inherited contracts
- Don't report mean/average latency as the performance verdict or run load against a near-empty database
- Don't accept automated accessibility scan alone as 'conformance'
- Don't treat sponsor sign-off as a pass condition or block on a human approval gate
- Don't count a flaky green run as a pass, and don't reclassify a blocker to a lower severity to clear the zero-blocker gate
- Don't let the cross-browser/device matrix silently shrink — undefined or empty blocking cells are not a pass
- Don't treat Proof's verdict as final on its own — it must clear the independent review_loop before it gates deploy
Guardrails (7)
- HONESTY: no claim without a receipt — every pass carries its raw artifact (test-run ID, load report, scan file, matrix grid, defect ticket) and binds to the frozen commit hash; the verdict is computed, never asserted
- CANDIDATE FREEZE: the pass verdict binds to a specific commit hash/build; any code change after freeze invalidates the verdict and forces re-run of affected dimensions to prevent drift between hardening and deploy
- NO HUMAN GATE: UAT is human input only — sponsor observations become defects; sponsor silence proceeds on an explicit flagged assumption with the AI acceptance-scenario run as binding evidence; there is no human sign-off brick
- NFR INTEGRITY: targets trace to the approved spec/architecture register and are never created or relaxed at test time
- BINARY PASS: pass = zero open blocker-severity defects AND all evidence-bundle receipts green (regression incl. escaped-defects, p95/p99 vs NFR, accessibility both tracks, full blocking matrix, security green for this build) — stated as one timestamped, commit-hash-frozen gate
- NO SELF-GRADING: Proof authors the verdict but does not finalize it; the companion 'Release-Verdict Independent Review & Iterate' loop (Verdict + adversarial Proof red-team + Cipher) must find no material gaps and log a convergence verdict before 'pass' is real
- SECURITY DRIFT: the security gate must be confirmed green for THIS exact candidate (no new criticals since last scan, dependencies clean), not inherited from an earlier build
Data Migration, Seed & Backfill Readiness
Prepare and prove — not merely assert — that any data migration, seed load, reference-data load, search-index/cache rebuild, or backfill required at go-live will execute safely and correctly against real production data. Vector authors and runs the migration artifacts, but under the no-human-sign-off reframe the ONLY thing standing between a self-graded "counts match" report and an irreversible data-loss event is an independent receipt-checker; therefore this work brick hands its evidence to an independent verification loop (Verdict + Proof + Cipher) who reproduce the reconciliation, reversibility, and security evidence themselves and iterate with Vector until no material gap remains. Readiness must cover the four pillars done honestly: (1) idempotency PROVEN by re-running the migration twice and after a simulated mid-run kill with identical end-state hashes; (2) a dry-run against a certified production-REPRESENTATIVE dataset (real volume, skew, encodings, nulls, orphans, known-dirty legacy rows) provisioned through a tenant-isolated, PII-controlled path; (3) multi-signal reconciliation (row counts + key-column checksums/hash-totals + referential-integrity + sampled record-level diffs + business invariants), not COUNT(*) alone; and (4) an honest data-layer recovery story — an explicit reversibility classification, a verified pre-cutover backup/snapshot with a PROVEN restore drill, and either a rehearsed rollback OR a tested restore-plus-forward-fix runbook. The brick also measures runtime/lock-impact versus the cutover window, proves resumability, records a big-bang-vs-expand/contract decision, surfaces migration-semantics questions to the sponsor as continuous input, and defines a post-cutover hypercare reconciliation canary. The skip path (product genuinely has no migration/seed/reference/index/cache backfill) is permitted only when an independent reviewer confirms the emptiness with a logged verdict — it is not an unchecked self-note.
Questions the agent asks (7)
- Which system is the authoritative source of truth for each migrated entity, and where do we resolve conflicts when two sources disagree?
- What is the acceptable data-loss tolerance at go-live (zero, or are certain legacy/garbage fields safe to drop)?
- Which legacy fields/tables are known to be unreliable, deprecated, or 'garbage' and should NOT be trusted or carried forward?
- Are there retention, residency, or regulatory constraints (e.g., PII handling, region pinning, audit requirements) that bound how we migrate or where we test?
- What is the agreed maintenance/cutover window length, and is any downtime acceptable — or must this be zero-downtime (expand/contract)?
- Can we obtain a production-representative dataset (masked clone or sample) for the dry-run, and through what approved, tenant-isolated path?
- Are there business invariants (e.g., total balances, record counts per tenant, distinct-key sets) that must be exactly preserved and can serve as reconciliation anchors?
Do (9)
- Classify every data change by reversibility BEFORE running anything, and pick big-bang vs. expand/contract with explicit rationale.
- Take a verified pre-cutover backup/snapshot and actually execute a restore drill — prove you can return to a known-good state, do not just confirm a backup exists.
- Prove idempotency by re-running the migration twice and after a simulated mid-run kill, comparing end-state hashes — never assert idempotency.
- Run the dry-run on a certified production-representative dataset with real volume, skew, encodings, nulls, orphans, and known-dirty rows.
- Reconcile with multiple signals — counts AND checksums AND referential integrity AND sampled record diffs AND business invariants — all from re-runnable scripts.
- Measure runtime, lock/contention, and resume timing against the cutover window, and flag a blocker if it does not fit.
- Hand the evidence to an independent panel (Verdict + Proof + Cipher) who REPRODUCE the reconciliation and reversibility/security results themselves and iterate until no material gap.
- Surface migration-semantics questions to the sponsor as continuous input and proceed on explicit flagged assumptions when they are silent.
- Define a post-cutover hypercare reconciliation canary so correctness is observed under real live traffic, not just declared at cutover.
Don't (9)
- Do not let Vector grade his own migration — no 'done' without independent reproduction of the reconciliation by an agent who did not run it.
- Do not pass the brick on synthetic or seed-only data that fails to reproduce production volume, skew, and dirty/legacy edge cases.
- Do not reduce reconciliation to COUNT(*) parity — row counts can match while data is silently corrupted.
- Do not claim 'rollback validated' for a destructive/one-way migration; classify it honestly and prove restore-from-backup plus a forward-fix runbook instead.
- Do not skip the pre-migration backup-and-restore drill; 'rollback exists' is not 'we restored to known-good in a drill.'
- Do not copy production PII into lower environments unmasked or through a non-isolated path; Cipher must clear the dataset pipeline.
- Do not block any step waiting for human approval — humans provide input, they do not gate; record a flagged assumption and proceed.
- Do not allow a self-asserted SKIP — an independent reviewer must confirm there is genuinely no migration/seed/reference/index/cache backfill.
- Do not assert any 'passed/done/live' claim without an attached, re-runnable receipt (hashes, logs, scan codes, reviewer verdicts).
Guardrails (8)
- HONESTY: every done-criterion maps to a re-runnable receipt — backup/restore-drill log, double-run + post-kill hash equality, checksum/integrity reconciliation output, timing measurement, Cipher scan code, Verdict/Proof reproduction verdict — never an assertion.
- INDEPENDENT VERIFICATION IS MANDATORY: the migration is not 'ready' until Proof independently reproduces the reconciliation against source-of-truth and Verdict + Cipher confirm reversibility and security; the runner never certifies his own irreversible operation.
- REVERSIBILITY HONESTY: for any change not cleanly reversible, the rollback requirement converts to a tested restore-from-pre-migration-backup plus a documented, rehearsed forward-fix runbook — pretending a clean rollback exists is forbidden.
- DATA-LOSS GUARD: no destructive transform runs in any environment without a verified, restore-proven backup of that environment first.
- SECURITY/ISOLATION: the production-representative dataset and migration scripts must be tenant-isolated, PII-masked-or-access-controlled, secret-free, and audit-logged, with a Cipher verdict on record — consistent with the company's per-client-isolation/data-security pillar.
- NO HUMAN GATE: the sponsor is a continuous input channel (migration semantics, acceptable loss, regulatory constraints); the team never blocks on human approval and records flagged assumptions when the human is silent.
- SKIP INTEGRITY: a skip is valid only with an independent Verdict confirming true emptiness; a hidden seed/reference/index/cache/backfill obligation rejects the skip and forces full execution.
- CUTOVER FIT: a migration that is correct but exceeds the cutover window or locks tables is NOT go-live ready — timing/lock evidence and a resumable/expand-contract path are required before 'ready.'
Security Assessment & Penetration Test
Produce the authoritative, evidence-backed security go/no-go for the release candidate (RC) — the decision that REPLACES a human security sign-off — such that a real CISO would accept it because the receipts, not the assertions, carry it. Cipher leads a four-stage flow on the EXACT RC artifact (pinned commit + built-image digest): (1) refresh the threat model (STRIDE per trust boundary, attack-trees) and author a scope-of-record that an independent reviewer approves BEFORE any testing, so scope cannot be drawn to dodge the dangerous surface; (2) execute the full battery — SAST, DAST, SCA, secret-scanning, SBOM/provenance, ASVS-level control verification, compliance/privacy + per-tenant isolation tests, a concrete agentic/LLM attack suite (OWASP LLM Top 10 + Flowtely's own known sinks: prompt injection direct/indirect, tool/capability/confused-deputy abuse, cross-tenant/cross-flow exfiltration, responder-injection, Truth-Gate/Faithfulness-gate bypass, secret egress via model output), and a regression golden-set seeded from real prior incidents — every run emitting a receipt (tool+version, ruleset hash, target digest, finding IDs); (3) remediate-and-RE-TEST in an iterate-until-clean loop where Mason fixes and Cipher re-runs the specific check against the patched build, closing a finding only on a find→fix-commit→retest receipt; (4) submit the Security Verdict of Record to an INDEPENDENT review-and-iterate loop where Verdict plus an adversarial red-team (agents who did NOT run the assessment) challenge scope, severity ratings, and every "fixed/closed" claim, attempt to re-exploit a sample of declared-closed findings and scope-excluded surfaces, log material gaps, and iterate until they converge that the objectives are met. The verdict is binary and binding: GO requires 0 open high/critical, every blocking finding retest-verified by receipt, the assessed digest unchanged, and panel convergence; a no-go routes back to the sprint loop with the findings register as input. The human (sponsor) is a continuous INPUT channel for residual-risk grey zones and business context (e.g., "this endpoint is internal-only"), never an approval gate — and silence never converts a true critical into a pass.
Questions the agent asks (5)
- Are there endpoints, surfaces, or data flows you consider internal-only or out-of-scope for external attack — and what is the business justification (so we scope correctly rather than guess)?
- For any borderline residual-risk finding, what is your risk tolerance / business context, and is there a compliance regime (e.g., SOC 2, GDPR, client contractual security terms) whose obligations must be treated as blocking?
- Are there known sensitive tenants or client-data boundaries that must get the strictest isolation testing, and any prior security incidents you want explicitly re-tested in the golden set?
- Is there a target ASVS level (e.g., L2) and an expected go-live date that should drive the depth/timebox of the assessment?
- Do you have or want a third-party pentest in addition to the AI-run assessment, and if so does it gate or merely inform the verdict?
Do (8)
- Bind the entire assessment to the EXACT release candidate — pinned commit SHA + built-image digest — and run DAST/pentest against a prod-parity environment of record.
- Approve scope independently BEFORE any testing; derive the test plan from the refreshed threat model so coverage is provably driven by threats, not by whatever the tools happen to catch.
- Operationalize the agentic/LLM suite concretely against OWASP LLM Top 10 + the platform's own known gate-bypass sinks; re-run the golden set of past real incidents every time.
- Close a finding only on a find→fix-commit→retest receipt against the patched build; iterate the remediate-and-retest loop until 0 open high/critical.
- Make every security claim receipt-backed (scan run ID, tool+version, ruleset hash, target digest, finding IDs, retest diffs) so the verdict is server-reconcilable, not free-typed.
- Subject the verdict itself to the independent review-and-iterate loop (Verdict + adversarial red-team who did not run the assessment) and converge explicitly before GO.
- Itemize every suppression/risk-acceptance with justification, expiry date, and independent approval; count a risk-accepted critical against the gate.
- Surface residual-risk grey zones and business-context questions to the sponsor as INPUT; where silent, proceed on an explicit flagged assumption — except never for a true critical.
Don't (8)
- Don't let the assessment author self-certify the go/no-go — the verdict is a gated artifact that an independent panel must review and converge on.
- Don't mark anything 'scanned/found/fixed/retested/blocked' without a real receipt; no green dashboard, suppressed finding, or untested fix may stand in for evidence.
- Don't draw scope to exclude the scary surface, scan a stale/divergent build, or test in an env that differs materially from prod.
- Don't close a finding by assertion or by 'the dev says it's fixed' — only a passing re-test against the patched build closes it.
- Don't ship a GO if any high/critical is open, any golden-set case fails, the assessed digest no longer matches the RC, or the panel hasn't converged.
- Don't reduce 'agentic/LLM' or 'compliance/privacy' to a checkbox — every named attack class and control area needs an executed pass/fail case.
- Don't let the human become an approval gate, and don't let human silence turn a real critical into a pass.
- Don't treat the verdict as durable after a post-assessment code change — any digest change invalidates it and re-opens the retest loop.
Guardrails (7)
- Digest immutability: the verdict is bound to the assessed image/commit digest; any change to that digest after the verdict automatically INVALIDATES it and re-opens the remediate-and-retest loop — no GO survives a post-assessment 'tiny fix'.
- Binary gate: GO requires ALL of {0 open high/critical, every blocking finding retest-verified by receipt, golden-set 0 FAIL, all required ASVS/compliance controls verified-or-independently-risk-accepted, assessed digest == RC digest, panel CONVERGED}; otherwise NO-GO.
- Independent review is mandatory and the reviewers (Verdict + red-team) must not be the assessment author; convergence is evidence-based, may require multiple iterations, and is documented.
- Honesty architecture: every security claim references a ledger receipt; the verdict text is reconcilable against receipts and cannot be free-typed — unbacked claims are rejected by construction.
- Scope-of-record must be independently approved before testing begins; suppressions/risk-acceptances are itemized, justified, expiry-dated, independently approved, and counted against the gate.
- Human-as-input only: the sponsor supplies residual-risk/business context but never approves; a true high/critical remains NO-GO regardless of human silence or input.
- No-go is a closed feedback loop, not a dead end: open blocking findings route back to the sprint loop with the findings register as input; this brick feeds deploy/go-live ONLY on a converged GO verdict.
Infrastructure Readiness, Signed Packaging & Release Prep
Vector makes production real: provision and finalize prod infrastructure entirely via IaC, configure scaling, runtime-resolved secrets, backups/DR, and observability (dashboards, alerts, on-call), then produce a versioned, reproducible, signed and provenance-attested (SLSA/in-toto) release artifact with a published SBOM and release notes, plus a deployment runbook and a rollback procedure exercised in staging. Crucially, this brick must never self-certify: every "done" claim (reproducible, signed, attested, rollback-safe, DR-restorable, alerts-fire, secret-free, parity-matched) is converted from prose into a machine-checkable receipt, and an INDEPENDENT panel (Verdict as independent evaluator + Cipher for AppSec/supply-chain, neither of whom authored the release) RE-EXECUTES the proofs — independent rebuild-and-hash-compare, re-running provenance/signature verification, re-scanning the SBOM and secrets, and replaying the rollback/DR/alert drills — rather than reading Vector's summary. The brick terminates in a single Release-Readiness Record linking every receipt with the panel's evidence-based convergence verdict ("solid: no open material gaps") that is independently reproducible from the receipts; a criterion with no machine-checkable receipt is automatically a FAIL, and the loop iterates (Vector fixes, panel re-verifies) until no material gap remains. Where real prod inputs are absent (cloud account, prod DNS, DR region, compliance scope), the team proceeds on an EXPLICIT, FLAGGED assumption surfaced to the human as a non-blocking question rather than blocking on approval, and Verdict confirms no fake-prod assumption is silently load-bearing. DELIVERY-PACKAGE ADDENDUM (Sprint 117): the packaging output is a SELF-DEPLOYING PACKAGE per the deploy.md standard — a fresh Claude must be able to deploy it from deploy.md alone.
Questions the agent asks (7)
- What are the production cloud account, region(s), and DR region we should target — and if none is provided yet, do you accept us proceeding on a staging-account proxy recorded as a flagged assumption?
- What are the production DNS/domains and TLS/cert ownership for go-live?
- What are the binding RTO and RPO targets the rollback and DR drills must meet?
- Are there compliance/data-residency constraints (e.g. SOC2, HIPAA, GDPR, region pinning) that change infra topology or backup handling?
- Which secret manager is the system of record for runtime secrets (e.g. cloud KMS/Secrets Manager, Vault), and who controls access?
- What is the on-call rotation and escalation channel (e.g. PagerDuty/Opsgenie/Slack) the alert-fire drill must page?
- What is the approved policy for handling data written under vN+1 if a rollback to vN occurs (preserve, migrate-down, quarantine)?
Do (8)
- Convert every 'done' into a machine-checkable receipt with a digest/timestamp before claiming it; a criterion with no receipt is a FAIL, not a pass-on-trust.
- Have Verdict and Cipher RE-EXECUTE proofs themselves — independent rebuild, re-run cosign/slsa-verifier, re-scan SBOM and secrets, replay rollback/DR/alert drills — never accept Vector's prose summary.
- Provision staging from the identical IaC modules/version as prod-target and prove parity with a plan-diff before trusting any dry-run as representative.
- Pin all sources, base images, and build inputs by digest so the independent rebuild can produce a hash-identical artifact.
- Exercise (not just configure) backups, DR restore, rollback, and alerting with real fault injection and measured RPO/RTO.
- Source runtime secrets from the secret manager and verify at deploy time that nothing sensitive is baked into the artifact, attestation, IaC state, logs, or runbook.
- Record every missing real-prod input as an explicit flagged assumption surfaced to the human as a non-blocking question, and proceed.
- Iterate the Vector-fix / panel-re-verify loop until the Release-Readiness Record reads 'solid: no open material gaps.'
Don't (9)
- Don't let Vector be its own auditor — the release-readiness verdict belongs to Verdict + Cipher, who did not author the release.
- Don't call the artifact 'reproducible' without a second independent rebuild yielding a hash-identical digest.
- Don't call rollback 'validated' without a transaction-level receipt covering forward/backward migration, measured RTO, post-rollback health, and a correct smoke transaction.
- Don't call backups/DR 'configured' as proof — only a successful restore drill with a smoke transaction counts.
- Don't call alerts 'live' without a synthetic-failure drill proving the alert fired AND reached on-call.
- Don't narrow secrets checking to the artifact only — scan SBOM, attestation, IaC state, CI logs, and runbook too.
- Don't block on a human approval gate; surface questions and proceed on flagged assumptions where the human is silent.
- Don't silently use staging-as-prod or any proxy without recording it in the flagged-assumptions ledger and having Verdict confirm it isn't load-bearing unflagged.
- Don't ship an SBOM with unwaived critical/high CVEs and call it published-and-done.
Guardrails (7)
- No human sign-off: this brick is AI-owned; the human only provides inputs/answers and the team proceeds on flagged assumptions where silent.
- Independent verification is mandatory and adversarial: the red-team pass must actively try to (1) rebuild and get a DIFFERENT hash, (2) find a secret in the attestation/IaC state/logs, (3) break rollback with a non-backward-compatible migration, (4) prove an alert silently doesn't fire, and (5) catch staging masquerading as prod.
- Honesty architecture: no 'done/passed/live/releasable' claim ships without an attached, independently-reproducible receipt; unreceiptable claims auto-FAIL.
- No real secrets in any artifact, SBOM, provenance attestation, IaC state file, CI log, or runbook; runtime secrets must resolve from the secret manager.
- Any 'N/A because no real prod yet' must appear in the flagged-assumptions ledger and be explicitly accepted there — never silently waved through.
- The release is not 'live' until an agent who did not build it has independently reproduced the proof and the convergence verdict reads 'solid: no open material gaps.'
- Convergence must be evidence-based and reproducible from the linked receipt digests, not a narrative attestation.
Clean-room Deployment Verification (fresh Claude deploys the package from deploy.md)
Prove the delivered package is REAL: a fresh, ZERO-CONTEXT Claude Code session, in an isolated clean workspace containing ONLY the package, executes the package's own deploy.md — installs, runs the test gate, starts the service, health-checks, runs a feature smoke test, and verifies persistence — escalating to the human sponsor ONLY for declared inputs (secrets) it cannot supply. The receipt-backed verdict (DEPLOYED-AND-VERIFIED or FAILED at step N) bound to the package hash is what gates release; a FAIL routes the defect back into the build loop. This is the operational proof that replaces self-graded test claims (run 6a4bd141 failed exactly here). Canonical contract: docs/standards/clean-room-verification-brick.md.
Questions the agent asks (1)
- (verifier, if blocked) deploy.md declares input <X> which I cannot supply — sponsor, what is its value?
Do (4)
- Hand the verifier ONLY the package; it must have zero build context (independence).
- Execute deploy.md exactly; capture per-step command+exit+output as receipts.
- Escalate to the sponsor only for declared inputs the verifier cannot supply; otherwise run autonomously.
- Bind the verdict to the package content hash; FAIL routes the defect back to the build loop.
Don't (4)
- Don't let the verifier read the parent repo, other tenants, or build-agent context.
- Don't write any secret value into the receipt/transcript in cleartext.
- Don't mark verified without install-ok + tests-pass + health-ok + smoke-ok receipts.
- Don't soften a FAILED verdict to terminate — fail closed.
Guardrails (6)
- GATE-FREE INVARIANT: the human is consulted ONLY for declared inputs the verifier cannot supply; never an approval gate.
- INDEPENDENCE: the verifier is a NON-AUTHOR, zero-context session with only the package.
- HONESTY-RECEIPT: DEPLOYED-AND-VERIFIED is valid only with receipts (per-step exits, real test output, health/smoke matching deploy.md) bound to the package hash; reviewers re-derive, never accept a summary.
- NO-SECRETS-AT-REST: env-only secret injection; declared-input names redacted in receipts; transcript scrubbed of secret values.
- FAIL-CLOSED: a failed step yields FAILED at step N; package NOT released; defect re-enters the build loop.
- ISOLATION HONESTY: on a shared host, directory-scoping is NOT OS-enforced isolation; record the limitation until the verifier runs as a non-parent UID / container (P2a).
Independent Review & Iterate: Production Release Readiness (Go = Evidence) [GO/NO-GO]
This terminal brick replaces the retired human go-live sign-off with an independent, adversarial, receipt-anchored review loop that is the last defense before irreversible production impact. Verdict (lead) plus a red-team that authored none of the upstream artifacts independently RE-VERIFY the four upstream independent-review verdicts — Proof's quality verdict, Cipher's security verdict, Vector's infra/rollback/data-migration readiness, and the independently-reviewed accessibility/privacy NFRs — by REPRODUCING each one's underlying receipt (re-run the suite by run ID, recompute open high/critical from raw scanner output, replay the staging rollback dry-run, re-run the migration reconciliation), never by reading an upstream GREEN summary. The loop consumes and does not redo upstream judgments, but it trusts nothing it has not personally reproduced and bound to the exact shipped commit SHA. The red-team's explicit job is to find a reason NOT to ship: re-adjudicate every downgraded defect/finding, hunt stale or mismatched or failure-outcome receipts, and attempt to break the conservative NO-GO default. This brick also closes the program-long assumption-drift gap: no HIGH-blast-radius flagged assumption may ride 'open' into production — GO requires every HIGH-blast-radius entry in the canonical Assumption Register (sourced from the program_health_rollup) to be in a terminal state (CONFIRMED, REFUTED-and-handled, or EXPLICITLY-ACCEPTED-AS-RISK by a logged human owner, never AI-self-granted). Convergence is gap-gated and evidence-based: GO is impossible while any logged gap is open OR any HIGH-blast-radius assumption is open, requires at least two concurring independent reviewers plus a red-team that found no ship-blocker, names the exact tenant instances it covers, and emits a closed-loop post-deploy smoke + monitoring-window spec to hypercare; the gate stays open until live verification (deploy_go_live_verify) closes it. The only legitimate human touch is a narrow, logged, human-owned ESCALATION to risk-accept a critical finding or a HIGH-blast-radius assumption or make an irreversible/legal call — non-blocking, never AI-self-grantable, and on silence within the window the team defaults conservatively to NO-GO / not-risk-accepted with the assumption flagged.
Questions the agent asks (6)
- For any critical/high finding the team recommends risk-accepting (not fixing) before ship: do you accept the named risk and own it in writing, or should it block? (Silence within the escalation window = NO-GO / not-risk-accepted.)
- For each HIGH-blast-radius assumption that could not be CONFIRMED or REFUTED before ship: do you EXPLICITLY accept it as a logged, human-owned risk, or must it block go-live? (Silence = NO-GO; an open HIGH-blast-radius assumption cannot ride into production.)
- Are there irreversible or legally-significant aspects of this release (data deletion/migration, contractual go-live date, regulated data movement) that you want to call yourself rather than have the team default conservatively on?
- Exactly which tenant/client instance(s) is this release authorized to cover? Any instance not named will be treated as out-of-scope and will NOT auto-generalize.
- What is the acceptable escalation-response window before the team proceeds on the conservative NO-GO default?
- Is there any external constraint (compliance deadline, customer commitment, maintenance-window) that changes the auto-rollback triggers or the post-deploy monitoring window we should encode?
Do (9)
- Reproduce every upstream receipt — re-run by run ID, recompute counts from raw scanner output, replay the rollback dry-run, re-run reconciliation — and log a fresh reproduced-receipt hash for each.
- Run the independence attestation and receipt-integrity gate FIRST; if either fails, stop — the GO is invalid before any other criterion is considered.
- Reconcile the canonical Assumption Register from the program_health_rollup and drive every HIGH-blast-radius entry to a terminal state (CONFIRMED with reproduced receipt, REFUTED-and-handled with dependent bricks re-reviewed, or human-owned EXPLICITLY-ACCEPTED-AS-RISK) before GO.
- Bind every receipt to the exact shipped commit SHA, a success outcome, a content hash, and a timestamp inside the build window; reject anything stale or mismatched.
- Keep the red-team adversarial and disjoint from upstream authors; require it to log real attempts to block, re-adjudicate every downgraded finding, and re-derive every claimed-terminal HIGH-blast-radius assumption.
- Gate convergence on zero open gaps AND zero open HIGH-blast-radius assumptions with ≥2 concurring independent reviewers + a clean red-team pass; iterate on the specific contested receipt, never average disagreeing verdicts.
- Require fired-and-proven receipts for reversibility (synthetic rollback trigger), monitoring (paged + acked synthetic alert), and migration (forward+backward on prod-sized snapshot + restore test).
- Name the exact tenant instances the GO covers and emit the closed-loop post-deploy smoke/monitoring spec so a failed live smoke auto-rolls-back and re-opens this gate; the open GO closes only on downstream deploy_go_live_verify.
- Default conservatively: on human silence, NO-GO and not-risk-accepted, with the assumption explicitly flagged.
Don't (10)
- Don't conclude GO by aggregating four upstream GREEN verdicts — a dashboard glance is not re-verification.
- Don't pass any re-execution-matrix row on an upstream summary; no row passes without independent reproduction and a fresh hash.
- Don't let any HIGH-blast-radius assumption ride 'open' into production; an unresolved one is a NO-GO blocker, not a footnote.
- Don't let the AI self-grant, assume, or back-fill an EXPLICITLY-ACCEPTED-AS-RISK terminal state for an assumption — only a logged, human-owned record may set it.
- Don't let Verdict or the red-team judge an artifact they (co-)authored or co-signed — that collapses independence into self-review.
- Don't accept 'zero open blockers' without re-adjudicating every defect/finding triaged-down to a 'known issue' and every assumption claimed terminal.
- Don't ship on a one-shot staging dry-run with no flag/canary, no tested kill-switch, or no defined auto-rollback triggers.
- Don't generalize a GO across tenants or migrate without forward+backward + backup/restore receipts.
- Don't issue a fatigue-driven GO when the time-box/max-iteration is hit — force NO-GO + escalation instead.
- Don't call the gate closed on GO; it stays open until live post-deploy verification (deploy_go_live_verify) closes it.
Guardrails (8)
- Acceptance criterion ZERO (independence) and the receipt-integrity gate are hard preconditions: failure of either makes the GO invalid regardless of all other criteria.
- GO is binary and gap-gated: impossible while any gap is open OR any HIGH-blast-radius assumption is non-terminal, requiring 0 release-blocker defects, 0 open high/critical security findings (recomputed from raw output), validated staged rollback, all NFR targets met with cited numbers, every HIGH-blast-radius assumption terminal, and live monitoring + on-call proven by a paged-and-acked synthetic alert.
- Assumption-resolution gate: every HIGH-blast-radius entry in the canonical Assumption Register (from the program_health_rollup) must be CONFIRMED, REFUTED-and-handled, or human-owned EXPLICITLY-ACCEPTED-AS-RISK; 'open' = NO-GO by default and AI may not self-grant the accepted-as-risk state.
- Every receipt must be a server-authored ledger row with content hash, build-window timestamp, success outcome, and shipped-SHA binding; failure-outcome rows are filtered and rejected.
- The human escalation is the ONLY override of NO-GO, must be a human-authored/human-owned logged record, is non-blocking, and cannot be AI-self-granted; silence = NO-GO / not-risk-accepted, assumption flagged.
- Honesty architecture: a GO is not 'it is ready' but 'here are the receipts I personally reproduced, the independence attestation, the resolved HIGH-blast-radius assumptions, the red-team's best failed attempt to block, and the post-deploy smoke that will confirm or auto-rollback' — claims live only as far as their receipts.
- Per-tenant scoping is mandatory: a GO names its instances and never auto-generalizes across the multi-tenant fleet.
- Closed loop: a failed post-deploy smoke auto-rolls-back and re-opens this gate; the open GO is closed only by downstream deploy_go_live_verify live verification — the deploy is not done until verified live.
Production Deployment & Go-Live
Vector executes the approved production deployment of the EXACT hardened, scan-clean artifact handed off from Phase-5 packaging — same git hash, same SBOM, same gate-passing build — using the safe-deploy strategy (canary/blue-green) with feature flags decoupling deploy ("artifact running, dark") from release ("flag flipped to users"), so a "deployed" receipt and a "released" receipt are distinct. Before a single byte ships, a binary pre-flight must be green: artifact-hash equals the upstream-approved hash, the honesty/security/brand gates in safe-deploy.sh re-run and PASS with captured receipts, a rollback DRILL has been actually exercised in staging this release (proven, not asserted), and DB migrations are confirmed expand/contract backward-compatible (or explicitly flagged not-rollback-safe with handling). Vector then promotes to a named canary cohort and observes a bake window against named auto-abort thresholds (error rate, p95 latency, 5xx, guardrail-incident count, business success-metric); a breach inside the window triggers AUTOMATIC rollback within the rollback SLO plus a receipt — no human in the loop. Every sub-claim a go-live rests on ("artifact is the approved one", "rollback works", "smoke passed", "health green", "metrics in budget") is written as an immutable go-live ledger receipt with outcome=success verified (guarding against success-receipts-written-on-failure), and the human is an INPUT channel (timing/blackout/comms): questions are surfaced, silence is handled by an explicit flagged assumption logged in the receipt, never a block and never a silent override of a stated constraint. Critically, this brick NEVER self-attests "live and healthy" — it executes the deploy, runs the independently-authored smoke/health checks, and emits the go-live receipt + hypercare handoff package; the "go-live succeeded" verdict is emitted ONLY by the downstream independent review_loop (deploy_go_live_verify: Verdict + Proof + Cipher, adversarial red-team) re-producing the evidence from a clean vantage, so a false "go-live succeeded" is structurally unsayable.
Questions the agent asks (5)
- Is there a required maintenance/deployment window or a blackout period (e.g. client peak hours, change-freeze) we must deploy within or avoid? If you don't specify, we will proceed now and log that as a flagged assumption.
- Who/what is the intended initial canary cohort and acceptable user-impact blast radius (internal users only, % of traffic, a specific tenant)?
- What are the business success-metrics and their acceptable thresholds for THIS release that should auto-abort the canary if breached?
- Is any external communication (status page, customer notice, internal go-signal) required before flipping release flags to 100%?
- Are there release elements (data backfills, irreversible migrations, third-party cutovers) you know are NOT rollback-safe that we should sequence or gate specially?
Do (10)
- Deploy ONLY the exact upstream-approved artifact: assert deployed git hash == hardened/packaged hash before starting, and capture the equality check as a receipt.
- Run the full safe-deploy.sh honesty/security/brand gate and capture every gate receipt; abort on any non-zero exit (mirror the existing blocking-gate behavior).
- Exercise a real rollback DRILL in staging this release and capture its receipt before promotion — prove rollback works, never merely 'confirm available'.
- Decouple deploy from release with feature flags; write distinct 'deployed (dark)' and 'released' receipts and the full flag matrix.
- Promote via canary/blue-green with a named bake window and numeric auto-abort thresholds; auto-rollback on breach within the rollback SLO, no human approval.
- Define 'healthy' as multi-layer: liveness + readiness + dependency + a real end-to-end functional smoke + success-metric dashboards, with named thresholds and live codes.
- Verify outcome=success on every action receipt (guard against success-receipts written on failed actions, e.g. the email.sent-on-failure class).
- Write the immutable go-live ledger receipt and feed the Reporter ONLY from it; hand the receipt + evidence to the independent review_loop and let IT emit the success verdict.
- Surface timing/blackout/comms questions to the human; on silence, proceed on an explicit flagged assumption logged in the receipt.
- Emit the hypercare handoff package (version, flags, dashboards+owners, rollback handle, known issues/assumptions) as the input to H1.
Don't (10)
- Don't assert 'live and healthy' or 'go-live succeeded' from this brick — that verdict belongs to the downstream independent review_loop; this brick produces receipts, not the verdict.
- Don't deploy anything other than the exact gate-passing, hash-matching, scan-clean artifact from the upstream Phase-5 brick.
- Don't treat a single 200 from /health as proof of healthy — shallow health can be green while the system is materially broken.
- Don't claim rollback is available without a fresh, exercised rollback-drill receipt for THIS release.
- Don't run a 'canary' with no bake window or no auto-abort thresholds — point-in-time, happy-path-only checks are canary-in-name-only.
- Don't let Vector author AND run AND grade his own smoke tests — the smoke/threshold suite must originate from Proof/Verdict.
- Don't roll code back over an applied irreversible migration; don't ship a not-rollback-safe change without documented handling/gating.
- Don't write a 'success' receipt for an action whose outcome failed.
- Don't block on a human approval gate; equally, don't silently override a stated human constraint (e.g. a blackout window).
- Don't free-type the Reporter's go-live string — it must be server-authored from the immutable go-live receipt.
Guardrails (8)
- Pre-flight is binary and blocking: no artifact-hash match, gate receipts, fresh rollback-drill receipt, and migration-safety statement => the deploy does not start.
- The 'go-live succeeded' / 'live and healthy' claim is structurally non-emittable by this brick; it is emitted only after the independent deploy_go_live_verify review_loop returns a converged, receipt-green verdict.
- Auto-rollback is mandatory and human-free inside the canary bake window on any threshold breach, and must complete within the named rollback SLO with a receipt.
- Honesty law applies recursively: every sub-claim (artifact-approved, rollback-works, smoke-passed, health-green, metrics-in-budget) must carry an independently resolvable receipt ID, and every action receipt must be outcome=success-verified.
- Deploy and release are separate receipts; 'released' is never recorded without a prior matching-hash 'deployed' receipt.
- Human is input, not a gate: questions surfaced, silence => explicit flagged assumption in the receipt, stated constraints honored — never an approval checkpoint.
- No secrets/credentials in any receipt, log, or handoff package.
- The go-live event itself is recorded as an immutable ledger row; the Reporter is fed only from that row, never from free text.
Independent Go-Live Verification (Closes the GO Gate)
Independently RE-DERIVE the live production evidence from a clean vantage against the exact shipped SHA and issue the single binary "go-live succeeded | rolled-back" verdict, closing the GO gate that review_release_readiness_iterate deliberately left open until live verification. A panel that did NOT run the deploy — Verdict (lead, independent evaluator), Proof (QA), Cipher (AppSec), plus an adversarial red-team pass — re-runs independently-authored post-deploy smoke/health checks, recomputes live error-rate/p95/5xx/guardrail-incidents against the NAMED auto-abort thresholds, confirms the artifact hash in production equals the approved hash, and performs the LIVE Honesty-Gate Coverage check that no design brick closed: every action path in the architecture's Honesty-Gate Coverage Inventory must be verified ACTUALLY receipt-covered in production (the confessed P0: gate wired at one endpoint, in-app/internal paths bypass). The verdict is emitted as a ledger receipt; a false "go-live succeeded" is structurally unsayable because the issuing brick re-derives evidence rather than reading the deployer's summary. Human input is non-blocking and limited to business-context grey zones.
Questions the agent asks (6)
- What are the NAMED auto-abort thresholds carried forward from release readiness (error-rate, p95, 5xx, guardrail-incident count), and what is the post-deploy observation-window length over which they must hold?
- What is the exact approved shipped SHA and the approved artifact hash recorded at the release-readiness GO gate, so the panel can assert prod == approved?
- Where is the architecture's Honesty-Gate Coverage Inventory of action paths, and which paths are flagged as the known P0 bypass surfaces (in-app message hooks, internal/agent-to-agent paths) that must be live-probed for production receipt coverage?
- Which production telemetry and ledger sources can the panel query directly (raw, not deployer-curated) from a clean vantage, and is read access provisioned for the non-author reviewers?
- Is the rollback path the panel will invoke on a FAIL verdict the same tested one-click rollback validated at release readiness, and who/what executes it on a 'rolled-back' verdict?
- Are there any business-context grey zones (client commitments, contractual SLAs) that should be surfaced as non-blocking human input rather than auto-failed at threshold?
Do (10)
- Run this brick as a review_loop placed immediately AFTER deploy_go_live and BEFORE hypercare — it is the live-verification step that closes the GO gate review_release_readiness_iterate left open.
- Staff the panel exclusively with reviewers who did NOT run the deploy: Verdict leads as the standing independent evaluator, with Proof, Cipher, and an adversarial red-team pass — independence is the source of the honesty guarantee.
- RE-DERIVE every claim from a clean vantage against the exact shipped SHA: re-run the independently-authored smoke/health, recompute metrics from raw telemetry, re-hash the production artifact — never accept the deployer's summary, dashboard, or 'it works'.
- Verify prod-served SHA == approved SHA and prod artifact-hash == approved hash as explicit binary criteria before anything else green can count.
- Recompute live error-rate, p95, 5xx, and guardrail-incident count over the named window and compare each against its NAMED auto-abort threshold — any breach forces 'rolled-back'.
- Perform the LIVE Honesty-Gate Coverage check: drive EVERY action path in the architecture Coverage Inventory in production and confirm each wrote a ledger receipt, explicitly probing the confessed P0 bypass surfaces (in-app hooks, internal/agent-to-agent paths), treating any uncovered path as a binary fail.
- Run the adversarial red-team pass to try to forge a green verdict (stale SHA, replayed metrics, staging-not-prod, bypass path) and require every attack to be detected and to force a fail.
- Iterate the GapLog to a binary, receipt-backed ConvergenceVerdict: 'go-live succeeded' only when open material gaps == 0 AND every non-author panel reviewer is SOLID; otherwise 'rolled-back' and invoke the tested rollback.
- Emit the verdict as an immutable ledger receipt that carries the re-derived evidence and explicitly marks the review_release_readiness_iterate GO gate CLOSED.
- Surface business-context grey zones as non-blocking human input and proceed to the binary verdict regardless — never wait on human sign-off.
Don't (8)
- Don't let anyone who ran the deploy author or sign the verification — a deployer verifying their own deploy is not independent verification.
- Don't read the deployer's post-deploy report, dashboard, or summary as evidence — if the panel didn't re-derive it, it doesn't count.
- Don't emit 'go-live succeeded' while ANY named threshold is breached, the prod SHA/hash differs from approved, or any Honesty-Gate Inventory path fires without a production receipt — a single binary failure forces 'rolled-back'.
- Don't treat the API-endpoint gate coverage as sufficient — the in-app and internal/agent-to-agent paths are the confessed P0 bypass and MUST be live-probed; skipping them is the failure mode this brick exists to catch.
- Don't verify against staging, a cached build, or a prior observation window — the check is against live production and the exact shipped SHA only.
- Don't insert a human approval gate or block the verdict awaiting sign-off — human input here is advisory grey-zone context only.
- Don't soften, average, or narrate a marginal result into a pass — the verdict is strictly binary and receipt-backed.
- Don't declare the GO gate closed by assertion — it is closed only by the emitted ledger receipt carrying the re-derived evidence.
Guardrails (7)
- Independence is structural: the issuing panel never includes anyone who ran the deploy; Verdict leads as the standing evaluator that never reviews what it authored.
- Honesty-by-re-derivation: the verdict is computed from the panel's own re-run smoke/health, recomputed raw-telemetry metrics, and re-hashed prod artifact — a false 'go-live succeeded' is structurally unsayable because the brick re-derives rather than reads.
- The verdict is strictly binary and receipt-backed: 'go-live succeeded' is emittable IFF re-derived smoke/health PASS AND prod SHA == approved SHA AND every recomputed metric <= its named auto-abort threshold AND prod artifact-hash == approved hash AND live Honesty-Gate coverage == full Inventory with zero bypassing paths AND all red-team attacks detected AND open material gaps == 0 AND every non-author panel reviewer is SOLID; otherwise 'rolled-back'.
- The LIVE Honesty-Gate Coverage check is mandatory and treats the confessed P0 (gate wired at one endpoint; in-app/internal paths bypass) as a binary live criterion — every Inventory action path must be proven receipt-covered in production, not asserted from the design.
- This brick CLOSES the review_release_readiness_iterate GO gate and only via the emitted ledger receipt; no calendar, no assertion, no deployer summary closes it.
- No human approval gate: this is a Verdict-led review_loop that iterates independently to a solid binary verdict; human input is non-blocking and confined to business-context grey zones.
- On a 'rolled-back' verdict the brick invokes the tested rollback path rather than redeploying or patching in place; data-never-to-model and least-privilege hold on every telemetry/ledger read the panel performs from its clean vantage.
Hypercare & Stabilization
For a defined, bounded post-launch window, the AI delivery team (Vector lead, on-call) runs heightened, receipt-backed operations to prove the live system is genuinely stable before handover — not merely asserted stable. Vector confirms a binary window-START readiness gate (monitoring timer active, alert routes test-fired to on-call, dashboards/logs queryable, synthetic/health checks green, AND — because the delivered product is itself agentic — the operate-phase eval/LLMOps harness wired: golden eval-set version pinned, a baseline eval run green, drift/output-quality monitors live — each with a receipt) before declaring the window open, then operates against a PRE-FROZEN SLO/SLI + error-budget contract (from the architecture/infra brick), a PRE-FROZEN success-metric definition (Vela's query + data source), AND a PRE-FROZEN agentic quality/honesty golden eval-set + pass thresholds — none of which may be redefined mid-window to manufacture a pass. Every incident is classified on a Sev-1..Sev-4 taxonomy with MTTA/MTTR targets, escalation, and a mandatory blameless postmortem (action items tracked to closure) for every Sev-1/Sev-2; an eval regression or honesty-golden-set failure or material output-quality/model drift is a first-class Sev-classified incident, not a footnote. Because the system being operated is agentic, this brick owns ongoing LLMOps as a first-class operate-phase concern: continuous evaluation of the live model/prompt configuration against a maintained eval set, regression detection on the agentic quality/honesty golden-set, drift and output-quality monitoring of model outputs, and eval-set maintenance — with EVERY such claim receipt-backed (eval run id, eval-set version/hash, pass count, regression count) and any regression raised through the SAME gated hotfix pipeline. This goes BEYOND the per-sprint honesty-regression seeds (which are point-in-time, build-gate checks) and beyond FinOps cost-trend tracking: it is sustained operate-phase model-quality assurance on production traffic. Hotfixes flow ONLY through the SAME gated pipeline (Mason build / Lens review / Proof test / Cipher AppSec / Vector deploy) and each ships a complete RECEIPT BUNDLE (reproduced failing test/eval → green, Lens approval id, Cipher scan, post-deploy health 200, post-fix eval re-run where AI behavior changed, ledger row, git hash); a rehearsed rollback/kill-switch path with a drill receipt governs roll-back-vs-forward-fix. Vela tracks adoption and the success metric vs target with a reproducible query receipt (query text + row count + timestamp). The brick produces period health reports where EVERY green/met claim carries a linked receipt and every miss carries a receipt + remediation owner + ETA, with honest commentary that explicitly names known platform gaps (e.g. Truth Gate wired at one endpoint, CI honesty evals non-blocking) where they bound what the reports can truthfully claim. This is a kind=work brick: it PRODUCES the receipt-linked evidence that the immediately-following independent Verdict-led "Hypercare Exit" review_loop and the final Handover brick consume — Vector does not self-certify exit.
Questions the agent asks (8)
- Sponsor: what is the single success metric for this system and its target value/band, and over how many consecutive days (Z) must it hold to count as stabilized?
- Sponsor: how long is the hypercare window expected to run, and what business events (e.g. a usage peak, a billing cycle) must fall inside it before we exit?
- Sponsor: what is the maximum acceptable user-facing downtime / Sev-1 blast radius during hypercare, and who on your side should be paged for a business-impacting Sev-1?
- Sponsor: are there adoption-friction points, AI-output quality concerns, or edge cases you are already worried about that we should instrument and watch (and seed into the eval-set) from day one?
- Sponsor: is the success-metric definition we froze in the spec still correct now that real users are on the system, or does it need correction (captured as an input, not an approval)?
- Internal (Proof/Keystone): is the agentic quality/honesty golden eval-set frozen with a version/hash, and are the pass/regression thresholds defined, BEFORE the window opens — or is that a gap to close before we declare the window open?
- Internal (Keystone/Vector): are the SLO/SLI targets and error budget actually frozen with a git hash in the architecture/infra artifact, or do we have a gap to close before the window opens?
- Internal (Proof/Cipher): what production model/prompt config (model id + prompt hash) is under eval in this window, and is the drift/output-quality monitor sampling real production traffic rather than a synthetic-only stream?
Do (13)
- Confirm and receipt the window-START readiness gate (monitoring active, alerts test-fired to on-call, dashboards/logs/health green, AND the operate-phase eval harness wired with a green baseline eval run + live drift monitor) BEFORE recording the window as open.
- Consume the PRE-FROZEN SLO/error-budget contract, PRE-FROZEN success-metric query, AND PRE-FROZEN agentic quality/honesty golden eval-set + pass thresholds by path + version/hash; report against them as given.
- Run ongoing operate-phase evals of the live model/prompt config against the maintained eval-set; record each run with an eval run id, eval-set version/hash, pass count, and regression count.
- Detect regressions by comparing every eval run to the frozen baseline; raise any quality/honesty golden-set regression or material model/output drift as a Sev-classified incident through the SAME gated hotfix pipeline.
- Maintain the eval-set additively: version and hash-stamp every change with rationale; add cases for newly-discovered edge cases and sponsor-surfaced quality complaints; never delete a case to mask a failure.
- Route every hotfix through the full Mason/Lens/Proof/Cipher/Vector gate; attach the complete per-fix receipt bundle, and where the fix changes AI behavior attach a post-fix eval re-run receipt proving zero golden-set regression.
- Classify every incident on the Sev taxonomy; write a blameless postmortem with closure-tracked action items for every Sev-1/Sev-2, eval-regression and drift incidents included.
- Make every adoption/success-metric number reproducible (query text + row count + timestamp) and every eval/drift number reproducible (eval run id + eval-set version + counts).
- Structure each period report so every green/met claim links to a receipt and every miss carries a receipt + owner + ETA, including the agentic eval/drift health section.
- State honestly where known platform gaps (e.g. Truth Gate single-endpoint, CI honesty evals non-blocking) limit what a report can truthfully claim.
- Capture sponsor-surfaced issues as continuous inputs; where the sponsor is silent past T, proceed on an explicit, flagged assumption recorded in the report.
- Rehearse rollback at least once and keep the kill-switch/trading-block posture verifiably intact during the window.
- Hand the exit-gate dossier to the independent Verdict-led Hypercare Exit review_loop and let it validate — do not declare exit here.
Don't (12)
- Don't self-certify hypercare exit or declare 'stable, proceed to handover' — that decision belongs to the independent Verdict-led review_loop.
- Don't ship any fix outside the gated pipeline, even for an emergency or an eval regression; an emergency is not a license to bypass review, test, AppSec, or the post-fix eval re-run.
- Don't redefine SLO targets, severity thresholds, the success metric, the golden eval-set, or its pass thresholds mid-window to manufacture a pass.
- Don't delete, weaken, or silently re-version eval-set cases to make a failing run pass; eval-set changes are additive, versioned, hash-stamped, and rationale-logged.
- Don't treat eval regressions or model/output drift as cosmetic; they are first-class Sev-classified incidents with postmortems like any other.
- Don't conflate operate-phase LLMOps with the per-sprint honesty-regression seeds or FinOps cost-trend tracking — this is sustained model-quality assurance on production traffic, beyond those point-in-time checks.
- Don't let new features or scope enter the emergency pipeline; stabilization and eval-regression fixes only.
- Don't report any 'uptime met', 'SLO green', 'adoption at X', or 'eval pass rate Y' claim without its linked receipt (health log, monitor record, query + row count + timestamp, or eval run id + eval-set version).
- Don't write spin: do not soften or omit misses, under-report incident severity, hide an open Sev-1, or bury an eval regression.
- Don't treat the sponsor input log as an approval gate or block work waiting on a sponsor reply.
- Don't run hotfixes without a rehearsed rollback path and a clear roll-back-vs-forward-fix rule.
- Don't exit on an arbitrary date; exit only when the binary gates (including the agentic eval/drift gate) are met and independently verified.
Guardrails (12)
- Honesty architecture: any 'met/green/done/live' claim lacking a linked receipt is unsayable and auto-rejected by the downstream review — same logic as an agent claiming an unperformed action; eval/drift claims must carry an eval run id + eval-set version; commentary must surface known coverage gaps rather than overstate.
- Agentic-system operate-phase rule: because the delivered product is itself agentic, ongoing model/prompt evaluation against a maintained golden-set, regression detection, and drift/output-quality monitoring are FIRST-CLASS operate-phase duties — not optional and not satisfied by build-time honesty seeds alone.
- Frozen-eval-set rule: the agentic quality/honesty golden eval-set and its pass/regression thresholds must pre-exist (cited by version/hash) before the window opens; they may not be authored or weakened after seeing production results, and eval-set edits are additive-only and hash-stamped.
- Separation of duties: Vector operates and produces evidence but does NOT certify exit; the operator is not the reviewer. Exit is validated by an independent panel (Verdict + Cipher + Proof + Vela) that did not run on-call.
- Same-gate rule: 100% of hotfixes — including eval-regression fixes — go through Mason/Lens/Proof/Cipher/Vector with a full receipt bundle (plus a post-fix eval re-run when AI behavior changes); the deploy log is cross-checked against bundles to prove zero bypass.
- No mid-window goalpost moves: no SLO, severity, success-metric, eval-set, or eval-pass-threshold target may be changed during the window to engineer a pass; any change requires the independent panel's logged sign-off.
- Eval-regression-as-incident: a quality/honesty golden-set regression or material model/output drift must be raised as a Sev-classified incident and resolved through the gated pipeline, with a postmortem if Sev-1/Sev-2 — it is never silently tolerated.
- Stabilization-only: new features/scope return to normal sprint planning or a new process run — the emergency pipeline is for stabilization and eval-regression fixes only.
- Frozen-contract rule: SLO/error-budget and success-metric definitions must pre-exist (cited by git hash) before the window opens; Vector/Vela may not author the targets after seeing the data.
- Readiness-before-open: 'heightened monitoring' AND the operate-phase eval harness (pinned golden-set, green baseline run, live drift monitor) must be proven by the window-START readiness receipts; an unproven monitoring or eval claim blocks declaring the window open.
- Human-as-input-not-gate: the sponsor provides information and corrections continuously and is never placed in an approval gate; silence past T yields an explicit flagged assumption, not a stall.
- Kill-switch/trading hard-block posture must remain intact and verifiable for the entire window.
Independent Review & Iterate: Hypercare Exit & BAU-Handover Readiness
This is the lifecycle's final brick, and it retires the two human gates that traditionally end a program: the self-declared "hypercare exit" and the hidden "handover acceptance" sign-off. It replaces both with a single AI-owned independent-review-and-iterate loop in which an independent panel — led by Verdict (independent evaluator) and joined by relevant specialists who did NOT author the stability story or the handover docs (Cipher for security evidence, Proof for test/quality evidence, Vector/Proof for runbook operability, Keystone for architecture completeness) plus an adversarial red-team pass — converts "exit/handover" from a ceremony into an evidence-converged verdict. Verdict does not read the hypercare team's dashboard or summaries; it independently RE-DERIVES every exit metric from the source systems (observability/SLO platform, incident tracker, value-realization/usage data, cost telemetry, security-scan and CI receipts) and FAILS any claim it cannot reproduce from source. Convergence is a strict four-part conjunction: (A) zero open material gaps under a binary materiality rubric, (B) every hypercare exit criterion source-reproduced green and sustained over a defined window N (not a single quiet day), (C) every BAU-handover artifact proven OPERABLE by an actual dry-run / game-day / restore drill (not asserted), and (D) reviewer independence attested. The human (sponsor) is strictly an input channel — proactively surfaced for the things only they can know (is real business value being realized vs. the captured baseline, are there real-world consequences the metrics missed, is the receiving BAU team real and named) — non-blocking, with an explicit flagged assumption recorded on silence; the human never approves. The loop iterates author→panel critique→gaps logged→team fixes→re-review and emits a single ledger-receipt verdict; it never auto-flips to "converged" while a material gap is open, and a persistent unfixable material gap is surfaced to the human as flagged non-blocking input AND recorded as an explicit accepted-risk in the verdict rather than silently reclassified.
Questions the agent asks (5)
- What is the sustained-stability window N (e.g., 7/14/30 days) over which exit criteria must hold, and what is the pre-agreed issue-inflow threshold below which inflow must stay across that window?
- Where are the authoritative SOURCE systems Verdict must re-derive each metric from — which SLO/observability platform, which incident tracker, which value-realization/usage store, which cost telemetry, and which CI/security-scan receipt store?
- Was a pre-go-live BASELINE captured for each success metric, and where is it stored — so every 'success metric met' claim can be anchored to the baseline it improved on?
- Is real business value being realized against that baseline, are there real-world consequences the dashboards may have missed, and is the receiving BAU team real, named, and ready to operate? (surfaced to the sponsor as input, non-blocking)
- What are the agreed RTO/RPO for the restore drill and the cost envelope for the run-cost criterion, and what event reopens the war room (the hypercare RE-ENTRY trigger) if BAU destabilizes post-handover?
Do (7)
- Re-derive EVERY exit metric independently from the source system (observability, incident tracker, value/usage data, cost telemetry, CI/security receipts) and FAIL any claim Verdict cannot reproduce from source — run this reproduction pass BEFORE classifying any gap's materiality
- Hold exit criteria over the full sustained window N, not a single quiet day, and treat zero open Sev-1/Sev-2, SLO burn within error budget, stable adoption, and in-envelope cost as binary gates
- Prove handover OPERABILITY by execution — run each runbook as a dry-run/game-day by an operator who did not write it, execute a restore drill within RTO/RPO, and execute a rollback — never accept a never-run document as a capability
- Run the adversarial premature-exit pass every iteration: check for incident suppression/misclassification, freeze-masquerading-as-stability, and subtly-wrong (non-erroring) outputs consuming error budget
- Attest reviewer independence explicitly (reviewer != author of the hypercare story and != author of the handover docs); exclude the war-room operators from certifying their own exit
- Proactively surface to the human the things only they can know (real value vs. baseline, real-world misses, the named receiving BAU team) as non-blocking input, and record verbatim either their answer or the flagged assumption taken on silence
- Make the verdict itself a ledger receipt: every criterion, source, reproduced value, operability result, gap lifecycle with timestamps, iteration count, independence attestation, and accepted-risk register — claim only what the receipt proves
Don't (7)
- Do NOT accept any receipt, SLO number, incident count, or success metric that Verdict cannot independently reproduce from the source system — a non-reproducible receipt is a bluff and a material gap, never a pass
- Do NOT reintroduce a human approval/sign-off anywhere — the human is input only; 'exit' and 'handover acceptance' are evidence-convergence verdicts, not a reviewer or a human feeling confident
- Do NOT silently reclassify an open material gap into 'known limitations' or 'next-release backlog' to force convergence — any such move must be a non-exit-blocker, surfaced on the human input channel, and recorded in the verdict
- Do NOT mark a runbook, restore, or rollback OPERABLE because it exists or 'should work' — only an executed, passing dry-run/game-day/restore/rollback counts
- Do NOT exit on a lucky quiet day, a change/deployment freeze, or by downgrading/misclassifying incidents to keep counts low — the sustained window and the red-team pass exist to catch exactly this
- Do NOT let the war-room operators (who authored the stability story) certify their own exit, and do NOT let Verdict claim independence it cannot attest
- Do NOT auto-flip the loop to 'converged' while any material gap, unreproduced metric, failed operability test, open red-team finding, or unattested independence remains
Guardrails (9)
- Convergence is a strict four-part conjunction enforced by the verdict receipt: open material gaps == 0 AND every exit criterion source-reproduced green over window N AND every handover artifact OPERABLE by executed test AND independence attested AND no open red-team finding — anything less stays OPEN or escalates as accepted-risk
- Binary materiality rubric (non-negotiable): any open Sev-1/Sev-2, any unmet/unreproduced SLO, any runbook/restore/rollback that fails or was never executed, any missing or unreproduced security/test evidence, and any unanchored (no-baseline) success claim are MATERIAL by definition and cannot be argued down
- Receipt provenance: the reviewer recomputes from raw source systems; a number read only from the team's dashboard/summary does not satisfy the criterion and is a material gap
- Independence is a precondition, not a formality: reviewer must be a non-author of the artifact being certified; if the only available reviewer is the author, that is itself a material gap
- No-silent-reclassification: a gap may move to known-limitations/backlog ONLY if it is not an exit blocker AND is explicitly surfaced to the human input channel AND recorded in the verdict — never silently
- Human stays strictly input/non-blocking with a flagged-assumption fallback on silence; the human never approves, signs off, or gates this brick
- Non-convergence semantics: a persistent unfixable material gap is surfaced to the human as flagged non-blocking input and recorded as an explicit accepted-risk in the verdict; the loop never auto-converges on an open material gap
- Handover scope must include a proven rollback and a defined hypercare RE-ENTRY trigger so BAU can both run and safely revert the system — exit is not treated as one-directional
- Honesty architecture: the verdict may claim only what a reproducible receipt proves; every 'met/operable/exited' assertion is backed by a source-reproduced receipt, an executed-test result, or a reviewer verdict — never asserted
Handover, Documentation & Continuous Improvement
Close the engagement without anyone "declaring victory": execute the closure as a work brick — assemble the complete handover package (architecture, executable runbooks, ops docs, test + security evidence, a structured known-limitations register, a prioritized next-release backlog, and a HANDOVER RECEIPT MANIFEST that maps every closure claim to a real receipt id), run a full end-to-end delivery retrospective that emits concrete template diffs (not a dead lessons-learned doc), transition the system from hypercare to a genuinely-owned BAU state, and re-open the continuous-discovery loop for the next release. Every "done / passed / secure / live" claim must be backed by an inspectable receipt (test run id + counts, security scan id + result, health-check timestamps + codes, git hash, runbook-execution transcript, prior independent-review verdict ids); an unbacked claim blocks closure exactly like the Truth Gate blocks an unbacked agent action. The human (sponsor) is an INFORMED RECIPIENT and a CONTINUOUS INPUT CHANNEL — they receive a walkthrough of the LIVE system (per the Discovery Step-23 precedent: walk the running platform, never a deck), and their questions / new asks flow into the next-release backlog as input; they do NOT sit in an approval gate and there is NO sign-off. Where the human is silent the team closes on an explicit FLAGGED assumption logged in the manifest rather than waiting. This work brick is immediately followed by the independent review_loop brick `handover_acceptance_review` (Verdict + Proof + Cipher + Vector, none of whom authored the package), so the final and most consequential claim — "the engagement is done" — is itself independently reviewed against the 7 objectives and the receipt manifest, never self-asserted.
Questions the agent asks (5)
- Who are the named post-engagement owners (on-call, escalation, cost/quota) for each running service, and is there any handover-of-ownership constraint we should encode in the BAU ownership map?
- Are there operational constraints (maintenance windows, change-freeze periods, compliance/audit retention) the runbooks and BAU transition must respect?
- What outcomes or open questions from this release matter most to you for the next release — so we seed the next-release backlog with your priorities, not just ours?
- Are there any limitations or risks you consider unacceptable to ship as 'known limitations' versus must-fix before BAU? (Input only — used to re-triage the register, not as a gate.)
- Is there a preferred format/cadence for ongoing visibility into the running system (dashboards, periodic summary) you want wired before we close hypercare?
Do (8)
- Back every closure claim with an inspectable receipt id and put it in the receipt manifest before declaring anything done; treat 'it works' as a non-receipt that blocks closure.
- Execute every runbook end-to-end in a clean context against the live/staging environment and capture commands + exit codes as the receipt — observed receipt beats reported claim.
- Hand over by walking the human through the LIVE running system (per Discovery Step-23), with the limitations register and next-release backlog open, and capture their questions as backlog input.
- Convert every material retro lesson into a committed template diff, and seed any incident from this run as a permanent honesty-eval/golden case (fails-before / passes-after).
- Structure the known-limitations register with severity + workaround + owner + target release per item, cross-linked to the backlog, and audit it for relabeled defects.
- State and evidence the binary hypercare-exit criteria; only flip to BAU when the BAU-readiness checklist is all-green with receipts.
- Where the human is silent, proceed and close on an explicit FLAGGED assumption logged in the manifest rather than waiting.
- Hand the assembled package straight to the independent review_loop (handover_acceptance_review) — do not treat this work brick's own assertion as closure.
Don't (8)
- Do NOT insert any human sign-off, approval gate, or 'client approves handover' step — the human is informed input and recipient, never a converger or blocker.
- Do NOT relabel a real defect or silent failure as a 'known limitation' to close out; mislabeled failures must move to backlog defects with a fail-receipt.
- Do NOT ship a runbook nobody executed, or claim a runbook 'works' without an execution transcript receipt.
- Do NOT declare 'done / passed / secure / live' without a linked, resolvable receipt id; no unbacked claim survives closure.
- Do NOT deliver the handover as a PowerPoint/deck-as-deliverable; the deck or DOCX/PDF is at most a by-product of the live system.
- Do NOT let the retrospective end as a dead lessons-learned doc with no committed template change.
- Do NOT close the engagement on the delivery team's self-assertion; closure is only valid after the independent acceptance review returns closure-solid.
- Do NOT block or fake closure on the absence of a human reply — convergence is the independent panel's evidence, human silence becomes a flagged assumption.
Guardrails (9)
- Truth-Gate parity: any closure claim without a linked receipt id is blocked from the manifest exactly as an unbacked agent action is blocked at the platform Truth Gate.
- Closure done-definition (single documented verdict): package manifest complete AND receipt manifest fully backed AND handover_acceptance_review = closure-solid AND BAU checklist all-green AND retro template-diffs committed AND next-release discovery loop opened — all six, with cited receipt/verdict ids.
- No-self-closure: this work brick is always immediately followed by the independent review_loop brick handover_acceptance_review (Verdict lead; panel Proof + Cipher + Vector; none authored the package); the engagement cannot be marked closed until that loop converges.
- Anti-rubber-stamp: a reviewer may mark an item 'verified' only by citing the specific receipt id inspected; a verdict with no cited receipt is itself unbacked and rejected.
- Re-execute high-risk receipts: the independent panel re-runs the test suite, re-triggers health checks, and re-executes at least one runbook rather than trusting transcripts — observed beats reported.
- Red-team pass is mandatory: actively hunt for one closure claim with no real receipt and one 'known limitation' that is actually a relabeled silent failure; any finding reopens the work brick.
- Human-as-input invariant: human questions become next-release backlog items; human silence becomes a flagged closure assumption; the human is never the converger and never a gate.
- Hypercare boundary is binary: BAU begins only when stated hypercare-exit criteria are each met with an evidence receipt; no transition on narrative alone.
- Permanent-lesson invariant: every incident from this run must become a committed golden/honesty-eval case (fails-before / passes-after) so the same defect cannot silently recur in the next run of this template.
Final INDEPENDENT handover-acceptance review AFTER handover: the lifecycle's most consequential claim — "the engagement is done" — is independently reviewed against the 7 objectives and the receipt manifest, never self-asserted by Cadence.
This is the lifecycle's terminal gate: the single most consequential claim — "the engagement is done" — must be independently proven, never self-asserted by the team that did the work. A non-author panel led by Verdict (who authored none of the handover package) re-derives every row of the HANDOVER RECEIPT MANIFEST from primary source: test-run ids and pass/fail counts re-read from the runner, security-scan ids and results re-pulled from the scanner, health-check codes and timestamps re-hit live, git hashes re-resolved, runbook-execution transcripts re-read, and the ids of every prior independent-review verdict (spec, architecture, each sprint, release-readiness, hypercare-exit) confirmed SOLID and unsuperseded. Each BAU runbook is proven OPERABLE by an actual dry-run/restore drill — executed, not asserted. The known-limitations register is checked for honesty and completeness against the GapLog and the live system. The receiving BAU team is confirmed real and named (human-input, non-blocking; silence becomes a flagged assumption, never a block). The brick converges to a binary, ledger-backed closure-solid | not-solid ConvergenceVerdict — solid iff open material gaps == 0 AND every non-author reviewer is SOLID. This is the brick whose SOLID verdict handover_continuous_improvement gates its own closure on. There is no human sign-off.
Questions the agent asks (7)
- Does EVERY row of the handover receipt manifest re-derive from primary source to the same value Cadence claimed — and which row, if any, fails to reproduce?
- Has every BAU runbook (deploy, rollback, restore, escalation, cost-breaker) been proven by an ACTUAL executed dry-run/restore drill that reached its documented success state — or is any merely asserted operable?
- Is the known-limitations register honest and complete against the full GapLog, the prior verdicts' deferred items, and the live system — any missing, misrepresented, or stale entry?
- Is every prior independent-review verdict (spec, architecture, each sprint, release-readiness, hypercare-exit) confirmed to exist, read SOLID, and remain unsuperseded by later changes?
- Did the adversarial red-team pass produce any unrefuted way to break the 'done' claim — circular evidence, stale receipts, happy-path-only drills, or invalidated prior verdicts?
- Is the receiving BAU team real and named — and if the human is silent, is a flagged assumption recorded so closure proceeds non-blocking?
- Are open material gaps exactly zero AND is every non-author reviewer SOLID — the two conditions the binary closure verdict turns on?
Do (8)
- Place this brick LAST in the lifecycle, immediately AFTER handover_continuous_improvement, and make its SOLID closure verdict the thing handover_continuous_improvement gates its own closure on
- Staff the panel exclusively with reviewers who authored NONE of the handover package — Verdict leads as standing independent evaluator, with Proof, Cipher, and Vector as non-author specialists plus an adversarial red-team pass
- RE-DERIVE every manifest row from primary source (test runner, scanner, live health endpoints, git, runbook transcripts, prior-verdict ledger) — never accept Cadence's summary or any author-supplied roll-up as evidence
- Prove each BAU runbook OPERABLE by executing a real dry-run/restore drill and capturing a timestamped transcript with exit/health codes — demonstrate, do not assert
- Confirm every prior independent-review verdict (spec, architecture, each sprint, release-readiness, hypercare-exit) exists, reads SOLID, and is unsuperseded before counting it toward closure
- Audit the known-limitations register against the full GapLog, prior deferred/accepted items, and live-system behavior for honesty AND completeness — catch both missing and stale entries
- Treat the receiving-BAU-team confirmation as human-input, NON-BLOCKING — on silence, record an explicit flagged assumption in the ledger and proceed
- Iterate the shared GapLog to convergence (file → re-derive/re-drill → re-verify) and emit a single binary, ledger-backed ConvergenceVerdict carrying the receipt id behind every condition
Don't (7)
- Do NOT let Cadence (or anyone who authored the handover package) self-assert closure or sit on the review panel — the most consequential claim cannot be marked done by its own author
- Do NOT accept any manifest row, test count, scan result, health code, or git hash on the author's word — an un-re-derived receipt is not evidence
- Do NOT count a BAU runbook as operable because it is documented or asserted — without an executed drill that reached its success state, it is a material gap
- Do NOT emit closure-solid while any material gap is open or any non-author reviewer is below SOLID — there is no partial or date-based closure
- Do NOT count a prior verdict toward closure if it is missing, not SOLID, or has been superseded by later changes — stale upstream SOLID is not SOLID
- Do NOT block the ConvergenceVerdict on the human BAU-team confirmation — silence becomes a flagged assumption, never a stop
- Do NOT insert any human sign-off / approval gate — closure is an independent, receipt-backed AI verdict, not a human signature
Guardrails (7)
- GATE: closure-solid IFF open material gaps == 0 AND every non-author panel reviewer (Verdict, Proof, Cipher, Vector) is SOLID AND every referenced prior independent-review verdict is SOLID-and-unsuperseded — otherwise not-solid and iterate
- INDEPENDENCE: the panel authored none of the handover package; Verdict never reviews what it wrote; closure cannot be self-asserted by Cadence or any author
- RE-DERIVATION IS LAW: every manifest row is regenerated from primary source with {claimed, re-derived, source-id, match}; the verdict cites the receipt id behind each condition — no author summary is ever evidence
- OPERABILITY IS DRILLED, NOT DECLARED: each BAU runbook carries a real dry-run/restore-drill transcript that reached its documented success state, or it is an open gap
- NON-BLOCKING HUMAN INPUT: the receiving-BAU-team confirmation is human-input and never blocks the verdict; on silence a flagged assumption is recorded and closure proceeds
- NO HUMAN SIGN-OFF: there is zero owner=Human-blocking approval gate; closure is a binary, ledger-backed, independently-reviewed AI ConvergenceVerdict
- TERMINAL POSITION: this brick is LAST, runs AFTER handover_continuous_improvement, and its SOLID verdict is the precondition that brick gates its own closure on
Generated from the live Flowtely process library · self-contained · AI-Led Agile Software Delivery