feat(audit): self-improvement loop in the skill + provenance convention 🟩

Add Phase 6 (self-improve) to audit-ara-portfolio: when a run confirms a
novel systematic problem, codify it as a check — gated on systematic (>=5
props, root-caused), not-already-covered, and /grill-me-pressure-tested.
Each check records provenance (motivating cause + example properties) so the
registry stays sharp and compounds every run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-06-26 19:57:57 +00:00
parent e3c9107313
commit 2d6b078bd8
2 changed files with 41 additions and 4 deletions

View file

@ -76,10 +76,40 @@ A triage report. Per check group:
- existing-PR/ADR coverage,
- recommended action: **fix** / **tune threshold** / **accept (expected)** / **new ticket**.
Then propose any **new deterministic checks** worth adding to
`scripts/audit/anomalies.py` (one decorated function each) so the next run catches
the pattern automatically — the check registry is the durable output of every
audit.
Then run Phase 6 to make the audit permanently better.
## Phase 6 — Self-improve (the compounding loop)
When the review confirms a **novel, systematic** problem, codify it so every
future run catches it automatically. This is what makes the audit get better each
time it runs. Apply the gates below — they keep the registry sharp, not noisy.
**Gates (all must hold before adding a check):**
1. **Systematic** — reproduced on **≥ 5** properties and root-caused, not a
one-off. (A single weird property is a ticket, not a check.)
2. **Not already covered** — no existing check fires on it, and no open/merged PR
or ADR already addresses the cause (you checked in Phase 5).
3. **Pressure-tested** — for any non-trivial check (a threshold, a heuristic),
run `/grill-me` on the proposed check first: what's the false-positive rate on
this portfolio? is the threshold defensible against the real distribution? does
it overlap an existing check? Tune from the answers before committing.
**What to change, smallest first:**
- **A check** — add one decorated `(PropertyAudit) -> Optional[str]` function to
`scripts/audit/anomalies.py`. Its docstring MUST record **provenance**: the
motivating property ids and the one-line root cause, so the check is traceable
and re-verifiable later. If it needs a field not on `PropertyAudit`, extend the
bundle + query.
- **The skill** — if the review revealed a new *expectation* (a pattern that is
expected-not-a-bug, or a new deep-dive technique), add it to this file's Notes
/ phases so the next reviewer starts ahead.
- **Docs** — if the cause is a load-bearing modelling decision, an ADR may be
warranted (rare; only when hard-to-reverse + surprising + a real trade-off).
Commit each codified check on its own with the motivating run referenced, then
**re-run Phase 1** to confirm the new check fires on the cases that motivated it
and nothing else surprising. The check registry — with provenance — is the
durable, compounding output of every audit.
## Notes

View file

@ -16,6 +16,13 @@ runner discovers it, runs it over every Property, and reports the reasons. Keep
each check small and single-purpose; lean on the shared `PropertyAudit` bundle
rather than re-querying.
This registry is meant to **compound**: each audit that confirms a new
systematic problem should leave behind a check (see the `audit-ara-portfolio`
skill's self-improve phase). So every check's docstring records its
**provenance** the motivating cause and example properties so a future reader
can re-verify it and judge whether it still earns its place. A threshold should
be justified against the real distribution, not guessed.
Read-only: this script never writes to the DB.
"""