feat(audit): self-improvement loop in the skill + provenance convention 🟩

Add Phase 6 (self-improve) to audit-ara-portfolio: when a run confirms a novel systematic problem, codify it as a check — gated on systematic (>=5 props, root-caused), not-already-covered, and /grill-me-pressure-tested. Each check records provenance (motivating cause + example properties) so the registry stays sharp and compounds every run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 13:10:47 +00:00 · 2026-06-26 19:57:57 +00:00 · 2026-06-26 19:57:57 +00:00 · 2d6b078bd8
commit 2d6b078bd8
parent e3c9107313
2 changed files with 41 additions and 4 deletions
--- a/.claude/skills/audit-ara-portfolio/SKILL.md
+++ b/.claude/skills/audit-ara-portfolio/SKILL.md
@ -76,10 +76,40 @@ A triage report. Per check group:
 - existing-PR/ADR coverage,
 - recommended action: **fix** / **tune threshold** / **accept (expected)** / **new ticket**.

-Then propose any **new deterministic checks** worth adding to
-`scripts/audit/anomalies.py` (one decorated function each) so the next run catches
-the pattern automatically — the check registry is the durable output of every
-audit.
+Then run Phase 6 to make the audit permanently better.
+
+## Phase 6 — Self-improve (the compounding loop)
+
+When the review confirms a **novel, systematic** problem, codify it so every
+future run catches it automatically. This is what makes the audit get better each
+time it runs. Apply the gates below — they keep the registry sharp, not noisy.
+
+**Gates (all must hold before adding a check):**
+1. **Systematic** — reproduced on **≥ 5** properties and root-caused, not a
+   one-off. (A single weird property is a ticket, not a check.)
+2. **Not already covered** — no existing check fires on it, and no open/merged PR
+   or ADR already addresses the cause (you checked in Phase 5).
+3. **Pressure-tested** — for any non-trivial check (a threshold, a heuristic),
+   run `/grill-me` on the proposed check first: what's the false-positive rate on
+   this portfolio? is the threshold defensible against the real distribution? does
+   it overlap an existing check? Tune from the answers before committing.
+
+**What to change, smallest first:**
+- **A check** — add one decorated `(PropertyAudit) -> Optional[str]` function to
+  `scripts/audit/anomalies.py`. Its docstring MUST record **provenance**: the
+  motivating property ids and the one-line root cause, so the check is traceable
+  and re-verifiable later. If it needs a field not on `PropertyAudit`, extend the
+  bundle + query.
+- **The skill** — if the review revealed a new *expectation* (a pattern that is
+  expected-not-a-bug, or a new deep-dive technique), add it to this file's Notes
+  / phases so the next reviewer starts ahead.
+- **Docs** — if the cause is a load-bearing modelling decision, an ADR may be
+  warranted (rare; only when hard-to-reverse + surprising + a real trade-off).
+
+Commit each codified check on its own with the motivating run referenced, then
+**re-run Phase 1** to confirm the new check fires on the cases that motivated it
+and nothing else surprising. The check registry — with provenance — is the
+durable, compounding output of every audit.

 ## Notes

--- a/scripts/audit/anomalies.py
+++ b/scripts/audit/anomalies.py
@ -16,6 +16,13 @@ runner discovers it, runs it over every Property, and reports the reasons. Keep
 each check small and single-purpose; lean on the shared `PropertyAudit` bundle
 rather than re-querying.

+This registry is meant to **compound**: each audit that confirms a new
+systematic problem should leave behind a check (see the `audit-ara-portfolio`
+skill's self-improve phase). So every check's docstring records its
+**provenance** — the motivating cause and example properties — so a future reader
+can re-verify it and judge whether it still earns its place. A threshold should
+be justified against the real distribution, not guessed.
+
 Read-only: this script never writes to the DB.
 """