From 2d6b078bd81efec58db03d98b3e581b93d1e7c9e Mon Sep 17 00:00:00 2001
From: Khalim Conn-Kowlessar <kconnkowlessar@gmail.com>
Date: Fri, 26 Jun 2026 19:57:57 +0000
Subject: [PATCH] =?UTF-8?q?feat(audit):=20self-improvement=20loop=20in=20t?=
 =?UTF-8?q?he=20skill=20+=20provenance=20convention=20=F0=9F=9F=A9?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add Phase 6 (self-improve) to audit-ara-portfolio: when a run confirms a
novel systematic problem, codify it as a check — gated on systematic (>=5
props, root-caused), not-already-covered, and /grill-me-pressure-tested.
Each check records provenance (motivating cause + example properties) so the
registry stays sharp and compounds every run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .claude/skills/audit-ara-portfolio/SKILL.md | 38 ++++++++++++++++++---
 scripts/audit/anomalies.py                  |  7 ++++
 2 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/.claude/skills/audit-ara-portfolio/SKILL.md b/.claude/skills/audit-ara-portfolio/SKILL.md
index dd6c1824..6621a9f3 100644
--- a/.claude/skills/audit-ara-portfolio/SKILL.md
+++ b/.claude/skills/audit-ara-portfolio/SKILL.md
@@ -76,10 +76,40 @@ A triage report. Per check group:
 - existing-PR/ADR coverage,
 - recommended action: **fix** / **tune threshold** / **accept (expected)** / **new ticket**.
 
-Then propose any **new deterministic checks** worth adding to
-`scripts/audit/anomalies.py` (one decorated function each) so the next run catches
-the pattern automatically — the check registry is the durable output of every
-audit.
+Then run Phase 6 to make the audit permanently better.
+
+## Phase 6 — Self-improve (the compounding loop)
+
+When the review confirms a **novel, systematic** problem, codify it so every
+future run catches it automatically. This is what makes the audit get better each
+time it runs. Apply the gates below — they keep the registry sharp, not noisy.
+
+**Gates (all must hold before adding a check):**
+1. **Systematic** — reproduced on **≥ 5** properties and root-caused, not a
+   one-off. (A single weird property is a ticket, not a check.)
+2. **Not already covered** — no existing check fires on it, and no open/merged PR
+   or ADR already addresses the cause (you checked in Phase 5).
+3. **Pressure-tested** — for any non-trivial check (a threshold, a heuristic),
+   run `/grill-me` on the proposed check first: what's the false-positive rate on
+   this portfolio? is the threshold defensible against the real distribution? does
+   it overlap an existing check? Tune from the answers before committing.
+
+**What to change, smallest first:**
+- **A check** — add one decorated `(PropertyAudit) -> Optional[str]` function to
+  `scripts/audit/anomalies.py`. Its docstring MUST record **provenance**: the
+  motivating property ids and the one-line root cause, so the check is traceable
+  and re-verifiable later. If it needs a field not on `PropertyAudit`, extend the
+  bundle + query.
+- **The skill** — if the review revealed a new *expectation* (a pattern that is
+  expected-not-a-bug, or a new deep-dive technique), add it to this file's Notes
+  / phases so the next reviewer starts ahead.
+- **Docs** — if the cause is a load-bearing modelling decision, an ADR may be
+  warranted (rare; only when hard-to-reverse + surprising + a real trade-off).
+
+Commit each codified check on its own with the motivating run referenced, then
+**re-run Phase 1** to confirm the new check fires on the cases that motivated it
+and nothing else surprising. The check registry — with provenance — is the
+durable, compounding output of every audit.
 
 ## Notes
 
diff --git a/scripts/audit/anomalies.py b/scripts/audit/anomalies.py
index 48f16875..7b1ac62e 100644
--- a/scripts/audit/anomalies.py
+++ b/scripts/audit/anomalies.py
@@ -16,6 +16,13 @@ runner discovers it, runs it over every Property, and reports the reasons. Keep
 each check small and single-purpose; lean on the shared `PropertyAudit` bundle
 rather than re-querying.
 
+This registry is meant to **compound**: each audit that confirms a new
+systematic problem should leave behind a check (see the `audit-ara-portfolio`
+skill's self-improve phase). So every check's docstring records its
+**provenance** — the motivating cause and example properties — so a future reader
+can re-verify it and judge whether it still earns its place. A threshold should
+be justified against the real distribution, not guessed.
+
 Read-only: this script never writes to the DB.
 """