From 2d6b078bd81efec58db03d98b3e581b93d1e7c9e Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Fri, 26 Jun 2026 19:57:57 +0000 Subject: [PATCH] =?UTF-8?q?feat(audit):=20self-improvement=20loop=20in=20t?= =?UTF-8?q?he=20skill=20+=20provenance=20convention=20=F0=9F=9F=A9?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add Phase 6 (self-improve) to audit-ara-portfolio: when a run confirms a novel systematic problem, codify it as a check — gated on systematic (>=5 props, root-caused), not-already-covered, and /grill-me-pressure-tested. Each check records provenance (motivating cause + example properties) so the registry stays sharp and compounds every run. Co-Authored-By: Claude Opus 4.8 (1M context) --- .claude/skills/audit-ara-portfolio/SKILL.md | 38 ++++++++++++++++++--- scripts/audit/anomalies.py | 7 ++++ 2 files changed, 41 insertions(+), 4 deletions(-) diff --git a/.claude/skills/audit-ara-portfolio/SKILL.md b/.claude/skills/audit-ara-portfolio/SKILL.md index dd6c1824..6621a9f3 100644 --- a/.claude/skills/audit-ara-portfolio/SKILL.md +++ b/.claude/skills/audit-ara-portfolio/SKILL.md @@ -76,10 +76,40 @@ A triage report. Per check group: - existing-PR/ADR coverage, - recommended action: **fix** / **tune threshold** / **accept (expected)** / **new ticket**. -Then propose any **new deterministic checks** worth adding to -`scripts/audit/anomalies.py` (one decorated function each) so the next run catches -the pattern automatically — the check registry is the durable output of every -audit. +Then run Phase 6 to make the audit permanently better. + +## Phase 6 — Self-improve (the compounding loop) + +When the review confirms a **novel, systematic** problem, codify it so every +future run catches it automatically. This is what makes the audit get better each +time it runs. Apply the gates below — they keep the registry sharp, not noisy. + +**Gates (all must hold before adding a check):** +1. **Systematic** — reproduced on **≥ 5** properties and root-caused, not a + one-off. (A single weird property is a ticket, not a check.) +2. **Not already covered** — no existing check fires on it, and no open/merged PR + or ADR already addresses the cause (you checked in Phase 5). +3. **Pressure-tested** — for any non-trivial check (a threshold, a heuristic), + run `/grill-me` on the proposed check first: what's the false-positive rate on + this portfolio? is the threshold defensible against the real distribution? does + it overlap an existing check? Tune from the answers before committing. + +**What to change, smallest first:** +- **A check** — add one decorated `(PropertyAudit) -> Optional[str]` function to + `scripts/audit/anomalies.py`. Its docstring MUST record **provenance**: the + motivating property ids and the one-line root cause, so the check is traceable + and re-verifiable later. If it needs a field not on `PropertyAudit`, extend the + bundle + query. +- **The skill** — if the review revealed a new *expectation* (a pattern that is + expected-not-a-bug, or a new deep-dive technique), add it to this file's Notes + / phases so the next reviewer starts ahead. +- **Docs** — if the cause is a load-bearing modelling decision, an ADR may be + warranted (rare; only when hard-to-reverse + surprising + a real trade-off). + +Commit each codified check on its own with the motivating run referenced, then +**re-run Phase 1** to confirm the new check fires on the cases that motivated it +and nothing else surprising. The check registry — with provenance — is the +durable, compounding output of every audit. ## Notes diff --git a/scripts/audit/anomalies.py b/scripts/audit/anomalies.py index 48f16875..7b1ac62e 100644 --- a/scripts/audit/anomalies.py +++ b/scripts/audit/anomalies.py @@ -16,6 +16,13 @@ runner discovers it, runs it over every Property, and reports the reasons. Keep each check small and single-purpose; lean on the shared `PropertyAudit` bundle rather than re-querying. +This registry is meant to **compound**: each audit that confirms a new +systematic problem should leave behind a check (see the `audit-ara-portfolio` +skill's self-improve phase). So every check's docstring records its +**provenance** — the motivating cause and example properties — so a future reader +can re-verify it and judge whether it still earns its place. A threshold should +be justified against the real distribution, not guessed. + Read-only: this script never writes to the DB. """