mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-30 13:10:47 +00:00
feat(audit): self-improvement loop in the skill + provenance convention 🟩
Add Phase 6 (self-improve) to audit-ara-portfolio: when a run confirms a novel systematic problem, codify it as a check — gated on systematic (>=5 props, root-caused), not-already-covered, and /grill-me-pressure-tested. Each check records provenance (motivating cause + example properties) so the registry stays sharp and compounds every run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
e3c9107313
commit
2d6b078bd8
2 changed files with 41 additions and 4 deletions
|
|
@ -76,10 +76,40 @@ A triage report. Per check group:
|
|||
- existing-PR/ADR coverage,
|
||||
- recommended action: **fix** / **tune threshold** / **accept (expected)** / **new ticket**.
|
||||
|
||||
Then propose any **new deterministic checks** worth adding to
|
||||
`scripts/audit/anomalies.py` (one decorated function each) so the next run catches
|
||||
the pattern automatically — the check registry is the durable output of every
|
||||
audit.
|
||||
Then run Phase 6 to make the audit permanently better.
|
||||
|
||||
## Phase 6 — Self-improve (the compounding loop)
|
||||
|
||||
When the review confirms a **novel, systematic** problem, codify it so every
|
||||
future run catches it automatically. This is what makes the audit get better each
|
||||
time it runs. Apply the gates below — they keep the registry sharp, not noisy.
|
||||
|
||||
**Gates (all must hold before adding a check):**
|
||||
1. **Systematic** — reproduced on **≥ 5** properties and root-caused, not a
|
||||
one-off. (A single weird property is a ticket, not a check.)
|
||||
2. **Not already covered** — no existing check fires on it, and no open/merged PR
|
||||
or ADR already addresses the cause (you checked in Phase 5).
|
||||
3. **Pressure-tested** — for any non-trivial check (a threshold, a heuristic),
|
||||
run `/grill-me` on the proposed check first: what's the false-positive rate on
|
||||
this portfolio? is the threshold defensible against the real distribution? does
|
||||
it overlap an existing check? Tune from the answers before committing.
|
||||
|
||||
**What to change, smallest first:**
|
||||
- **A check** — add one decorated `(PropertyAudit) -> Optional[str]` function to
|
||||
`scripts/audit/anomalies.py`. Its docstring MUST record **provenance**: the
|
||||
motivating property ids and the one-line root cause, so the check is traceable
|
||||
and re-verifiable later. If it needs a field not on `PropertyAudit`, extend the
|
||||
bundle + query.
|
||||
- **The skill** — if the review revealed a new *expectation* (a pattern that is
|
||||
expected-not-a-bug, or a new deep-dive technique), add it to this file's Notes
|
||||
/ phases so the next reviewer starts ahead.
|
||||
- **Docs** — if the cause is a load-bearing modelling decision, an ADR may be
|
||||
warranted (rare; only when hard-to-reverse + surprising + a real trade-off).
|
||||
|
||||
Commit each codified check on its own with the motivating run referenced, then
|
||||
**re-run Phase 1** to confirm the new check fires on the cases that motivated it
|
||||
and nothing else surprising. The check registry — with provenance — is the
|
||||
durable, compounding output of every audit.
|
||||
|
||||
## Notes
|
||||
|
||||
|
|
|
|||
|
|
@ -16,6 +16,13 @@ runner discovers it, runs it over every Property, and reports the reasons. Keep
|
|||
each check small and single-purpose; lean on the shared `PropertyAudit` bundle
|
||||
rather than re-querying.
|
||||
|
||||
This registry is meant to **compound**: each audit that confirms a new
|
||||
systematic problem should leave behind a check (see the `audit-ara-portfolio`
|
||||
skill's self-improve phase). So every check's docstring records its
|
||||
**provenance** — the motivating cause and example properties — so a future reader
|
||||
can re-verify it and judge whether it still earns its place. A threshold should
|
||||
be justified against the real distribution, not guessed.
|
||||
|
||||
Read-only: this script never writes to the DB.
|
||||
"""
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue