chore(epc-prediction): grow validation corpus to 150 postcodes

Bumps N_POSTCODES 40 -> 150 for the fetch script. Larger corpus (150
postcodes / 3719 certs) reduces leave-one-out variance and unblocks the
recency-template work (#1223), which regressed the noisier 36-target gate
fixture. Corpus itself stays out of git (gitignored /tmp + persistent
backup at /workspaces/home/epc_prediction_corpus_backup).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-06-15 06:42:19 +00:00
parent 718455e971
commit 6e9f831296

View file

@ -62,7 +62,7 @@ CACHE.mkdir(parents=True, exist_ok=True)
WINDOW = {"date_start": "2026-01-01", "date_end": "2026-05-31"}
TOTAL_PAGES = 7402
SEED_PAGES = 20 # random search pages → postcode seeds
N_POSTCODES = 40 # distinct postcodes to pull full cohorts for
N_POSTCODES = 150 # distinct postcodes to pull full cohorts for
random.seed(2026) # reproducible draw