Commit graph

11 commits

Author SHA1 Message Date
Khalim Conn-Kowlessar
a7b08a4e8f refactor: move docs/sap-spec/ contents into domain/sap10_calculator/
Locality of reference — SAP-specific docs, specs, and runtime data
now live alongside the calculator that consumes them, mirroring the
prior packages→domain layout moves.

Move targets:

- Narrative MDs → domain/sap10_calculator/docs/
    NEXT_AGENT_PROMPT.md, HANDOVER_NEXT.md, SAP_CALCULATOR.md
- Spec PDFs → domain/sap10_calculator/docs/specs/
    RdSAP 10 Specification 10-06-2025.pdf
    PCDF_Spec_Rev-06b_12_May_2021.pdf
    sap-10-2-full-specification-2025-03-14.pdf
    sap-10-3-full-specification-2026-01-13.pdf
- PCDB runtime data → domain/sap10_calculator/tables/pcdb/data/
    pcdb10.dat (8.3MB) + 7× pcdb_table_*.jsonl (18MB total)

Path code rewrites (load-bearing):

- tables/pcdb/__init__.py: replaced parents[4]/'docs'/'sap-spec' with
  Path(__file__).resolve().parent/'data' for Table 105 JSONL loading.
- tables/pcdb/postcode_weather.py: same rebase for the pcdb10.dat path
  read by _postcode_climate_table().
- tables/pcdb/etl.py __main__: same rebase for the manual ETL invocation
  (source + output_dir both now point inside the package).
- tests/test_pcdb_etl.py: _PCDB_DAT_PATH now derives from
  parents[1]/'tables'/'pcdb'/'data' (was parents[3]/'docs'/'sap-spec').

Citation rewrites:

- 12 .py docstrings and 4 .md docs (ADRs + READMEs + narrative docs)
  had `docs/sap-spec/<file>` strings rewritten to their new locations.
- Two cases where the catch-all sed misfired (an ADR-0009 line about a
  PCDB extract; the pcdb __init__.py docstring about ETL output) were
  hand-corrected to point at tables/pcdb/data/ rather than docs/specs/.

docs/sap-spec/ is now empty (will be removed in a follow-up sweep or
left as a vestigial empty dir for future repurposing). ADRs 0009 and
0010 remain at docs/adr/ — they're part of the chronological
cross-cutting decision log, not calculator-specific narrative.

Verified:

- Calculator's 1e-4 production gate
  (test_api_001479_full_chain_sap_matches_worksheet_pdf_exactly) GREEN.
- Wider sweep (domain/sap10_calculator/ + domain/sap10_ml/): 1654
  passed / 20 failed — exact pre-move baseline. All 20 failures
  pre-existing (10 hand-built skeleton + 4 cohort chain + 6 cohort
  diff).
- Pyright net-zero on the 4 touched runtime/test files (0 errors)
  and unchanged on heat_transmission.py (13) / cert_to_inputs.py (35) /
  mapper.py (33).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 13:17:18 +00:00
Khalim Conn-Kowlessar
68401c517a refactor: lift-and-shift packages/domain/src/domain/ml → domain/sap10_ml
Sibling migration to the sap10_calculator move — `domain.ml` now lives
at the root-level layout (`domain/sap10_ml/`) matching the pattern
already used by `domain.addresses`, `domain.tasks`, `domain.postcode`,
and `domain.sap10_calculator`.

Changes:

- `git mv packages/domain/src/domain/ml → domain/sap10_ml` (19 files;
  history preserved).
- Subpackage rename: `domain.ml` → `domain.sap10_ml`. 32 references
  rewritten across .py and .md files: 11 internal + 21 external
  (datatypes/epc/domain/mapper.py, 14 files in domain/sap10_calculator,
  2 backend tests, 2 ADRs, 1 README, 1 design doc).
- Path-string updates: `pytest.ini` testpath
  `packages/domain/src/domain/ml/tests` → `domain/sap10_ml/tests` so
  ML tests stay in the default auto-discovered sweep. `CONTEXT.md`
  also updated.

`packages/domain/src/domain/` is now empty — the workspace `domain/`
tree has been fully migrated. Together with the `domain/__init__.py`
deletions from the sap10_calculator commit (29ac35cc), `domain` is
now a single root-level namespace package with subpackages
{addresses, sap10_calculator, sap10_ml, tasks} + the standalone
`postcode.py` module.

Verified:

- Focused sweep (backend mapper-chain + sap10_calculator worksheet
  e2e + golden fixtures): 99 passed / 19 failed — identical baseline.
- Wider sweep (all sap10_calculator + sap10_ml): 1654 passed / 20
  failed (same pre-existing failures).
- domain/sap10_ml/tests: 210/210 PASSED at new path.
- Pyright net-zero: heat_transmission.py 13, cert_to_inputs.py 35,
  mapper.py 33, rdsap_uvalues.py 1 (all unchanged from baseline).

Note: `packages/domain/pyproject.toml` still declares
`packages = ["src/domain"]` for the hatchling wheel — that target
directory is now empty and the wheel build is effectively a no-op.
Retiring the workspace package or repointing the wheel is a follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 13:01:35 +00:00
Khalim Conn-Kowlessar
29ac35ccbe refactor: lift-and-shift packages/domain/src/domain/sap → domain/sap10_calculator
Migration of the SAP 10.2 calculator package from the uv-workspace
src-layout (`packages/domain/src/domain/sap`) to the root-level layout
(`domain/sap10_calculator`), matching the pattern already used by
`domain.addresses` / `domain.tasks` / `domain.postcode`.

Changes:

- `git mv packages/domain/src/domain/sap → domain/sap10_calculator`
  (92 files; git auto-detected all as renames so blame/history is
  preserved).
- Subpackage rename: `domain.sap` → `domain.sap10_calculator`. 48
  Python files rewritten (`from domain.sap.X` → `from domain.sap10_
  calculator.X`); zero remaining `domain.sap` refs after the sed pass.
- Path-string updates: 3 .py files (test fixtures + xlsx loader) +
  6 markdown docs (CONTEXT.md, 2 ADRs, 3 sap-spec docs, sap10_
  calculator/README.md) had hard-coded `packages/domain/src/domain/
  sap/...` paths rewritten to `domain/sap10_calculator/...`.
- `Path(__file__).parents[N]` rebasing: the old tree was 3 levels
  deeper than the new one (`packages/domain/src/`), so 4× `parents[7]`
  became `parents[4]` and 1× `parents[6]` became `parents[3]` across
  `tables/pcdb/{__init__.py, postcode_weather.py, etl.py}`,
  `worksheet/tests/_xlsx_loader.py`, and `tests/test_pcdb_etl.py`.
- PEP 420 namespace package: deleted both `domain/__init__.py`
  (root + workspace, both load-bearing only as empty/docstring) so
  Python combines `domain.sap10_calculator` (root) and `domain.ml`
  (workspace) into one namespace package. Confirmed via
  `domain.__path__ == ['/workspaces/model/domain',
  '/workspaces/model/packages/domain/src/domain']`. Without this,
  the root `domain/__init__.py` shadowed the workspace one and
  `domain.ml` was unreachable.

Verified:

- Full sweep (`backend/documents_parser/tests/test_summary_pdf_
  mapper_chain.py + domain/sap10_calculator/worksheet/tests/test_
  e2e_elmhurst_sap_score.py + domain/sap10_calculator/rdsap/tests/
  test_golden_fixtures.py`): 99 passed / 19 failed — exact same
  counts as pre-refactor. All 19 failures pre-existing (9 hand-built
  001479 + 6 cohort diff + 4 cohort chain non-spec).
- Wider sweep (all sap10_calculator + domain.ml): 1654 passed /
  20 failed (the +1 vs the focused sweep is the pre-existing
  `test_roof_insulated_assumed_with_ni_thickness_uses_50mm_per_
  section_5_11_4` which was already failing on the previous baseline).
- Pyright net-zero on the three load-bearing baselines:
  `heat_transmission.py` 13, `cert_to_inputs.py` 35, `mapper.py` 33.

Lift-and-shift only — no semantic renames (`Sap10Calculator` stays
`Sap10Calculator`), no testpaths edits in pytest.ini (sap tests
continue to be invoked by explicit pytest paths).

Note: `domain.ml` still lives at `packages/domain/src/domain/ml/`.
Migrating it would close out the dual-`domain/` layout but is
out of scope for this commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 12:22:37 +00:00
Khalim Conn-Kowlessar
6c966ffe2b docs: handover for Table 3c two-profile combi loss → close 4 Elmhurst fixtures
Rewrites HANDOVER_NEXT.md for the next agent. Two-ticket sequence:

1. Table 3c (immediate): implement SAP10.2 Appendix J §J3 two-profile
   combi-loss formula + route PCDB records with separate_dhw_tests=2
   through it. Closes 000477/000480/000487/000516 from SAP delta
   +1/+12/+11/+12 to delta=0. Currently those fall through to Table 3a
   keep-hot 600 kWh/yr default = ~25× overshoot.
2. RdSAP API integration test (end-state): real RdSAP10 API response
   → EpcPropertyDataMapper → cert_to_inputs → SAP integer == lodged.
   User generating exotic fixtures to pressure-test first.

SPEC_COVERAGE §4 row updated to call out the Table 3c gap. ADR-0010
gains a "Cohort residual hunt + SAP 10.2 rating constants" amendment
documenting the 5 component closures (secondary heating, ventilation
cert lodgement, Table 4f pumps_fans, SAP 10.2 rating constants,
000477 partial) and naming the deferred Table 3c work.

Carries a PCDF parser concern: raw row at index 52 has 13.729 which
looks like F2-annual-kWh but parser reads F2 from fields[55] = 0.0.
Verify field positions per BRE PCDF Spec §7.11 before assuming F2=0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 12:14:00 +00:00
Khalim Conn-Kowlessar
fd9df9e502 Appendix L slice 3: docs — SPEC_COVERAGE rows + ADR-0010 amendment + heuristic deprecation note
SPEC_COVERAGE:
- §5 row: note new `annual_lighting_kwh` public leaf + InternalGainsResult
  field + per-fixture U985 (232) abs=1e-4 pin across all 6 Elmhurst fixtures.
- Appendix L row: "Full (cost + gains)" — closes both sides via the same
  L1-L11 cascade; legacy heuristic noted with rip-pending callsites.

ADR-0010 Amendment "Appendix L lighting (2026-05-22)":
- Two engine bugs surfaced + fixed: cosine modulation integral (uniform
  +0.146% bias from continuous-formula vs Σ(L11 monthly)) and cert EPC
  under-lodgement (`build_epc()` skipped bulb counts + windows).
- 000474 hits SAP integer delta=0 (first Elmhurst fixture across the gate).
- 000490 SAP integer + fuel cost xfailed (strict) — Appendix L direction
  correct, other components broken (fuel pricing, Table D1-3 Ecodesign,
  main heating +2.5%). Tracked as next ticket.
- Golden cohort PE tolerance widened 30→35 with rationale.
- Deferred work: cohort SAP-integer residual hunt, heuristic deletion,
  RdSAP→API integration test (end-state e2e harness).

`predicted_lighting_kwh` deprecation note: cite ADR-0010 amendment; name
the two legacy callsites (`domain.ml.ecf`, `domain.ml.transform`) that
block deletion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 09:34:09 +00:00
Khalim Conn-Kowlessar
ae8c946179 docs: §10a slice 3 — ADR-0010 amendment + SPEC_COVERAGE row
- ADR-0010 amendment: narrow the SAP10.2 spec target — §10a/§10b
  cost prices source from RdSAP10 Table 32 (per RdSAP10 §19.1),
  not SAP10.2 Table 12. CO2 + PEF stay on Table 12 (RdSAP10 §19.2
  says they're identical). Closes out the 000490 "spec-version
  drift" framing as wrong-table + missing-standing-charges, not
  corpus drift. Names §4 HW + Appendix L as the next-ticket
  upstream debt that pre-§10a wrong-prices had been masking.
- SPEC_COVERAGE: new §10a row (32-field FuelCostResult, three new
  tables/* + worksheet/* modules, per-line-ref status, Remaining
  §10a work list). Updates §12 to "folded into §10a". Updates
  header attribution.

No code changes in this commit — docs only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 20:11:37 +00:00
Khalim Conn-Kowlessar
bb9c5ac017 docs: ADR-0010 retargets calculator to SAP 10.2; rewrite handover
Adds ADR-0010 superseding ADR-0009's spec-version target, PCDB
sequencing, and cert-calibration layer. Captures the conclusions
of a grill-with-docs session:

  1. Active spec target is SAP 10.2 (14-03-2025), not SAP 10.3 — no
     SAP-10.3-lodged certs exist in the corpus to validate against.
  2. table_12_cert_calibration is deleted (not "re-derived at the
     end"). It was pre-March-2025 spec prices fit against a mixture
     distribution of two spec-version regimes, with downstream-
     component bugs absorbed into the fit — not Elmhurst deviation.
  3. Validation Cohort: filter the corpus to inspection_date ≥
     2025-07-01 so every cert in the probe was lodged on SAP 10.2
     (14-03-2025) prices. One spec, one signal.
  4. PCDB integration is promoted from "Session C deferred" to
     prerequisite P4 — dominates residual variance on heat pumps and
     the 78% of gas-boiler certs lodging main_heating_data_source=1.
  5. Trace mode (SapResult.intermediate) and BRE worked-example
     fixtures replace the 7 cert-based golden fixtures, which
     contained compensating errors.
  6. Strict-type EpcPropertyData via codes.csv-derived canonical
     enums (P6) — the in-source motivation lives at
     dimensions.py:74-82 (Khalim's comment, included in this commit).
  7. Worksheet-faithful structure is a sweep-time principle: each
     worksheet module mirrors SAP 10.2 worksheet line numbering.

CONTEXT.md additions:
  - Refined "Calculated SAP10 Performance" and "SAP10 Calculation"
    to reference SAP 10.2 + ADR-0010.
  - New term "SAP Spec Version" — domain-meaningful because the
    same EpcPropertyData yields different sap_score under different
    spec revisions.
  - New term "Validation Cohort" — the version-locked sub-corpus.

HANDOVER_SYSTEMATIC_REVIEW.md is rewritten section-by-section to
reflect ADR-0010: §1 framing, §2 status pointer, new §2.5 with the
six prerequisites P1–P6 in dependency order, §3 diagnosis (cert-cal
was stale prices, not Elmhurst deviation), §4 scope (PCDB IN,
SAP 10.3 stays OUT), §5 approach (worksheet-faithful principle as
§5.5), §7 tension dissolved, §7b findings re-framed, §8 dead-ends
re-classified as conditional, §9 cohort filter, §10 fixture
strategy, §11 trace mode as prerequisite, §12 prereqs-first,
§13 Phase 0/Phase 1 workflow, §14 ADR-0010 reference, §15 final
note.

P2.1 (commit ac1aa56a) already lands the first ADR-0010 slice
(probe swap to spec prices).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 09:54:24 +00:00
Khalim Conn-Kowlessar
8dbe873daf ADR-0009: pivot to deterministic SAP 10.3 calculator (Accepted)
Promotes ADR-0009 from Proposed to Accepted after the grill-with-docs
session resolved all seven open questions. Bundles the SAP 10.3 and
RdSAP 10 specifications under docs/sap-spec/ plus a calculator design
sketch (module layout, monthly-loop pseudo-code, status table).

CONTEXT.md adds three new domain terms parallel to existing performance
language:
  - Calculated SAP10 Performance (parallel to Effective / Lodged)
  - SAP10 Calculation (process; implemented by Sap10Calculator)
  - Measure Application (process; implemented by MeasureApplicator)

ML pipeline is NOT retired — it stays as the residual head once the
calculator reaches parity in Session B. ADR-0009 §"Grill outcomes" carries
the seven binding scope decisions plus three Session-A-scope changes
discovered during the grill (RdSAP §19 EER formula, SAP 10.2 Appendix A
cross-reference, RdSAP Table 29 cascade defaults).
2026-05-17 21:27:21 +00:00
Khalim Conn-Kowlessar
f61d74a327 docs: ADR-0008 physics-as-feature + v16.0.0 schema bump
Captures the slice-16 plan decisions before code lands:
- Mid-physics: predicted_ecf + predicted_log10_ecf, NOT predicted_sap_score
- Cost scope: heating + DHW + lighting (no PV/pumps/secondary)
- Crude annual heat-demand calc (HLC * HDH / efficiency)
- Cascade-defaulting U-value imputation
- envelope_heat_loss_w_per_k sums all parts; extension_1 only as discrete features (88% null drops extension_2)
- v16.0.0 MAJOR bump (rename secondary_dwelling_* -> extension_1_*); coordinated cutover with AutoGluon repo + scoring lambda
- LightGBM objective="mape" for sap_score+peui_ucl in 16g; sample weights deferred
2026-05-17 11:20:40 +00:00
Khalim Conn-Kowlessar
611ff24eb6 scaffolding for ml pipeline 2026-05-16 14:15:56 +00:00
Khalim Conn-Kowlessar
d9c1696085 added architechtural decisions, added to prd 2026-05-13 21:26:18 +00:00