Commit graph

10 commits

Author SHA1 Message Date
Khalim Conn-Kowlessar
460f17352a chore: stage cert 0330 fixtures (boiler pilot)
Adds the (API JSON + Summary PDF) fixtures for cert
0330-2249-8150-2326-4121 — the boiler pilot identified in the
handover. Property: 17 Summerfield Road, MANCHESTER M22 1AE
(mid-terrace house, mains gas boiler PCDB idx 10241, age D).

Source: API JSON fetched via EpcClientService from
https://api.get-energy-performance-data.communities.gov.uk
(OPEN_EPC_API_TOKEN). Summary PDF copied from
`sap worksheets/Additional data with api/0330-2249-8150-2326-4121/
Summary_000897.pdf` (where the user provided the triple).

Worksheet target: SAP 61.5993 (continuous), from `dr87-0001-000897
.pdf` in the same source directory.

Current state on these fixtures (uncommitted before this slice):
  - Summary mapper cascade SAP: 62.0660 (Δ +0.4667 vs worksheet)
  - API mapper cascade SAP:     63.7446 (Δ +2.1453 vs worksheet)

Both paths RED at 1e-4. Two specific cascade-component gaps
identified in the handover for follow-up slices:

  1. Windows HLC +6.71 W/K (API vs Summary) — likely glazing_type=14
     not in Slice 93's `_API_GLAZING_TYPE_TO_TRANSMISSION` (only
     codes 3 and 13 mapped).
  2. HW kWh +1060 (API 3172.65 vs Summary 2112.00) — §4 subsystem
     gap; needs occupancy/shower/cylinder probe.

This commit stages the fixtures only — no tests added yet. The
follow-up slice should add a RED Layer 2 test (Summary path 1e-4
vs 61.5993) and proceed slice-by-slice.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 17:37:14 +00:00
Khalim Conn-Kowlessar
c783a15ff1 docs: handover for per-cert mapper validation workflow
Rewrites the cert 001479 closure handover into a forward-looking
brief for the new workstream: validating the API EpcPropertyDataMapper
against 9 newly-staged (Summary + worksheet + API) cert triples.

Key contents:

- User's stated workflow (verbatim): Summary path proves itself
  against the worksheet → becomes canonical reference for API parity.
- Folder-structure changes since the prior handover were written
  (packages/domain/ removed; sap10_calculator + sap10_ml now at the
  repo root under a PEP 420 namespace; docs/sap-spec/ moved into
  domain/sap10_calculator/docs/; PCDB data into tables/pcdb/data/).
- New test data layout: `sap worksheets/Additional data with api/
  <cert-ref>/{Summary_NNNNNN.pdf, dr87-0001-NNNNNN.pdf}`.
- Cert reference table with heating type, PCDB index, worksheet SAP,
  TFA, bp count, dwelling type for all 9 triples.
- Major scope discovery: 7 of 9 are Air Source Heat Pumps (PCDB
  104568 / 102421). The mapper has never been validated against HPs;
  cert 0380 pilot showed catastrophic deltas (Summary -70 / API -18
  SAP vs worksheet). Recommended deferring HP certs until boiler
  workflow is proven.
- Cert 0330 (mid-terrace gas boiler) pilot status: fixtures staged
  uncommitted; Summary path +0.47 SAP, API path +2.15 SAP vs
  worksheet 61.5993. Cascade-component diff localises 2 specific
  gaps (windows HLC +6.71 W/K likely from glazing_type=14 missing
  from Slice 93's transmission map; HW kWh +1060 needs §4
  subsystem probe).
- Tooling shortcut: use OPEN_EPC_API_TOKEN (not EPC_AUTH_TOKEN) in
  backend/.env with EpcClientService._fetch_certificate(cert_ref)
  to fetch raw JSON.
- First actions for next agent: confirm baseline, commit cert 0330
  fixtures, add RED Layer 2 test, iterate.

Lesson preserved: cohort hand-builts encode non-spec quirks
(e.g. has_suspended_timber_floor=False to override §(12) spec
inference and match the non-spec worksheet). Cross-check against
spec-inferred mapper output before trusting hand-built fields.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 17:36:56 +00:00
Khalim Conn-Kowlessar
a7b08a4e8f refactor: move docs/sap-spec/ contents into domain/sap10_calculator/
Locality of reference — SAP-specific docs, specs, and runtime data
now live alongside the calculator that consumes them, mirroring the
prior packages→domain layout moves.

Move targets:

- Narrative MDs → domain/sap10_calculator/docs/
    NEXT_AGENT_PROMPT.md, HANDOVER_NEXT.md, SAP_CALCULATOR.md
- Spec PDFs → domain/sap10_calculator/docs/specs/
    RdSAP 10 Specification 10-06-2025.pdf
    PCDF_Spec_Rev-06b_12_May_2021.pdf
    sap-10-2-full-specification-2025-03-14.pdf
    sap-10-3-full-specification-2026-01-13.pdf
- PCDB runtime data → domain/sap10_calculator/tables/pcdb/data/
    pcdb10.dat (8.3MB) + 7× pcdb_table_*.jsonl (18MB total)

Path code rewrites (load-bearing):

- tables/pcdb/__init__.py: replaced parents[4]/'docs'/'sap-spec' with
  Path(__file__).resolve().parent/'data' for Table 105 JSONL loading.
- tables/pcdb/postcode_weather.py: same rebase for the pcdb10.dat path
  read by _postcode_climate_table().
- tables/pcdb/etl.py __main__: same rebase for the manual ETL invocation
  (source + output_dir both now point inside the package).
- tests/test_pcdb_etl.py: _PCDB_DAT_PATH now derives from
  parents[1]/'tables'/'pcdb'/'data' (was parents[3]/'docs'/'sap-spec').

Citation rewrites:

- 12 .py docstrings and 4 .md docs (ADRs + READMEs + narrative docs)
  had `docs/sap-spec/<file>` strings rewritten to their new locations.
- Two cases where the catch-all sed misfired (an ADR-0009 line about a
  PCDB extract; the pcdb __init__.py docstring about ETL output) were
  hand-corrected to point at tables/pcdb/data/ rather than docs/specs/.

docs/sap-spec/ is now empty (will be removed in a follow-up sweep or
left as a vestigial empty dir for future repurposing). ADRs 0009 and
0010 remain at docs/adr/ — they're part of the chronological
cross-cutting decision log, not calculator-specific narrative.

Verified:

- Calculator's 1e-4 production gate
  (test_api_001479_full_chain_sap_matches_worksheet_pdf_exactly) GREEN.
- Wider sweep (domain/sap10_calculator/ + domain/sap10_ml/): 1654
  passed / 20 failed — exact pre-move baseline. All 20 failures
  pre-existing (10 hand-built skeleton + 4 cohort chain + 6 cohort
  diff).
- Pyright net-zero on the 4 touched runtime/test files (0 errors)
  and unchanged on heat_transmission.py (13) / cert_to_inputs.py (35) /
  mapper.py (33).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 13:17:18 +00:00
Khalim Conn-Kowlessar
68401c517a refactor: lift-and-shift packages/domain/src/domain/ml → domain/sap10_ml
Sibling migration to the sap10_calculator move — `domain.ml` now lives
at the root-level layout (`domain/sap10_ml/`) matching the pattern
already used by `domain.addresses`, `domain.tasks`, `domain.postcode`,
and `domain.sap10_calculator`.

Changes:

- `git mv packages/domain/src/domain/ml → domain/sap10_ml` (19 files;
  history preserved).
- Subpackage rename: `domain.ml` → `domain.sap10_ml`. 32 references
  rewritten across .py and .md files: 11 internal + 21 external
  (datatypes/epc/domain/mapper.py, 14 files in domain/sap10_calculator,
  2 backend tests, 2 ADRs, 1 README, 1 design doc).
- Path-string updates: `pytest.ini` testpath
  `packages/domain/src/domain/ml/tests` → `domain/sap10_ml/tests` so
  ML tests stay in the default auto-discovered sweep. `CONTEXT.md`
  also updated.

`packages/domain/src/domain/` is now empty — the workspace `domain/`
tree has been fully migrated. Together with the `domain/__init__.py`
deletions from the sap10_calculator commit (29ac35cc), `domain` is
now a single root-level namespace package with subpackages
{addresses, sap10_calculator, sap10_ml, tasks} + the standalone
`postcode.py` module.

Verified:

- Focused sweep (backend mapper-chain + sap10_calculator worksheet
  e2e + golden fixtures): 99 passed / 19 failed — identical baseline.
- Wider sweep (all sap10_calculator + sap10_ml): 1654 passed / 20
  failed (same pre-existing failures).
- domain/sap10_ml/tests: 210/210 PASSED at new path.
- Pyright net-zero: heat_transmission.py 13, cert_to_inputs.py 35,
  mapper.py 33, rdsap_uvalues.py 1 (all unchanged from baseline).

Note: `packages/domain/pyproject.toml` still declares
`packages = ["src/domain"]` for the hatchling wheel — that target
directory is now empty and the wheel build is effectively a no-op.
Retiring the workspace package or repointing the wheel is a follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 13:01:35 +00:00
Khalim Conn-Kowlessar
29ac35ccbe refactor: lift-and-shift packages/domain/src/domain/sap → domain/sap10_calculator
Migration of the SAP 10.2 calculator package from the uv-workspace
src-layout (`packages/domain/src/domain/sap`) to the root-level layout
(`domain/sap10_calculator`), matching the pattern already used by
`domain.addresses` / `domain.tasks` / `domain.postcode`.

Changes:

- `git mv packages/domain/src/domain/sap → domain/sap10_calculator`
  (92 files; git auto-detected all as renames so blame/history is
  preserved).
- Subpackage rename: `domain.sap` → `domain.sap10_calculator`. 48
  Python files rewritten (`from domain.sap.X` → `from domain.sap10_
  calculator.X`); zero remaining `domain.sap` refs after the sed pass.
- Path-string updates: 3 .py files (test fixtures + xlsx loader) +
  6 markdown docs (CONTEXT.md, 2 ADRs, 3 sap-spec docs, sap10_
  calculator/README.md) had hard-coded `packages/domain/src/domain/
  sap/...` paths rewritten to `domain/sap10_calculator/...`.
- `Path(__file__).parents[N]` rebasing: the old tree was 3 levels
  deeper than the new one (`packages/domain/src/`), so 4× `parents[7]`
  became `parents[4]` and 1× `parents[6]` became `parents[3]` across
  `tables/pcdb/{__init__.py, postcode_weather.py, etl.py}`,
  `worksheet/tests/_xlsx_loader.py`, and `tests/test_pcdb_etl.py`.
- PEP 420 namespace package: deleted both `domain/__init__.py`
  (root + workspace, both load-bearing only as empty/docstring) so
  Python combines `domain.sap10_calculator` (root) and `domain.ml`
  (workspace) into one namespace package. Confirmed via
  `domain.__path__ == ['/workspaces/model/domain',
  '/workspaces/model/packages/domain/src/domain']`. Without this,
  the root `domain/__init__.py` shadowed the workspace one and
  `domain.ml` was unreachable.

Verified:

- Full sweep (`backend/documents_parser/tests/test_summary_pdf_
  mapper_chain.py + domain/sap10_calculator/worksheet/tests/test_
  e2e_elmhurst_sap_score.py + domain/sap10_calculator/rdsap/tests/
  test_golden_fixtures.py`): 99 passed / 19 failed — exact same
  counts as pre-refactor. All 19 failures pre-existing (9 hand-built
  001479 + 6 cohort diff + 4 cohort chain non-spec).
- Wider sweep (all sap10_calculator + domain.ml): 1654 passed /
  20 failed (the +1 vs the focused sweep is the pre-existing
  `test_roof_insulated_assumed_with_ni_thickness_uses_50mm_per_
  section_5_11_4` which was already failing on the previous baseline).
- Pyright net-zero on the three load-bearing baselines:
  `heat_transmission.py` 13, `cert_to_inputs.py` 35, `mapper.py` 33.

Lift-and-shift only — no semantic renames (`Sap10Calculator` stays
`Sap10Calculator`), no testpaths edits in pytest.ini (sap tests
continue to be invoked by explicit pytest paths).

Note: `domain.ml` still lives at `packages/domain/src/domain/ml/`.
Migrating it would close out the dual-`domain/` layout but is
out of scope for this commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 12:22:37 +00:00
Jun-te Kim
d0cf3d14ad get rid of comments 2026-05-20 13:21:11 +00:00
Jun-te Kim
8bb90a5aa5 sanitisation of postcode 2026-05-20 12:57:03 +00:00
Jun-te Kim
914a8ed51e postcode splliter working e2e 2026-05-20 11:07:40 +00:00
Jun-te Kim
6198d7a46d postcode_splitter: pure domain (UserAddress, sanitise_postcode, postcode_batching)
Slice 1/6 of the postcode_splitter refactor (Hestia-Homes/Model#1100).
Introduces the pure-domain foundation under domain/, with no AWS, Postgres,
or pandas. UserAddress is a frozen dataclass that sanitises its postcode in
__post_init__ via the canonical sanitise_postcode helper, and
iter_postcode_grouped_batches preserves the legacy splitter's batching
invariants (group-by-postcode in insertion order, never split a group,
oversize single-postcode groups dispatched whole, final flush). Updates
UBIQUITOUS_LANGUAGE.md so the User Address term covers both the dataclass
sense (preferred in domain code) and the raw upstream-string sense.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 16:45:47 +00:00
Jun-te Kim
54a674b5c8 added postcode splitter rewrite to ddd 2026-05-19 16:35:09 +00:00