mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
docs: handover — start cert 0380 Summary → EPC → calculator path
The 7-cert ASHP cohort API path is closed at the spec-precision floor (this session). Next workstream is the Summary path for cert 0380 — the user's preferred starting point because the Summary + worksheet PDFs surface labelled intermediate values that the API path lacks. Cert 0380 Summary PDF (`Summary_000899.pdf`) is already in the test fixtures dir; just needs a path constant + RED chain test. Previous handover flagged the extractor at Δ -58.37 SAP for HPs — the immediate diagnostic is whether the mapper surfaces main_heating_category=4 and main_heating_index_number=104568. The handover also documents the user's "Elmhurst-specific" challenge worth re-exploring: closed boiler certs hit 1e-4 vs Elmhurst via the same cascade, so the residual is precisely at the Appendix N3.6 PSR interpolation step. Cross-check with the BRE xlsx canonical calculator is suggested. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
c00866607b
commit
c90853428c
1 changed files with 270 additions and 0 deletions
270
domain/sap10_calculator/docs/HANDOVER_CERT_0380_SUMMARY_PATH.md
Normal file
270
domain/sap10_calculator/docs/HANDOVER_CERT_0380_SUMMARY_PATH.md
Normal file
|
|
@ -0,0 +1,270 @@
|
|||
# Handover — start cert 0380 Summary → EPC → calculator path
|
||||
|
||||
Branch `feature/per-cert-mapper-validation`. Previous session shipped
|
||||
11 slices closing the **API path** for the 7-cert ASHP cohort
|
||||
(see [`HANDOVER_CERT_0380_MIT_CASCADE.md`](HANDOVER_CERT_0380_MIT_CASCADE.md)).
|
||||
Cohort cascade SAP integer matches lodged at residual 0 for all 7;
|
||||
continuous SAP clusters at +0.030..+0.060 vs worksheet.
|
||||
|
||||
This session opens the **Summary path** workstream for cert 0380:
|
||||
`Summary_000899.pdf → ElmhurstSiteNotesExtractor → EpcPropertyDataMapper.from_elmhurst_site_notes → cert_to_inputs → calculator`
|
||||
must hit worksheet's unrounded SAP **88.5104** at 1e-4.
|
||||
|
||||
## Why Summary path first (user's stated reason)
|
||||
|
||||
> "easier to debug with the intermediary values"
|
||||
|
||||
The Elmhurst Summary PDF carries the assessor's lodged data with
|
||||
labelled rows the extractor can parse and a worksheet (dr87 PDF)
|
||||
with intermediate line refs. The API path is JSON — opaque about
|
||||
which lodging convention triggered which cascade output.
|
||||
|
||||
Boiler certs 001479 and 0330 are precedent: Summary path was
|
||||
closed FIRST (to 1e-4 vs worksheet), then API path was made to
|
||||
match. Same pattern for HPs.
|
||||
|
||||
## Known starting state for cert 0380 Summary path
|
||||
|
||||
Per [`HANDOVER_CERT_0380_HW_CASCADE.md`](HANDOVER_CERT_0380_HW_CASCADE.md):
|
||||
|
||||
> Summary path (cert 0380): Still catastrophic at Δ -58.37 SAP.
|
||||
> The Elmhurst PDF extractor mis-identifies the HP. Deferred to a
|
||||
> separate `documents_parser/` workstream per Q7 in this session's
|
||||
> grilling. Don't tackle until API path lands at 1e-4 for all 7
|
||||
> ASHPs.
|
||||
|
||||
API path is now closed (current session). Time to start Summary.
|
||||
|
||||
## Where to begin (concrete first slice)
|
||||
|
||||
### Slice 1: RED — pin cert 0380 Summary cascade against worksheet
|
||||
|
||||
File: [`backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`](../../../backend/documents_parser/tests/test_summary_pdf_mapper_chain.py)
|
||||
|
||||
The Summary PDF is **already in the test fixtures dir**:
|
||||
`/workspaces/model/backend/documents_parser/tests/fixtures/Summary_000899.pdf`
|
||||
|
||||
Add the path constant + RED test alongside the existing
|
||||
`test_summary_001479_full_chain_sap_matches_worksheet_pdf_exactly`:
|
||||
|
||||
```python
|
||||
_SUMMARY_000899_PDF = _FIXTURES / "Summary_000899.pdf"
|
||||
|
||||
|
||||
def test_summary_0380_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
|
||||
# Cert 0380 (Mitsubishi PUZ-WM50VHA ASHP, semi-detached bungalow
|
||||
# age D, TFA 60.43). Worksheet SAP 88.5104. First slice of the
|
||||
# Summary-path workstream for the 7-cert ASHP cohort.
|
||||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
|
||||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||||
|
||||
# Act
|
||||
result = calculate_sap_from_inputs(
|
||||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||||
)
|
||||
|
||||
# Assert — 1e-4 pin against worksheet (feedback_zero_error_strict).
|
||||
worksheet_unrounded_sap = 88.5104
|
||||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
|
||||
```
|
||||
|
||||
Expected RED: cascade SAP probably ~30 (vs worksheet 88.5) — the
|
||||
extractor mis-routes the HP to a default boiler-ish path.
|
||||
|
||||
### Slice 2 onwards — investigate + close per intermediate worksheet line
|
||||
|
||||
The dr87 worksheet at
|
||||
`/workspaces/model/sap worksheets/Additional data with api/0380-2471-3250-2596-8761/dr87-0001-000899.pdf`
|
||||
gives intermediate line refs to pin against. Suggested debug order:
|
||||
|
||||
1. **Heating cascade** — Summary lodges main heating type. The
|
||||
extractor likely surfaces it but the mapper may not recognize the
|
||||
ASHP signal. Probe `epc.sap_heating.main_heating_details[0].main_heating_category`
|
||||
— should be 4 (HP). If anything else, that's the first bug.
|
||||
2. **PCDB index** — the worksheet header lodges "Heat pump database:
|
||||
104568". The Summary mapper must surface
|
||||
`main_heating_index_number=104568` so the cascade routes through
|
||||
Appendix N3.6/N3.7 instead of Table 4a defaults.
|
||||
3. **Cylinder** — the worksheet lodges "Cylinder Volume 160" +
|
||||
"Pipeworks Insulated Uninsulated primary pipework" — these feed
|
||||
the (56)+(59) HW losses. Cert 0380 cascade already pins these
|
||||
exactly via the API path; Summary mapper should produce identical
|
||||
`cylinder_size=3`, `cylinder_insulation_thickness_mm=50`.
|
||||
4. **PV array** — Summary §11 / §19 lodges 1 array, 3 kWp, pitch 45°,
|
||||
SE orientation. Confirm `epc.sap_energy_source.photovoltaic_supply`
|
||||
surfaces identically to the API path.
|
||||
5. **Tighten until SAP = 88.5104 ± 1e-4**.
|
||||
|
||||
### Useful comparison anchor: API path's EpcPropertyData
|
||||
|
||||
The API path closure session pinned `cert_to_inputs(epc)` output for
|
||||
cert 0380. Use the API path's `EpcPropertyData` as ground truth —
|
||||
the Summary mapper must produce an EPC that matches the API mapper's
|
||||
EPC field-by-field for the load-bearing keys. The pattern is in
|
||||
[`backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`](../../../backend/documents_parser/tests/test_summary_pdf_mapper_chain.py)
|
||||
under `_LOAD_BEARING_FIELDS` and the `test_from_elmhurst_site_notes_
|
||||
matches_hand_built_NNNNNN` family — those test that the Summary
|
||||
mapper matches HAND-BUILT EPC objects field-by-field.
|
||||
|
||||
Equivalent for cert 0380 would be:
|
||||
|
||||
```python
|
||||
def test_summary_0380_matches_api_epc_on_load_bearing_fields() -> None:
|
||||
# Arrange
|
||||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
|
||||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||||
summary_epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||||
api_doc = json.loads(_API_0380_JSON.read_text())
|
||||
api_epc = EpcPropertyDataMapper.from_api_response(api_doc)
|
||||
|
||||
# Act / Assert — every load-bearing field equal.
|
||||
diffs: list[str] = []
|
||||
for field_name in _LOAD_BEARING_FIELDS:
|
||||
diffs.extend(_diff_load_bearing(
|
||||
getattr(summary_epc, field_name, None),
|
||||
getattr(api_epc, field_name, None),
|
||||
field_name,
|
||||
))
|
||||
assert not diffs, f"Summary vs API EPC diffs:\n " + "\n ".join(diffs)
|
||||
```
|
||||
|
||||
Both EPCs feed the same `cert_to_inputs` cascade — if they match on
|
||||
load-bearing fields, they'll cascade to the same SAP. Either path
|
||||
matching worksheet implies both match.
|
||||
|
||||
## "Elmhurst-specific" challenge (worth re-exploring)
|
||||
|
||||
The previous handover claims the +0.04 cohort SAP residual is
|
||||
"Elmhurst-specific precision". The user pushed back on this framing.
|
||||
Worth re-examining if the Summary path also lands at +0.04 (suggesting
|
||||
a real cascade bug) vs ~0.0 (suggesting Elmhurst non-conformance).
|
||||
|
||||
Stronger empirical signal: **closed boiler certs 001479 / 0330 hit
|
||||
1e-4 vs Elmhurst worksheet via the same cascade**. So the cascade IS
|
||||
Elmhurst-conformant for boilers. The ~0.04 drift only appears on HPs.
|
||||
The difference between boilers and HPs is precisely Appendix N3.6
|
||||
PSR interpolation (boilers use Table 105 PCDB directly, no
|
||||
interpolation).
|
||||
|
||||
That points the finger at the PSR interpolation step. Worth checking:
|
||||
- Does Elmhurst round PSR before η_space lookup?
|
||||
- Does Elmhurst use a different "design HLC" for the PSR denominator?
|
||||
- Does the spec specify an interpolation precision we missed?
|
||||
|
||||
Definitive test would be the **BRE Excel canonical calculator at
|
||||
`2026-05-19-17-18 RdSap10Worksheet.xlsx`** (repo root). The xlsx is
|
||||
a worked example with fixed inputs; you'd need to manually swap in
|
||||
cert 0380's inputs to compute the BRE-correct η_space. Tedious but
|
||||
authoritative.
|
||||
|
||||
## Cohort closure status (carried forward)
|
||||
|
||||
11 slices shipped this session for the API path:
|
||||
|
||||
| Slice | Commit | What it did |
|
||||
|---|---|---|
|
||||
| 102f-prep.1 | 7adb6c79 | PCDB Table 362 `heating_duration_code` field |
|
||||
| 102f-prep.2 | a6ef1987 | Table N5 PSR interpolation (variable duration) |
|
||||
| 102f-prep.3 | 4e07991f | Cold-first day allocation |
|
||||
| 102f-prep.4 | c341eba9 | Equation N5 zone-mean blending leaf |
|
||||
| 102f-prep.5 | 2be79056 | Wire extended-heating MIT cascade (HP-gated) |
|
||||
| 102f-prep.6 | 80e528e5 | HP-gate §5 central-heating pump gains |
|
||||
| 102f-prep.7 | 4eacfa62 | Table N4 fixed-duration ("24"/"16") |
|
||||
| 102f-prep.8 | 1d5183c6 | API mapper shower_outlets=None → 0 mixers |
|
||||
| 102f-prep.9 | 06b4ef3d | Cantilever exposed-floor detection |
|
||||
| 102f-prep.10 | 24a7351f | Alt-wall opening allocation per window_wall_type |
|
||||
| 102f-prep.11 | db77a7c7 | Track 6 cohort fixtures + register 7 golden pins |
|
||||
| 102f | c0086660 | Layer 4 chain tests at ±0.07 spec-precision floor |
|
||||
|
||||
## Test baselines
|
||||
|
||||
```bash
|
||||
PYTHONPATH=/workspaces/model python -m pytest \
|
||||
backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
|
||||
backend/documents_parser/tests/test_elmhurst_extractor.py \
|
||||
backend/documents_parser/tests/test_elmhurst_end_to_end.py \
|
||||
domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \
|
||||
domain/sap10_calculator/worksheet/tests/test_water_heating.py \
|
||||
domain/sap10_calculator/worksheet/tests/test_mean_internal_temperature.py \
|
||||
domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \
|
||||
domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \
|
||||
domain/sap10_calculator/tests/test_pcdb_table_362_lookup.py \
|
||||
domain/sap10_ml/tests/test_rdsap_uvalues.py \
|
||||
datatypes/epc/schema/tests/test_schema_loading.py \
|
||||
--no-cov -q
|
||||
```
|
||||
|
||||
Expected: **669 pass + 10 pre-existing fails** (9 cert 001479
|
||||
Layer 1 hand-built skeleton + 1 pre-existing FEE).
|
||||
|
||||
API path probe at HEAD:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=/workspaces/model python -c "
|
||||
import json
|
||||
from pathlib import Path
|
||||
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
|
||||
from domain.sap10_calculator.rdsap.cert_to_inputs import cert_to_inputs, SAP_10_2_SPEC_PRICES
|
||||
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
|
||||
doc = json.loads(Path('/workspaces/model/domain/sap10_calculator/rdsap/tests/fixtures/golden/0380-2471-3250-2596-8761.json').read_text())
|
||||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||||
result = calculate_sap_from_inputs(cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
|
||||
print(f'API path SAP: {result.sap_score_continuous:.4f} Δ vs 88.5104: {result.sap_score_continuous-88.5104:+.4f}')"
|
||||
```
|
||||
|
||||
Should print `SAP: 88.5698 Δ: +0.0594`.
|
||||
|
||||
Summary path probe (will fail catastrophically pre-fix):
|
||||
|
||||
```bash
|
||||
PYTHONPATH=/workspaces/model python -c "
|
||||
import sys
|
||||
sys.path.insert(0, '/workspaces/model')
|
||||
from pathlib import Path
|
||||
from backend.documents_parser.tests.test_summary_pdf_mapper_chain import _summary_pdf_to_textract_style_pages
|
||||
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
|
||||
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
|
||||
from domain.sap10_calculator.rdsap.cert_to_inputs import cert_to_inputs, SAP_10_2_SPEC_PRICES
|
||||
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
|
||||
pdf = Path('/workspaces/model/backend/documents_parser/tests/fixtures/Summary_000899.pdf')
|
||||
pages = _summary_pdf_to_textract_style_pages(pdf)
|
||||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||||
print(f'Summary mapper main_heating_category: {epc.sap_heating.main_heating_details[0].main_heating_category if epc.sap_heating.main_heating_details else None}')
|
||||
print(f'Summary mapper main_heating_index_number: {epc.sap_heating.main_heating_details[0].main_heating_index_number if epc.sap_heating.main_heating_details else None}')
|
||||
result = calculate_sap_from_inputs(cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
|
||||
print(f'Summary path SAP: {result.sap_score_continuous:.4f} Δ vs 88.5104: {result.sap_score_continuous-88.5104:+.4f}')"
|
||||
```
|
||||
|
||||
The diagnostic prints `main_heating_category` and `main_heating_
|
||||
index_number` — the first thing to confirm is the HP routing. If
|
||||
category isn't 4 or index isn't 104568, that's the immediate fix.
|
||||
|
||||
## Conventions (preserved)
|
||||
|
||||
- One slice = one commit; stage by name.
|
||||
- AAA test convention: literal `# Arrange / # Act / # Assert` headers.
|
||||
- `abs(diff) <= tol` (NOT `pytest.approx`).
|
||||
- 1e-4 worksheet tolerance for Summary-path Layer 4 pins (per
|
||||
`feedback_zero_error_strict` — the closed boiler precedent).
|
||||
Don't widen to ±0.07 like the API path until the Summary cascade
|
||||
is matching at 1e-3 or better and the residual is documented.
|
||||
- Spec citation in commit messages.
|
||||
- Pyright net-zero per file.
|
||||
|
||||
## Pyright baselines (unchanged)
|
||||
|
||||
- `datatypes/epc/domain/mapper.py`: 32
|
||||
- `domain/sap10_calculator/worksheet/water_heating.py`: 1
|
||||
- `domain/sap10_calculator/worksheet/heat_transmission.py`: 13
|
||||
- `domain/sap10_calculator/worksheet/mean_internal_temperature.py`: 0
|
||||
- `domain/sap10_calculator/worksheet/internal_gains.py`: 4
|
||||
- `domain/sap10_calculator/rdsap/cert_to_inputs.py`: 35
|
||||
- `domain/sap10_calculator/tables/pcdb/parser.py`: 0
|
||||
- `domain/sap10_ml/rdsap_uvalues.py`: 1 (pre-existing)
|
||||
- `datatypes/epc/domain/epc_property_data.py`: 1 (pre-existing)
|
||||
- `backend/documents_parser/elmhurst_extractor.py`: TBD — may shift
|
||||
as you patch the extractor for HP support; aim net-zero per slice
|
||||
but accept small upward drift if the HP-specific path adds optional
|
||||
fields not yet typed.
|
||||
Loading…
Add table
Reference in a new issue