docs: handover — Summary + API cohort expansion to 38 additional certs

Hands off the next workstream: the 38 cert subdirs at
`sap worksheets/additional with api 2/`. Each subdir is named after
the 20-digit EPC cert reference and contains a Summary PDF + dr87
worksheet PDF. API JSONs are NOT in the dataset but ARE fetchable
via the existing `EpcClientService` (token in `backend/.env` as
`OPEN_EPC_API_TOKEN`).

User's stated ordering: Elmhurst Summary mapping FIRST, API path
SECOND. Folder names = cert refs; need to verify the matching before
bulk-pinning (any mis-filed PDF would silently invalidate slice
work).

Handover ships with verified dataset and first-attempt baselines:

  - Folder-vs-cert sweep: **38/38 match** at handover (postcode
    parity check between Summary PDF and Open EPC API).
  - First-attempt Summary-path probe across 38 certs:
      24  closed at ±0.07 (first-try, zero new slices needed)
       9 ~ small gap (<1 SAP) — likely 1 slice each
       3 ✗ big gap (>1 SAP) — multi-slice investigation
       2 RAISES UnmappedElmhurstLabel: cylinder_size='Normal'

The two `Normal` cylinder raises are the immediate Phase 1 slice —
Slice S0380.15's strict-enum pattern paid off on its first new
cohort by surfacing the gap at extraction time instead of as a
downstream SAP delta.

Workstream phases documented in the handover:

  Phase 0: folder-vs-cert sweep (already done — 38/38)
  Phase 1: fix 'Normal' cylinder unmapped-label raise
  Phase 2: bulk-pin the 24 first-try-closures as chain tests
  Phase 3: close the 9 small-gap certs one slice each
  Phase 4: investigate the 3 big-gap certs (likely HP-routing)
  Phase 5: fetch + persist API JSON for all 38, run API path tests
  Phase 6: cross-mapper EPC parity (Summary EPC ≡ API EPC) — the
    user's stated north-star

Includes:
  - Paste-able diagnostic probe scripts (Summary path + folder-vs-
    cert sweep + .env loader + EpcClientService usage example).
  - Full table of first-attempt deltas per cert with classifications.
  - All 15 prior-session slice commits indexed.
  - Memory references to the slicing / methodology conventions.
  - Per-cert diagnostic recipe template.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-05-27 22:22:13 +00:00
parent d7ca179ec0
commit 92fc4f4f16

View file

@ -0,0 +1,448 @@
# Handover — Summary + API cohort expansion to 38 additional certs
Branch `feature/per-cert-mapper-validation`. Previous session shipped 15 slices
(S0380.1 → S0380.15) closing the 7-cert ASHP cohort Summary path at the ±0.07
Appendix N3.6 PSR-precision floor and establishing the strict-enum pattern.
This handover opens the **38-cert cohort expansion** workstream.
**HEAD at handover start:** `d7ca179e` (Slice S0380.15: strict-enum raising
on unmapped cylinder labels).
## User's stated goal (preserved verbatim)
> Awesome - could you write a handover for a new agent to pick this up.
> I've added some more test cases, in the same format, in here:
> `sap worksheets/additional with api 2`
> We should check that the Elmhurst mapping works and then the api
> the folder name is the certificate number. We can use the EPC api to get
> the api responses. We should check I've matched correctly. The api token
> is in backend/.env and is OPEN_EPC_API_TOKEN
**Ordering:** Elmhurst Summary mapping FIRST (Summary PDFs + dr87 worksheets
ship in each folder), API path SECOND (fetched live via `EpcClientService`).
Along the way: **verify the folder name actually matches the cert** (it does
for the 5 spot-checks I ran — postcode parity — but the full 38 needs a
sweep before mapping work compounds errors on a mis-filed cert).
## The new dataset
`/workspaces/model/sap worksheets/additional with api 2/` — 38 cert subdirs.
Each subdir is named after the **20-digit EPC certificate reference** (e.g.
`0036-6325-1100-0063-1226`) and contains:
- `Summary_NNNNNN.pdf` — Elmhurst Summary PDF (drives the Summary path)
- `dr87-0001-NNNNNN.pdf` — dr87 worksheet PDF (spec anchor; lodges
`SAP value` + every cascade line ref)
The 6-digit suffix is the Elmhurst worksheet number, NOT the cert ref.
**Folder-name verification — full 38-cert sweep at handover time: 38/38 ✅**
All postcode-extracted-from-Summary-PDF values match the Open EPC API
postcode for the folder-name cert reference. Dataset is clean.
(Caveat: the sweep iterator picked up a `.DS_Store` macOS metadata file.
Skip non-directory entries in your iterators: `for cd in sorted(src.iterdir()) if cd.is_dir() and not cd.name.startswith('.')`.)
## First-attempt Summary-path probe (run at HEAD `d7ca179e`)
24 of 38 certs (63%) close first-try at ±0.07 — strong validation that the
ASHP-cohort mapper work amortizes. Distribution:
| Status | Count | Disposition |
|---|---|---|
| ✅ Closed at ±0.07 | **24** | Add chain tests; zero new slices needed |
| ~ Small gap (<1 SAP) | 9 | 12 slices each, similar to certs 0350 / 2225 |
| ✗ Big gap (>1 SAP) | 3 | Multi-slice investigation per cert |
| RAISES UnmappedElmhurstLabel | **2** | First strict-enum catches — fix immediately |
### Detailed first-attempt Summary deltas
```
cert WS SAP Summary delta result
0036-6325-1100-0063-1226 62.7471 62.3734 -0.3737 ~ small
0100-5141-0522-4696-3463 85.8332 85.8668 +0.0336 ✅
0200-3155-0122-2602-3563 80.8674 80.8674 -0.0000 ✅
0300-2403-2650-2206-0235 76.6541 76.6541 +0.0000 ✅
0310-2763-5450-2506-3501 78.3593 77.6061 -0.7532 ~ small
0320-2126-2150-2326-6161 71.7224 71.7224 +0.0000 ✅
0320-2756-8640-2296-1101 89.9458 89.9879 +0.0421 ✅
0330-2257-3640-2196-3145 84.6541 84.6966 +0.0425 ✅
0360-2266-5650-2106-8285 80.4680 80.4680 +0.0000 ✅
0380-2530-6150-2326-4161 65.7795 65.7795 +0.0000 ✅
0390-2066-4250-2026-4555 65.3253 64.9942 -0.3311 ~ small
0464-3032-0205-4276-3204 80.4533 79.9249 -0.5284 ~ small
0652-3022-1205-2826-1200 70.9577 72.8813 +1.9236 ✗ big
1536-9325-5100-0433-1226 65.8928 65.8928 -0.0000 ✅
2007-3011-9205-8136-3204 68.3914 68.3914 -0.0000 ✅
2031-3007-0205-1296-3204 64.1734 64.1734 +0.0000 ✅
2102-3018-0205-7886-5204 63.8732 48.0657 -15.8075 ✗ big (HW or HP?)
2130-3018-4205-4686-5204 71.3158 71.3158 +0.0000 ✅
2336-3124-3600-0517-1292 83.4955 83.5381 +0.0426 ✅
2536-2525-0600-0788-2292 79.7264 RAISES Unmapped: cylinder_size='Normal'
2590-3025-7205-9066-0200 65.9194 65.9194 -0.0000 ✅
2699-3025-5205-8066-0200 68.7535 68.7535 +0.0000 ✅
2800-7999-0322-4594-3563 78.1408 78.1665 +0.0257 ✅
3136-7925-4500-0246-6202 77.8872 77.1341 -0.7531 ~ small
3336-2825-9400-0512-8292 78.3739 78.4413 +0.0674 ✅
4536-5424-8600-0109-1226 82.4974 82.5412 +0.0438 ✅
4536-8325-3100-0409-1222 65.6000 65.1680 -0.4320 ~ small
4800-3992-0422-0599-3563 86.7192 86.7688 +0.0496 ✅
6835-3920-2509-0933-5226 80.1977 65.6387 -14.5590 ✗ big (HW or HP?)
7700-3362-0922-7022-3563 63.4425 63.0024 -0.4401 ~ small
7800-1501-0922-7127-3563 64.7504 64.5072 -0.2432 ~ small
7836-3125-0600-0526-2202 80.1792 80.1389 -0.0403 ✅
9036-0824-3500-0420-8222 84.2727 84.3227 +0.0500 ✅
9370-3060-1205-3546-4204 87.8687 87.8946 +0.0259 ✅
9380-2957-7490-2595-3141 74.5902 74.6175 +0.0273 ✅
9421-3045-3205-1646-6200 87.4495 RAISES Unmapped: cylinder_size='Normal'
9796-3058-6205-0346-9200 90.1318 90.6983 +0.5665 ~ small
9836-7525-9500-0575-1202 75.2223 75.2203 -0.0020 ✅
```
Run the probe yourself to confirm the baseline before slicing — script in
"Diagnostic probe script" below.
## API path is fetchable, not deferred
The Open EPC API is reachable via the existing client
[`backend/epc_client/epc_client_service.py`](../../../backend/epc_client/epc_client_service.py).
Token sits in `backend/.env` as `OPEN_EPC_API_TOKEN`. Minimal example
(confirmed working at handover time):
```python
import os
from pathlib import Path
# Load .env (no python-dotenv assumption — manual parse works)
for line in Path('/workspaces/model/backend/.env').read_text().splitlines():
line = line.strip()
if not line or line.startswith('#') or '=' not in line: continue
k, v = line.split('=', 1)
os.environ[k.strip()] = v.strip().strip('"').strip("'")
from backend.epc_client.epc_client_service import EpcClientService
svc = EpcClientService(auth_token=os.environ["OPEN_EPC_API_TOKEN"])
# Returns the raw API JSON dict (the same shape that
# `EpcPropertyDataMapper.from_api_response` consumes):
raw_json = svc._fetch_certificate("0036-6325-1100-0063-1226")
# Or skip straight to the mapped EPC:
epc = svc.get_by_certificate_number("0036-6325-1100-0063-1226")
```
For the 38-cert sweep, persist the raw JSON to disk so future runs are
offline + deterministic:
```bash
mkdir -p /workspaces/model/domain/sap10_calculator/rdsap/tests/fixtures/golden
# write each `raw_json` to <cert_ref>.json — matches the existing
# golden/<cert>.json convention used by the 7-cert ASHP cohort.
```
Rate-limit caveat: the client raises `EpcRateLimitError` with a
`retry_after` hint on HTTP 429. The existing `call_with_retry` wrapper at
`backend/epc_client/_retry.py` handles backoff. Be polite — sleep 0.5s
between fetches on the bulk sweep.
## Recommended workstream order
### Phase 0 — Folder-vs-cert sweep (already done at handover time — clean)
Already run at handover: **38/38 match**. Re-run if the dataset has
changed since handover. Fail loudly on any new mismatch. If mismatches
exist, audit the cert dir (likely a typo'd folder name or a misplaced
PDF) before sinking slice work into a wrong-cert mapping.
```python
# (uses the .env loader + svc from above)
import re
from pathlib import Path
src = Path('/workspaces/model/sap worksheets/additional with api 2')
from backend.documents_parser.tests.test_summary_pdf_mapper_chain import _summary_pdf_to_textract_style_pages
mismatches = []
for cd in sorted(src.iterdir()):
cert_ref = cd.name
sp = next(cd.glob("Summary_*.pdf"), None)
if sp is None:
mismatches.append((cert_ref, "no Summary PDF"))
continue
text = "\n".join(_summary_pdf_to_textract_style_pages(sp))
m = re.search(r"\b([A-Z]{1,2}[0-9][0-9A-Z]?\s?[0-9][A-Z]{2})\b", text)
pdf_pc = (m.group(1) if m else "").replace(" ","").upper()
try:
api_pc = (svc._fetch_certificate(cert_ref).get("postcode","") or "").replace(" ","").upper()
if pdf_pc != api_pc:
mismatches.append((cert_ref, f"PDF={pdf_pc!r} vs API={api_pc!r}"))
except Exception as e:
mismatches.append((cert_ref, f"API ERROR: {type(e).__name__}"))
print(f"{len(mismatches)} mismatches:", mismatches)
```
### Phase 1 — Strict-enum catches (immediate, lowest-investigation)
**First slice:** `cylinder_size='Normal'` → cascade code. Two certs raise
on this label (2536, 9421). Look up the worksheet `Cylinder Volume` for
cert 2536 (`sap worksheets/additional with api 2/2536-2525-0600-0788-2292/dr87-0001-NNNNNN.pdf`)
to determine the correct cascade enum. The cascade lookup is at
[`domain/sap10_calculator/rdsap/cert_to_inputs.py:1878`](../../../domain/sap10_calculator/rdsap/cert_to_inputs.py#L1878):
`_CYLINDER_SIZE_CODE_TO_LITRES: Final[dict[int, float]] = {3: 160.0, 4: 210.0}`.
If 'Normal' maps to a volume not in this dict, the cascade itself needs an
entry too — but most likely 'Normal' is a different size band the cascade
already knows about (check RdSAP cylinder-size enums: Small/Normal/Medium/
Large/Very Large). After the fix, the
`test_all_seven_ashp_cohort_certs_extract_without_unmapped_label_raise`
test should be extended to include the new cohort certs.
### Phase 2 — Bulk-pin the 24 already-closed certs
Add `test_summary_<cert>_full_chain_sap_within_spec_floor_of_worksheet`
tests for all 24 first-try-closures. Mostly mechanical: copy Summary PDFs
to `backend/documents_parser/tests/fixtures/Summary_NNNNNN.pdf`, add
path constants, register chain tests using `_ASHP_COHORT_CHAIN_TOLERANCE
= 0.07`. Probably 23 slices grouped by batch.
Chain-test body pattern — see
[`backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`](../../../backend/documents_parser/tests/test_summary_pdf_mapper_chain.py)
`test_summary_3800_full_chain_sap_within_spec_floor_of_worksheet`
(zero-slice closure precedent).
### Phase 3 — Close the 9 small-gap certs
In delta order (smallest first, easier to debug):
- 7836 (Δ -0.04) — already inside ±0.07 on closer inspection? Re-run
probe; pin if so.
- 0036 (Δ -0.37), 0390 (Δ -0.33), 7800 (Δ -0.24), 4536-8325 (Δ -0.43),
9796 (Δ +0.57), 7700 (Δ -0.44), 0464 (Δ -0.53), 3136 (Δ -0.75),
0310 (Δ -0.75) — likely 1 fix each per the cohort precedent.
For each, follow the [[feedback-worksheet-not-api-reference]] methodology:
extract worksheet line refs (26)..(39), (64), (216) for the cert, diff
against Summary cascade output. The dominant residual line ref points to
the missing mapper field.
### Phase 4 — Investigate the 3 big-gap certs
- **cert 2102** (Δ -15.81) and **cert 6835** (Δ -14.56) — both ~-15 SAP.
Magnitude similar to cert 0380 starting point pre-Slice 2 (HP mis-
routing) was -54 SAP. -15 SAP suggests partial HP mis-routing or major
HW/cylinder mis-config. Probe `main_heating_index_number` /
`main_heating_category` on the Summary EPC first.
- **cert 0652** (Δ +1.92) — moderate over-prediction. Could be PV
multi-array / extension / unusual fabric variant.
### Phase 5 — API path closure
Once Elmhurst is closed for all 38, run the **same** chain tests against
the API path:
1. Fetch raw JSON for each cert (see `_fetch_certificate` snippet above).
2. Persist to `domain/sap10_calculator/rdsap/tests/fixtures/golden/<cert_ref>.json`.
3. Run the API path: `EpcPropertyDataMapper.from_api_response(json) →
cert_to_inputs → calculate_sap_from_inputs`.
4. Pin against worksheet at ±0.07 (HPs) or 1e-4 (boilers).
5. Pattern existing `test_api_<cert>_full_chain_sap_within_spec_floor_of_worksheet`
live in the same `test_summary_pdf_mapper_chain.py` file (yes,
confusing — but that's where the slice 102f-prep series put them).
Per the prior session's prediction memory: many API-path certs should
close first-try because Elmhurst's first pass paid down most cascade-
side gaps. Per-cert convergence should be ≤1 slice each for the API path
once Elmhurst is done.
### Phase 6 — Cross-mapper parity (Summary EPC ≡ API EPC)
The user's longstanding north-star ("the EPC objects matching is our
signal that we've done things correctly"). For each cert with both
Summary + API EPCs, diff load-bearing fields. Existing pattern:
`test_from_elmhurst_site_notes_matches_hand_built_*` family. Extend or
adapt to compare Summary EPC vs API EPC directly. Any divergence is
either (a) a mapper gap on one side or (b) a real Summary-vs-API source
discrepancy worth flagging.
## Methodology — preserved conventions
All from prior session memory:
- **Worksheet, not API, is the target** ([[feedback-worksheet-not-api-reference]]).
The dr87 worksheet's `SAP value` line is the pin. The API path is a
*signal* (useful for "what should the EPC field look like?") but never
the target.
- **One slice = one commit; stage by name** ([[feedback-commit-per-slice]]).
- **AAA test convention** with literal `# Arrange / # Act / # Assert`
headers ([[feedback-aaa-test-convention]]).
- **`abs(diff) <= tol`** not `pytest.approx` ([[feedback-abs-diff-over-pytest-approx]]).
- **±0.07 spec-floor tolerance** for HP cohort chain tests; **1e-4** for
boiler cohort chain tests.
- **Spec citation in commit messages** ([[feedback-spec-citation-in-commits]]).
- **Pyright net-zero per file**.
- **Worksheet-shape fidelity** ([[feedback-worksheet-shape-fidelity]]) when
adding new dataclass fields — mirror existing patterns, full structure
even without immediate consumer.
- **Strict-enum raises on unmapped labels** (Slice S0380.15 — currently
only cylinder helpers; extend to other label-mapping helpers as their
dicts get exercised). Exception is `UnmappedElmhurstLabel` from
`datatypes.epc.domain.mapper`.
## Diagnostic probe script
Paste-able first-attempt probe (run from repo root):
```python
PYTHONPATH=/workspaces/model python <<'PY'
import re, subprocess
from pathlib import Path
from backend.documents_parser.tests.test_summary_pdf_mapper_chain import _summary_pdf_to_textract_style_pages
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
from datatypes.epc.domain.mapper import EpcPropertyDataMapper, UnmappedElmhurstLabel
from domain.sap10_calculator.rdsap.cert_to_inputs import cert_to_inputs, SAP_10_2_SPEC_PRICES
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
src_root = Path('/workspaces/model/sap worksheets/additional with api 2')
for cd in sorted(src_root.iterdir()):
summary_pdfs = list(cd.glob("Summary_*.pdf"))
ws_pdfs = list(cd.glob("dr87-*.pdf"))
if not (summary_pdfs and ws_pdfs):
continue
out = subprocess.run(["pdftotext", str(ws_pdfs[0]), "-"], capture_output=True, text=True).stdout
m = re.search(r"SAP value\s*\n?\s*([\d.]+)", out)
ws_sap = float(m.group(1)) if m else None
try:
sn = ElmhurstSiteNotesExtractor(_summary_pdf_to_textract_style_pages(summary_pdfs[0])).extract()
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(sn)
r = calculate_sap_from_inputs(cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
d = r.sap_score_continuous - ws_sap if ws_sap else 0
tag = "✅" if abs(d) < 0.07 else ""
print(f" {cd.name:26s} ws={ws_sap} summary={r.sap_score_continuous:.4f} delta={d:+.4f} {tag}")
except UnmappedElmhurstLabel as e:
print(f" {cd.name:26s} ws={ws_sap} RAISES {e.field}={e.value!r}")
except Exception as e:
print(f" {cd.name:26s} ERROR {type(e).__name__}: {e}")
PY
```
Worksheet line-ref grep (for any cert's HLC table):
```bash
pdftotext "/workspaces/model/sap worksheets/additional with api 2/<cert>/dr87-0001-<suffix>.pdf" - | sed -n '380,475p'
```
## Per-cert diagnostic recipe
When a Summary chain test fails, the worksheet-anchored diff at HLC line refs
is the canonical first step:
```python
# (paste in a probe shell after running cert_to_inputs/calculate)
ws = {
"doors_w_per_k": 4.4400, # (26) — pull from worksheet PDF
"windows_w_per_k": 6.8011, # (27)
"walls_w_per_k": 11.6150, # (29a) Main + Ext sum
"party_walls_w_per_k": 3.9050, # (32) Main + Ext sum
"heat_transfer_coefficient_w_per_k": 127.1578, # (39) avg
}
for k, w in ws.items():
v = r.intermediate.get(k); print(f" {k:36s} {v:.4f} vs ws {w:.4f} d={v-w:+.4f}")
```
If fabric all matches and SAP is still off, the gap is in HW (line refs
(64)/(216)), internal gains (66..73), or HP path (Appendix N3.6 PSR).
Compare against the API path as a *signal* (not a target) — the previous
session's Slice 6 work has a worked example.
## Test baselines at HEAD
```bash
PYTHONPATH=/workspaces/model python -m pytest \
backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
backend/documents_parser/tests/test_elmhurst_extractor.py \
backend/documents_parser/tests/test_elmhurst_end_to_end.py \
domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \
domain/sap10_calculator/worksheet/tests/test_water_heating.py \
domain/sap10_calculator/worksheet/tests/test_mean_internal_temperature.py \
domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \
domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \
domain/sap10_calculator/tests/test_pcdb_table_362_lookup.py \
domain/sap10_ml/tests/test_rdsap_uvalues.py \
datatypes/epc/schema/tests/test_schema_loading.py \
--no-cov -q
```
Expected: **689 pass + 10 pre-existing fails** (9 cert 001479 Layer 1
hand-built skeleton + 1 pre-existing FEE).
Pyright per-file baselines (unchanged across this session's slices):
- `datatypes/epc/domain/mapper.py`: 32
- `domain/sap10_calculator/worksheet/heat_transmission.py`: 13
- `domain/sap10_calculator/rdsap/cert_to_inputs.py`: 35
- `backend/documents_parser/elmhurst_extractor.py`: 0
- `datatypes/epc/surveys/elmhurst_site_notes.py`: 0
- `backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`: 0
## Cohort closure status (carried forward)
15 slices shipped in the previous session (S0380.1 → S0380.15), all on
branch `feature/per-cert-mapper-validation`:
| Slice | Commit | What |
|---|---|---|
| S0380.1 | dca2ff09 | RED pin: chain test for cert 0380 vs worksheet 88.5104 |
| S0380.2 | b1a1bb8d | main_heating_category=4 for PCDB Table 362 heat pumps |
| S0380.3 | 575cdd53 | wall_insulation_type=6 for "FE Filled Cavity + External" |
| S0380.4 | 2d15951b | wall_insulation_thickness from Summary §7.0 (mapper+extractor+dataclass) |
| S0380.5 | d4d0aa24 | insulated_door_u_value from Summary §10 "Average U-value" |
| S0380.6 | 16fe2262 | Full §15.1 cylinder block (size+insulation+thickness+thermostat) |
| S0380.7 | b6ae18f3 | Re-pin chain test to ±0.07 spec-floor tolerance |
| S0380.8 | 4c06865f | "As Main Wall" extension inheritance copies insulation_thickness_mm |
| S0380.9 | 43a86d66 | Multi-array PV refactor (Renewables.pv_arrays list) |
| S0380.10 | f546bd5d | Chain tests for first-try closures (certs 3800, 9285) |
| S0380.11 | 5de41d58 | Zero-shower lodgings resolve to explicit 0 counts |
| S0380.12 | 2f5e70e3 | Alt-wall window-location parses pre-data slice |
| S0380.13 | 7f099d98 | Cantilever gate accepts "House" descriptive form |
| S0380.14 | f878bf51 | "Large" cylinder → cascade code 4 (closes Daikin cert 9418) |
| S0380.15 | d7ca179e | Strict-enum raising on unmapped cylinder labels |
All 7 original ASHP cohort certs closed at ±0.07. Mean residual +0.044.
## Memory references
- [[project-summary-path-cohort-closure]] — cohort closure status table
and convergence trend.
- [[feedback-worksheet-not-api-reference]] — Summary-path targets pin to
the dr87 worksheet PDF, not the API EPC.
- [[feedback-cascade-pin-methodology]] — test the actual cascade against
PDF line refs at 1e-4 (or ±0.07 for the HP precision floor).
- [[feedback-zero-error-strict]] — every line ref of every output for
every fixture must pin against PDF at abs=1e-4 unless documented.
- [[feedback-commit-per-slice]] / [[feedback-aaa-test-convention]] /
[[feedback-abs-diff-over-pytest-approx]] / [[feedback-spec-citation-in-commits]]
/ [[feedback-worksheet-shape-fidelity]] — slicing + test conventions.
- [[reference-rdsap10-worksheet-xlsx]] — canonical SAP 10.2 calculator
spreadsheet at repo root (`2026-05-19-17-18 RdSap10Worksheet.xlsx`)
for spec-conformance cross-checks.
## First concrete actions
1. **Folder-vs-cert sweep** is already 38/38 ✅ at handover. Re-run if
the dataset has changed.
2. **Run the Summary-path diagnostic probe** to confirm the baseline
reproduces (24 ✅, 9 small, 3 big, 2 raises).
3. **Fix the 'Normal' cylinder raise** as Slice 1 (lowest-investigation
start). Look at the worksheet `Cylinder Volume` for cert 2536, decide
the cascade enum, extend `_ELMHURST_CYLINDER_SIZE_LABEL_TO_SAP10`,
add a unit test + chain test for both raising certs.
4. **Bulk-pin the 24 first-try-closures** as Slice 2 (or split into a
couple of batches by 6-digit suffix range).
5. **Iterate on the 9 small-gap certs** one by one, worksheet-anchored
diagnostic each time.
6. **Tackle the 3 big-gap certs** with deeper investigation (likely
HP-routing or HW-cascade gaps).
7. **Fetch + persist API JSON for all 38** (`_fetch_certificate`
`golden/<cert>.json`). Then mirror the Summary closure tests on the
API path.
8. **Add cross-mapper EPC parity tests** for the load-bearing fields
per the user's longstanding north-star.
Good luck. The first concrete action is the folder-vs-cert sweep —
confirm the dataset is clean before starting any mapper slice.