Model/backend/documents_parser
Khalim Conn-Kowlessar 43a86d66c2 Slice S0380.9: multi-array PV support + close cert 0350 to ASHP spec floor
Refactors Elmhurst `Renewables` PV detail from four scalar fields
(pv_peak_power_kw / pv_orientation / pv_elevation_deg / pv_overshading
— single-array shape) to `pv_arrays: List[ElmhurstPvArray]`, then
walks the §19.0 PV Panel block in 4-tuples so dwellings with multiple
PV arrays surface every array.

Forced by cert 0350-2968-2650-2796-5255 (Summary_000903.pdf), the
second ASHP cohort cert through the Summary path and first to lodge
multiple PV arrays — the dr87 worksheet pins 2 arrays at 1.50 kWp
each (one SE at 45°, one NW at 45°). Pre-slice the extractor's
hardcoded "break at len(values) == 4" capped output at one array
regardless of how many the PDF lodged.

Three-layer end-to-end change:

1. `datatypes/epc/surveys/elmhurst_site_notes.py` — add
   `ElmhurstPvArray` dataclass (kw, orientation, elevation_deg,
   overshading); replace four `Renewables.pv_*` scalars with
   `pv_arrays: List[ElmhurstPvArray] = field(default_factory=list)`.
2. `backend/documents_parser/elmhurst_extractor.py` — rename
   `_extract_pv_array_detail` → `_extract_pv_arrays`; walk values
   after the "Photovoltaic panel details" anchor in 4-tuples until a
   stop token ("batteries"/"export"/etc.) or a §-header closes the
   block. §-header regex tightened to `\d{1,2}\.\d\s+\w` so kWp
   values like "1.50" don't trip the close (without the `\s+\w` the
   regex matched both "20.0 Wind Turbine" AND "1.50").
3. `datatypes/epc/domain/mapper.py` — `_elmhurst_pv_arrays` iterates
   the list and emits one `PhotovoltaicArray` per row; collapses
   empty list → None so the cascade keeps its no-PV fallback.

Forcing function: cert 0350 first-attempt Summary SAP closes from
Δ -4.5829 (Slice 8 baseline) to Δ **+0.0458** — within the ±0.07
ASHP-cohort spec-precision floor. PV export credit GBP moves from
158.91 (one array surfaced) to 265.99 (both arrays surfaced) — the
extra ~107 GBP of avoided cost lifts cert 0350's SAP by ~4.6 points.

This validates the structural-debt-amortizes hypothesis: cert 0350
needed only TWO new slices (S0380.8 inheritance + S0380.9 multi-PV)
beyond the cert 0380 closure work, vs cert 0380's 6 slices from
scratch. Subsequent cohort certs should converge similarly fast as
fixture-specific gaps are paid down.

Added two tests:
- `test_summary_0350_surfaces_two_pv_arrays` — unit test pinning
  the multi-array contract on the mapper boundary.
- `test_summary_0350_full_chain_sap_within_spec_floor_of_worksheet`
  — chain test pinning Δ < ±0.07 (matches cert 0380's chain test).

Cert 0380 (single-array, 3 kWp) continues to pass its chain test +
all 6 unit-level pins — the refactor preserves single-array behaviour.

Pyright net-zero across all four edited files:
  datatypes/epc/domain/mapper.py:                32 (baseline)
  datatypes/epc/surveys/elmhurst_site_notes.py:   0
  backend/documents_parser/elmhurst_extractor.py: 0
  backend/documents_parser/tests/test_summary_pdf_mapper_chain.py: 0

Regression suite: 677 pass + 10 fail (= handover baseline 669 + 10
+ 8 new GREEN unit+chain tests across Slices S0380.2..S0380.9).

Fixtures added: `backend/documents_parser/tests/fixtures/Summary_
000903.pdf` (copied from `sap worksheets/Additional data with api/
0350-2968-2650-2796-5255/`).

Spec refs:
- SAP 10.2 Appendix M (PDF p.103) — multiple PV arrays sum to total
  electricity generation per Equation M-1 (each array's surface flux
  computed independently per Appendix U3.3).
- SAP 10.2 Appendix U3.3 (PDF p.124) — per-array surface flux keyed
  on orientation + tilt + overshading.
- Cert 0350 worksheet `dr87-0001-000903.pdf` (29a Main 19.4575 W/K
  + Ext1 1.3025 W/K = 20.7600 ≡ Summary cascade walls_w_per_k; (39)
  avg HTC 173.4202 ≡ Summary cascade; (64) HW 2084.66 ÷ (216) HW eff
  1.7285 = 1206.04 ≡ Summary cascade hot_water_kwh_per_yr).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 20:44:13 +00:00
..
handler address JTK review comments 2026-04-20 15:11:17 +00:00
tests Slice S0380.9: multi-array PV support + close cert 0350 to ASHP spec floor 2026-05-27 20:44:13 +00:00
__init__.py Map to RdSapSiteNotes from site notes JSON 🟥 2026-04-16 13:54:03 +00:00
db_writer.py include updating epc_property_data to pashub to ara workflow 2026-04-29 09:55:14 +00:00
elmhurst_extractor.py Slice S0380.9: multi-array PV support + close cert 0350 to ASHP spec floor 2026-05-27 20:44:13 +00:00
extractor.py Handle wall thickness "Unmeasurable" 🟩 2026-04-30 16:41:16 +00:00
local_runner.py update local runner to work for elmhurst 2026-04-24 14:01:36 +00:00
parser.py load ecmk site notes to db 2026-04-29 11:20:47 +00:00
pdf.py update local runner to work for elmhurst 2026-04-24 14:01:36 +00:00