Model/packages
Khalim Conn-Kowlessar 15613309df slice S-B24: parse measured U from full-SAP wall description
Full SAP assessments (~15% of corpus, 4 403 of 30 000 scanned bulk-zip
certs) lodge a measured/calculated wall U-value per BS EN ISO 6946 in
walls[i].description, e.g. "Average thermal transmittance 0.18 W/m²K".
These certs typically have wall_construction, wall_insulation_type and
construction_age_band all None, which the cascade defaults previously
resolved to U = 1.5 (uninsulated cavity at band E). RdSAP 10 §5.3:
"U values are obtained from … the construction type, date of
construction and, where applicable, thickness of additional insulation"
— but a measured value supersedes the cascade.

Corpus U-value distribution among parsed:
  median 0.21, mean 0.225, range 0.06-1.84
  80% at U ≈ 0.2 (Part L-compliant new-builds)
  10% at U ≈ 0.1 (passivhaus / very low)
  7%  at U ≈ 0.3 (older retrofitted full-SAP)
  3%  in the tail (conversions, edge cases)

Per affected cert (100 m² new-build at U 1.5 → 0.21):
  walls_w_per_k drops 129 → 21 W/K
  PEUI drops ≈ 120 kWh/m²

Implementation:
- _measured_u_from_description() regex-parses the phrase from the wall
  description; returns None on no-match or non-numeric so the cascade
  fall-through is preserved.
- u_wall checks the measured value FIRST, before any cascade logic.
- No range cap — calculator mirrors what the assessor lodged, per the
  "deterministic except for input errors" principle. Parse failure
  falls through cleanly.

Parity probe at 300 certs, seed=7: headlines unchanged. Direct check
on the sample: 0/300 certs carry an "Average thermal transmittance"
description. The v18a parquet filters full-SAP certs out somewhere
upstream, so this slice is invisible in the parquet-based probe. The
slice's correctness is proved by:
- 4 unit tests in test_rdsap_uvalues.py (tracer + regression on
  ordinary descriptions + parse-failure fallback + filled-cavity
  description still routes correctly)
- 1 end-to-end test in test_heat_transmission.py exercising a
  synthetic full-SAP cert through heat_transmission_from_cert
- All 274 domain tests passing, no regressions

Follow-up tooling: a bulk-zip-based parity probe that doesn't filter
to the parquet's subset is needed to measure this slice's corpus
impact. Separate dig.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 20:50:39 +00:00
..
domain slice S-B24: parse measured U from full-SAP wall description 2026-05-18 20:50:39 +00:00
fetchers added potential file scaffolding: 2026-05-15 10:56:53 +00:00
repos added potential file scaffolding: 2026-05-15 10:56:53 +00:00
utils added potential file scaffolding: 2026-05-15 10:56:53 +00:00
README.md added potential file scaffolding: 2026-05-15 10:56:53 +00:00

Shared packages

Workspace packages consumed by services/*. Each package is its own Python distribution with its own pyproject.toml; services import via the workspace dependency mechanism ({ workspace = true }).

Package Purpose
domain/ Shared domain types — Property, BaselinePerformance, Plan, Scenario, EpcPropertyData, etc. No persistence, no IO, no business logic.
repos/ Persistence layer — one repo per aggregate. Owns the SQL. Depends on domain.
fetchers/ External API clients (gov EPC, Ofgem, Google Solar, etc.). Depend on domain for response shapes.
utils/ Cross-cutting infra — logging, S3, CloudWatch URL builders, SQS task helpers.

Adding a new shared package

Only when a real second consumer materialises. Don't pre-shatter (repos-epc, repos-property, ...) — split when a deployment needs to drop a dep, not before.

See ../ara_backend_design.md §11 for the broader monorepo layout and ../CONTEXT.md for the domain glossary that names the types living in domain/.