Model/packages
Khalim Conn-Kowlessar fe04cd3a35 pcdb slice 1: pcdb10.dat ETL → 8 per-table NDJSON files + parser + 8 tests
Parser/ETL for BRE PCDB pcdb10.dat (April 2026 revision). domain.sap.tables.pcdb.parser exposes parse_table_105 (typed GasOilBoilerRecord with brand/model/winter+summer+comparative-HW efficiency/output kW/final year) plus parse_table_raw for generic positional ingestion (pcdb_id + raw row only). etl.py runs the full ETL: reads pcdb10.dat as latin-1, writes per-table .jsonl files under docs/sap-spec/. Idempotent; runnable via PYTHONPATH=packages/domain/src python -m domain.sap.tables.pcdb.etl.

Per Q1=D grilling: all 8 tables of interest ingested — 105 (Gas/Oil Boilers, typed) plus 122/143/313/353/362/391/506 (raw). Per-table typed refinement deferred to the follow-up slices that wire each table's cert-side cascade. Per Q3=B: typed fields decode against ncm-pcdb.org.uk ground-truth records (Baxi 000098 + Potterton 000619 + Saunier Duval 000732 verified by user); full raw row preserved on every record for forensics. Per Q2 user choice: NDJSON .jsonl format chosen over indented JSON to keep diff-friendliness while halving file size (17MB total vs 31MB pretty-printed).

Edge cases handled: latin-1 encoding (manufacturer addresses carry the degree sign), `'obsolete'` status string where a year would otherwise live, `'>70kW'` range indicator on output-power fields — non-numeric values fall to None with the raw string preserved on `raw`.

Slice 2 lands the domain.sap.tables.pcdb runtime lookup module (per-table by-pcdb-id dicts loaded at import time). Slice 3 wires Table 105 into cert_to_inputs.main_heating_efficiency / water_efficiency precedence cascades per Q5=B (space heating + water heating scalar override; equation D1 monthly + Appendix N HP factor + FGHRS/WWHRS/HIU deferred).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 09:43:41 +00:00
..
domain pcdb slice 1: pcdb10.dat ETL → 8 per-table NDJSON files + parser + 8 tests 2026-05-21 09:43:41 +00:00
fetchers added potential file scaffolding: 2026-05-15 10:56:53 +00:00
repos added potential file scaffolding: 2026-05-15 10:56:53 +00:00
utils added potential file scaffolding: 2026-05-15 10:56:53 +00:00
README.md added potential file scaffolding: 2026-05-15 10:56:53 +00:00

Shared packages

Workspace packages consumed by services/*. Each package is its own Python distribution with its own pyproject.toml; services import via the workspace dependency mechanism ({ workspace = true }).

Package Purpose
domain/ Shared domain types — Property, BaselinePerformance, Plan, Scenario, EpcPropertyData, etc. No persistence, no IO, no business logic.
repos/ Persistence layer — one repo per aggregate. Owns the SQL. Depends on domain.
fetchers/ External API clients (gov EPC, Ofgem, Google Solar, etc.). Depend on domain for response shapes.
utils/ Cross-cutting infra — logging, S3, CloudWatch URL builders, SQS task helpers.

Adding a new shared package

Only when a real second consumer materialises. Don't pre-shatter (repos-epc, repos-property, ...) — split when a deployment needs to drop a dep, not before.

See ../ara_backend_design.md §11 for the broader monorepo layout and ../CONTEXT.md for the domain glossary that names the types living in domain/.