mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Parser/ETL for BRE PCDB pcdb10.dat (April 2026 revision). domain.sap.tables.pcdb.parser exposes parse_table_105 (typed GasOilBoilerRecord with brand/model/winter+summer+comparative-HW efficiency/output kW/final year) plus parse_table_raw for generic positional ingestion (pcdb_id + raw row only). etl.py runs the full ETL: reads pcdb10.dat as latin-1, writes per-table .jsonl files under docs/sap-spec/. Idempotent; runnable via PYTHONPATH=packages/domain/src python -m domain.sap.tables.pcdb.etl. Per Q1=D grilling: all 8 tables of interest ingested — 105 (Gas/Oil Boilers, typed) plus 122/143/313/353/362/391/506 (raw). Per-table typed refinement deferred to the follow-up slices that wire each table's cert-side cascade. Per Q3=B: typed fields decode against ncm-pcdb.org.uk ground-truth records (Baxi 000098 + Potterton 000619 + Saunier Duval 000732 verified by user); full raw row preserved on every record for forensics. Per Q2 user choice: NDJSON .jsonl format chosen over indented JSON to keep diff-friendliness while halving file size (17MB total vs 31MB pretty-printed). Edge cases handled: latin-1 encoding (manufacturer addresses carry the degree sign), `'obsolete'` status string where a year would otherwise live, `'>70kW'` range indicator on output-power fields — non-numeric values fall to None with the raw string preserved on `raw`. Slice 2 lands the domain.sap.tables.pcdb runtime lookup module (per-table by-pcdb-id dicts loaded at import time). Slice 3 wires Table 105 into cert_to_inputs.main_heating_efficiency / water_efficiency precedence cascades per Q5=B (space heating + water heating scalar override; equation D1 monthly + Appendix N HP factor + FGHRS/WWHRS/HIU deferred). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| domain | ||
| fetchers | ||
| repos | ||
| utils | ||
| README.md | ||
Shared packages
Workspace packages consumed by services/*. Each package is its own Python distribution with its own pyproject.toml; services import via the workspace dependency mechanism ({ workspace = true }).
| Package | Purpose |
|---|---|
domain/ |
Shared domain types — Property, BaselinePerformance, Plan, Scenario, EpcPropertyData, etc. No persistence, no IO, no business logic. |
repos/ |
Persistence layer — one repo per aggregate. Owns the SQL. Depends on domain. |
fetchers/ |
External API clients (gov EPC, Ofgem, Google Solar, etc.). Depend on domain for response shapes. |
utils/ |
Cross-cutting infra — logging, S3, CloudWatch URL builders, SQS task helpers. |
Adding a new shared package
Only when a real second consumer materialises. Don't pre-shatter (repos-epc, repos-property, ...) — split when a deployment needs to drop a dep, not before.
See ../ara_backend_design.md §11 for the broader monorepo layout and ../CONTEXT.md for the domain glossary that names the types living in domain/.