mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Parser/ETL for BRE PCDB pcdb10.dat (April 2026 revision). domain.sap.tables.pcdb.parser exposes parse_table_105 (typed GasOilBoilerRecord with brand/model/winter+summer+comparative-HW efficiency/output kW/final year) plus parse_table_raw for generic positional ingestion (pcdb_id + raw row only). etl.py runs the full ETL: reads pcdb10.dat as latin-1, writes per-table .jsonl files under docs/sap-spec/. Idempotent; runnable via PYTHONPATH=packages/domain/src python -m domain.sap.tables.pcdb.etl. Per Q1=D grilling: all 8 tables of interest ingested — 105 (Gas/Oil Boilers, typed) plus 122/143/313/353/362/391/506 (raw). Per-table typed refinement deferred to the follow-up slices that wire each table's cert-side cascade. Per Q3=B: typed fields decode against ncm-pcdb.org.uk ground-truth records (Baxi 000098 + Potterton 000619 + Saunier Duval 000732 verified by user); full raw row preserved on every record for forensics. Per Q2 user choice: NDJSON .jsonl format chosen over indented JSON to keep diff-friendliness while halving file size (17MB total vs 31MB pretty-printed). Edge cases handled: latin-1 encoding (manufacturer addresses carry the degree sign), `'obsolete'` status string where a year would otherwise live, `'>70kW'` range indicator on output-power fields — non-numeric values fall to None with the raw string preserved on `raw`. Slice 2 lands the domain.sap.tables.pcdb runtime lookup module (per-table by-pcdb-id dicts loaded at import time). Slice 3 wires Table 105 into cert_to_inputs.main_heating_efficiency / water_efficiency precedence cascades per Q5=B (space heating + water heating scalar override; equation D1 monthly + Appendix N HP factor + FGHRS/WWHRS/HIU deferred). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| src/domain | ||
| pyproject.toml | ||
| README.md | ||
domna-domain
Shared domain types — Property, Properties, BaselinePerformance, Plan, PlanPhase, Scenario, ScenarioPhase, ScenarioSnapshot, Recommendation, OptimisedPackage, EpcPropertyData, etc.
Boundary: types only. No persistence, no IO, no business logic. Other packages and services depend on domna-domain; this package depends on nothing internal.
Domain definitions live in ../../CONTEXT.md. New types added here must match the glossary terms.
Layout
src/domain/
├── __init__.py
├── property.py # Property, Properties, PropertyIdentity
├── site_notes.py
├── landlord_overrides.py
├── baseline_performance.py # lodged + effective pair (ADR-0004)
├── plan.py # Plan, PlanPhase, OptimisedPackage
├── scenario.py # Scenario, ScenarioPhase, ScenarioSnapshot (ADR-0005)
├── recommendation.py
├── geospatial.py
├── solar.py
├── anomaly_flags.py
└── ml/
├── __init__.py
├── transform.py # EpcMlTransform (versioned per §8.3)
└── schema.py
When datatypes/epc/domain/ folds in, the EPC schema types move under src/domain/epc/.