Model/packages/domain
Khalim Conn-Kowlessar fe04cd3a35 pcdb slice 1: pcdb10.dat ETL → 8 per-table NDJSON files + parser + 8 tests
Parser/ETL for BRE PCDB pcdb10.dat (April 2026 revision). domain.sap.tables.pcdb.parser exposes parse_table_105 (typed GasOilBoilerRecord with brand/model/winter+summer+comparative-HW efficiency/output kW/final year) plus parse_table_raw for generic positional ingestion (pcdb_id + raw row only). etl.py runs the full ETL: reads pcdb10.dat as latin-1, writes per-table .jsonl files under docs/sap-spec/. Idempotent; runnable via PYTHONPATH=packages/domain/src python -m domain.sap.tables.pcdb.etl.

Per Q1=D grilling: all 8 tables of interest ingested — 105 (Gas/Oil Boilers, typed) plus 122/143/313/353/362/391/506 (raw). Per-table typed refinement deferred to the follow-up slices that wire each table's cert-side cascade. Per Q3=B: typed fields decode against ncm-pcdb.org.uk ground-truth records (Baxi 000098 + Potterton 000619 + Saunier Duval 000732 verified by user); full raw row preserved on every record for forensics. Per Q2 user choice: NDJSON .jsonl format chosen over indented JSON to keep diff-friendliness while halving file size (17MB total vs 31MB pretty-printed).

Edge cases handled: latin-1 encoding (manufacturer addresses carry the degree sign), `'obsolete'` status string where a year would otherwise live, `'>70kW'` range indicator on output-power fields — non-numeric values fall to None with the raw string preserved on `raw`.

Slice 2 lands the domain.sap.tables.pcdb runtime lookup module (per-table by-pcdb-id dicts loaded at import time). Slice 3 wires Table 105 into cert_to_inputs.main_heating_efficiency / water_efficiency precedence cascades per Q5=B (space heating + water heating scalar override; equation D1 monthly + Appendix N HP factor + FGHRS/WWHRS/HIU deferred).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 09:43:41 +00:00
..
src/domain pcdb slice 1: pcdb10.dat ETL → 8 per-table NDJSON files + parser + 8 tests 2026-05-21 09:43:41 +00:00
pyproject.toml slice 13: to_rows(properties) returns pd.DataFrame 2026-05-16 16:43:28 +00:00
README.md added potential file scaffolding: 2026-05-15 10:56:53 +00:00

domna-domain

Shared domain types — Property, Properties, BaselinePerformance, Plan, PlanPhase, Scenario, ScenarioPhase, ScenarioSnapshot, Recommendation, OptimisedPackage, EpcPropertyData, etc.

Boundary: types only. No persistence, no IO, no business logic. Other packages and services depend on domna-domain; this package depends on nothing internal.

Domain definitions live in ../../CONTEXT.md. New types added here must match the glossary terms.

Layout

src/domain/
├── __init__.py
├── property.py             # Property, Properties, PropertyIdentity
├── site_notes.py
├── landlord_overrides.py
├── baseline_performance.py # lodged + effective pair (ADR-0004)
├── plan.py                 # Plan, PlanPhase, OptimisedPackage
├── scenario.py             # Scenario, ScenarioPhase, ScenarioSnapshot (ADR-0005)
├── recommendation.py
├── geospatial.py
├── solar.py
├── anomaly_flags.py
└── ml/
    ├── __init__.py
    ├── transform.py        # EpcMlTransform (versioned per §8.3)
    └── schema.py

When datatypes/epc/domain/ folds in, the EPC schema types move under src/domain/epc/.