Model/domain/sap10_calculator/tables/pcdb/etl.py
Khalim Conn-Kowlessar 433f4a49ce Slice S0380.99: PCDB Table 329 (MV In-Use Factors) ETL + parser + lookup (PCDF Spec §A.20)
PCDF Spec Rev 6b §A.20 (May 2021) Format 430 — Mechanical Ventilation
In-Use Factors Table. Pcdb10.dat carries Format 432 (header
`$329,432,4,2021,11,25,2`), an extended-field version where Format
430 fields 1-4 (system_type + 3 SFP factors for the "no approved
scheme" variant) align at positions 0..3. The remainder of Format
432 carries MVHR adjustments + "with approved scheme" variants +
additional Format 432 columns, preserved verbatim in `raw` for
follow-up slices.

Per PCDF Spec §A.20 field 1 — system types:
  1  = centralised MEV
  2  = decentralised MEV
  3  = balanced whole-house MV (with or without heat recovery)
  5  = positive input ventilation (PIV)
  10 = default data (used with SAP Table 4g defaults)

Decentralised MEV (system_type=2) IUFs:
  SFP × ducting type:
    flexible:   1.45 (field 2)
    rigid:      1.30 (field 3)
    no-duct:    1.15 (field 4 — through-wall fans)

Per spec Note: "If there is no applicable approved installation
scheme the values for with and without scheme are the same." Cert
000565 lodges "Approved Installation: No" → use the "no scheme"
IUFs.

Validation for cert 000565 against worksheet line (230a):
  Σ(SFP_j × FR_j × IUF_j) for the 4 lodged fans:
    in-room kitchen:        1×0.15×13×1.45 = 2.8275
    in-room other wet:      1×0.15× 8×1.45 = 1.7400
    through-wall kitchen:   2×0.11×13×1.15 = 3.2890
    through-wall other wet: 3×0.14× 8×1.15 = 3.8640
  Σ = 11.7205 W (matches worksheet "total watage = 11.7205")
  Σ(FR_j) = 92.0 l/s (matches worksheet "total flow = 92.0000")
  SFPav = 11.7205 / 92.0 = 0.1274 W/(l/s) ✓ matches worksheet

Foundation only this slice — typed parser + ETL + runtime lookup
`mv_in_use_factors_record(system_type)`. No cascade integration; no
behavioural change on any cert. Next slice S0380.100 wires the
SFPav formula.

5 Table 329 records ingested. Pyright net-zero per touched file.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 15:20:02 +00:00

130 lines
5.1 KiB
Python

"""ETL: parse BRE PCDB pcdb10.dat into per-table JSON files.
Idempotent. Re-run when BRE publishes an updated pcdb10.dat. JSON files
are committed in-repo alongside the source .dat so callers can load
without a build step. Run via `python -m domain.sap10_calculator.tables.pcdb.etl`.
Reference: BRE PCDB pcdb10.dat (April 2026 revision).
"""
from __future__ import annotations
import json
from dataclasses import asdict
from pathlib import Path
from domain.sap10_calculator.tables.pcdb.parser import (
DecentralisedMevRecord,
GasOilBoilerRecord,
MvInUseFactorsRecord,
RawPcdbRecord,
parse_table_105,
parse_table_322,
parse_table_329,
parse_table_raw,
)
_TABLE_105_OUTPUT_FILENAME: str = "pcdb_table_105_gas_oil_boilers.jsonl"
_TABLE_322_OUTPUT_FILENAME: str = "pcdb_table_322_decentralised_mev.jsonl"
_TABLE_329_OUTPUT_FILENAME: str = "pcdb_table_329_mv_in_use_factors.jsonl"
# Tables ingested as `RawPcdbRecord` (pcdb_id + raw) — per-field typing is
# deferred to follow-up slices when the cert-side wiring for each table
# lands.
_RAW_TABLES: dict[str, str] = {
"122": "pcdb_table_122_solid_fuel_boilers.jsonl",
"143": "pcdb_table_143_micro_cogen.jsonl",
"313": "pcdb_table_313_flue_gas_heat_recovery.jsonl",
"353": "pcdb_table_353_waste_water_heat_recovery.jsonl",
"362": "pcdb_table_362_heat_pumps.jsonl",
"391": "pcdb_table_391_high_heat_retention_storage_heaters.jsonl",
"506": "pcdb_table_506_heat_interface_units.jsonl",
}
def _gas_oil_record_to_jsonable(record: GasOilBoilerRecord) -> dict[str, object]:
"""Serialise a typed Table 105 record into a JSON-safe dict."""
serialisable = asdict(record)
serialisable["raw"] = list(record.raw)
return serialisable
def _raw_record_to_jsonable(record: RawPcdbRecord) -> dict[str, object]:
"""Serialise a generic raw PCDB record into a JSON-safe dict."""
return {"pcdb_id": record.pcdb_id, "raw": list(record.raw)}
def _write_ndjson(*, output_path: Path, records: list[dict[str, object]]) -> None:
"""Newline-delimited JSON: one record per line, no top-level array,
no indent. Diffs are line-granular when records are added/changed."""
lines = [json.dumps(record, ensure_ascii=False) for record in records]
output_path.write_text("\n".join(lines) + "\n", encoding="utf-8")
def run_etl(*, source: Path, output_dir: Path) -> None:
"""Read `source` (pcdb10.dat), parse Table 105 (typed) plus the raw
tables enumerated in `_RAW_TABLES`, and write one newline-delimited
JSON file (`.jsonl`) per table under `output_dir/`. Idempotent;
record order preserves source order for diff-friendliness."""
output_dir.mkdir(parents=True, exist_ok=True)
dat_text = source.read_text(encoding="latin-1")
_write_ndjson(
output_path=output_dir / _TABLE_105_OUTPUT_FILENAME,
records=[_gas_oil_record_to_jsonable(r) for r in parse_table_105(dat_text)],
)
# Table 322 (Decentralised MEV) — typed via `parse_table_322` so the
# per-fan-configuration block (config_code, flow, SFP triplets) is
# exposed for the SAP 10.2 §2.6.4 SFPav cascade. Stored as raw row +
# typed-on-load (consistent with Table 362 pattern at `__init__.py`).
_write_ndjson(
output_path=output_dir / _TABLE_322_OUTPUT_FILENAME,
records=[
_decentralised_mev_record_to_jsonable(r)
for r in parse_table_322(dat_text)
],
)
# Table 329 (MV In-Use Factors) — typed via `parse_table_329`,
# exposing the per-ducting-type SFP IUF multipliers for "no
# approved scheme" installations (the only variant our cohort
# exercises). Stored as raw row + typed-on-load.
_write_ndjson(
output_path=output_dir / _TABLE_329_OUTPUT_FILENAME,
records=[
_mv_in_use_factors_record_to_jsonable(r)
for r in parse_table_329(dat_text)
],
)
for table_id, filename in _RAW_TABLES.items():
_write_ndjson(
output_path=output_dir / filename,
records=[_raw_record_to_jsonable(r) for r in parse_table_raw(dat_text, table_id)],
)
def _decentralised_mev_record_to_jsonable(
record: DecentralisedMevRecord,
) -> dict[str, object]:
"""Serialise a typed Table 322 record as `{pcdb_id, raw}` — same
shape as `_raw_record_to_jsonable` so the on-disk format is
identical between raw and typed tables. The lookup re-decodes via
`parse_decentralised_mev_row` at import time."""
return {"pcdb_id": record.pcdb_id, "raw": list(record.raw)}
def _mv_in_use_factors_record_to_jsonable(
record: MvInUseFactorsRecord,
) -> dict[str, object]:
"""Serialise a typed Table 329 record. Table 329 is keyed by
`system_type` rather than `pcdb_id`, so this dict uses `system_type`
as the primary identifier; lookup callers `mv_in_use_factors(
system_type)` resolve via the same key."""
return {"system_type": record.system_type, "raw": list(record.raw)}
if __name__ == "__main__": # pragma: no cover — manual ETL invocation
data_dir = Path(__file__).resolve().parent / "data"
run_etl(
source=data_dir / "pcdb10.dat",
output_dir=data_dir,
)