pcdb slice 2: runtime gas_oil_boiler_record lookup via Table 105 NDJSON

Adds the cert-side lookup surface for Table 105: gas_oil_boiler_record(pcdb_id) -> Optional[GasOilBoilerRecord]. NDJSON is loaded once at module import, parsed into a by-pcdb-id dict, and cached by the Python runtime. Lookup is O(1).

Returns None when the cert's main_heating_index_number is not in Table 105 — caller falls back to the existing seasonal_efficiency(...) Table 4a/4b cascade.

Two tests pin the contract: verified Baxi 000098 lookup returns the typed record with brand "Baxi Heating", winter eff 66.0%, summer eff 56.0%; unknown PCDB ID returns None.

Slice 3 wires gas_oil_boiler_record into cert_to_inputs.main_heating_efficiency and water_efficiency precedence cascades per Q5=B (space heating + water heating scalar override).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-05-21 09:45:28 +00:00
parent fe04cd3a35
commit 236782287e
2 changed files with 103 additions and 11 deletions

View file

@ -8,19 +8,70 @@ that the PCDB winter seasonal efficiency overrides the Table 4b
category default closing most of the cert-vs-rating efficiency gap
documented in [ADR-0010 §4](../../../../../../../docs/adr/0010-sap10-calculator-spec-target-and-validation.md#4-pcdb-integration-is-promoted-from-session-c-to-a-prerequisite).
This subpackage owns:
Public surface:
- `parser.py`: per-table row parsers (Tables 105, 122, 143, 362, 391,
313, 353, 506) that decode a CSV-shaped row into a typed record
dataclass with high-confidence fields named + the full raw row
preserved for forensics.
- `etl.py`: walks the multi-table `pcdb10.dat` source, dispatches each
table's records to its parser, and writes one JSON file per table
under `docs/sap-spec/`.
- `<table>.py` (planned): runtime lookup modules that import the JSON
and expose `gas_oil_boiler_record(pcdb_id) -> Optional[...]` style
functions for `cert_to_inputs` precedence cascades.
- `gas_oil_boiler_record(pcdb_id)`: Table 105 lookup.
- `GasOilBoilerRecord`: typed record dataclass.
- `parser.py`: per-table row parsers (Table 105 typed; raw walker for the
other 7 tables).
- `etl.py`: walks the multi-table `pcdb10.dat` source and writes one
newline-delimited JSON file per table under `docs/sap-spec/`.
Reference: BRE PCDB pcdb10.dat (April 2026 revision); SAP 10.2
specification (14-03-2025) Appendix D2.1.
"""
from __future__ import annotations
import json
from pathlib import Path
from typing import Final, Optional
from domain.sap.tables.pcdb.parser import GasOilBoilerRecord
__all__ = ["GasOilBoilerRecord", "gas_oil_boiler_record"]
_REPO_SAP_SPEC_DIR: Final[Path] = (
Path(__file__).resolve().parents[7] / "docs" / "sap-spec"
)
_TABLE_105_JSONL: Final[Path] = (
_REPO_SAP_SPEC_DIR / "pcdb_table_105_gas_oil_boilers.jsonl"
)
def _load_table_105() -> dict[int, GasOilBoilerRecord]:
"""Read the Table 105 NDJSON at import time and build a by-pcdb-id
dict. ~5MB / ~4000 rows; one-off ~50ms cost. The Python runtime
caches the dict so repeated lookups are O(1)."""
records_by_id: dict[int, GasOilBoilerRecord] = {}
with _TABLE_105_JSONL.open(encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
data = json.loads(line)
record = GasOilBoilerRecord(
pcdb_id=data["pcdb_id"],
brand_name=data["brand_name"],
model_name=data["model_name"],
model_qualifier=data["model_qualifier"],
winter_efficiency_pct=data["winter_efficiency_pct"],
summer_efficiency_pct=data["summer_efficiency_pct"],
comparative_hot_water_efficiency_pct=data["comparative_hot_water_efficiency_pct"],
output_kw_max=data["output_kw_max"],
final_year_of_manufacture=data["final_year_of_manufacture"],
raw=tuple(data["raw"]),
)
records_by_id[record.pcdb_id] = record
return records_by_id
_TABLE_105_BY_ID: Final[dict[int, GasOilBoilerRecord]] = _load_table_105()
def gas_oil_boiler_record(pcdb_id: int) -> Optional[GasOilBoilerRecord]:
"""Table 105 lookup by `main_heating_index_number`. Returns None when
the cert's index number is not in Table 105 — caller falls back to
Table 4a/4b category defaults via `seasonal_efficiency(...)`."""
return _TABLE_105_BY_ID.get(pcdb_id)

View file

@ -0,0 +1,41 @@
"""Tests for the runtime PCDB lookup module.
The lookup loads pcdb_table_105_gas_oil_boilers.jsonl at import time and
caches it as a by-pcdb-id dict. Callers (cert_to_inputs) invoke
`gas_oil_boiler_record(pcdb_id)` to obtain the typed record or None when
the ID is not in the PCDB.
Reference: BRE PCDB pcdb10.dat (April 2026); user-verified records.
"""
from __future__ import annotations
from domain.sap.tables.pcdb import gas_oil_boiler_record
def test_gas_oil_boiler_record_returns_verified_baxi_98() -> None:
"""Baxi Heating Wm 20/3rs (user-verified against ncm-pcdb.org.uk):
winter SAP seasonal efficiency 66.0%, summer 56.0%, comparative HW
40.8%. Lookup by `main_heating_index_number = 98` returns the typed
record."""
# Arrange
# Act
record = gas_oil_boiler_record(98)
# Assert
assert record is not None
assert record.brand_name == "Baxi Heating"
assert record.model_name == "Wm"
assert record.winter_efficiency_pct == 66.0
assert record.summer_efficiency_pct == 56.0
def test_gas_oil_boiler_record_returns_none_for_unknown_pcdb_id() -> None:
"""`main_heating_index_number` values not in Table 105 return None so
`cert_to_inputs` can fall back to the Table 4a/4b category default."""
# Arrange
# Act
record = gas_oil_boiler_record(99999999)
# Assert
assert record is None