Model/docs/adr/0004-baseline-performance-lodged-effective-pair.md
Khalim Conn-Kowlessar 457d959b1f refactor(property-baseline): rename baseline → property_baseline aggregate (PR #1139 review)
Wholesale rename of the Baseline aggregate to PropertyBaseline for clarity /
to disambiguate from baselines that appear elsewhere in Modelling. Scoped to
this aggregate only — the distinct Rebaselining term (rebaseline_reason,
StubRebaseliner, RebaselineNotImplemented) is deliberately untouched.

- domain/baseline → domain/property_baseline; BaselinePerformance →
  PropertyBaselinePerformance.
- repositories/baseline → repositories/property_baseline; BaselineRepository
  / BaselinePostgresRepository → PropertyBaseline*.
- orchestration/baseline_orchestrator.py → property_baseline_orchestrator.py;
  BaselineOrchestrator → PropertyBaselineOrchestrator. BaselineStage →
  PropertyBaselineStage.
- infrastructure/postgres: baseline_performance_table.py →
  property_baseline_performance_table.py; table `baseline_performance` →
  `property_baseline_performance`; Model renamed.
- UnitOfWork attribute `.baseline` → `.property_baseline`.
- Docs: ADR-0004 references + migration doc (renamed to
  property-baseline-performance-table.md) updated.

CONTEXT.md glossary term ("Baseline Performance") left as-is pending a
ubiquitous-language call (raised on the PR). 123 tests pass; pyright strict
clean (only the unrelated pre-existing moto import errors remain).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00

41 lines
3.8 KiB
Markdown

# `PropertyBaselinePerformance` stores both lodged and effective values
A Property's current performance has two states we care about: the rating that was lodged on the government register (the "lodged" SAP / band / carbon / heat) and the rating produced by the modelling pipeline against the current Effective EPC (the "effective" values, which may have been rebaselined by ML when the EPC was pre-SAP10 or when Landlord Overrides / Site Notes changed physical state). We considered storing a single set of values — the rebaselined-if-needed-otherwise-lodged figures — and rejected that. Both are stored as a pair on every `PropertyBaselinePerformance`, equal when no rebaselining trigger fires.
The pair lets the FE show "this is what the gov register says vs this is the SAP10-equivalent we modelled against" side by side without a second query, and keeps the audit trail clean: a user looking at a property's plan can see exactly which figure drove the recommendation pipeline. Storing only one set forces a downstream consumer to recompute the missing one from raw EPC fields when it needs both, which is the kind of derivation creep we want to keep out of the FE.
The cost is a wider row + the discipline that **every** `PropertyBaselinePerformance` populates both halves, even when they're equal. Annual kWh, fuel split and bills are not paired — they are always derived deterministically by `EpcEnergyDerivationService` against the Effective state, because the EPC's recorded cost fields use fuel rates pinned to the inspection date and the UCL correction depends on the modelled band.
## Consequences
- Reversing this means rewriting every consumer that has learned to read both values. Hard to roll back once the FE depends on the pair.
- The rebaseline trigger has two reasons (`pre_sap10`, `physical_state_changed`, or `both`) — store the reason alongside so we know *why* a property was rebaselined when debugging.
### Amendment (2026-05-30, #1135): standalone `property_baseline_performance` table
The original consequence read *"`property_details_epc` (or its successor) carries 8 fields
instead of 4 for the SAP-equivalent block"* — i.e. the pair as columns on the EPC-details table.
That is superseded. `property_details_epc` is being **retired**: it is too tightly coupled to the
schema of the legacy EPC API, which the Ara rebuild is moving off. So the pair has no home there.
`PropertyBaselinePerformance` instead persists as its **own standalone `property_baseline_performance` table, one
row per Property**, behind a dedicated `PropertyBaselineRepository` port (`save` / `get_for_property`),
mirroring the EPC slice's repo shape. This is the cleaner model regardless of the retirement:
`PropertyBaselinePerformance` is its own aggregate (a Property's current performance), not a detail of any
single EPC.
The row is **flat typed columns**, not a JSONB blob, because the FE both surfaces the block and
queries the lodged-vs-effective pair: `lodged_{sap_score, epc_band, co2_emissions,
primary_energy_intensity}`, the four `effective_*` mirrors, `rebaseline_reason`, and (for the part
of the energy block that needs no derivation) `space_heating_kwh` / `water_heating_kwh`. The
fourth paired quantity is **Primary Energy Intensity**, not "heat demand" — see CONTEXT.md
(the prose above predates that term being sharpened).
Fuel split and bills — the rest of the EPC Energy Derivation block — are **deferred to a
follow-up**: bills require a current Fuel Rates source (Ofgem-cap ETL) that does not yet exist, and
fuel split is produced by the same `EpcEnergyDerivationService`, so the two land together rather
than churning the table twice.
The SQLModel row is defined in `infrastructure/postgres/` so the ephemeral-Postgres tests build it
via `create_all`; the production migration is FE-owned (Drizzle ORM) and tracked in
`docs/migrations/`.