feat(baseline): run Sap10Calculator in shadow on Property Baseline (ADR-0013)

Wire Sap10Calculator into PropertyBaselineOrchestrator as a non-load-bearing shadow runner. For each property it scores the Effective EPC beside the load-bearing Lodged/Effective write, catches any strict-raise -> log.error (never aborts the batch), and on success log.warning's divergence from Lodged: SAP |continuous - lodged| > 0.5; PEUI/CO2 > 1% relative (CO2 after kg->tonnes). Every line is tagged with sap_version so SAP-10.2 signal separates from older-spec drift (ADR-0010 Validation Cohort). Per ADR-0013, Calculated SAP10 Performance is not a persisted third value-set: effective = calculated in every baselining scenario, so the calculator IS the mechanism that produces Effective Performance (the Rebaseliner). It runs in shadow only while being hardened; when overrides/estimation land it is promoted to drive Effective and the failure posture flips to abort (ADR-0012, calculator now load-bearing). No table change. - ADR-0013 + CONTEXT (Calculated SAP10 Performance / Effective Performance / Rebaselining) record the decision. - CalculatorShadow port + LoggingCalculatorShadow + Calculator protocol. - FakeCalculatorShadow for orchestrator unit tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-27 23:35:01 +00:00 · 2026-06-02 08:01:47 +00:00 · 2026-06-02 08:01:47 +00:00 · 561e1b8b49
commit 561e1b8b49
parent ce33cd94ef
9 changed files with 473 additions and 7 deletions
--- a/CONTEXT.md
+++ b/CONTEXT.md
@ -82,7 +82,7 @@ The EpcPropertyData scored by the modelling pipeline for a single Property, deri
 _Avoid_: modelling EPC, working EPC, resolved EPC, derived EPC

 **Rebaselining**:
-Re-predicting a Property's SAP score, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh via ML so the modelling pipeline scores it against the current SAP10 methodology. Triggered when either (a) the Effective EPC was lodged under a pre-SAP10 schema (`sap_version < 10.0`), so the recorded scores reflect a superseded methodology, or (b) Site Notes / Landlord Overrides changed the physical state of the Property (walls / heating / windows / etc.) so the lodged scores no longer reflect what's installed. Both triggers may fire together. Produces Effective Performance; Lodged Performance is preserved unchanged. kWh is included as ML targets per ADR-0007 — see [[epc-ml-transform]].
+Re-predicting a Property's SAP score, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh via **SAP10 Calculation** (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013) so the modelling pipeline scores it against the current SAP10 methodology. Triggered when either (a) the Effective EPC was lodged under a pre-SAP10 schema (`sap_version < 10.0`), so the recorded scores reflect a superseded methodology, or (b) Site Notes / Landlord Overrides changed the physical state of the Property (walls / heating / windows / etc.) so the lodged scores no longer reflect what's installed. Both triggers may fire together. Produces Effective Performance; Lodged Performance is preserved unchanged. kWh is included as ML targets per ADR-0007 — see [[epc-ml-transform]].
 _Avoid_: re-scoring, re-prediction, performance recomputation, refresh (for cache-freshness)

 **Baseline Performance**:
@ -94,12 +94,12 @@ The SAP / EPC Band / carbon emissions / Primary Energy Intensity recorded on the
 _Avoid_: original performance, raw EPC values, recorded baseline

 **Effective Performance**:
-The SAP / EPC Band / carbon emissions / Primary Energy Intensity the modelling pipeline actually scored against — equal to Lodged Performance when no Rebaselining trigger fires, replaced by ML output when triggered. The half of Baseline Performance that says "what we modelled".
+The SAP / EPC Band / carbon emissions / Primary Energy Intensity the modelling pipeline actually scored against — equal to Lodged Performance when no Rebaselining trigger fires, replaced by **SAP10 Calculation** output (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013) when triggered. The half of Baseline Performance that says "what we modelled".
 _Avoid_: modelled performance, rebaselined performance (only correct when rebaselining ran), scored values

 **Calculated SAP10 Performance**:
-The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by **SAP10 Calculation** from a Property's EpcPropertyData. Distinct from Effective Performance (ML output) and Lodged Performance (gov register) during the validation phase. Surfaced alongside Effective Performance in the UI; may supersede Effective Performance in a later ADR once parity is confirmed against the cert-reported SAP across ≥1000 sample certs lodged on the calculator's target spec version (see [[sap-spec-version]]). ADR-0009 (as amended by ADR-0010).
-_Avoid_: calculator output, computed performance, worksheet performance, SAP10 output
+The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by **SAP10 Calculation** from a Property's EpcPropertyData. It is **not** a separately-persisted third value-set beside Lodged and Effective: in every baselining scenario the calculator's output *is* the **Effective Performance** (real lodged SAP10 EPC with no overrides ⇒ Calculated = Lodged = Effective; overrides or an estimated / pre-SAP10 EPC ⇒ Calculated = Effective, there being no lodged SAP10 figure to compare against). The calculator is therefore the mechanism that produces Effective Performance, having superseded the old ML-API rebaseliner. While it is being hardened it runs in **shadow** for the first baselining slice — computed on every Property, compared to Lodged, and any divergence (SAP > 0.5, or PEUI / CO2 beyond tolerance) or strict-raise **logged, not persisted** — then is promoted to drive Effective Performance once overrides / estimation land (ADR-0013). The ≥1000-cert parity confirmation against the cert-reported SAP (see [[sap-spec-version]]) gates that promotion. ADR-0009 introduced the term, as amended by ADR-0010 and realized by ADR-0013.
+_Avoid_: calculator output, computed performance, worksheet performance, SAP10 output, calculated value-set (it is not a stored third set)

 **SAP10 Calculation**:
 The process that runs the deterministic SAP 10.2 (14-03-2025 amendment) worksheet over a Property's EpcPropertyData and emits **Calculated SAP10 Performance**. Implemented by the `Sap10Calculator` service class in `domain/sap10_calculator/` (`calculator.py`). Reads cert fabric/heating/geometry fields, applies the RdSAP 10 (10-06-2025) cert→input mapping, executes the 12-month heat balance per SAP 10.2 §§1-14, looks up boiler/heat-pump performance in the **PCDB** when the cert lodges a product index, and returns a `SapResult` carrying the five Calculated SAP10 Performance quantities plus a monthly breakdown and worksheet-line audit trail. Distinct from **Rebaselining**, which is ML-based. ADR-0009 originally targeted SAP 10.3 (13-01-2026); ADR-0010 retargets to SAP 10.2 (14-03-2025) until the cert corpus migrates.
--- a/applications/ara_first_run/handler.py
+++ b/applications/ara_first_run/handler.py
@ -10,7 +10,9 @@ from sqlmodel import Session
 from applications.ara_first_run.ara_first_run_trigger_body import (
    AraFirstRunTriggerBody,
 )
+from domain.property_baseline.calculator_shadow import LoggingCalculatorShadow
 from domain.property_baseline.rebaseliner import StubRebaseliner
+from domain.sap10_calculator.calculator import Sap10Calculator
 from infrastructure.postgres.config import PostgresConfig
 from infrastructure.postgres.engine import make_engine
 from orchestration.property_baseline_orchestrator import PropertyBaselineOrchestrator
@ -81,6 +83,9 @@ def build_first_run_pipeline(
        baseline=PropertyBaselineOrchestrator(
            unit_of_work=unit_of_work,
            rebaseliner=StubRebaseliner(),
+            # Shadow only: validates the calculator over the wild cohort without
+            # gating the load-bearing baseline write (ADR-0013).
+            calculator_shadow=LoggingCalculatorShadow(Sap10Calculator()),
        ),
        modelling=ModellingOrchestrator(
            scenario_repo=ScenarioRepository(),
--- a/docs/adr/0013-calculator-produces-effective-performance-shadow-first.md
+++ b/docs/adr/0013-calculator-produces-effective-performance-shadow-first.md
@ -0,0 +1,88 @@
+---
+Status: accepted
+---
+
+# The `Sap10Calculator` produces Effective Performance (it is the Rebaseliner); Calculated SAP10 Performance is not a persisted third value-set, and is wired in shadow first
+
+Refines [ADR-0004](0004-baseline-performance-lodged-effective-pair.md) (the Lodged/Effective
+pair), [ADR-0009](0009-deterministic-sap-calculator.md)/[ADR-0010](0010-sap10-calculator-spec-target-and-validation.md)
+(the calculator + the **Calculated SAP10 Performance** term), [ADR-0011](0011-composable-stage-orchestrators.md)
+(the `Rebaseliner` seam) and [ADR-0012](0012-unit-of-work-per-stage-batch-transaction.md)
+(all-or-nothing per batch). Decided in a `/grill-with-docs` session (2026-06-01) before wiring
+`Sap10Calculator` into `PropertyBaselineOrchestrator`.
+
+## Context
+
+The old `model_engine` (`backend/engine/engine.py`) called out to an **ML API**
+(`model_api.predict_all` over `BASELINE_MODEL_PREFIXES`) to rebaseline the properties that needed
+it. The rebuild replaces that round-trip with the **deterministic `Sap10Calculator`, run live**.
+
+The handover and CONTEXT (line 100) framed **Calculated SAP10 Performance** as a *third* value-set
+persisted *alongside* Lodged and Effective (`calculated_*` columns). Walking the baselining
+scenarios shows that framing reifies a distinction that does not exist in the domain:
+
+- real lodged SAP10 EPC, no overrides ⇒ Calculated = Lodged = Effective;
+- real EPC + property/landlord overrides ⇒ Calculated = Lodged-plus-overrides = Effective;
+- estimated EPC (± overrides), or a pre-SAP10 EPC ⇒ Calculated = Effective (no lodged SAP10 to
+  compare against — Lodged Performance exists only for a *real lodged* EPC).
+
+In every scenario **Effective = Calculated**. There is no third quantity.
+
+## Decision
+
+**The calculator is the mechanism that produces Effective Performance** — i.e. the deterministic
+`Rebaseliner` (ADR-0011's seam), superseding the old ML-API rebaseliner. "Calculated SAP10
+Performance" is the *name of that output during validation*, **not** a separately-persisted third
+value-set. No `calculated_*` columns are added; `property_baseline_performance` keeps its
+Lodged/Effective shape (ADR-0004). The ADR-0009 ML model is repositioned as a *future residual head*
+over the calculator, not the baseline producer.
+
+**Shadow-first, then promotion.** The calculator still strict-raises (`UnmappedSapCode`,
+`MissingMainFuelType`, `UnresolvedPcdbCombiLoss`) on cert mappings it has not yet hardened, and the
+strict-typing of `EpcPropertyData` that will close most of those gaps is still pending. A ~40,000
+property test cohort is about to flow through baselining. So this lands in two steps:
+
+1. **This slice — shadow.** Performance is still **defined by the input data**: `StubRebaseliner`
+   keeps producing Effective (`= Lodged` for the only live scenario, real SAP10 + no overrides).
+   The calculator runs *beside* it, on every Property's Effective EPC, **purely to be battle-tested
+   in the wild**. It is **not load-bearing**, therefore:
+   - a calculator raise is **caught and logged at `error`, never aborts the batch** — otherwise one
+     unmappable cert would lose the load-bearing Lodged/Effective write for the whole batch, and
+     over a 40k run most batches would never baseline;
+   - on success, its output is **compared to Lodged and logged, not persisted** — `warning` when
+     `|sap_continuous − lodged_sap| > 0.5`, or PEUI / CO2 diverge beyond tolerance (CO2 after the
+     kg→tonnes conversion). Each log is tagged with the cert's `sap_version` so SAP-10.2 divergence
+     (a real calculator signal) is separable from older-spec drift (expected — see
+     [ADR-0010](0010-sap10-calculator-spec-target-and-validation.md) Validation Cohort).
+
+2. **Next slice or two — load-bearing.** When overrides + EPC estimation land (days away),
+   `StubRebaseliner` is replaced by a calculator-backed `Rebaseliner`: the calculator's output
+   **becomes Effective Performance**. The failure posture **flips to abort** per ADR-0012 — now that
+   the calculator *is* the baseline, a silent wrong answer is the expensive outcome, so a raise must
+   fail the batch noisily. Same exception, opposite handling, because the calculator went from
+   shadow to load-bearing. The shadow logging is then retired.
+
+## Considered options
+
+- **A third persisted `calculated_*` value-set on `PropertyBaselinePerformance`** (the handover's
+  recommendation) — rejected: `Effective = Calculated` in every scenario, so the columns would
+  store a distinction with no domain reality, and the future "supersede effective" promotion would
+  be a data move instead of nothing.
+- **Promote the calculator to drive Effective immediately** — rejected for this one slice: it still
+  strict-raises on un-hardened mappings, so over the imminent 40k run it would gate the
+  load-bearing baseline write. Shadow-first surfaces every gap as an aggregatable error log without
+  blocking baselining.
+- **A separate `calculator_shadow` validation table** — held in reserve: log-only is enough while
+  the calculator is moving and the shadow step is a 1–2 day stepping stone; we add a queryable table
+  only if log aggregation proves too weak.
+
+## Consequences
+
+- `property_baseline_performance` is **unchanged** this slice — no migration.
+- CONTEXT **Calculated SAP10 Performance**, **Effective Performance**, and **Rebaselining** are
+  updated: the calculator (not ML) is the rebaseliner mechanism in the rebuilt engine; Calculated is
+  not a stored third set.
+- The shadow runner's broad `except` is deliberate (the point is to discover *what* breaks in the
+  wild); each caught exception is logged with its type and `property_id`.
+- This decision is short-lived in its shadow form by design; the durable half — "the calculator
+  produces Effective Performance; there is no third value-set" — outlives it.
--- a/domain/property_baseline/calculator_shadow.py
+++ b/domain/property_baseline/calculator_shadow.py
@ -0,0 +1,141 @@
+from __future__ import annotations
+
+import logging
+from abc import ABC, abstractmethod
+from typing import TYPE_CHECKING, Optional, Protocol
+
+from domain.property_baseline.performance import Performance
+
+if TYPE_CHECKING:
+    from datatypes.epc.domain.epc_property_data import EpcPropertyData
+    from domain.sap10_calculator.calculator import SapResult
+
+logger = logging.getLogger(__name__)
+
+# A continuous SAP this far from the lodged integer would round to a different
+# band-driving score; PEUI / CO2 scale with dwelling size so they use a relative
+# tolerance (ADR-0013). Starting dials — tune against the wild-cohort logs.
+_SAP_ABS_TOL = 0.5
+_REL_TOL = 0.01
+_KG_PER_TONNE = 1000.0
+
+
+class CalculatorShadow(ABC):
+    """Runs SAP10 Calculation in shadow beside the load-bearing baseline write
+    and reports divergence from Lodged Performance (ADR-0013).
+
+    The calculator is not yet load-bearing — it is still being hardened, and a
+    large test cohort is about to flow through baselining. So an implementation
+    **must never raise**: a shadow failure may not abort the batch (ADR-0012's
+    all-or-nothing governs only the load-bearing Lodged/Effective write). It
+    observes, compares against Lodged, and logs; it does not feed Effective
+    Performance. The seam is retired when the calculator is promoted to the
+    Rebaseliner and its output *becomes* Effective Performance.
+    """
+
+    @abstractmethod
+    def observe(
+        self,
+        *,
+        property_id: int,
+        effective_epc: "EpcPropertyData",
+        lodged: Performance,
+    ) -> None: ...
+
+
+def _relative_diff(calculated: float, lodged: float) -> float:
+    """|calculated − lodged| / |lodged|; a zero lodged value diverges iff
+    calculated is non-zero (avoids a divide-by-zero on degenerate certs)."""
+    if lodged == 0:
+        return 0.0 if calculated == 0 else float("inf")
+    return abs(calculated - lodged) / abs(lodged)
+
+
+class Calculator(Protocol):
+    """The slice of `Sap10Calculator` the shadow needs: cert in, result out.
+    `Sap10Calculator` satisfies it structurally — no coupling to its module."""
+
+    def calculate(self, epc: "EpcPropertyData") -> "SapResult": ...
+
+
+class LoggingCalculatorShadow(CalculatorShadow):
+    """Runs the calculator and logs, never persists, never raises (ADR-0013).
+
+    A strict-raise (an un-mapped cert) is caught and logged at ``error`` so the
+    wild-cohort gap is greppable; a successful result whose SAP / PEUI / CO2
+    diverges from Lodged beyond tolerance is logged at ``warning``. Every line
+    is tagged with ``property_id`` and the cert's ``sap_version`` so SAP-10.2
+    divergence (a real calculator signal) is separable from older-spec drift.
+    """
+
+    def __init__(self, calculator: Calculator) -> None:
+        self._calculator = calculator
+
+    def observe(
+        self,
+        *,
+        property_id: int,
+        effective_epc: "EpcPropertyData",
+        lodged: Performance,
+    ) -> None:
+        sap_version = effective_epc.sap_version
+        try:
+            # Broad by design: the point is to discover *what* breaks in the
+            # wild, and a shadow failure must never abort the batch (ADR-0013).
+            result = self._calculator.calculate(effective_epc)
+        except Exception as exc:
+            logger.error(
+                "SAP10 shadow calculation failed for property_id=%s "
+                "sap_version=%s: %r",
+                property_id,
+                sap_version,
+                exc,
+            )
+            return
+        if abs(result.sap_score_continuous - lodged.sap_score) > _SAP_ABS_TOL:
+            self._warn_divergence(
+                quantity="sap_score",
+                property_id=property_id,
+                sap_version=sap_version,
+                lodged=lodged.sap_score,
+                calculated=result.sap_score_continuous,
+            )
+        if _relative_diff(
+            result.primary_energy_kwh_per_m2, lodged.primary_energy_intensity
+        ) > _REL_TOL:
+            self._warn_divergence(
+                quantity="primary_energy_intensity",
+                property_id=property_id,
+                sap_version=sap_version,
+                lodged=lodged.primary_energy_intensity,
+                calculated=result.primary_energy_kwh_per_m2,
+            )
+        # Lodged CO2 is tonnes/yr; the calculator emits kg/yr (ADR-0013).
+        calculated_co2_t = result.co2_kg_per_yr / _KG_PER_TONNE
+        if _relative_diff(calculated_co2_t, lodged.co2_emissions) > _REL_TOL:
+            self._warn_divergence(
+                quantity="co2_emissions",
+                property_id=property_id,
+                sap_version=sap_version,
+                lodged=lodged.co2_emissions,
+                calculated=calculated_co2_t,
+            )
+
+    def _warn_divergence(
+        self,
+        *,
+        quantity: str,
+        property_id: int,
+        sap_version: Optional[float],
+        lodged: float,
+        calculated: float,
+    ) -> None:
+        logger.warning(
+            "SAP10 shadow divergence on %s for property_id=%s sap_version=%s: "
+            "lodged=%s calculated=%s",
+            quantity,
+            property_id,
+            sap_version,
+            lodged,
+            calculated,
+        )
--- a/orchestration/property_baseline_orchestrator.py
+++ b/orchestration/property_baseline_orchestrator.py
@ -6,6 +6,7 @@ from datatypes.epc.domain.epc_property_data import (
    EpcPropertyData,
    RenewableHeatIncentive,
 )
+from domain.property_baseline.calculator_shadow import CalculatorShadow
 from domain.property_baseline.property_baseline_performance import PropertyBaselinePerformance
 from domain.property_baseline.performance import lodged_performance
 from domain.property_baseline.rebaseliner import Rebaseliner
@ -32,9 +33,11 @@ class PropertyBaselineOrchestrator:
        *,
        unit_of_work: Callable[[], UnitOfWork],
        rebaseliner: Rebaseliner,
+        calculator_shadow: CalculatorShadow,
    ) -> None:
        self._unit_of_work = unit_of_work
        self._rebaseliner = rebaseliner
+        self._calculator_shadow = calculator_shadow

    def run(self, property_ids: list[int]) -> None:
        with self._unit_of_work() as uow:
@ -54,6 +57,14 @@ class PropertyBaselineOrchestrator:
                    water_heating_kwh=rhi.water_heating_kwh,
                )
                uow.property_baseline.save(baseline, property_id)
+                # Shadow only: validate the calculator in the wild without
+                # gating the load-bearing write above (ADR-0013). `observe`
+                # never raises, so it cannot abort the batch.
+                self._calculator_shadow.observe(
+                    property_id=property_id,
+                    effective_epc=effective_epc,
+                    lodged=lodged,
+                )
            uow.commit()


--- a/tests/domain/property_baseline/test_calculator_shadow.py
+++ b/tests/domain/property_baseline/test_calculator_shadow.py
@ -0,0 +1,166 @@
+from __future__ import annotations
+
+import logging
+from typing import Optional
+
+import pytest
+
+from datatypes.epc.domain.epc import Epc
+from datatypes.epc.domain.epc_property_data import EpcPropertyData
+from domain.property_baseline.calculator_shadow import LoggingCalculatorShadow
+from domain.property_baseline.performance import Performance
+from domain.sap10_calculator.calculator import SapResult
+from domain.sap10_calculator.exceptions import UnmappedSapCode
+
+
+def _epc(*, sap_version: Optional[float]) -> EpcPropertyData:
+    epc = object.__new__(EpcPropertyData)
+    epc.sap_version = sap_version
+    return epc
+
+
+def _lodged() -> Performance:
+    return Performance(
+        sap_score=72, epc_band=Epc.C, co2_emissions=1.8, primary_energy_intensity=180
+    )
+
+
+def _sap_result(
+    *,
+    sap_score_continuous: float = 72.0,
+    primary_energy_kwh_per_m2: float = 180.0,
+    co2_kg_per_yr: float = 1800.0,
+) -> SapResult:
+    """A `SapResult` whose three compared quantities default to *matching*
+    `_lodged()`; each test perturbs one axis."""
+    return SapResult(
+        sap_score=round(sap_score_continuous),
+        sap_score_continuous=sap_score_continuous,
+        ecf=0.0,
+        total_fuel_cost_gbp=0.0,
+        co2_kg_per_yr=co2_kg_per_yr,
+        space_heating_kwh_per_yr=0.0,
+        space_cooling_kwh_per_yr=0.0,
+        fabric_energy_efficiency_kwh_per_m2_yr=0.0,
+        main_heating_fuel_kwh_per_yr=0.0,
+        main_2_heating_fuel_kwh_per_yr=0.0,
+        secondary_heating_fuel_kwh_per_yr=0.0,
+        space_cooling_fuel_kwh_per_yr=0.0,
+        hot_water_kwh_per_yr=0.0,
+        pumps_fans_kwh_per_yr=0.0,
+        lighting_kwh_per_yr=0.0,
+        primary_energy_kwh_per_yr=0.0,
+        primary_energy_kwh_per_m2=primary_energy_kwh_per_m2,
+        monthly=(),
+        intermediate={},
+    )
+
+
+class _RaisingCalculator:
+    def calculate(self, epc: EpcPropertyData) -> SapResult:
+        raise UnmappedSapCode("heat_emitter_type", 99)
+
+
+class _StubCalculator:
+    def __init__(self, result: SapResult) -> None:
+        self._result = result
+
+    def calculate(self, epc: EpcPropertyData) -> SapResult:
+        return self._result
+
+
+def test_observe_swallows_a_calculator_raise_and_logs_error(
+    caplog: pytest.LogCaptureFixture,
+) -> None:
+    # Arrange — the calculator strict-raises on a cert it cannot yet map.
+    shadow = LoggingCalculatorShadow(_RaisingCalculator())
+    epc = _epc(sap_version=10.2)
+
+    # Act — observe must not propagate the raise (ADR-0013: shadow is not
+    # load-bearing, so it cannot abort the batch).
+    with caplog.at_level(logging.ERROR):
+        shadow.observe(property_id=42, effective_epc=epc, lodged=_lodged())
+
+    # Assert — exactly one error record, tagged with property_id + sap_version
+    # and carrying the exception so the wild-cohort gap is greppable.
+    assert len(caplog.records) == 1
+    message = caplog.records[0].getMessage()
+    assert caplog.records[0].levelno == logging.ERROR
+    assert "property_id=42" in message
+    assert "sap_version=10.2" in message
+    assert "heat_emitter_type" in message
+
+
+def test_observe_warns_when_sap_diverges_beyond_half_a_point(
+    caplog: pytest.LogCaptureFixture,
+) -> None:
+    # Arrange — calculated SAP 75.0 vs lodged 72 is 3.0 out (> 0.5).
+    shadow = LoggingCalculatorShadow(
+        _StubCalculator(_sap_result(sap_score_continuous=75.0))
+    )
+    epc = _epc(sap_version=10.2)
+
+    # Act
+    with caplog.at_level(logging.WARNING):
+        shadow.observe(property_id=42, effective_epc=epc, lodged=_lodged())
+
+    # Assert — one warning, naming the diverging quantity + the tags.
+    assert len(caplog.records) == 1
+    message = caplog.records[0].getMessage()
+    assert caplog.records[0].levelno == logging.WARNING
+    assert "sap_score" in message
+    assert "property_id=42" in message
+    assert "sap_version=10.2" in message
+
+
+def test_observe_warns_when_peui_diverges_beyond_one_percent(
+    caplog: pytest.LogCaptureFixture,
+) -> None:
+    # Arrange — calculated PEUI 200 vs lodged 180 is ~11% out (> 1%).
+    shadow = LoggingCalculatorShadow(
+        _StubCalculator(_sap_result(primary_energy_kwh_per_m2=200.0))
+    )
+    epc = _epc(sap_version=10.2)
+
+    # Act
+    with caplog.at_level(logging.WARNING):
+        shadow.observe(property_id=42, effective_epc=epc, lodged=_lodged())
+
+    # Assert
+    assert len(caplog.records) == 1
+    assert "primary_energy_intensity" in caplog.records[0].getMessage()
+
+
+def test_observe_warns_when_co2_diverges_beyond_one_percent_after_kg_to_tonnes(
+    caplog: pytest.LogCaptureFixture,
+) -> None:
+    # Arrange — calculator emits kg/yr; 2000 kg = 2.0 t vs lodged 1.8 t (~11%).
+    shadow = LoggingCalculatorShadow(
+        _StubCalculator(_sap_result(co2_kg_per_yr=2000.0))
+    )
+    epc = _epc(sap_version=10.2)
+
+    # Act
+    with caplog.at_level(logging.WARNING):
+        shadow.observe(property_id=42, effective_epc=epc, lodged=_lodged())
+
+    # Assert — the kg→tonnes conversion is applied before comparison, so a
+    # matching 1800 kg would *not* fire (guarded by the silent-when-aligned test).
+    assert len(caplog.records) == 1
+    assert "co2_emissions" in caplog.records[0].getMessage()
+
+
+def test_observe_is_silent_when_the_calculator_agrees_with_lodged(
+    caplog: pytest.LogCaptureFixture,
+) -> None:
+    # Arrange — all three quantities at the matching defaults (SAP 72, PEUI 180,
+    # 1800 kg ≡ 1.8 t): nothing should be logged.
+    shadow = LoggingCalculatorShadow(_StubCalculator(_sap_result()))
+    epc = _epc(sap_version=10.2)
+
+    # Act
+    with caplog.at_level(logging.WARNING):
+        shadow.observe(property_id=42, effective_epc=epc, lodged=_lodged())
+
+    # Assert
+    assert caplog.records == []
--- a/tests/orchestration/fakes.py
+++ b/tests/orchestration/fakes.py
@ -10,6 +10,8 @@ from types import TracebackType
 from typing import Any, Optional

 from datatypes.epc.domain.epc_property_data import EpcPropertyData
+from domain.property_baseline.calculator_shadow import CalculatorShadow
+from domain.property_baseline.performance import Performance
 from domain.property_baseline.property_baseline_performance import PropertyBaselinePerformance
 from domain.property.properties import Properties
 from domain.property.property import Property
@ -88,6 +90,23 @@ class FakePropertyBaselineRepo(PropertyBaselineRepository):
        raise NotImplementedError


+class FakeCalculatorShadow(CalculatorShadow):
+    """Records each `observe` call so a test can assert the orchestrator runs
+    the shadow per property without dragging in the real calculator."""
+
+    def __init__(self) -> None:
+        self.observed: list[tuple[int, EpcPropertyData, Performance]] = []
+
+    def observe(
+        self,
+        *,
+        property_id: int,
+        effective_epc: EpcPropertyData,
+        lodged: Performance,
+    ) -> None:
+        self.observed.append((property_id, effective_epc, lodged))
+
+
 class FakeUnitOfWork(UnitOfWork):
    """A unit that holds in-memory repos and counts commits."""

--- a/tests/orchestration/test_ara_first_run_pipeline_integration.py
+++ b/tests/orchestration/test_ara_first_run_pipeline_integration.py
@ -36,6 +36,7 @@ from repositories.geospatial.geospatial_repository import GeospatialRepository
 from repositories.materials.materials_repository import MaterialsRepository
 from repositories.postgres_unit_of_work import PostgresUnitOfWork
 from repositories.scenario.scenario_repository import ScenarioRepository
+from tests.orchestration.fakes import FakeCalculatorShadow

 _JSON_SAMPLES = Path(__file__).resolve().parents[2] / "backend/epc_api/json_samples"

@ -111,7 +112,9 @@ def test_first_run_baselines_through_repos_and_is_idempotent_on_rerun(
            solar_fetcher=_UnusedSolarFetcher(),
        ),
        baseline=PropertyBaselineOrchestrator(
-            unit_of_work=unit_of_work, rebaseliner=StubRebaseliner()
+            unit_of_work=unit_of_work,
+            rebaseliner=StubRebaseliner(),
+            calculator_shadow=FakeCalculatorShadow(),
        ),
        modelling=ModellingOrchestrator(
            scenario_repo=ScenarioRepository(),
--- a/tests/orchestration/test_property_baseline_orchestrator.py
+++ b/tests/orchestration/test_property_baseline_orchestrator.py
@ -13,6 +13,7 @@ from domain.property_baseline.rebaseliner import RebaselineNotImplemented, StubR
 from domain.property.property import Property, PropertyIdentity
 from orchestration.property_baseline_orchestrator import PropertyBaselineOrchestrator
 from tests.orchestration.fakes import (
+    FakeCalculatorShadow,
    FakePropertyBaselineRepo,
    FakePropertyRepo,
    FakeUnitOfWork,
@ -37,6 +38,34 @@ def _property(*, sap_version: float) -> Property:
    )


+def test_run_invokes_the_calculator_shadow_per_property_and_still_persists() -> None:
+    # Arrange
+    property_baseline_repo = FakePropertyBaselineRepo()
+    shadow = FakeCalculatorShadow()
+    prop = _property(sap_version=10.2)
+    uow = FakeUnitOfWork(
+        property=FakePropertyRepo({10: prop}),
+        property_baseline=property_baseline_repo,
+    )
+    orchestrator = PropertyBaselineOrchestrator(
+        unit_of_work=lambda: uow,
+        rebaseliner=StubRebaseliner(),
+        calculator_shadow=shadow,
+    )
+
+    # Act
+    orchestrator.run([10])
+
+    # Assert — the load-bearing write + single commit are unchanged, and the
+    # shadow observed the Effective EPC + Lodged Performance once (ADR-0013).
+    lodged = Performance(
+        sap_score=72, epc_band=Epc.C, co2_emissions=1.8, primary_energy_intensity=180
+    )
+    assert len(property_baseline_repo.saved) == 1
+    assert uow.commits == 1
+    assert shadow.observed == [(10, prop.effective_epc, lodged)]
+
+
 def test_run_establishes_persists_and_commits_the_batch_once() -> None:
    # Arrange
    property_baseline_repo = FakePropertyBaselineRepo()
@ -45,7 +74,9 @@ def test_run_establishes_persists_and_commits_the_batch_once() -> None:
        property_baseline=property_baseline_repo,
    )
    orchestrator = PropertyBaselineOrchestrator(
-        unit_of_work=lambda: uow, rebaseliner=StubRebaseliner()
+        unit_of_work=lambda: uow,
+        rebaseliner=StubRebaseliner(),
+        calculator_shadow=FakeCalculatorShadow(),
    )

    # Act
@ -79,7 +110,9 @@ def test_run_raises_on_a_pre_sap10_property_and_does_not_commit() -> None:
        property_baseline=property_baseline_repo,
    )
    orchestrator = PropertyBaselineOrchestrator(
-        unit_of_work=lambda: uow, rebaseliner=StubRebaseliner()
+        unit_of_work=lambda: uow,
+        rebaseliner=StubRebaseliner(),
+        calculator_shadow=FakeCalculatorShadow(),
    )

    # Act / Assert — the raise propagates; the batch is neither persisted nor