Model/docs/adr/0003-strict-ingestion-modelling-separation.md

# Strict separation between Ingestion and Modelling

**Status: Accepted, refined by [ADR-0011](0011-composable-stage-orchestrators.md).** The one-way flow below stands. ADR-0011 generalises the chaining rule: it is no longer "only a `RefreshOrchestrator` may chain" — it is *"only a top-level use-case pipeline orchestrator (e.g. `FirstRunPipeline`) may chain across the Ingestion→Modelling seam; the stage orchestrators communicate through repos and never call across it."*


Data flows one way only: **Ingestion → Repos → Modelling**. Modelling services never make external HTTP calls; Ingestion services never run business logic. If Modelling needs fresh data, it sees a stale record in a repo and returns; the caller (a refresh orchestrator or the FE) decides whether to ingest first. We considered allowing modelling services to call fetchers directly on cache miss — convenient — and rejected it.

The trade-off is that modelling cannot "self-heal" by going to the gov EPC API when it finds stale data. The benefit is that modelling becomes a deterministic function of repository state: same Property in the repos, same modelling output. That is the property that makes modelling unit-testable against fakes (no DB, no network, no ML lambda), reproducible, and debuggable. It also enables a per-property UI flow where fetched data is shown to the user for review and possible override **before** modelling runs.

Under the rushed timeline this constraint is more valuable, not less. Mixing fetchers into services is the easy thing to do when shipping fast; once it's done it's hard to extract.

## Consequences

- Every modelling service depends only on Repos (and other Services / domain logic). No HTTP libraries in the modelling import graph.
- A `RefreshOrchestrator` is the only thing that calls Ingestion then Modelling in sequence; nothing else may.
- "Modelling is stale, refetch in-line" is a forbidden pattern — surface staleness, do not silently repair it.