Strict separation between Ingestion and Modelling

Data flows one way only: Ingestion → Repos → Modelling. Modelling services never make external HTTP calls; Ingestion services never run business logic. If Modelling needs fresh data, it sees a stale record in a repo and returns; the caller (a refresh orchestrator or the FE) decides whether to ingest first. We considered allowing modelling services to call fetchers directly on cache miss — convenient — and rejected it.

The trade-off is that modelling cannot "self-heal" by going to the gov EPC API when it finds stale data. The benefit is that modelling becomes a deterministic function of repository state: same Property in the repos, same modelling output. That is the property that makes modelling unit-testable against fakes (no DB, no network, no ML lambda), reproducible, and debuggable. It also enables a per-property UI flow where fetched data is shown to the user for review and possible override before modelling runs.

Under the rushed timeline this constraint is more valuable, not less. Mixing fetchers into services is the easy thing to do when shipping fast; once it's done it's hard to extract.

Consequences

Every modelling service depends only on Repos (and other Services / domain logic). No HTTP libraries in the modelling import graph.
A RefreshOrchestrator is the only thing that calls Ingestion then Modelling in sequence; nothing else may.
"Modelling is stale, refetch in-line" is a forbidden pattern — surface staleness, do not silently repair it.

1.5 KiB Raw Blame History

Strict separation between Ingestion and Modelling

Consequences

1.5 KiB

Raw Blame History