Model/docs/adr/0003-strict-ingestion-modelling-separation.md
Khalim Conn-Kowlessar 8291f29721 docs(ara): composable stage-orchestrator design (ADR-0011 + ADR-0003 amend + CONTEXT)
Records the grill-with-docs outcomes for the ara_first_run rebuild: three
composable stage orchestrators (Ingestion/Baseline/Modelling), one lambda per
use case chaining them through repos (not in-memory), and the Fetcher-vs-Repo
data-source taxonomy. Amends ADR-0003's chaining rule to generalise beyond
RefreshOrchestrator. Adds the pipeline-composition + First Run vocabulary to
CONTEXT.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00

16 lines
1.9 KiB
Markdown

# Strict separation between Ingestion and Modelling
**Status: Accepted, refined by [ADR-0011](0011-composable-stage-orchestrators.md).** The one-way flow below stands. ADR-0011 generalises the chaining rule: it is no longer "only a `RefreshOrchestrator` may chain" — it is *"only a top-level use-case pipeline orchestrator (e.g. `FirstRunPipeline`) may chain across the Ingestion→Modelling seam; the stage orchestrators communicate through repos and never call across it."*
Data flows one way only: **Ingestion → Repos → Modelling**. Modelling services never make external HTTP calls; Ingestion services never run business logic. If Modelling needs fresh data, it sees a stale record in a repo and returns; the caller (a refresh orchestrator or the FE) decides whether to ingest first. We considered allowing modelling services to call fetchers directly on cache miss — convenient — and rejected it.
The trade-off is that modelling cannot "self-heal" by going to the gov EPC API when it finds stale data. The benefit is that modelling becomes a deterministic function of repository state: same Property in the repos, same modelling output. That is the property that makes modelling unit-testable against fakes (no DB, no network, no ML lambda), reproducible, and debuggable. It also enables a per-property UI flow where fetched data is shown to the user for review and possible override **before** modelling runs.
Under the rushed timeline this constraint is more valuable, not less. Mixing fetchers into services is the easy thing to do when shipping fast; once it's done it's hard to extract.
## Consequences
- Every modelling service depends only on Repos (and other Services / domain logic). No HTTP libraries in the modelling import graph.
- A `RefreshOrchestrator` is the only thing that calls Ingestion then Modelling in sequence; nothing else may.
- "Modelling is stale, refetch in-line" is a forbidden pattern — surface staleness, do not silently repair it.