Records the grill-with-docs outcomes for the ara_first_run rebuild: three composable stage orchestrators (Ingestion/Baseline/Modelling), one lambda per use case chaining them through repos (not in-memory), and the Fetcher-vs-Repo data-source taxonomy. Amends ADR-0003's chaining rule to generalise beyond RefreshOrchestrator. Adds the pipeline-composition + First Run vocabulary to CONTEXT.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1.9 KiB
Strict separation between Ingestion and Modelling
Status: Accepted, refined by ADR-0011. The one-way flow below stands. ADR-0011 generalises the chaining rule: it is no longer "only a RefreshOrchestrator may chain" — it is "only a top-level use-case pipeline orchestrator (e.g. FirstRunPipeline) may chain across the Ingestion→Modelling seam; the stage orchestrators communicate through repos and never call across it."
Data flows one way only: Ingestion → Repos → Modelling. Modelling services never make external HTTP calls; Ingestion services never run business logic. If Modelling needs fresh data, it sees a stale record in a repo and returns; the caller (a refresh orchestrator or the FE) decides whether to ingest first. We considered allowing modelling services to call fetchers directly on cache miss — convenient — and rejected it.
The trade-off is that modelling cannot "self-heal" by going to the gov EPC API when it finds stale data. The benefit is that modelling becomes a deterministic function of repository state: same Property in the repos, same modelling output. That is the property that makes modelling unit-testable against fakes (no DB, no network, no ML lambda), reproducible, and debuggable. It also enables a per-property UI flow where fetched data is shown to the user for review and possible override before modelling runs.
Under the rushed timeline this constraint is more valuable, not less. Mixing fetchers into services is the easy thing to do when shipping fast; once it's done it's hard to extract.
Consequences
- Every modelling service depends only on Repos (and other Services / domain logic). No HTTP libraries in the modelling import graph.
- A
RefreshOrchestratoris the only thing that calls Ingestion then Modelling in sequence; nothing else may. - "Modelling is stale, refetch in-line" is a forbidden pattern — surface staleness, do not silently repair it.