mirror of https://github.com/Hestia-Homes/Model.git synced 2026-07-27 23:35:01 +00:00

History

Khalim Conn-Kowlessar a1f89b6033 slice 15c: stream build_features so 500k+ cert runs fit memory Previously kept the full list of EpcPropertyData in memory before calling EpcMlTransform.to_rows. For the 25k slice that's ~30 MB; for the 580k full-2026 corpus it OOM-killed the process silently. Now: parse cert -> to_row -> append dict -> drop EpcPropertyData reference, so memory is O(row-dict * n) instead of O(EpcPropertyData * n). Same end-of-frame post-processing (categorical casts, column-order pin). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-17 00:36:53 +00:00
..
ara	added potential file scaffolding:	2026-05-15 10:56:53 +00:00
ml_training_data	slice 15c: stream build_features so 500k+ cert runs fit memory	2026-05-17 00:36:53 +00:00
README.md	added potential file scaffolding:	2026-05-15 10:56:53 +00:00

README.md

Services

Each subdirectory is a deployable unit — typically a Lambda image. Own pyproject.toml, own Dockerfile, own deps. Lambda bundle contains only that service's deps + its workspace deps.

Service	Purpose
`ara/`	The Domna retrofit modelling backend — ingestion + modelling pipelines, all 9 services in PRD §9.2.

Other Domna services (address2uprn, hubspot, pashub, ecmk, magicplan) live in the legacy backend/ and etl/ trees for now; they are slated to migrate here as their owners pick them up — see PRD §11. When that work starts, scaffold the service under services/<name>/ and add it to the workspace members in the root pyproject.toml.

Service boundary

A service can import domain.*, import repos.*, import fetchers.*, import utils.* (workspace deps). It cannot import another service's modules — they are separate distributions with no cross-import path. This is the structural enforcement of the modelling/ingestion separation (ADR-0003).