Model/domain/sap10_ml/schema.py
Khalim Conn-Kowlessar 68401c517a refactor: lift-and-shift packages/domain/src/domain/ml → domain/sap10_ml
Sibling migration to the sap10_calculator move — `domain.ml` now lives
at the root-level layout (`domain/sap10_ml/`) matching the pattern
already used by `domain.addresses`, `domain.tasks`, `domain.postcode`,
and `domain.sap10_calculator`.

Changes:

- `git mv packages/domain/src/domain/ml → domain/sap10_ml` (19 files;
  history preserved).
- Subpackage rename: `domain.ml` → `domain.sap10_ml`. 32 references
  rewritten across .py and .md files: 11 internal + 21 external
  (datatypes/epc/domain/mapper.py, 14 files in domain/sap10_calculator,
  2 backend tests, 2 ADRs, 1 README, 1 design doc).
- Path-string updates: `pytest.ini` testpath
  `packages/domain/src/domain/ml/tests` → `domain/sap10_ml/tests` so
  ML tests stay in the default auto-discovered sweep. `CONTEXT.md`
  also updated.

`packages/domain/src/domain/` is now empty — the workspace `domain/`
tree has been fully migrated. Together with the `domain/__init__.py`
deletions from the sap10_calculator commit (29ac35cc), `domain` is
now a single root-level namespace package with subpackages
{addresses, sap10_calculator, sap10_ml, tasks} + the standalone
`postcode.py` module.

Verified:

- Focused sweep (backend mapper-chain + sap10_calculator worksheet
  e2e + golden fixtures): 99 passed / 19 failed — identical baseline.
- Wider sweep (all sap10_calculator + sap10_ml): 1654 passed / 20
  failed (same pre-existing failures).
- domain/sap10_ml/tests: 210/210 PASSED at new path.
- Pyright net-zero: heat_transmission.py 13, cert_to_inputs.py 35,
  mapper.py 33, rdsap_uvalues.py 1 (all unchanged from baseline).

Note: `packages/domain/pyproject.toml` still declares
`packages = ["src/domain"]` for the hatchling wheel — that target
directory is now empty and the wheel build is effectively a no-op.
Retiring the workspace package or repointing the wheel is a follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 13:01:35 +00:00

32 lines
1 KiB
Python

"""Schema dataclasses for EpcMlTransform — the cross-repo ML data contract.
Consumed by the AutoGluon training repo (and by anything else that reads the
transform's parquet output) to know each column's dtype, nullability, and meaning.
"""
from dataclasses import dataclass
@dataclass(frozen=True)
class ColumnSpec:
"""Specification of a single column in the EPC ML training dataset.
`categorical=True` signals that the column carries a categorical value (raw
strings emitted by the transform) and should be cast to `pd.Categorical` at
parquet write time. The schema module stays pandas-free; the cast happens at
the I/O boundary in `services/ml_training_data/`.
"""
dtype: type
nullable: bool = True
description: str = ""
categorical: bool = False
@dataclass(frozen=True)
class TransformSchema:
"""The cross-repo ML data contract emitted by EpcMlTransform.schema()."""
transform_version: str
feature_columns: dict[str, ColumnSpec]
target_columns: dict[str, ColumnSpec]