UploadedFile, FileTypeEnum, FileSourceEnum importable from infrastructure.postgres.uploaded_file_table 🟥

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 13:10:47 +00:00 · 2026-06-09 11:42:53 +00:00 · 2026-06-09 11:42:53 +00:00 · 41b282042f
commit 41b282042f
parent e5e67c203b
5 changed files with 76 additions and 22 deletions
--- a/CONTEXT.md
+++ b/CONTEXT.md
@ -65,6 +65,16 @@ _Avoid_: user input, raw address, user_inputed_address
 The reference cohort matched to a target Property by both geographic proximity (postcode prefix / UPRN range) and physical similarity (property type, built form, age band); used by the EPC Prediction Service for gap-filling and anomaly detection.
 _Avoid_: neighbours, similar properties, peer set

+### Survey documents
+
+**Ventilation Audit**:
+A machine-generated `.xlsx` spreadsheet produced by the `audit-generator` Lambda from a property's parsed **MagicPlan Plan**. Written fields per room: room name, width, length, area. Per window: dimensions, opening type, number of openings, percent openable (`pct_openable`), trickle vent count and area per vent. Per door: width and undercut. Internal doors appear once per room they connect (so typically twice). Columns requiring human knowledge (Blocked, Pictured, FP reference numbers, door location labels) are left blank for the coordinator to complete. Recorded in `uploaded_files` with `file_type = VENTILATION_AUDIT` and `file_source = AUDIT_GENERATOR`. Distinct from a PAS 2023 Ventilation document, which is externally uploaded by a human.
+_Avoid_: ventilation report, audit report, PAS ventilation (that is the external survey form)
+
+**PAS 2023 Ventilation**:
+An externally-uploaded ventilation survey document produced by a human assessor and ingested from an external source (e.g. Coordination Hub). Recorded in `uploaded_files` with `file_type = PAS_2023_VENTILATION`. Distinct from a **Ventilation Audit**, which is machine-generated from MagicPlan floor plan data.
+_Avoid_: ventilation audit (that is the generated output)
+
 ### Source data

 **Site Notes**:
--- a/applications/audit-generator/d1_ventilation_template.xlsx
+++ b/applications/audit-generator/d1_ventilation_template.xlsx
--- a/backlog/ventilation-audit-generator.md
+++ b/backlog/ventilation-audit-generator.md
@ -18,9 +18,7 @@ An AWS Lambda (`audit-generator`) triggered via SQS receives a HubSpot deal ID,
 6. As an engineer, I want the lambda to raise a clear error if no MagicPlan JSON has been uploaded for the deal, so that misconfigured triggers are diagnosed quickly.
 7. As an engineer, I want the lambda to raise a distinct error if a MagicPlan JSON exists but has not yet been parsed into the database, so that timing issues are distinguishable from missing data.
 8. As an engineer, I want the generated spreadsheet recorded in `uploaded_files` with a `VENTILATION_AUDIT` file type, so that the UI and other systems can query for its existence.
-9. As an engineer, I want the audit template to be resolved from an environment variable, so that different templates can be used in staging and production without a code deploy.
-10. As an engineer, I want the lambda to follow the `@subtask_handler()` pattern, so that it integrates with the task orchestration system and benefits from standard error handling and observability.
-11. As an engineer, I want the spreadsheet cells to be written via named ranges defined in the template, so that template layout changes do not require code changes.
+9. As an engineer, I want the lambda to follow the `@subtask_handler()` pattern, so that it integrates with the task orchestration system and benefits from standard error handling and observability.

 ## Implementation Decisions

@ -37,10 +35,11 @@ An AWS Lambda (`audit-generator`) triggered via SQS receives a HubSpot deal ID,

 - **Spreadsheet generation**:
  - Format: `.xlsx` via `openpyxl`.
-  - The template `.xlsx` is downloaded from S3 at the key given by env var `AUDIT_TEMPLATE_S3_KEY`.
-  - The template is loaded into memory (`openpyxl.load_workbook(BytesIO(template_bytes))`), populated, and serialised back to bytes for upload.
-  - Cell targeting uses named ranges defined in the template (`workbook.defined_names`). The initial stub implementation may use fixed cell addresses as a placeholder until the template is finalised.
-  - The template does not exist yet. For the initial implementation, the template download and population step is stubbed — the lambda generates a minimal valid `.xlsx` (e.g. one row per room with name and area) without a template.
+  - The template `d1_ventilation_template.xlsx` is bundled with the lambda at `applications/audit-generator/d1_ventilation_template.xlsx` and loaded from the deployment package via `importlib.resources` or a path relative to the handler file. No S3 round-trip for the template.
+  - The template is loaded with `openpyxl.load_workbook(path)` (default `data_only=False` to preserve formulas), populated, and serialised to bytes via `BytesIO` for upload.
+  - Cell targeting uses fixed column letters (see Spreadsheet Layout below). Named ranges are not defined in the template.
+  - The template has formulas in columns J (`=H*I`), N (`=J*M`), S (`=Q*R`), and Y (`=W*X`) — the lambda does not write to these cells; they are calculated by Excel/Sheets when the file is opened.
+  - The template has 50 data rows (rows 6–55), extended programmatically. The footer merge sits at A56:Z56; legend rows at 57–60.

 - **Output S3 key**: `documents/hubspot_deal_id/{hubspot_deal_id}/ventilation_audit.xlsx`. Re-running the lambda overwrites the previous file.

@ -50,13 +49,16 @@ An AWS Lambda (`audit-generator`) triggered via SQS receives a HubSpot deal ID,
  - `FileTypeEnum.VENTILATION_AUDIT = "ventilation_audit"`
  - `FileSourceEnum.AUDIT_GENERATOR = "audit_generator"`

- **New `UploadedFileRepository`**: A new repository (`UploadedFilePostgresRepository`) is introduced with at minimum a `get_latest_magic_plan_json_by_hubspot_deal_id(hubspot_deal_id: str) -> Optional[UploadedFile]` method. This queries the existing `uploaded_files` table via the existing SQLAlchemy model in `backend/app/db/models/uploaded_file.py`. Full DDD migration of the `UploadedFile` model to `infrastructure/postgres/` is out of scope for this PR.
+- **DDD migration of `UploadedFile`**: The existing `backend/app/db/models/uploaded_file.py` (SQLAlchemy `Base`) is replaced by `infrastructure/postgres/uploaded_file_table.py` (SQLModel). `FileTypeEnum`, `FileSourceEnum`, and `UploadedFile` all move there. The class name `UploadedFile` is kept (no `Model` suffix — there is no domain counterpart). All seven consumers update their import path; `backend/app/db/models/uploaded_file.py` is deleted. Because `UploadedFile` is now registered on `SQLModel.metadata`, the shared `tests/conftest.py` `db_engine` fixture must emit `CREATE TYPE IF NOT EXISTS` for `file_type` and `file_source` via raw SQL before calling `SQLModel.metadata.create_all(engine)` — otherwise the table creation fails for all integration tests. The dedicated per-test conftest approach (Question 6) is therefore superseded.

- **Idempotency**: No duplicate guard. The lambda always overwrites and always inserts a new `uploaded_files` row. The UI surfaces whether a record exists; the timestamp on the most recent row is authoritative.
+- **New `UploadedFileRepository`**: A new repository (`UploadedFilePostgresRepository`) is introduced with a `get_latest_by_hubspot_deal_id(hubspot_deal_id: str, file_type: FileTypeEnum) -> Optional[UploadedFile]` method. Queries `uploaded_files` filtered by `hubspot_deal_id` and `file_type`, ordered by `s3_upload_timestamp DESC`, returning the most recent row.
+
+- **Session management**: A dedicated `AuditGeneratorUnitOfWork` context manager (standalone — does not inherit from `PostgresUnitOfWork` or `UnitOfWork`) holds `uploaded_file: UploadedFilePostgresRepository` and `magic_plan: MagicPlanPostgresRepository`, both bound to the same session. Opens the session on `__enter__`, rolls back and closes on `__exit__`, exposes `commit()`. The handler holds a module-scoped engine (reused across warm Lambda invocations) and passes a `session_factory` callable to `AuditGeneratorUnitOfWork` — the session is created fresh per invocation and never long-lived.
+
+- **Idempotency**: No duplicate guard. `uploaded_files` is append-only — the lambda always inserts a new row; rows are never updated or deleted. The S3 file is always overwritten at the fixed key. The UI and any future queries treat the most recent row by `s3_upload_timestamp` as authoritative.

 - **Environment variables**:
  - `S3_BUCKET_NAME` (shared convention)
-  - `AUDIT_TEMPLATE_S3_KEY` (template location)
  - `DATABASE_URL` (shared convention)

 - **Trigger**: The SQS message is sent by a UI action in a separate repo. No SQS publishing client is required in this PR.
@ -72,28 +74,53 @@ Good tests assert observable outputs given controlled inputs — they do not ass
 - Use `handler.__wrapped__` to bypass the `@subtask_handler` decorator (prior art: `test_magic_plan_handler.py`).

 **Orchestrator tests** (`tests/orchestration/audit_generator/test_audit_generator_orchestrator.py`):
- Mock `S3Client`, `UploadedFileRepository`, and `MagicPlanPostgresRepository` with `MagicMock(spec=...)`.
- Test happy path: correct S3 key used for output upload; `uploaded_files` insert called with correct `file_type` and `file_source`.
- Test error path: raises with appropriate message when no `uploaded_files` row found.
- Test error path: raises with appropriate message when plan not in postgres.
- Prior art: `tests/orchestration/magic_plan/test_magic_plan_orchestrator.py`.
+- Mock `S3Client` with `MagicMock(spec=S3Client)`. Mock the `AuditGeneratorUnitOfWork` factory: the factory returns a mock UoW whose `__enter__` returns itself and whose `.uploaded_file` and `.magic_plan` attributes are mock repos.
+- Test happy path: correct S3 key used for output upload; `uploaded_files` insert called with correct `file_type` and `file_source`; `uow.commit()` called.
+- Test error path: raises with appropriate message when `uploaded_file_repo.get_latest_by_hubspot_deal_id` returns `None`.
+- Test error path: raises with appropriate message when `magic_plan_repo.get_plan_by_uploaded_file_id` returns `None`.

 **Repository tests** (`tests/repositories/uploaded_file/test_uploaded_file_postgres_repository.py`):
- Integration tests against a real postgres `Engine` (prior art: `tests/repositories/magic_plan/test_magic_plan_postgres_repository.py`).
- Test that `get_latest_magic_plan_json_by_hubspot_deal_id` returns the most recent row by `s3_upload_timestamp` when multiple rows exist.
+- Integration tests using the shared `db_engine` fixture. The fixture already calls `SQLModel.metadata.create_all(engine)`; after the DDD migration `UploadedFile` is in `SQLModel.metadata`, so no dedicated conftest is needed. The shared `tests/conftest.py` must emit `CREATE TYPE IF NOT EXISTS` for `file_type` and `file_source` before `create_all`.
+- Test that `get_latest_by_hubspot_deal_id` returns the most recent row by `s3_upload_timestamp` when multiple rows with the same `file_type` exist.
 - Test that it returns `None` when no matching row exists.
+- Test that it filters correctly by `file_type` (a row with a different `file_type` is not returned).

 ## Out of Scope

 - The SQS trigger — the UI button that sends the SQS message lives in a separate repo.
- The `.xlsx` template file itself — the template does not yet exist; the initial implementation stubs this step.
- Full DDD migration of the `UploadedFile` SQLAlchemy model from `backend/app/db/models/` to `infrastructure/postgres/` — this is a separate refactoring task.
- Named-range-based cell targeting — the stub uses fixed addresses or a minimal generated workbook; named ranges are the target interface once the template is designed.
 - Any ventilation calculation or compliance logic — the spreadsheet is populated with raw plan data only.

+## Spreadsheet Layout
+
+Sheet name: `D1 Ventilation`. Data starts at row 6. The three series run in parallel columns — each row may contain room data, window data, and door data independently; the longest series determines the last row used.
+
+| Column | Content | Source |
+|--------|---------|--------|
+| B | Room name | `Room.name` |
+| D | Room area (m²) | `Room.area_m2` |
+| G | Window location (room name) | `Room.name` (parent room) |
+| H | Window width (m) | `Window.width_m` |
+| I | Window height (m) | `Window.height_m` |
+| J | Window area (m²) | **formula** `=H*I` — do not write |
+| K | Opening type | `WindowVentilation.opening_type` |
+| L | Number of openings | `WindowVentilation.num_openings` |
+| M | % of window (decimal) | `WindowVentilation.pct_openable / 100` |
+| N | Total opening area (m²) | **formula** `=J*M` — do not write |
+| O | Blocked | leave blank (visual check by auditor) |
+| P | Pictured | leave blank (visual check by auditor) |
+| Q | Trickle vent effective area per vent (mm²) | `WindowVentilation.trickle_vent_area_mm2` |
+| R | Number of trickle vents | `WindowVentilation.num_trickle_vents` |
+| S | Total trickle vent area (mm²) | **formula** `=Q*R` — do not write |
+| V | Door location (room name) | `Room.name` (parent room) |
+| W | Door width (mm) | `Door.width_mm` |
+| X | Door undercut (mm) | `DoorVentilation.undercut_mm` |
+| Y | Door area (mm²) | **formula** `=W*X` — do not write |
+
+Internal doors appear once per room they connect (typically twice). `WindowVentilation` and `DoorVentilation` fields are `Optional`; write `0` when `None` so formula cells (J, N, S, Y) do not produce `#VALUE!` errors.
+
 ## Further Notes

 - The `audit-generator` application scaffold already exists at `applications/audit-generator/` with empty `handler.py` and `audit_generator_trigger_request.py` files.
- The `MagicPlanPostgresRepository.get_plan_by_uploaded_file_id` method (introduced in recent commits) is the correct entry point for fetching the parsed plan — no S3 re-parsing is needed.
- When the template is eventually created, it should define named ranges for every cell the lambda writes to. This decouples layout from code and means template redesigns require no code changes.
+- The `MagicPlanPostgresRepository.get_plan_by_uploaded_file_id` method is the correct entry point for fetching the parsed plan — no S3 re-parsing is needed.
 - The `openpyxl` library must be added to `applications/audit-generator/handler/requirements.txt`.
+- The template (`d1_ventilation_template.xlsx`) has 50 data rows (rows 6–55) with formulas in columns J, N, S, Y. If a property exceeds 50 windows, rooms, or doors the lambda should raise a clear error rather than silently truncating.
--- a/tests/infrastructure/postgres/init.py
+++ b/tests/infrastructure/postgres/init.py
--- a/tests/infrastructure/postgres/test_uploaded_file_table.py
+++ b/tests/infrastructure/postgres/test_uploaded_file_table.py
@ -0,0 +1,17 @@
+from infrastructure.postgres.uploaded_file_table import (
+    FileSourceEnum,
+    FileTypeEnum,
+    UploadedFile,
+)
+
+
+def test_file_type_enum_has_ventilation_audit() -> None:
+    assert FileTypeEnum.VENTILATION_AUDIT.value == "ventilation_audit"
+
+
+def test_file_source_enum_has_audit_generator() -> None:
+    assert FileSourceEnum.AUDIT_GENERATOR.value == "audit_generator"
+
+
+def test_uploaded_file_is_importable() -> None:
+    assert UploadedFile.__tablename__ == "uploaded_files"