temp commit to backup prd in case laptop dies

This commit is contained in:
Daniel Roth 2026-06-08 16:48:37 +00:00
parent 209a0c401c
commit e5e67c203b

View file

@ -0,0 +1,99 @@
# PRD: Ventilation Audit Generator from MagicPlan
## Problem Statement
When a surveyor completes a MagicPlan survey for a property, the resulting floor plan data (rooms, windows, doors, ventilation measurements) needs to be transformed into a structured ventilation audit spreadsheet. Currently this transformation is manual — someone must extract plan data and populate a report by hand, which is slow and error-prone.
## Solution
An AWS Lambda (`audit-generator`) triggered via SQS receives a HubSpot deal ID, fetches the parsed MagicPlan `Plan` from the database, populates a pre-formatted `.xlsx` template with plan data, uploads the result to S3, and records it in `uploaded_files`. The populated spreadsheet is then accessible to the UI so the user knows an audit file exists for that deal.
## User Stories
1. As a coordinator, I want clicking a button in the UI to trigger generation of a ventilation audit spreadsheet, so that I do not have to manually populate it from the floor plan.
2. As a coordinator, I want the audit spreadsheet to be automatically populated with room, window, and door data from the MagicPlan survey, so that the data entry step is eliminated.
3. As a coordinator, I want the system to use a pre-formatted `.xlsx` template when generating the audit, so that conditional formatting and layout are preserved without requiring code changes.
4. As a coordinator, I want the UI to indicate whether a ventilation audit already exists for a deal, so that I avoid triggering duplicate generation unnecessarily.
5. As a coordinator, I want re-triggering generation to overwrite the previous audit file, so that I can regenerate after a corrected survey is uploaded.
6. As an engineer, I want the lambda to raise a clear error if no MagicPlan JSON has been uploaded for the deal, so that misconfigured triggers are diagnosed quickly.
7. As an engineer, I want the lambda to raise a distinct error if a MagicPlan JSON exists but has not yet been parsed into the database, so that timing issues are distinguishable from missing data.
8. As an engineer, I want the generated spreadsheet recorded in `uploaded_files` with a `VENTILATION_AUDIT` file type, so that the UI and other systems can query for its existence.
9. As an engineer, I want the audit template to be resolved from an environment variable, so that different templates can be used in staging and production without a code deploy.
10. As an engineer, I want the lambda to follow the `@subtask_handler()` pattern, so that it integrates with the task orchestration system and benefits from standard error handling and observability.
11. As an engineer, I want the spreadsheet cells to be written via named ranges defined in the template, so that template layout changes do not require code changes.
## Implementation Decisions
- **Lambda pattern**: `@subtask_handler()` decorator. Trigger body contains `task_id`, `sub_task_id`, and `hubspot_deal_id`.
- **MAGIC_PLAN_JSON lookup**: Query `uploaded_files` filtered by `hubspot_deal_id` and `file_type = MAGIC_PLAN_JSON`, ordered by `s3_upload_timestamp DESC`, taking the most recent row. Rationale: a re-upload supersedes the earlier file.
- **Plan retrieval**: Use the existing `MagicPlanPostgresRepository.get_plan_by_uploaded_file_id` to fetch the parsed domain `Plan` from postgres. The lambda does not re-parse from S3 — that is the magic_plan lambda's responsibility.
- **Error handling — two distinct cases**:
- No `uploaded_files` row found → raise with message indicating no MagicPlan has been uploaded for this deal.
- Row found but `get_plan_by_uploaded_file_id` returns `None` → raise with message indicating the plan has been uploaded but not yet parsed.
- Both use the same exception type; distinct messages enable diagnosis in CloudWatch.
- **Spreadsheet generation**:
- Format: `.xlsx` via `openpyxl`.
- The template `.xlsx` is downloaded from S3 at the key given by env var `AUDIT_TEMPLATE_S3_KEY`.
- The template is loaded into memory (`openpyxl.load_workbook(BytesIO(template_bytes))`), populated, and serialised back to bytes for upload.
- Cell targeting uses named ranges defined in the template (`workbook.defined_names`). The initial stub implementation may use fixed cell addresses as a placeholder until the template is finalised.
- The template does not exist yet. For the initial implementation, the template download and population step is stubbed — the lambda generates a minimal valid `.xlsx` (e.g. one row per room with name and area) without a template.
- **Output S3 key**: `documents/hubspot_deal_id/{hubspot_deal_id}/ventilation_audit.xlsx`. Re-running the lambda overwrites the previous file.
- **Operation order**: S3 upload first, then `uploaded_files` DB insert. An orphaned S3 file on DB failure is harmless and will be overwritten on retry. A DB record pointing to a non-existent file is worse.
- **New enum values** (added to `FileTypeEnum` and `FileSourceEnum`):
- `FileTypeEnum.VENTILATION_AUDIT = "ventilation_audit"`
- `FileSourceEnum.AUDIT_GENERATOR = "audit_generator"`
- **New `UploadedFileRepository`**: A new repository (`UploadedFilePostgresRepository`) is introduced with at minimum a `get_latest_magic_plan_json_by_hubspot_deal_id(hubspot_deal_id: str) -> Optional[UploadedFile]` method. This queries the existing `uploaded_files` table via the existing SQLAlchemy model in `backend/app/db/models/uploaded_file.py`. Full DDD migration of the `UploadedFile` model to `infrastructure/postgres/` is out of scope for this PR.
- **Idempotency**: No duplicate guard. The lambda always overwrites and always inserts a new `uploaded_files` row. The UI surfaces whether a record exists; the timestamp on the most recent row is authoritative.
- **Environment variables**:
- `S3_BUCKET_NAME` (shared convention)
- `AUDIT_TEMPLATE_S3_KEY` (template location)
- `DATABASE_URL` (shared convention)
- **Trigger**: The SQS message is sent by a UI action in a separate repo. No SQS publishing client is required in this PR.
## Testing Decisions
Good tests assert observable outputs given controlled inputs — they do not assert on internal call sequences or implementation details. Prefer mocking at the boundary of the system under test, not inside it.
**Handler tests** (`tests/applications/audit_generator/test_audit_generator_handler.py`):
- Test that an invalid trigger body raises `ValidationError`.
- Test that the orchestrator is constructed with values derived from env vars and the trigger body.
- Test that the handler returns the expected value on success.
- Use `handler.__wrapped__` to bypass the `@subtask_handler` decorator (prior art: `test_magic_plan_handler.py`).
**Orchestrator tests** (`tests/orchestration/audit_generator/test_audit_generator_orchestrator.py`):
- Mock `S3Client`, `UploadedFileRepository`, and `MagicPlanPostgresRepository` with `MagicMock(spec=...)`.
- Test happy path: correct S3 key used for output upload; `uploaded_files` insert called with correct `file_type` and `file_source`.
- Test error path: raises with appropriate message when no `uploaded_files` row found.
- Test error path: raises with appropriate message when plan not in postgres.
- Prior art: `tests/orchestration/magic_plan/test_magic_plan_orchestrator.py`.
**Repository tests** (`tests/repositories/uploaded_file/test_uploaded_file_postgres_repository.py`):
- Integration tests against a real postgres `Engine` (prior art: `tests/repositories/magic_plan/test_magic_plan_postgres_repository.py`).
- Test that `get_latest_magic_plan_json_by_hubspot_deal_id` returns the most recent row by `s3_upload_timestamp` when multiple rows exist.
- Test that it returns `None` when no matching row exists.
## Out of Scope
- The SQS trigger — the UI button that sends the SQS message lives in a separate repo.
- The `.xlsx` template file itself — the template does not yet exist; the initial implementation stubs this step.
- Full DDD migration of the `UploadedFile` SQLAlchemy model from `backend/app/db/models/` to `infrastructure/postgres/` — this is a separate refactoring task.
- Named-range-based cell targeting — the stub uses fixed addresses or a minimal generated workbook; named ranges are the target interface once the template is designed.
- Any ventilation calculation or compliance logic — the spreadsheet is populated with raw plan data only.
## Further Notes
- The `audit-generator` application scaffold already exists at `applications/audit-generator/` with empty `handler.py` and `audit_generator_trigger_request.py` files.
- The `MagicPlanPostgresRepository.get_plan_by_uploaded_file_id` method (introduced in recent commits) is the correct entry point for fetching the parsed plan — no S3 re-parsing is needed.
- When the template is eventually created, it should define named ranges for every cell the lambda writes to. This decouples layout from code and means template redesigns require no code changes.
- The `openpyxl` library must be added to `applications/audit-generator/handler/requirements.txt`.