mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Three orthogonal issues surfaced by the full project test sweep: 1. Dockerfile.test: install poppler-utils alongside postgresql. The 20× `pdfinfo: No such file or directory` failures in test_summary_pdf_mapper_chain.py traced to the CI test image missing the poppler-utils system package (pdfinfo + pdftotext). `_summary_pdf_to_textract_style_pages` shells out to these for layout-preserving PDF text extraction. Pure-Python alternatives (pymupdf, pypdf) don't reproduce pdftotext -layout's row-major table cell ordering, which the Elmhurst Summary extractor depends on. So system poppler is the right fix; added to apt-get install with an explanatory comment. 2. test_from_rdsap_schema.py::test_total_floor_area: expected 55.0, got 45.82. Slice 95 (commitf502db8c) changed the API mapper to compute total_floor_area_m2 from the precise sum of per-bp sap_floor_dimensions[*].total_floor_area rather than the lodged scalar. The synthetic 21_0_1.json fixture has lodged total_floor_ area=55 + a single fd of 45.82 (per-bp sum doesn't match lodged). Updated the expected to 45.82 with a comment explaining the Slice 95 per-bp-sum precedence. 3. test_elmhurst_end_to_end.py::test_emitter_temperature: expected "Unknown", got int 1. Pre-existing failure (confirmed by checking out commit985a59e1and reproducing). `_elmhurst_emitter_ temperature_int` in datatypes/epc/domain/mapper.py converts the Elmhurst Summary §14 "Design flow temperature: Unknown" to SAP10.2 Table 4d code 1 (high-temp / ≥45 °C, worst-case for unmeasured boilers). The int encoding mirrors the API mapper's MainHeating Detail.emitter_temperature for cross-mapper field parity. Test updated to expect 1 (with comment) since the conversion is the correct production behaviour. Verified: - Layer 4 1e-4 gate (test_api_001479_full_chain_sap_matches_worksheet_ pdf_exactly) still GREEN. - Wider domain sweep (domain/sap10_calculator + domain/sap10_ml): 1654 passed / 20 failed, exact pre-fix baseline. - All three originally-failing tests now PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
35 lines
1.2 KiB
Text
35 lines
1.2 KiB
Text
FROM python:3.11-slim
|
|
|
|
# System binaries:
|
|
# - postgresql: pytest-postgresql spawns ephemeral test databases
|
|
# - poppler-utils: provides pdfinfo / pdftotext, used by
|
|
# backend/documents_parser/tests/test_summary_pdf_mapper_chain.py's
|
|
# `_summary_pdf_to_textract_style_pages` helper for layout-preserving
|
|
# PDF text extraction. Pure-Python alternatives (pymupdf, pypdf) don't
|
|
# reproduce pdftotext -layout's row-major table cell ordering, which
|
|
# the Elmhurst Summary extractor depends on.
|
|
RUN apt-get update \
|
|
&& apt-get install -y --no-install-recommends postgresql poppler-utils \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
WORKDIR /app
|
|
ENV PYTHONPATH=/app
|
|
|
|
# Copy requirements first so Docker can cache the install layer
|
|
COPY backend/engine/requirements.txt backend/engine/requirements.txt
|
|
COPY backend/app/requirements/requirements.txt backend/app/requirements/requirements.txt
|
|
COPY test.requirements.txt test.requirements.txt
|
|
|
|
RUN pip install --no-cache-dir \
|
|
-r backend/engine/requirements.txt \
|
|
-r backend/app/requirements/requirements.txt \
|
|
-r test.requirements.txt
|
|
|
|
# Copy source
|
|
COPY . .
|
|
|
|
# pg_ctl refuses to run as root — create an unprivileged user
|
|
RUN useradd -m testuser && chown -R testuser /app
|
|
USER testuser
|
|
|
|
CMD ["pytest"]
|