Commit graph

74 commits

Author SHA1 Message Date
Khalim Conn-Kowlessar
066dce19e3 Slice 46b: Elmhurst extractor parses windows from layout-style Summary PDFs
The legacy `_extract_windows` regex anchors on "Permanent Shutters\n" which is broken across lines by the pdftotext-layout preprocessor. New fallback `_extract_windows_from_layout` anchors on the two stable per-window markers — a "W H Area" data line and the "Manufacturer <U_value>" line a few lines further down — and tolerates the variable-order optional fields (glazing_gap, inline building_part, inline orientation) between them. Prefix/suffix tokens around the data block are re-joined into glazing_type / building_part / orientation strings.

Cert U985-0001-000474's 7 windows across Main + 2 extensions now flow through the mapper to EpcPropertyData.sap_windows (was 0). Textract-style extraction (existing fixture) is unchanged — the legacy path runs first and only falls through when its regex misses.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 18:03:29 +00:00
Khalim Conn-Kowlessar
36f2c7bbdf Slice 46a: Elmhurst mapper handles multi-bp Summary PDFs — Summary_000474 chain test flips green
ElmhurstSiteNotes had no representation for extensions: singular dimensions / walls / roof / floor fields could only describe the main bp. Summary PDFs lodge "1st Extension" / "2nd Extension" subsections in §4, §7, §8, §9 with optional "As Main: Yes" inheritance. This slice:

- Adds `ExtensionPart` dataclass and `ElmhurstSiteNotes.extensions: List[ExtensionPart]`.
- Adds `_split_section_by_bp` helper + per-bp parsing of dimensions / walls / roof / floor in the extractor; "As Main" inherits from the main bp.
- Refactors `_map_elmhurst_building_part` into a parameterised builder; adds `_map_elmhurst_building_parts` that yields Main + one SapBuildingPart per extension (capped at 4 per RdSAP10 §1.2).
- Scaffold test `test_summary_000474_mapper_produces_three_building_parts` flips from strict-xfail to passing.

Single-bp behaviour is unchanged (empty extensions list defaults). 752 existing tests stay green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 17:55:13 +00:00
Khalim Conn-Kowlessar
ccf7aa2118 Scaffold: end-to-end Summary→EpcPropertyData chain test for 000474 (xfail)
The 6 worksheet fixtures build EpcPropertyData by hand, validating the cascade in isolation from the mapper. This commit lands the first half of the OTHER validation: Summary_000474.pdf → ElmhurstSiteNotesExtractor → from_elmhurst_site_notes → EpcPropertyData, asserting it produces the same shape as the hand-built fixture. Test is strict-xfail on sap_building_parts count (mapper produces 1, cert lodges 3). Includes a pdftotext-layout preprocessor that converts spatial label/value layout into the Textract-style sequence the existing extractor expects (test-only). Full punch list of 28 mapper-output diffs captured in project memory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 17:40:06 +00:00
Khalim Conn-Kowlessar
883028c89e P6.1 follow-on: unbox BuildingPartIdentifier at backend boundaries
Threads the strict BuildingPartIdentifier type (introduced in a8b443f6)
through the two remaining backend touchpoints:

- EpcBuildingPartModel.from_*: SQLModel column expects a string, so
  unbox the enum with .identifier.value before binding to the DB.
- documents_parser end-to-end tests: swap bare-string equality
  ("main" / "extension_1") for identity checks against the enum
  members (BuildingPartIdentifier.MAIN / EXTENSION_1).

Documents_parser test pack passes (105/105). No dedicated SQLModel test
covers EpcBuildingPartModel.from_*; the .value line is exercised
transitively via db_writer.py / local_runner.py in production.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 09:58:23 +00:00
Daniel Roth
78da2f88b6 Handle wall thickness "Unmeasurable" 🟩 2026-04-30 16:41:16 +00:00
Daniel Roth
6c70c5a535 Extract address when Property photo element is missing from PDF 🟩 2026-04-30 16:25:41 +00:00
Daniel Roth
b347039b80 load ecmk site notes to db 2026-04-29 11:20:47 +00:00
Daniel Roth
252657a374 include updating epc_property_data to pashub to ara workflow 2026-04-29 09:55:14 +00:00
Daniel Roth
51bd18e0d7 Rename window frame material column 🟩 2026-04-27 16:11:32 +00:00
Daniel Roth
01ebb2e0e1 extract window frame details from elmhurst site notes 🟩 2026-04-27 16:04:02 +00:00
Daniel Roth
8f94bb5435 extract window frame details from elmhurst site notes 🟥 2026-04-27 15:50:25 +00:00
Daniel Roth
9571ed608c map elmhurst window transmission details to epc property data class 🟥 2026-04-27 14:13:02 +00:00
Daniel Roth
00821c5c23 map elmhurst energy fields to epc property data class 🟥 2026-04-27 12:15:28 +00:00
Daniel Roth
7a68fbcae9 extract energy fields from elmhurst site notes 🟩 2026-04-27 12:11:53 +00:00
Daniel Roth
444eaa0c06 extract energy fields from elmhurst site notes 🟥 2026-04-27 12:10:40 +00:00
Daniel Roth
b36c8b884c map remaining Elmhurst fields to EpcPropertyData 🟩 2026-04-24 15:33:59 +00:00
Daniel Roth
20ef8cd489 update local runner to work for elmhurst 2026-04-24 14:01:36 +00:00
Daniel Roth
15ae46ec92 Map Elmhurst site notes to EpcPropertyData 🟥 2026-04-24 13:37:21 +00:00
Daniel Roth
f61add9544 Extract Elmhurst site notes to dataclass 🟩 2026-04-24 13:32:08 +00:00
Daniel Roth
1a53a8d83e Extract Elmhurst site notes to dataclass 🟥 2026-04-24 13:13:24 +00:00
Daniel Roth
a8579db4d9 elmhurst site notes fixture 2026-04-24 13:09:30 +00:00
Daniel Roth
e15646c341 rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
Daniel Roth
b3096b52ad move local runner 2026-04-24 12:49:54 +00:00
Daniel Roth
691d6f04a8 extend EpcPropertyData domain model with site-notes-only fields 🟩 2026-04-23 14:50:41 +00:00
Daniel Roth
146b5999dc additional fields mapped from pdf 6 🟩 2026-04-23 11:23:29 +00:00
Daniel Roth
2bf8bc6d5e additional fields mapped from pdf 5 🟩 2026-04-23 11:12:25 +00:00
Daniel Roth
c5c3f3fc83 additional fields mapped from pdf 4 🟥 2026-04-23 10:36:15 +00:00
Daniel Roth
0a1ba404ad map electric storage heater fuel type 🟥 2026-04-21 15:34:34 +00:00
Daniel Roth
5bd4e22886 map pv connection 🟥 2026-04-21 15:31:01 +00:00
Daniel Roth
8b4b345f7a extract renewables pv connection 🟥 2026-04-21 15:17:34 +00:00
Daniel Roth
0e69f8e7a5 extract cylinder thermostat 🟥 2026-04-21 15:16:49 +00:00
Daniel Roth
22893e8645 map heating immersion type 🟥 2026-04-21 15:11:31 +00:00
Daniel Roth
61dfb34e5b extracy heating immersion type 🟩 2026-04-21 15:08:47 +00:00
Daniel Roth
691efcad72 extracy heating immersion type 🟥 2026-04-21 15:07:18 +00:00
Daniel Roth
fc3742d84d extract cylinder thermostat 🟩 2026-04-21 15:04:45 +00:00
Daniel Roth
7700aac5bb extract cylinder thermostat 🟥 2026-04-21 15:03:12 +00:00
Daniel Roth
f43a4d20eb map secondary heating system to secondary_heating_type 🟥 2026-04-21 11:51:14 +00:00
Daniel Roth
771c700643 extract secondary heating system 🟩 2026-04-21 11:48:39 +00:00
Daniel Roth
1e0d72a805 extract secondary heating system 🟥 2026-04-21 11:47:35 +00:00
Daniel Roth
c558b79b68 map water heating cylinder size 🟥 2026-04-21 11:13:04 +00:00
Daniel Roth
b64c9d275f Rename cylinder_insulation_thickness to cylinder_insulation_thickness_mm 2026-04-21 11:06:21 +00:00
Daniel Roth
0e30d81fe1 map water heating cylinder thickness 🟥 2026-04-21 11:01:01 +00:00
Daniel Roth
0976088cc9 extract no extensions 🟩 2026-04-21 10:53:37 +00:00
Daniel Roth
da26e4a4cb extract no extensions 🟥 2026-04-21 10:53:14 +00:00
Daniel Roth
6b4a8dfef1 extract water heating cylinder thickness alternative field name 🟩 2026-04-21 10:45:56 +00:00
Daniel Roth
ac854f161a extract water heating cylinder thickness 🟥 2026-04-21 10:40:07 +00:00
Daniel Roth
825f5fb096 address JTK review comments 2026-04-20 15:11:17 +00:00
Daniel Roth
5945c31de5 Popualte shower outlets 🟥 2026-04-20 13:15:24 +00:00
Daniel Roth
b9bb580ebd Get building part roof insulation information from Roof Space section 🟥 2026-04-20 12:58:19 +00:00
Daniel Roth
78773ca633 Get floor number from floor name 🟥 2026-04-20 12:50:30 +00:00