Use NullPool as a graceful ceiling for the one-connection-per-lambda design

The invocation is architecturally one DB connection at a time (read up front,
sequential write Units of Work, overrides resolved on the unit's own session).
Keep that as the design intent, but back it with NullPool instead of a fixed
pool_size=1 pool: each checkout opens a fresh connection and closes it on return,
so there is no pool slot to exhaust.

The difference is the failure mode if a path ever regresses and holds two
Sessions at once. A pool_size=1/max_overflow=0 pool turns that into a hard
30s dead-lock that fails the whole invocation ("QueuePool limit of size 1
overflow 0 reached, connection timed out"). NullPool instead opens a transient
second connection for that instant and the Lambda keeps running. The design
target stays one connection; NullPool just keeps it alive if we slip.

The single-connection invariant itself is still enforced in the Unit of Work
(overrides read on the unit's own session) and pinned by the regression test,
which uses its own strict pool_size=1 engine so it asserts the architecture
regardless of the production NullPool choice.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Jun-te Kim 2026-06-24 17:10:23 +00:00
parent de71f9abb6
commit fb308cfaea

View file

@ -18,18 +18,19 @@ All Measure Types are considered: pricing goes through
and heating gaps) are priced from the committed off-catalogue overlay instead of and heating gaps) are priced from the committed off-catalogue overlay instead of
crashing. crashing.
DB engine is module-scoped so the connection pool is reused across warm The DB engine is module-scoped (ADR-0012). Architecturally each invocation uses
invocations (ADR-0012). The pool holds a single connection (``pool_size=1``): the one DB connection at a time: the handler reads everything up front overrides,
handler reads everything up front overrides, Scenario, a catalogue snapshot, and Scenario, a catalogue snapshot, and stored Solar through one short-lived read
stored Solar through one short-lived read Session, closes it, then writes each Session, closes it, then writes each Property in a sequential Unit of Work whose
Property in a sequential Unit of Work, so the read and write Sessions never overrides resolve on its own session, so no two Sessions ever overlap. The engine
overlap. The orchestrator shares the same engine and releases its connection uses ``NullPool`` rather than a fixed pool so that target is a graceful ceiling,
between bookkeeping commits, so one invocation uses one DB connection at a time. not a hard one: a fresh connection is opened per checkout and closed on return,
so there is no pool slot to exhaust any future accidental overlap opens a
transient second connection instead of dead-locking the Lambda.
""" """
from __future__ import annotations from __future__ import annotations
import dataclasses
import io import io
import os import os
from collections.abc import Callable, Generator from collections.abc import Callable, Generator
@ -39,6 +40,7 @@ from typing import Any, Optional, cast
import boto3 import boto3
import pandas as pd # pyright: ignore[reportMissingTypeStubs] import pandas as pd # pyright: ignore[reportMissingTypeStubs]
from sqlalchemy import Engine, text from sqlalchemy import Engine, text
from sqlalchemy.pool import NullPool
from sqlmodel import Session from sqlmodel import Session
from datatypes.epc.domain.epc_property_data import ( from datatypes.epc.domain.epc_property_data import (
@ -136,26 +138,34 @@ def _get_engine() -> Engine:
global _engine global _engine
if _engine is None: if _engine is None:
config = PostgresConfig.from_env(dict(os.environ)) config = PostgresConfig.from_env(dict(os.environ))
# One connection per invocation: the handler reads everything up front # Architecturally one connection per invocation: the handler reads
# through one short-lived read Session, closes it, then writes each # everything up front through one short-lived read Session, closes it,
# Property in a sequential Unit of Work — so the read and write Sessions # then writes each Property in a sequential Unit of Work — and the Unit of
# never overlap and a single pooled connection suffices. The orchestrator # Work resolves overrides on its own session — so no two Sessions overlap
# shares this engine (see ``_shared_engine_orchestrator``) and releases # and a single connection suffices. 32 concurrent containers × 1 = 32
# its connection between bookkeeping commits, so it holds none during the # against RDS.
# work. 32 concurrent containers × 1 connection = 32 against RDS. #
_engine = make_engine(dataclasses.replace(config, pool_size=1, max_overflow=0)) # NullPool, not a fixed pool, enforces that as a *graceful* ceiling rather
# than a hard one: each checkout opens a fresh connection and closes it on
# return, so there is no pool slot to exhaust. If a future code path ever
# holds two Sessions at once it opens a second connection for that instant
# instead of dead-locking on a 1-slot pool and failing the whole
# invocation (the "QueuePool limit of size 1 overflow 0 reached" timeout).
# The design target stays one connection; NullPool just keeps the Lambda
# running if we ever regress it.
_engine = make_engine(config, poolclass=NullPool)
return _engine return _engine
@contextmanager @contextmanager
def _shared_engine_orchestrator() -> Generator[TaskOrchestrator, None, None]: def _shared_engine_orchestrator() -> Generator[TaskOrchestrator, None, None]:
"""A ``TaskOrchestrator`` on the same module-scoped pooled engine as the """A ``TaskOrchestrator`` on the same module-scoped engine as the modelling
modelling work not a separate per-invocation NullPool engine. work, not a separate one.
Its repositories commit on every ``save``/``create``, releasing the pooled Its repositories commit on every ``save``/``create``, releasing the
connection between bookkeeping calls, so it holds none while the wrapped connection between bookkeeping calls, so it holds none while the wrapped
handler body runs. Combined with the read-then-write handler structure and handler body runs. Combined with the read-then-write handler structure, the
``pool_size=1``, the whole invocation uses one DB connection at a time.""" whole invocation uses one DB connection at a time."""
engine = _get_engine() engine = _get_engine()
with Session(engine) as session: with Session(engine) as session:
yield TaskOrchestrator( yield TaskOrchestrator(