Use NullPool as a graceful ceiling for the one-connection-per-lambda design

The invocation is architecturally one DB connection at a time (read up front,
sequential write Units of Work, overrides resolved on the unit's own session).
Keep that as the design intent, but back it with NullPool instead of a fixed
pool_size=1 pool: each checkout opens a fresh connection and closes it on return,
so there is no pool slot to exhaust.

The difference is the failure mode if a path ever regresses and holds two
Sessions at once. A pool_size=1/max_overflow=0 pool turns that into a hard
30s dead-lock that fails the whole invocation ("QueuePool limit of size 1
overflow 0 reached, connection timed out"). NullPool instead opens a transient
second connection for that instant and the Lambda keeps running. The design
target stays one connection; NullPool just keeps it alive if we slip.

The single-connection invariant itself is still enforced in the Unit of Work
(overrides read on the unit's own session) and pinned by the regression test,
which uses its own strict pool_size=1 engine so it asserts the architecture
regardless of the production NullPool choice.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Jun-te Kim 2026-06-24 17:10:23 +00:00
parent de71f9abb6
commit fb308cfaea

View file

@ -18,18 +18,19 @@ All Measure Types are considered: pricing goes through
and heating gaps) are priced from the committed off-catalogue overlay instead of
crashing.
DB engine is module-scoped so the connection pool is reused across warm
invocations (ADR-0012). The pool holds a single connection (``pool_size=1``): the
handler reads everything up front overrides, Scenario, a catalogue snapshot, and
stored Solar through one short-lived read Session, closes it, then writes each
Property in a sequential Unit of Work, so the read and write Sessions never
overlap. The orchestrator shares the same engine and releases its connection
between bookkeeping commits, so one invocation uses one DB connection at a time.
The DB engine is module-scoped (ADR-0012). Architecturally each invocation uses
one DB connection at a time: the handler reads everything up front overrides,
Scenario, a catalogue snapshot, and stored Solar through one short-lived read
Session, closes it, then writes each Property in a sequential Unit of Work whose
overrides resolve on its own session, so no two Sessions ever overlap. The engine
uses ``NullPool`` rather than a fixed pool so that target is a graceful ceiling,
not a hard one: a fresh connection is opened per checkout and closed on return,
so there is no pool slot to exhaust any future accidental overlap opens a
transient second connection instead of dead-locking the Lambda.
"""
from __future__ import annotations
import dataclasses
import io
import os
from collections.abc import Callable, Generator
@ -39,6 +40,7 @@ from typing import Any, Optional, cast
import boto3
import pandas as pd # pyright: ignore[reportMissingTypeStubs]
from sqlalchemy import Engine, text
from sqlalchemy.pool import NullPool
from sqlmodel import Session
from datatypes.epc.domain.epc_property_data import (
@ -136,26 +138,34 @@ def _get_engine() -> Engine:
global _engine
if _engine is None:
config = PostgresConfig.from_env(dict(os.environ))
# One connection per invocation: the handler reads everything up front
# through one short-lived read Session, closes it, then writes each
# Property in a sequential Unit of Work — so the read and write Sessions
# never overlap and a single pooled connection suffices. The orchestrator
# shares this engine (see ``_shared_engine_orchestrator``) and releases
# its connection between bookkeeping commits, so it holds none during the
# work. 32 concurrent containers × 1 connection = 32 against RDS.
_engine = make_engine(dataclasses.replace(config, pool_size=1, max_overflow=0))
# Architecturally one connection per invocation: the handler reads
# everything up front through one short-lived read Session, closes it,
# then writes each Property in a sequential Unit of Work — and the Unit of
# Work resolves overrides on its own session — so no two Sessions overlap
# and a single connection suffices. 32 concurrent containers × 1 = 32
# against RDS.
#
# NullPool, not a fixed pool, enforces that as a *graceful* ceiling rather
# than a hard one: each checkout opens a fresh connection and closes it on
# return, so there is no pool slot to exhaust. If a future code path ever
# holds two Sessions at once it opens a second connection for that instant
# instead of dead-locking on a 1-slot pool and failing the whole
# invocation (the "QueuePool limit of size 1 overflow 0 reached" timeout).
# The design target stays one connection; NullPool just keeps the Lambda
# running if we ever regress it.
_engine = make_engine(config, poolclass=NullPool)
return _engine
@contextmanager
def _shared_engine_orchestrator() -> Generator[TaskOrchestrator, None, None]:
"""A ``TaskOrchestrator`` on the same module-scoped pooled engine as the
modelling work not a separate per-invocation NullPool engine.
"""A ``TaskOrchestrator`` on the same module-scoped engine as the modelling
work, not a separate one.
Its repositories commit on every ``save``/``create``, releasing the pooled
Its repositories commit on every ``save``/``create``, releasing the
connection between bookkeeping calls, so it holds none while the wrapped
handler body runs. Combined with the read-then-write handler structure and
``pool_size=1``, the whole invocation uses one DB connection at a time."""
handler body runs. Combined with the read-then-write handler structure, the
whole invocation uses one DB connection at a time."""
engine = _get_engine()
with Session(engine) as session:
yield TaskOrchestrator(