e7de7fb64b97836b1f6ca572c8148bd1b513dd5b
Reads new routing columns from stage_3_queue (state_type, state_type_confidence,
supersedes_prior_state, state_type_rationale) and dispatches each row to one of
two ingest pathways:
- BULK pathway (existing, renamed from ingest_to_graphiti to ingest_bulk):
safer-cheaper default. Used when supersedes=false OR confidence=low OR
routing fields are NULL (legacy rows). Skips edge invalidation per
graphiti-core's bulk semantics.
- SINGLE-EPISODE pathway (new, ingest_single_episode): used only when
supersedes_prior_state=true AND confidence in {medium, high}. Per-chunk
POST to /episodes (singular endpoint) with shared saga tag. Each call
independent — own timeout, own retry envelope.
Routing decision isolated in should_route_single_episode() with unit-tested
truth table covering all eight (supersedes × confidence) combinations.
Per-chunk heartbeat (heartbeat_row): single-episode pathway updates
stage_3_queue.started_at after each successful chunk POST so a long-running
document doesn't cross the 10-minute stale threshold mid-process and get
re-dequeued. started_at semantics now: 'last activity timestamp' rather
than 'began at'. Best-effort; failures logged not raised.
Partial-success on chunk failure: previously-committed chunks stay in the
graph; the function raises with detail (single_episode_partial: chunk N/M
failed, succeeded K). The row is marked failed_at with that detail. Re-
ingestion would re-POST chunks 1..N-1 against the graph; graphiti's dedup
handles them as no-ops.
DB connection scoping: process_one no longer holds one Postgres connection
across the whole ingest call (which can run an hour for long single-episode
documents). Each DB write gets a short-lived connection.
Phase A item 3 of three. Closes the mechanical-patches block. Item 4
(custom_extraction_instructions text design) is the remaining intellectual
work; sidecar and worker plumbing is now ready for it.
Description
No description provided
Languages
Python
95.9%
HTML
3.7%
Shell
0.4%