Files
aaronAI/scripts
aaron f645b74b1c graphiti_service: v2.0 — Pattern 1 async job model + search_interface bridge
Major rewrite of the Graphiti sidecar. Two architectural changes:

PATTERN 1 ASYNC JOB MODEL

Submission and completion are decoupled. POST /episodes and
POST /episodes/bulk return job_id immediately; the actual graphiti-core
work happens in a background asyncio task. Submitters poll
GET /jobs/{job_id} until terminal status (committed | failed).

Why: tonight's smoke test confirmed that bulk ingest against the
4,222-entity graph was committing successfully even when the worker's
HTTP read-timeout fired. The synchronous interface was producing
false-negative failures — work succeeded but the worker stopped
listening at the 10-minute read-timeout. Three days of 'saga deadlock'
failures reframe as scaling pathology of unindexed similarity search,
not substrate deadlocks. Pattern 1 separates submission from completion
observation so the worker can't false-negative this way.

Architectural commitments:

- One in-flight job per sidecar (per graph). Concurrent jobs against
  the same graph would race on graphiti-core's bulk-resolve path (no
  transaction boundary). Concurrent multi-tenancy is 'run multiple
  sidecars,' not 'make one sidecar concurrency-safe across graphs.'

- Postgres-backed job state. Survives sidecar restart. On startup the
  sidecar resets any 'running' rows to 'queued' (their previous run
  died); the background worker picks them up naturally.

- Both endpoints async-shaped for parity. Bulk pathway preserved —
  load-bearing for first-run corpus migration. Single-episode
  preserved — load-bearing for state-superseding content per the
  Stage 2/3 routing rule. graphiti-core's add_episode and
  add_episode_bulk are unchanged underneath; the async wrapper sits
  between the HTTP layer and the library call.

- Polling cadence: 2s flat at the worker, FOR UPDATE SKIP LOCKED so
  the design is safe for future multi-sidecar deployment without
  changes.

Postgres helpers (_pg, _job_insert, _job_get, _job_claim_next,
_job_complete, _job_fail, _startup_recovery) replace the synchronous
graphiti.add_episode call with persistent job state. Background worker
loop catches everything, logs everything, never dies from an unexpected
error.

SEARCH_INTERFACE BRIDGE

graphiti-core 0.29.0 builds FalkorSearchOperations as
driver._search_ops in FalkorDriver.__init__ but never assigns it to
driver.search_interface. search_utils.py:edge_similarity_search and
node_similarity_search check 'if driver.search_interface:' and
delegate when present, falling through to interpreted-Cypher cosine
math when not. The naming mismatch between the two halves of
graphiti-core means the per-driver implementation never gets used.

Bridge after Graphiti instance construction:
  driver.search_interface = driver._search_ops

This activates the per-driver path which (with our vendored patches)
uses db.idx.vector.queryNodes for FalkorDB's native vector index.
Empirical result: single-episode add_episode against a 4,277-entity
graph went from indefinite hang to 8.2 seconds.

The bridge is also a candidate for an upstream PR — pick one name and
stick to it across the codebase. Tonight it's local.
2026-05-02 05:19:46 +00:00
..
2026-04-26 16:21:15 +00:00