graphiti_service: v2.0 — Pattern 1 async job model + search_interface bridge

Major rewrite of the Graphiti sidecar. Two architectural changes: PATTERN 1 ASYNC JOB MODEL Submission and completion are decoupled. POST /episodes and POST /episodes/bulk return job_id immediately; the actual graphiti-core work happens in a background asyncio task. Submitters poll GET /jobs/{job_id} until terminal status (committed | failed). Why: tonight's smoke test confirmed that bulk ingest against the 4,222-entity graph was committing successfully even when the worker's HTTP read-timeout fired. The synchronous interface was producing false-negative failures — work succeeded but the worker stopped listening at the 10-minute read-timeout. Three days of 'saga deadlock' failures reframe as scaling pathology of unindexed similarity search, not substrate deadlocks. Pattern 1 separates submission from completion observation so the worker can't false-negative this way. Architectural commitments: - One in-flight job per sidecar (per graph). Concurrent jobs against the same graph would race on graphiti-core's bulk-resolve path (no transaction boundary). Concurrent multi-tenancy is 'run multiple sidecars,' not 'make one sidecar concurrency-safe across graphs.' - Postgres-backed job state. Survives sidecar restart. On startup the sidecar resets any 'running' rows to 'queued' (their previous run died); the background worker picks them up naturally. - Both endpoints async-shaped for parity. Bulk pathway preserved — load-bearing for first-run corpus migration. Single-episode preserved — load-bearing for state-superseding content per the Stage 2/3 routing rule. graphiti-core's add_episode and add_episode_bulk are unchanged underneath; the async wrapper sits between the HTTP layer and the library call. - Polling cadence: 2s flat at the worker, FOR UPDATE SKIP LOCKED so the design is safe for future multi-sidecar deployment without changes. Postgres helpers (_pg, _job_insert, _job_get, _job_claim_next, _job_complete, _job_fail, _startup_recovery) replace the synchronous graphiti.add_episode call with persistent job state. Background worker loop catches everything, logs everything, never dies from an unexpected error. SEARCH_INTERFACE BRIDGE graphiti-core 0.29.0 builds FalkorSearchOperations as driver._search_ops in FalkorDriver.__init__ but never assigns it to driver.search_interface. search_utils.py:edge_similarity_search and node_similarity_search check 'if driver.search_interface:' and delegate when present, falling through to interpreted-Cypher cosine math when not. The naming mismatch between the two halves of graphiti-core means the per-driver implementation never gets used. Bridge after Graphiti instance construction: driver.search_interface = driver._search_ops This activates the per-driver path which (with our vendored patches) uses db.idx.vector.queryNodes for FalkorDB's native vector index. Empirical result: single-episode add_episode against a 4,277-entity graph went from indefinite hang to 8.2 seconds. The bridge is also a candidate for an upstream PR — pick one name and stick to it across the codebase. Tonight it's local.
2026-05-02 05:19:46 +00:00
parent c0e6159b5e
commit f645b74b1c
1 changed files with 419 additions and 69 deletions
@@ -1,14 +1,44 @@
 """
-Aaron AI — Graphiti Sidecar Service
+Aaron AI — Graphiti Sidecar Service (v2.0 — Pattern 1 async job model)
-Wraps graphiti-core in a FastAPI service to avoid asyncio event loop conflicts.
+
 Wraps graphiti-core in a FastAPI service. Pattern 1 architecture: ingest
 submission and completion are decoupled. Submitters POST to /episodes or
 /episodes/bulk and receive a job_id; an in-process background worker
 processes jobs serially against the graph; submitters poll GET /jobs/{id}
 until terminal status.
 Why Pattern 1: tonight's smoke test (2026-05-02) confirmed that bulk
 ingest against the 4,222-entity graph commits successfully even when the
 worker's HTTP read-timeout fires. The synchronous interface was producing
 false-negative failures — work succeeded but the worker stopped listening.
 Pattern 1 separates submission from completion observation so the worker
 can't false-negative this way.
 Architectural commitments:
 - One in-flight job per sidecar (per graph). Concurrent jobs against the
  same graph would race on graphiti-core's _resolve_nodes_and_edges_bulk
  (no transaction boundary, no internal coordination). Concurrent
  multi-tenancy is "run multiple sidecars," not "make one sidecar
  concurrency-safe across graphs."
 - Postgres-backed job state. Survives sidecar restart. On startup the
  sidecar resets any 'running' rows to 'queued' (their previous run died);
  the background worker picks them up naturally.
 - Both /episodes and /episodes/bulk are async-shaped for parity. graphiti-
  core operations underneath (add_episode, add_episode_bulk) are unchanged.
 - The bulk pathway is preserved — load-bearing for first-run corpus
  migration. Single-episode is preserved — load-bearing for state-
  superseding content per the Stage 2/3 routing rule.
 Port 8001 (internal only). No OpenAI dependency.
 """
-import os, logging, sys, traceback
+import os, logging, sys, asyncio, traceback, uuid, json
 from contextlib import asynccontextmanager
 from datetime import datetime
 from pathlib import Path
 import psycopg2
 import psycopg2.extras
 from dotenv import load_dotenv
 from fastapi import FastAPI, HTTPException
 from pydantic import BaseModel
@@ -31,8 +61,18 @@ FALKORDB_PORT = int(os.getenv("FALKORDB_PORT", "6379"))
 LLM_PROVIDER  = os.getenv("LLM_PROVIDER", "anthropic")
 LLM_MODEL     = os.getenv("LLM_MODEL", "claude-sonnet-4-6")
 LLM_API_KEY   = os.getenv("LLM_API_KEY") or os.getenv("ANTHROPIC_API_KEY")
 PG_DSN        = os.getenv("PG_DSN")
 SIDECAR_NAME  = os.getenv("SIDECAR_NAME", "graphiti-sidecar-1")
 os.environ["EMBEDDING_DIM"] = "384"
 # Background worker configuration. Polls Postgres for queued jobs every
 # WORKER_POLL_INTERVAL seconds when idle. Single-job-at-a-time by design;
 # no concurrency primitive beyond the serial loop. The sleep is brief
 # enough to feel responsive but long enough to avoid burning CPU on an
 # empty queue.
 WORKER_POLL_INTERVAL = 2.0
 def get_llm_client():
    from graphiti_core.llm_client.config import LLMConfig
    config = LLMConfig(api_key=LLM_API_KEY, model=LLM_MODEL)
@@ -50,16 +90,286 @@ def get_llm_client():
        return GroqClient(config)
    raise ValueError(f"Unsupported LLM provider: {LLM_PROVIDER}")
 graphiti_instance = None
-async def get_graphiti():
+graphiti_instance = None
-    if graphiti_instance is None:
+worker_task = None
-        raise HTTPException(status_code=503, detail="Graphiti not initialized")
+
-    return graphiti_instance
+
 # ---------------------------------------------------------------------------
 # Postgres job-state helpers. Synchronous psycopg2 calls inside async
 # functions: each call opens a fresh connection, runs one statement, closes.
 # Acceptable here because traffic is low (single-digit jobs/min steady state)
 # and the simplicity is worth more than connection pooling. If this ever
 # becomes a bottleneck, swap to asyncpg or psycopg3 async.
 # ---------------------------------------------------------------------------
 def _pg():
    return psycopg2.connect(PG_DSN)
 def _job_insert(job_id: str, job_type: str, payload: dict) -> None:
    """Write a new job row in 'queued' status."""
    pg = _pg()
    cur = pg.cursor()
    cur.execute(
        """
        INSERT INTO graphiti_jobs (job_id, job_type, payload, status, submitted_by)
        VALUES (%s, %s, %s::jsonb, 'queued', %s)
        """,
        (job_id, job_type, json.dumps(payload), SIDECAR_NAME),
    )
    pg.commit()
    pg.close()
 def _job_get(job_id: str) -> dict | None:
    """Read a single job by id. Returns None if not found."""
    pg = _pg()
    cur = pg.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
    cur.execute(
        """
        SELECT job_id, job_type, status, enqueued_at, started_at, finished_at,
               error, summary, submitted_by
        FROM graphiti_jobs
        WHERE job_id = %s
        """,
        (job_id,),
    )
    row = cur.fetchone()
    pg.close()
    if row is None:
        return None
    # Convert UUID, datetimes for JSON serialization
    return {
        "job_id": str(row["job_id"]),
        "job_type": row["job_type"],
        "status": row["status"],
        "enqueued_at": row["enqueued_at"].isoformat() if row["enqueued_at"] else None,
        "started_at": row["started_at"].isoformat() if row["started_at"] else None,
        "finished_at": row["finished_at"].isoformat() if row["finished_at"] else None,
        "error": row["error"],
        "summary": row["summary"],
        "submitted_by": row["submitted_by"],
    }
 def _job_claim_next() -> dict | None:
    """Atomically claim the oldest queued job for processing.
    Uses SELECT ... FOR UPDATE SKIP LOCKED so multiple sidecar instances
    (future multi-tenant deployment) don't fight over the same row. For
    single-sidecar deployments this is just a clean atomic transition.
    Returns the full job row (including payload) or None if queue is empty.
    """
    pg = _pg()
    cur = pg.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
    cur.execute(
        """
        WITH next_job AS (
            SELECT job_id
            FROM graphiti_jobs
            WHERE status = 'queued'
            ORDER BY enqueued_at ASC
            LIMIT 1
            FOR UPDATE SKIP LOCKED
        )
        UPDATE graphiti_jobs g
        SET status = 'running', started_at = NOW()
        FROM next_job
        WHERE g.job_id = next_job.job_id
        RETURNING g.job_id, g.job_type, g.payload
        """
    )
    row = cur.fetchone()
    pg.commit()
    pg.close()
    if row is None:
        return None
    return {
        "job_id": str(row["job_id"]),
        "job_type": row["job_type"],
        "payload": row["payload"],  # already a dict via JSONB
    }
 def _job_complete(job_id: str, summary: dict) -> None:
    pg = _pg()
    cur = pg.cursor()
    cur.execute(
        """
        UPDATE graphiti_jobs
        SET status = 'committed', finished_at = NOW(), summary = %s::jsonb
        WHERE job_id = %s
        """,
        (json.dumps(summary), job_id),
    )
    pg.commit()
    pg.close()
 def _job_fail(job_id: str, error: str) -> None:
    pg = _pg()
    cur = pg.cursor()
    cur.execute(
        """
        UPDATE graphiti_jobs
        SET status = 'failed', finished_at = NOW(), error = %s
        WHERE job_id = %s
        """,
        (error[:2000], job_id),  # truncate to keep error column reasonable
    )
    pg.commit()
    pg.close()
 def _startup_recovery() -> int:
    """Reset any 'running' jobs to 'queued' on startup.
    Rationale: if the sidecar died while processing a job, that row is
    stuck in 'running' with no process advancing it. The right behavior
    on restart is to retry. graphiti-core's add_episode_bulk and
    add_episode are idempotent against the graph (dedup handles duplicate
    submission), so re-running a job is safe — at worst, a second run
    incurs API spend on resolve calls that no-op against an already-
    committed entity set.
    Returns the count of recovered jobs.
    """
    pg = _pg()
    cur = pg.cursor()
    cur.execute(
        """
        UPDATE graphiti_jobs
        SET status = 'queued', started_at = NULL
        WHERE status = 'running'
        """
    )
    count = cur.rowcount
    pg.commit()
    pg.close()
    return count
 # ---------------------------------------------------------------------------
 # Background worker — single asyncio task running for the sidecar lifetime.
 # Processes one job at a time. No concurrency. Restart recovery is handled
 # by _startup_recovery() before this task starts.
 # ---------------------------------------------------------------------------
 async def background_worker():
    """Serial job processor. Polls graphiti_jobs, processes one at a time."""
    log.info("Background worker started")
    from graphiti_core.nodes import EpisodeType
    from graphiti_core.utils.bulk_utils import RawEpisode
    while True:
        try:
            claimed = _job_claim_next()
            if claimed is None:
                await asyncio.sleep(WORKER_POLL_INTERVAL)
                continue
            job_id = claimed["job_id"]
            job_type = claimed["job_type"]
            payload = claimed["payload"]
            log.info(f"Processing job {job_id} (type={job_type})")
            start = datetime.now()
            try:
                if job_type == "bulk":
                    summary = await _process_bulk_job(payload, EpisodeType, RawEpisode)
                elif job_type == "single":
                    summary = await _process_single_job(payload, EpisodeType)
                else:
                    raise ValueError(f"Unknown job_type: {job_type}")
                duration = (datetime.now() - start).total_seconds()
                summary["duration_seconds"] = duration
                _job_complete(job_id, summary)
                log.info(f"Committed job {job_id} in {duration:.1f}s — {summary}")
            except Exception as e:
                duration = (datetime.now() - start).total_seconds()
                err = f"{type(e).__name__}: {e}"
                log.error(f"Job {job_id} failed after {duration:.1f}s: {err}\n{traceback.format_exc()}")
                _job_fail(job_id, err)
        except asyncio.CancelledError:
            log.info("Background worker cancelled")
            raise
        except Exception as e:
            # Defensive: don't let the worker loop die from an unexpected error.
            # Log it, sleep briefly, continue.
            log.error(f"Worker loop error: {e}\n{traceback.format_exc()}")
            await asyncio.sleep(5.0)
 async def _process_bulk_job(payload: dict, EpisodeType, RawEpisode) -> dict:
    """Run add_episode_bulk for a 'bulk' job. Payload mirrors BulkEpisodeRequest."""
    raw_episodes = []
    for ep in payload["episodes"]:
        ref_time = (
            datetime.fromisoformat(ep["timestamp"])
            if ep.get("timestamp") else datetime.now()
        )
        raw_episodes.append(RawEpisode(
            name=ep["name"],
            content=ep["content"],
            source_description=ep.get("source_description", ""),
            source=EpisodeType.text,
            reference_time=ref_time,
        ))
    kwargs = dict(
        bulk_episodes=raw_episodes,
        group_id=payload.get("group_id") or GROUP_ID,
        saga=payload.get("saga"),
    )
    if payload.get("custom_extraction_instructions") is not None:
        kwargs["custom_extraction_instructions"] = payload["custom_extraction_instructions"]
    result = await graphiti_instance.add_episode_bulk(**kwargs)
    return {
        "type": "bulk",
        "episodes": len(result.episodes) if result and result.episodes else len(raw_episodes),
        "nodes": len(result.nodes) if result and result.nodes else 0,
        "edges": len(result.edges) if result and result.edges else 0,
    }
 async def _process_single_job(payload: dict, EpisodeType) -> dict:
    """Run add_episode for a 'single' job. Payload mirrors EpisodeRequest."""
    ref_time = (
        datetime.fromisoformat(payload["timestamp"])
        if payload.get("timestamp") else datetime.now()
    )
    kwargs = dict(
        name=payload["name"],
        episode_body=payload["content"],
        source=EpisodeType.text,
        reference_time=ref_time,
        source_description=payload.get("source_description", ""),
        group_id=payload.get("group_id") or GROUP_ID,
        custom_extraction_instructions=payload.get("custom_extraction_instructions"),
    )
    if payload.get("saga") is not None:
        kwargs["saga"] = payload["saga"]
    await graphiti_instance.add_episode(**kwargs)
    return {"type": "single", "episodes": 1}
 # ---------------------------------------------------------------------------
 # Lifespan & app
 # ---------------------------------------------------------------------------
@asynccontextmanager
 async def lifespan(app: FastAPI):
-    global graphiti_instance
+    global graphiti_instance, worker_task
    sys.path.insert(0, str(Path.home() / "aaronai" / "scripts"))
    log.info("Loading embedding and reranker models...")
    from st_embedder import SentenceTransformerEmbedder
@@ -75,11 +385,51 @@ async def lifespan(app: FastAPI):
        max_coroutines=2,
    )
    await graphiti_instance.build_indices_and_constraints()
    # PATCHED 2026-05-02: bridge the per-driver SearchOperations to the
    # search_interface attribute that search_utils.py dispatches on.
    # graphiti-core 0.29.0 builds FalkorSearchOperations as driver._search_ops
    # but never assigns it to driver.search_interface — naming mismatch
    # between the two halves of the codebase. Without this, search_utils.py
    # falls through to interpreted-Cypher cosine math (full-table scan) even
    # when our patched FalkorSearchOperations exists. Setting search_interface
    # activates the per-driver vector-index path.
    if hasattr(graphiti_instance.driver, '_search_ops') and graphiti_instance.driver.search_interface is None:
        graphiti_instance.driver.search_interface = graphiti_instance.driver._search_ops
        log.info("Wired driver.search_interface = driver._search_ops (vector index path active)")
    log.info(f"Graphiti ready — provider: {LLM_PROVIDER}, group: {GROUP_ID}")
    # Recover any jobs left 'running' from a previous sidecar instance.
    # They become 'queued' again and the background worker picks them up.
    recovered = _startup_recovery()
    if recovered > 0:
        log.info(f"Startup recovery: reset {recovered} running job(s) to queued")
    # Start the background job worker.
    worker_task = asyncio.create_task(background_worker())
    log.info("Sidecar ready — accepting job submissions on :8001")
    yield
    # Shutdown: cancel worker, close graphiti.
    if worker_task is not None:
        worker_task.cancel()
        try:
            await worker_task
        except asyncio.CancelledError:
            pass
    await graphiti_instance.close()
-app = FastAPI(title="Aaron AI Graphiti Sidecar", lifespan=lifespan)
+
 app = FastAPI(title="Aaron AI Graphiti Sidecar (Pattern 1)", lifespan=lifespan)
 # ---------------------------------------------------------------------------
 # Request models — preserved from v1.0 with no payload-shape changes. The
 # only API change is the response shape: instead of blocking until
 # graphiti-core returns, submission endpoints return a job_id immediately.
 # ---------------------------------------------------------------------------
 class BulkEpisodeItem(BaseModel):
    name: str
@@ -92,11 +442,6 @@ class BulkEpisodeRequest(BaseModel):
    episodes: list[BulkEpisodeItem]
    group_id: str | None = None
    saga: str | None = None
    # Batch-level extraction guidance. graphiti-core inserts this into the
    # entity-extraction and edge-extraction prompts only — NOT into dedup
    # prompts. Use to bias *what* gets extracted, not *how* dedup runs.
    # Verified 2026-05-01 by reading extract_nodes.py, extract_edges.py,
    # dedupe_nodes.py, dedupe_edges.py in graphiti-core.
    custom_extraction_instructions: str | None = None
@@ -109,72 +454,76 @@ class EpisodeRequest(BaseModel):
    custom_extraction_instructions: str | None = None
    saga: str | None = None
 # ---------------------------------------------------------------------------
 # Endpoints
 # ---------------------------------------------------------------------------
@app.get("/health")
 async def health():
-    return {"ok": True, "provider": LLM_PROVIDER, "group": GROUP_ID}
+    return {
        "ok": True,
        "provider": LLM_PROVIDER,
        "group": GROUP_ID,
        "sidecar": SIDECAR_NAME,
        "version": "2.0",
    }
@app.post("/episodes")
 async def add_episode(req: EpisodeRequest):
    g = await get_graphiti()
    from graphiti_core.nodes import EpisodeType
    try:
        ref_time = datetime.fromisoformat(req.timestamp) if req.timestamp else datetime.now()
        kwargs = dict(
            name=req.name,
            episode_body=req.content,
            source=EpisodeType.text,
            reference_time=ref_time,
            source_description=req.source_description,
            group_id=req.group_id or GROUP_ID,
            custom_extraction_instructions=req.custom_extraction_instructions,
        )
        # Saga is supported on graphiti-core add_episode but kept optional
        # so older callers don't need to know about it.
        if req.saga is not None:
            kwargs["saga"] = req.saga
        await g.add_episode(**kwargs)
        return {"ok": True}
    except Exception as e:
        log.error(f"Episode ingestion failed: {e}\n{traceback.format_exc()}")
        raise HTTPException(status_code=500, detail=str(e))
@app.post("/episodes/bulk")
-async def add_episodes_bulk(req: BulkEpisodeRequest):
+async def submit_bulk(req: BulkEpisodeRequest):
-    g = await get_graphiti()
+    """Submit a bulk ingest job. Returns job_id for polling.
-    from graphiti_core.nodes import EpisodeType
+
-    from graphiti_core.utils.bulk_utils import RawEpisode
+    Job is processed serially by the sidecar's background worker; one
-    raw_episodes = []
+    bulk-or-single job at a time per graph. No HTTP read-timeout
-    for ep in req.episodes:
+    blocking. Submitter polls GET /jobs/{job_id} until terminal status.
-        ref_time = datetime.fromisoformat(ep.timestamp) if ep.timestamp else datetime.now()
+    """
-        raw_episodes.append(RawEpisode(
+    if graphiti_instance is None:
-            name=ep.name,
+        raise HTTPException(status_code=503, detail="Graphiti not initialized")
-            content=ep.content,
+
-            source_description=ep.source_description,
+    job_id = str(uuid.uuid4())
-            source=EpisodeType.text,
+    payload = req.model_dump()
            reference_time=ref_time,
        ))
    try:
-        kwargs = dict(
+        _job_insert(job_id, "bulk", payload)
            bulk_episodes=raw_episodes,
            group_id=req.group_id or GROUP_ID,
            saga=req.saga or None,
        )
        # Pass-through only when set, so callers that don't supply
        # instructions get graphiti-core's default behavior unchanged.
        if req.custom_extraction_instructions is not None:
            kwargs["custom_extraction_instructions"] = req.custom_extraction_instructions
        result = await g.add_episode_bulk(**kwargs)
        return {"ok": True, "count": len(raw_episodes)}
    except Exception as e:
-        log.error(f"Bulk ingestion failed: {e}\n{traceback.format_exc()}")
+        log.error(f"Failed to enqueue bulk job: {e}\n{traceback.format_exc()}")
-        raise HTTPException(status_code=500, detail=str(e))
+        raise HTTPException(status_code=500, detail=f"Job enqueue failed: {e}")
    return {"job_id": job_id, "status": "queued"}
@app.post("/episodes")
 async def submit_single(req: EpisodeRequest):
    """Submit a single-episode ingest job. Returns job_id for polling."""
    if graphiti_instance is None:
        raise HTTPException(status_code=503, detail="Graphiti not initialized")
    job_id = str(uuid.uuid4())
    payload = req.model_dump()
    try:
        _job_insert(job_id, "single", payload)
    except Exception as e:
        log.error(f"Failed to enqueue single job: {e}\n{traceback.format_exc()}")
        raise HTTPException(status_code=500, detail=f"Job enqueue failed: {e}")
    return {"job_id": job_id, "status": "queued"}
@app.get("/jobs/{job_id}")
 async def get_job(job_id: str):
    """Poll a job's status. Returns 404 if job not found."""
    job = _job_get(job_id)
    if job is None:
        raise HTTPException(status_code=404, detail=f"Job {job_id} not found")
    return job
@app.get("/search")
 async def search(query: str, limit: int = 8, group_id: str | None = None):
-    g = await get_graphiti()
+    if graphiti_instance is None:
        raise HTTPException(status_code=503, detail="Graphiti not initialized")
    try:
-        results = await g.search(
+        results = await graphiti_instance.search(
            query=query,
            num_results=limit,
            group_ids=[group_id or GROUP_ID],
@@ -195,6 +544,7 @@ async def search(query: str, limit: int = 8, group_id: str | None = None):
        log.error(f"Search failed: {e}\n{traceback.format_exc()}")
        raise HTTPException(status_code=500, detail=str(e))
 if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="127.0.0.1", port=8001, log_level="info")