Add Pattern 1 async job model migration
Adds graphiti_jobs table for sidecar's async ingest queue and external_job_id column on stage_3_queue for worker's polling reference. Tonight's smoke test diagnosed that bulk ingest against the 4,222-entity graph commits successfully but the worker's 600s HTTP read-timeout fires before the sidecar's response returns. Three days of 'saga deadlock' failures were false negatives — the work succeeded; the worker just stopped listening. Pattern 1 separates submission from completion observation so the worker can't false-negative this way. Migration only — sidecar and worker code changes follow in subsequent commits.
This commit is contained in:
@@ -0,0 +1,55 @@
|
||||
-- Migration: 20260502-001_async_job_model
|
||||
-- Purpose: Pattern 1 async job model — sidecar processes ingest jobs serially
|
||||
-- via Postgres-backed queue. Worker submits and polls rather than
|
||||
-- blocking on synchronous HTTP response.
|
||||
--
|
||||
-- Architectural rationale: tonight's smoke test (2026-05-02 ~01:40-01:50 UTC)
|
||||
-- diagnosed that bulk ingest against a 4,222-entity graph commits successfully
|
||||
-- but the worker's HTTP read-timeout fires before the response returns. Three
|
||||
-- days of "saga deadlock" failures were false negatives — the work succeeded;
|
||||
-- the worker just stopped listening. Pattern 1 separates submission from
|
||||
-- completion observation so the worker can't false-negative this way.
|
||||
--
|
||||
-- The job model is also the natural data source for Phase A items 6-7
|
||||
-- (metrics tables) — graphiti_jobs records duration, status transitions,
|
||||
-- and per-job summary that those tables will aggregate.
|
||||
--
|
||||
-- Idempotent: safe to re-run.
|
||||
|
||||
-- Job state for sidecar's async ingest queue.
|
||||
-- One row per submitted bulk-or-single ingest. Sidecar reads queued jobs
|
||||
-- on startup to resume after restart. Worker polls status until terminal.
|
||||
CREATE TABLE IF NOT EXISTS graphiti_jobs (
|
||||
job_id UUID PRIMARY KEY,
|
||||
job_type TEXT NOT NULL CHECK (job_type IN ('bulk', 'single')),
|
||||
payload JSONB NOT NULL, -- full submitted request body
|
||||
status TEXT NOT NULL DEFAULT 'queued' -- 'queued'|'running'|'committed'|'failed'
|
||||
CHECK (status IN ('queued', 'running', 'committed', 'failed')),
|
||||
enqueued_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
started_at TIMESTAMPTZ,
|
||||
finished_at TIMESTAMPTZ,
|
||||
error TEXT, -- non-null when status='failed'
|
||||
summary JSONB, -- {nodes: N, edges: N, episodes: N}
|
||||
submitted_by TEXT -- worker name for traceability
|
||||
);
|
||||
|
||||
-- Index supporting sidecar's "pick next queued job" query
|
||||
CREATE INDEX IF NOT EXISTS idx_graphiti_jobs_queued
|
||||
ON graphiti_jobs (enqueued_at)
|
||||
WHERE status = 'queued';
|
||||
|
||||
-- Index supporting worker's "poll my job by id" query (PK already does this,
|
||||
-- but explicit index aids ANALYZE behavior on small tables)
|
||||
CREATE INDEX IF NOT EXISTS idx_graphiti_jobs_status
|
||||
ON graphiti_jobs (status);
|
||||
|
||||
-- Stage 3 queue gains a reference to the sidecar job processing the row.
|
||||
-- When set, worker polls graphiti_jobs.status rather than blocking on HTTP.
|
||||
-- NULL means: row not yet submitted, or pre-Pattern-1 row.
|
||||
ALTER TABLE stage_3_queue
|
||||
ADD COLUMN IF NOT EXISTS external_job_id UUID;
|
||||
|
||||
-- Index for "find rows that submitted but didn't complete" recovery scans
|
||||
CREATE INDEX IF NOT EXISTS idx_stage_3_queue_external_job
|
||||
ON stage_3_queue (external_job_id)
|
||||
WHERE external_job_id IS NOT NULL AND completed_at IS NULL AND failed_at IS NULL;
|
||||
Reference in New Issue
Block a user