aaron 7f07972109 stage2_worker: ON CONFLICT clause resets all run-state fields on re-enqueue
Bug: when a row in stage_3_queue gets re-enqueued (same source ingested
again after Stage 2 re-runs), the ON CONFLICT (source) DO UPDATE clause
updated content fields and reset enqueued_at, completed_at, failed_at,
attempts — but did not reset started_at, failure_reason, or
external_job_id.

Stale started_at from a prior attempt makes the row invisible to the
Stage 3 worker's claim filter (which uses started_at IS NULL). The row
sits queued forever; Stage 3 never picks it up; the source effectively
fails silently after a re-trigger.

Discovered tonight while testing the bulk pathway after the substrate
fix: a journal entry that had been ingested earlier (and manually marked
completed during recovery from a worker timeout) showed enqueued_at
from the new touch but started_at from the original 01:40 attempt. Fix
extends the upsert clause to NULL all run-state fields so re-enqueue
behaves as 'fresh attempt.'

After fix, re-triggered journal entry routed cleanly through Stage 2 →
Stage 3 → bulk pathway → sidecar bulk job → 60ms commit (worst-case
dedup against already-known content).
2026-05-02 05:20:14 +00:00
2026-04-25 02:05:42 +00:00
S
Description
No description provided
12 MiB
Languages
Python 95.9%
HTML 3.7%
Shell 0.4%