aaron da980193dd stage2_worker: v2.1 — terminal failure states + sudo path fix
Three classes of silent failure converted to clean terminal states:

- Mistral timeout: previously left rows in zombie state (started_at set,
  failed_at null, attempts incremented past retry threshold, row invisible
  to selection query). Now sets failed_at with reason
  'mistral_timeout_after_300s'. Surfaced 2026-05-01 when 17 documents
  accumulated in this state during the Stage 3 saga deadlock incident.

- Mistral parse failure: run_mistral returns {'error': 'parse_failed'} on
  JSON decode failure but process_one wasn't checking, so empty orientation
  ('Active frames: . Frame relationships: ...') was shipped to Stage 3.
  This is F22 from the 2026-04-30 code review. Now sets failed_at with
  reason 'mistral_parse_failure'.

- Wedge recovery hammering: consecutive_failures was only reset on
  successful Ollama restart. With the sudo path bug (also fixed here),
  recovery always failed, so every subsequent failure re-attempted restart.
  Now resets the counter regardless and logs the failure visibly.

Also: subprocess.run now uses absolute paths (/usr/bin/sudo,
/bin/systemctl) instead of relying on PATH, fixing the 'No such file or
directory: sudo' error that broke Stage 2's recover_wedge() since
deployment. F45-adjacent — sudoers entries were added 2026-05-01 but the
PATH issue was masking that fix.

Worker version bumped to 2.1 to match Stage 3's resilience patch level.
2026-05-01 17:28:53 +00:00
2026-04-25 02:05:42 +00:00
S
Description
No description provided
12 MiB
Languages
Python 95.9%
HTML 3.7%
Shell 0.4%