Commit Graph

63 Commits

Author SHA1 Message Date
aaron d7b2a850c4 stage3_worker: v2.4 — encoder extraction instructions v1.0
Adds EXTRACTION_INSTRUCTIONS_V1 constant passed to the sidecar via
custom_extraction_instructions on both bulk and single-episode pathways.
graphiti-core inserts the text into entity and edge extraction prompts
only; it does NOT enter dedup prompts (that's the encoder-stays-naive
commitment).

Architectural posture: the encoder is content-naive. It does not draw on
prior knowledge of the user, the substrate, or the cycle's accumulated
work. Schema and personality live in the cycle's consolidated substrate
where the dream phase shapes them. The encoder produces source-grounded
ground truth for the cycle to work from.

Empirical validation in tonight's smoke test: 30+ verb-shaped predicates
from 3 chunks of real content, including IS_AUTOBIOGRAPHICAL_TO,
INFORMED_DESIGN_OF, EVALUATED_DOMAIN_PURITY, DISCONFIRMED_HYPOTHESIS_ABOUT.
Compare to default extraction's 4 predicate types across 22,289 edges.
RELATES_TO appears once as appropriate fallback rather than collapsing
everything generic.

Bumps WORKER_VERSION to 2.4.
2026-05-02 05:15:17 +00:00
aaron e7de7fb64b stage3_worker: v2.3 — bulk-vs-single-episode routing on Stage 2 state-type
Reads new routing columns from stage_3_queue (state_type, state_type_confidence,
supersedes_prior_state, state_type_rationale) and dispatches each row to one of
two ingest pathways:

  - BULK pathway (existing, renamed from ingest_to_graphiti to ingest_bulk):
    safer-cheaper default. Used when supersedes=false OR confidence=low OR
    routing fields are NULL (legacy rows). Skips edge invalidation per
    graphiti-core's bulk semantics.

  - SINGLE-EPISODE pathway (new, ingest_single_episode): used only when
    supersedes_prior_state=true AND confidence in {medium, high}. Per-chunk
    POST to /episodes (singular endpoint) with shared saga tag. Each call
    independent — own timeout, own retry envelope.

Routing decision isolated in should_route_single_episode() with unit-tested
truth table covering all eight (supersedes × confidence) combinations.

Per-chunk heartbeat (heartbeat_row): single-episode pathway updates
stage_3_queue.started_at after each successful chunk POST so a long-running
document doesn't cross the 10-minute stale threshold mid-process and get
re-dequeued. started_at semantics now: 'last activity timestamp' rather
than 'began at'. Best-effort; failures logged not raised.

Partial-success on chunk failure: previously-committed chunks stay in the
graph; the function raises with detail (single_episode_partial: chunk N/M
failed, succeeded K). The row is marked failed_at with that detail. Re-
ingestion would re-POST chunks 1..N-1 against the graph; graphiti's dedup
handles them as no-ops.

DB connection scoping: process_one no longer holds one Postgres connection
across the whole ingest call (which can run an hour for long single-episode
documents). Each DB write gets a short-lived connection.

Phase A item 3 of three. Closes the mechanical-patches block. Item 4
(custom_extraction_instructions text design) is the remaining intellectual
work; sidecar and worker plumbing is now ready for it.
2026-05-01 19:07:41 +00:00
aaron 70e87e3ab5 stage2_worker: v2.2 — add state-type classification for Stage 3 routing
Mistral pass now produces two concerns in a single flat JSON output:
  (a) orientation context (existing four fields, unchanged semantics)
  (b) state-type classification: state_type (current/reference/historical),
      state_type_confidence (low/medium/high), supersedes_prior_state (bool),
      state_type_rationale (text)

Routing fields written as explicit columns on stage_3_queue (separate
ALTER TABLE migration adds them: state_type, state_type_confidence,
supersedes_prior_state, state_type_rationale + index on supersedes).

Safe-cheap defaults on malformed Mistral output: state_type='reference',
confidence='low', supersedes=false. All defaults route to bulk pathway
(no temporal invalidation cost) so Mistral parse drift can't accidentally
trigger expensive single-episode ingest.

Phase A item 2 of three. Sidecar (item 1, commit 8b0a163) already plumbs
custom_extraction_instructions through to /episodes/bulk. Stage 3 routing
logic (item 3) follows.
2026-05-01 19:02:11 +00:00
aaron 8b0a163670 graphiti_service: expose custom_extraction_instructions on /episodes/bulk; add saga on /episodes
- BulkEpisodeRequest: new optional custom_extraction_instructions field
  with comment noting graphiti-core inserts it into extract_nodes/extract_edges
  prompts only, NOT dedupe prompts (verified by reading prompts directory)
- EpisodeRequest: new optional saga field, plumbed through to add_episode
  for upcoming Stage 3 single-episode pathway
- Both handlers use conditional kwargs construction so existing callers
  see no behavioral change

Phase A item 1 of three. Items 2 (stage2_worker) and 3 (stage3_worker) follow.
2026-05-01 18:57:31 +00:00
aaron 1a8e0353f5 stage3_worker: v2.2 — absolute sudo/systemctl paths, error logging, reset failure counter on recovery failure
Mirrors stage2_worker v2.1 (da98019) resilience fixes:
- Absolute paths for /usr/bin/sudo and /bin/systemctl
- Log stdout/stderr when sidecar restart fails
- Reset consecutive_failures even when wedge recovery fails (prevents
  permanent stuck state if restart itself is broken)
2026-05-01 18:40:25 +00:00
aaron da980193dd stage2_worker: v2.1 — terminal failure states + sudo path fix
Three classes of silent failure converted to clean terminal states:

- Mistral timeout: previously left rows in zombie state (started_at set,
  failed_at null, attempts incremented past retry threshold, row invisible
  to selection query). Now sets failed_at with reason
  'mistral_timeout_after_300s'. Surfaced 2026-05-01 when 17 documents
  accumulated in this state during the Stage 3 saga deadlock incident.

- Mistral parse failure: run_mistral returns {'error': 'parse_failed'} on
  JSON decode failure but process_one wasn't checking, so empty orientation
  ('Active frames: . Frame relationships: ...') was shipped to Stage 3.
  This is F22 from the 2026-04-30 code review. Now sets failed_at with
  reason 'mistral_parse_failure'.

- Wedge recovery hammering: consecutive_failures was only reset on
  successful Ollama restart. With the sudo path bug (also fixed here),
  recovery always failed, so every subsequent failure re-attempted restart.
  Now resets the counter regardless and logs the failure visibly.

Also: subprocess.run now uses absolute paths (/usr/bin/sudo,
/bin/systemctl) instead of relying on PATH, fixing the 'No such file or
directory: sudo' error that broke Stage 2's recover_wedge() since
deployment. F45-adjacent — sudoers entries were added 2026-05-01 but the
PATH issue was masking that fix.

Worker version bumped to 2.1 to match Stage 3's resilience patch level.
2026-05-01 17:28:53 +00:00
aaron b936931668 Stage 3 worker v2.1 — saga-size limit + wedge detection + sudoers fixes
Production incident 2026-05-01: F14 re-cascade attempt surfaced three
compounding issues in cascade resilience.

stage3_worker.py changes:
- MAX_CHUNKS_PER_SAGA=10 — large documents split into multiple bulk
  commits, all sharing the same saga tag for Graphiti document linking.
  Original implementation sent all chunks as one saga; 17-19 chunk sagas
  deadlocked sidecar's Python-side coordination.
- recover_wedge() function — restarts aaronai-graphiti.service when
  consecutive_failures hits threshold. Mirrors Stage 2 pattern.
- run() loop adds consecutive_failures counter with threshold-2
  escalation. Resolves F28 + F29 from code review.
- Worker version bumped 2.0 -> 2.1.
- post_bulk() helper extracts shared HTTP POST + error handling.

Outside-repo changes (system config, separately documented):
- WatchdogSec=600 commented in stage2 + stage3 systemd unit files.
  Workers have no sd_notify support; per-request timeouts in code
  handle the actual failure modes.
- /etc/sudoers.d/aaron-aaronai created with NOPASSWD entries for
  systemctl restart ollama and restart aaronai-graphiti.service.
  Stage 2's existing recover_wedge() was silently broken since
  deployment due to this gap.

.gitignore — added rules for *.bak files, runtime artifacts
(watcher_heartbeat, dreamer_state.json, corpus_integrity_report.json,
watcher_state.json, watcher_status.json), Python cruft, virtual env,
.env, editor/OS files, and Aaron AI runtime data (conversations.db,
sessions.db, memory.md, settings.json).

Untracked 11 files that shouldn't have been committed in 465f2f7
(this morning): backup files and runtime artifacts.

Re-cascading Shop Class (414KB) and BirdAI-Experiments-Log.md (192KB)
through the patched worker after re-extracting full text from disk.
Cascade in progress at commit time.
2026-05-01 05:18:09 +00:00
aaron 465f2f725b Code review fixes: CV pinning, F1 (excluded_sources), F14 (50KB truncation), F37
- api.py: strip CV pinning workaround (parity violation, see architecture doc)
- dream.py: F1 — retrieve_graphiti() now accepts excluded_sources, over-fetches
  3x and filters in-process. Was silently dropping the parameter; would have
  confounded E3 with broken cross-stage exclusion in Graphiti arm.
- watcher.py + ingest.py: F14 — drop full_text[:50000] truncation. Was
  propagating through entire cascade. Postgres TEXT can hold up to 1GB.
- corpus_integrity.py: F37 — same truncation, third path now clean.

Backups: api.py.bak.*, dream.py.bak.*, watcher.py.bak.*, ingest.py.bak.*,
corpus_integrity.py.bak.* timestamped pre-fix.

Re-cascaded Shop Class as Soulcraft (only already-cascaded source affected
by F14, 414KB).
2026-05-01 02:26:37 +00:00
aaron 25e42c0231 corpus_integrity.py: write unreadables with retry_count=0 so OCR can retry when it ships 2026-04-30 22:03:48 +00:00
aaron 7822fb1cc1 corpus_integrity.py: write unreadable files to ingest_failures for UI visibility 2026-04-30 21:59:06 +00:00
aaron 74e2c34f43 corpus integrity: ingest_failures tracking in watcher, reconciliation script, corpus status/retry/reconcile endpoints 2026-04-30 21:54:39 +00:00
aaron f11cacd9c9 add experiment scripts and results; watcher.py latest changes 2026-04-30 18:06:03 +00:00
aaron 1cf26df450 api.py: return error_type=transcription_failed on Whisper crash, frontend retry logic can now distinguish from network failures 2026-04-30 17:45:47 +00:00
aaron 7cd765146a stage3_worker.py: log sidecar response body on non-200 2026-04-30 17:37:28 +00:00
aaron 58515ebec0 graphiti_service.py: add traceback logging, log file handler for all endpoints 2026-04-30 17:36:19 +00:00
aaron 91166367fa E3: add Graphiti retrieval branch to dream.py, E3 experiment script with blinding 2026-04-30 17:17:28 +00:00
aaron 2b3c2380a0 watcher.py: in-process ingest, embedder loaded once at startup, startup recovery, heartbeat, no duplicate logging 2026-04-30 16:42:44 +00:00
aaron 2fb50cce71 ingest.py: guard Stage 2 enqueue behind SKIP_STAGE2_ENQUEUE env var for migration runs 2026-04-30 16:20:11 +00:00
aaron c08f57a6f2 stage2/3 workers: remove duplicate StreamHandler, stdout captured by systemd 2026-04-30 16:12:51 +00:00
aaron cae7fb8775 dream.py v1.1: score-band exclusion for Early REM, DREAMER_VERSION constant, manifest versioning 2026-04-30 15:51:11 +00:00
aaron b53717af5b dream.py: enrich manifest with retrieval breadth metrics 2026-04-30 06:14:55 +00:00
aaron 2b9a1782c1 feat: stage2/3 pipeline, taxonomy-free cascade, E1.8/E4 experiments, corpus migration state 2026-04-30 04:04:31 +00:00
aaron 62b5b5453a fix: max_coroutines=2, saga support in sidecar; stage3 chunking; TIMEOUT_MAX 0 persistent in falkordb compose 2026-04-30 04:01:02 +00:00
aaron 95d022ec64 fix: FalkorDriver database=aaron, build indices on correct graph 2026-04-29 21:34:20 +00:00
aaron d91a5675ff capture: public SSE endpoint for transcription completion events 2026-04-29 18:00:54 +00:00
aaron c42d898504 emit capture_saved SSE event when async transcription completes 2026-04-29 17:58:01 +00:00
aaron a05fcec882 async voice transcription — return immediately, whisper runs in background 2026-04-29 17:48:22 +00:00
aaron eb7cf3be10 upgrade whisper small -> large-v3, bump cpu_threads to 8 2026-04-29 17:35:03 +00:00
aaron 3f6c435be4 add client_time to chat context — user-supplied, not logged 2026-04-29 17:26:03 +00:00
aaron 21557790d9 capture: return error_type on transcription failure instead of HTTP 500 2026-04-29 17:04:56 +00:00
aaron 794e0aeddd update whisper prompt: add BirdAI stack terms, remove stale ChromaDB 2026-04-29 16:47:30 +00:00
aaron d271e17929 add sourcing constraint to system prompt, close hallucination gap 2026-04-29 16:37:39 +00:00
aaron 5d83fb7601 fix: load_dotenv override=True, option b source exclusion 2026-04-29 16:32:09 +00:00
aaron 83d4f60d0d option b: cross-night source exclusion in dream pipeline 2026-04-29 16:19:52 +00:00
aaron b6fe350ab2 experiments: add consistency test and briefing generator results + scripts 2026-04-28 02:47:41 +00:00
aaron 037d747573 chore: archive deprecated chromadb and migration scripts 2026-04-28 00:15:46 +00:00
aaron d5b5c2ec14 Graphiti sidecar service + SentenceTransformer embedder — self-hosted, no OpenAI dependency 2026-04-27 18:21:22 +00:00
aaron 4ee2567400 Add SentenceTransformer embedder for Graphiti — self-hosted, no OpenAI dependency 2026-04-27 18:18:37 +00:00
aaron a1f732fc9e Dreamer: manifest writer, Late REM v1.2 (remove coherence pull) 2026-04-27 16:54:18 +00:00
aaron 03b3f012c3 Dreamer: prompt versioning, Early REM v1.1, prompt signature in headers 2026-04-27 16:50:21 +00:00
aaron 6776637178 Remove hardcoded PG password fallbacks — require PG_DSN env var in all scripts 2026-04-27 05:16:37 +00:00
aaron a1f5c1049a Fix dreamer status display, watcher excludes Media/, remove NVM debt item 2026-04-27 05:08:01 +00:00
aaron d3239aba17 Image capture — extend /api/capture for image+voice, Claude vision description, Media/ WebDAV, watcher excludes Media/ 2026-04-27 04:28:31 +00:00
aaron ef2fddc47f Redesign dreamer — interdependent pipeline, NREM→Early REM→Late REM→Synthesis 2026-04-26 23:41:24 -04:00
aaron 7af246ac01 APScheduler — replace systemd timers, in-process dream and ingest scheduling 2026-04-27 03:04:33 +00:00
aaron 9b312d936f Add SSE endpoint and dream notify — /api/events and /api/events/notify 2026-04-27 02:20:50 +00:00
aaron 9088b5643d Add /api/dreamer/status and /api/dreamer/run endpoints 2026-04-27 01:27:09 +00:00
aaron a07de922df Add /api/capture and /api/captures endpoints — auth-free, WebDAV delivery to Journal/Captures/ 2026-04-26 22:39:55 +00:00
aaron 8c8fba11b8 Add nightly conversation indexing — Aaron AI conversations into pgvector at 2:30AM 2026-04-26 21:28:40 +00:00
aaron f78b83042b Migrate to pgvector — remove ChromaDB from api.py, ingest scripts, dream.py 2026-04-26 21:16:04 +00:00