# BirdAI Component Inventory — 2026-05-02 *Track 1 stabilization, deliverable 1. Read-only investigation.* **Repo state:** HEAD `7615ded` (NREM exclusion fix) on baseline `1a8e035`. Last night's experimental work was reverted. **Method:** Each component classified Working / Working-degraded / Broken / In-flight / Experimental / Stopped / Deprecated, with last-touched date from `git log -1`, dependencies, dependents, and a behavior-vs-intent column comparing observed code against `aaronai-architecture.md` and `aaronai-architecture-reframe-2026-05-01.md`. **A note on terminology.** "Behavior matches intent" is read against two intent surfaces: (1) the architecture doc as written, which still frames graphiti as the target memory layer, and (2) the reframe doc, which supersedes parts of the architecture doc and which the bespoke decision now extends. Where the two diverge, the reframe is treated as canonical for purposes of this inventory; the architecture-doc-only divergences are flagged separately. --- ## Findings summary This inventory's most useful work is identifying mechanisms that are running silently, without errors, while doing something the architecture didn't ask for. The 2026-05-02 NREM exclusion bug had that exact shape: NREM was excluding prior traces, the dreamer logged "completed," files appeared on schedule, and the architecture's stated commitment (NREM is replay-and-consolidation) was being violated invisibly. Track 1's job is to find the rest of those before they accumulate. ### Top-priority NREM-shaped divergences (working, but doing something the architecture didn't request) These are the items most worth reading the linked phase entries for. They are ranked by potential impact on Track 1 or on subsequent E6-class work. 1. **`dream.py` cumulative cross-night exclusion (500-cap).** Phase 1, `dream.py`. Early REM and Late REM exclude up to 500 prior sources accumulated across nights. On a 1,200-source corpus this hides ~40% of the corpus from those modes after the cap fills, and trims to 400 only when overflowing — a churn pattern, not an architectural choice. The architecture and reframe specify session-scoped novelty; cumulative-across-nights exclusion is nowhere documented. Same shape as the NREM bug — a deduplication mechanism running silently, the architecture didn't request, and nobody noticed. **This is the highest-priority finding from the inventory.** 2. **`api.py /api/corpus/retry` reintroduces 50KB truncation.** Phase 1, `api.py`. The F14 fix removed truncation from `watcher.py`, `ingest.py`, and `corpus_integrity.py` on 2026-05-01. The retry endpoint at line 1074 still writes `text[:50000]`. Clicking "Retry" on an ingest-failed file in the SettingsPanel re-introduces exactly the bug F14 fixed. Working without errors; doing the wrong thing. 3. **`aaronai-stage3.service` is `enabled` while `inactive`.** Phase 2. The session brief says Stage 3 is stopped manually. The unit is `enabled`, so on next reboot the worker auto-starts and resumes processing the `stage_3_queue` rows that Stage 2 has been adding. The "stopped" state is paper-thin. `systemctl disable` would harden it; nobody has done that yet. 4. **Stage 2 keeps enqueuing to `stage_3_queue` while Stage 3 is off.** Phase 3. As of inventory time, 6 pending rows sit in `stage_3_queue`, last enqueued 2026-05-02 22:33 UTC. The queue grows until Stage 3 is restarted (and then catches up) or stopped at the producer. Nothing is broken — but the system is doing work whose output sits unconsumed. 5. **`embeddings.type` NULL for 71% of rows; `embeddings.created_at` text-typed and NULL for 87%.** Phase 3. The architecture treats these fields as load-bearing for "type-aware retrieval" and "temporal awareness." In production, most chunks lack both. Retrieval still works because nothing routes on either field. The doc's commitment and the data shape disagree, invisibly to anyone not querying the schema. 6. **`graphiti_jobs` documented as "empty" but holds 9 rows from the 2026-05-02 experimental run.** Phase 3. Current-state doc explicitly says "exists, empty (or near-empty)." Reality: 6 failed, 3 committed, all from the rolled-back code. Inert (no current code reads or writes), but the rollback narrative is incomplete on this point. 7. **`aaronai-maintenance.service` references ChromaDB.** Phase 2. The unit invokes `chops hnsw rebuild --path ~/aaronai/db --collection aaronai`. ChromaDB was retired 2026-04-26. `chops` is not in the venv. The `~/aaronai/db/` directory still exists with a ChromaDB sqlite. Saved from doing damage only because its timer is not enabled. A clean-room reading of `/etc/systemd/system/` would suggest BirdAI is still on ChromaDB. 8. **`aaronai-dreamer.service` hardcodes `--mode nrem`.** Phase 2. Production scheduling fires `dream.py` with no flag (default = full pipeline). The systemd entry-point is the historical "manual NREM" wrapper. Any future maintainer running `systemctl start aaronai-dreamer.service` from the shell expects "the dreamer" and gets only NREM. 9. **`dream_mode` setting in api.py defaults is silently ignored by the scheduler.** Phase 4. Setting in `DEFAULT_SETTINGS`, mergeable into `settings.json`, used by `update_settings` to decide whether to reschedule. Not actually read by `run_dream_job`. A configurable scheduling parameter that has no effect. 10. **Watcher-restart cron line uses sudo not in the sudoers file the session brief documents.** Phase 5. The 2026-05-01 sudoers fix listed `restart ollama` and `restart aaronai-graphiti.service`. The watcher-restart cron line uses `sudo systemctl restart aaronai-watcher`. Either there's an additional sudoers entry the brief doesn't mention, or this watchdog has been silently failing every fire. Worth checking `/var/log/aaronai/watcher-cron.log` (out of scope for this read-only inventory). 11. **`prompt_hash()` in `dream.py` hashes function `__doc__` strings, but none of the synth functions have docstrings.** Phase 1, `dream.py` notes (folded into the "F8" reference). The hash is deterministic across all dreams (always the MD5 of `""`). This is the architecture-doc tech-debt item F8 ("`prompt_hash` broken") confirmed in code: the manifest field meant to "catch undeclared drift" carries a constant value. Same shape as NREM: a mechanism present, running, doing something the architecture-stated purpose explicitly denies. 12. **Two parallel scheduling stacks.** Phase 5. APScheduler in `api.py` and three dormant `aaronai-*.timer` files. The dormant ones aren't firing, so no actual harm. The presence makes "what triggers the dream" harder to answer than it should be. ### Cross-cutting findings (not necessarily NREM-shaped) - **The `scripts/` directory mixes 11 production files with 32 experimental scripts and ~20 `.bak` files.** Reading the directory it is hard to tell at-a-glance what is live. Track 1 cleanup candidate: move experimental files to `experiments/` (which already exists with a few) or `deprecated/`, and delete `.bak*` (git history is the durable record). This is mostly cosmetic but makes future inventories easier. - **Two implementations of Stage 1 (F11) confirmed.** `watcher.py:ingest_file` and `ingest.py:ingest_file` (and `corpus_integrity.py:extract_text_for_retry` plus the api.py retry path) all reimplement extract-chunk-embed-write. The architecture doc records this as known tech debt; the inventory verifies all four call sites still drift. - **The bespoke decision dissolves several components without removing them.** `consolidator_v0_1.py`, `tier1_migration.py`, `graphiti_service.py`, `stage3_worker.py`, both Stage 3 unused-column sets in `stage_3_queue`, `graphiti_jobs` table, the experiment scripts. None is actively harmful in current state; collectively they make the bespoke direction harder to read out of the codebase. Track 1 stripping is the right venue for these. - **Memory-and-state fan-out.** The system has at least 7 distinct files outside the database that hold state: `dreamer_state.json`, `watcher_state.json`, `watcher_status.json`, `watcher_heartbeat`, `corpus_integrity_report.json`, `tier1_migration_state.json`, `settings.json`, plus two sqlite DBs (`conversations.db`, `sessions.db`) and a markdown file (`memory.md`). Bespoke design will likely consolidate. ### What looks fine The watcher (`watcher.py` + `aaronai-watcher.service`) is a clean Stage 1 that matches the architecture doc and the parity principle exactly. The capture endpoint works as documented. The `ingest_failures` table reflects exactly the 129 unreadable files the architecture doc cites. The frontend route surface is minimal and entirely backed. The 2026-05-01 worker patches (saga-size limit, wedge detection, sudoers, no `WatchdogSec` without `sd_notify`) are visible and correct in code. The NREM exclusion fix is in place and the manual run on 2026-05-02 21:34 UTC produced a real dream. ### Where I am uncertain - I did not read the watcher-cron.log, sudo configuration, or systemd journal directly. The "sudo for `aaronai-watcher` restart" question (Phase 5 / divergence #10) is based on the session brief's stated sudoers contents only. - I did not exhaustively read each of the 32 experimental scripts. I read enough of each (header docstring) to classify; deep behavioral inspection of these is unnecessary for Track 1 but means I cannot rule out additional NREM-shape divergences inside them. - I did not deep-read frontend components (`~/aaronai-web/components/`). Per Phase 6 scope. - The session brief says Stage 3 is "stopped manually." I confirmed `systemctl is-active aaronai-stage3.service = inactive`. I did not confirm via `journalctl` when it was stopped — but the inventory doesn't need that, only the current state. --- ## Updates — 2026-05-03 session *Layered updates from Track 1 improvement work on 2026-05-03. The 2026-05-02 inventory above is preserved as a point-in-time snapshot; corrections and resolutions are recorded here with provenance.* ### Resolved - **NREM-shape divergence #1 (cumulative cross-night exclusion 500-cap, `dream.py`) — RESOLVED.** Replaced cumulative `retrieved_sources` with session-scoped novelty. Early REM now excludes only NREM high-scorers from the current session; Late REM excludes the current session's NREM ∪ Early REM. Legacy `retrieved_sources` key cleared from `dreamer_state.json`. Verification: post-fix dream-manifest source count rose to 24 (vs. 13 / 16 on the two prior comparable runs) — the previously-hidden ~40% of corpus is now reachable to Early/Late REM as the architecture and reframe specify. NREM exclusion fix from 2026-05-02 preserved. ### Corrections to existing findings - **`stage2_metadata` location (Phase 1, `stage2_worker.py`):** the metadata column lives on `stage_3_queue.stage2_metadata` (jsonb), **not on `stage_2_queue`**. `stage_2_queue` has only basic queue fields (`id, source, full_text, char_length, timestamps, failure_reason, attempts`). The 2026-05-02 entry implied otherwise. Corrected via direct schema inspection on 2026-05-03. - **Stage 2 char_length gate (Phase 1, `stage2_worker.py`):** the `char_length < 2000` check at line 139 runs *before* the Mistral call at line 149. For sub-2000-char docs, Mistral is **never invoked** — the worker logs `Processing → Skipping Stage 3 → completed_at = NOW()` with no Mistral pass between them. The earlier framing of "documents under 2000 chars skip Stage 3" was correct as written, but the implied "Stage 2 produces orientation metadata for everything" architecture commitment is not what the code does. 339 of 1,041 completed Stage 2 docs (33%) have **no frame data extracted at all**, not "frame data extracted then discarded." ### New findings from 2026-05-03 frame analysis (Improvement #3) - **`ingest_conversations.py` bypasses Stage 2 entirely.** 198 distinct conversation sources (`Claude:`, `ChatGPT:`, `Aaron AI:`, plus `type='aaronai_conversation'`) write directly to pgvector `embeddings` and never enter `stage_2_queue`. Conversations have **zero frame coverage by design**, not by accident. Combined with the 339-doc char-gate exclusion and 12 Stage 2 failures, **only 56% of the embeddings corpus has any frame data**. Same NREM shape — a routing decision the architecture didn't explicitly request, doing something silently that the architecture's "Stage 2 produces orientation for everything" commitment denies. - **Voice notes (14) and dream outputs (39) are systematically excluded from the frame system.** Within the 339-doc <2000-char gap: all 14 voice notes and all 39 dreamer-output files (NREM, Early REM, Late REM, synthesis markdown) are present. Voice is one of Aaron's primary capture channels. Dream outputs are the dreamer's own reflection. Both are silent to the frame system that orients downstream extraction — meaning the dreamer cannot frame-condition on its own output. Same NREM shape as the others. - **File-type × frame stratification signal exists and is currently unused** (cross-link to Phase 3 `embeddings.type` finding). The 2026-05-03 frame analysis (`docs/stage2-frame-analysis-2026-05-03.md` §5) shows that within frame-extracted docs, "Programming" pivots to pptx (n=15), "Application" pivots to pdf (n=13), Education spreads across pdf+docx — file type adds discriminating signal to frame routing. Currently `embeddings.type` is NULL for 71% of rows; backfilling it (Improvement #2, not yet applied) would make this stratification queryable at retrieval time instead of reverse-engineerable from filenames. ### Artifacts produced 2026-05-03 - **Code change:** `scripts/dream.py` (Improvement #1). - **New SQL view:** `stage2_frames_v` (over `stage_3_queue.stage2_metadata`; `CREATE OR REPLACE`, idempotent, drop with `DROP VIEW stage2_frames_v;`). - **New analysis script:** `scripts/experiments/frame_distribution_report.py` (read-only). - **JSON sidecar:** `experiments/frame_distribution_2026-05-03.json`. - **Report:** `docs/stage2-frame-analysis-2026-05-03.md`. --- ## Phase 1 — Scripts Inventory of every file under `~/aaronai/scripts/` (and `~/aaronai/scripts/experiments/`). `.bak*` files are listed at the bottom of the section but not individually documented; they are point-in-time snapshots from the rollback work and are not part of any active code path. ### `api.py` - **Path:** `scripts/api.py` - **Status:** Working - **Last-touched:** 2026-05-01 - **What it does:** FastAPI backend on port 8000. Hosts the chat endpoint (`/api/chat`), session-based auth (`/auth/login`, `/auth/logout`, `/auth/check`), conversation CRUD, settings panel API, memory editor, status endpoint, audio transcription via faster-whisper `large-v3`, capture endpoint (voice and image+voice), dreamer-status and dreamer-run, corpus-integrity status / retry / reconcile, and SSE streams for both authenticated dreamer notifications and the public capture page. Embeds an APScheduler `BackgroundScheduler` that drives the nightly dream cycle and conversation ingest. Loads SentenceTransformers `all-MiniLM-L6-v2` and the Anthropic SDK at startup. Auth is a session token in a 30-day cookie backed by `sessions.db` (sqlite). Conversations and messages are in `conversations.db` (sqlite). Document retrieval is pure cosine similarity over pgvector (top-8, threshold 0.3) — the CV-pinning workaround was stripped 2026-04-30. - **Dependencies:** `.env` (`PG_DSN`, `ANTHROPIC_API_KEY`, `AARON_AI_PASSWORD`, `NEXTCLOUD_*`); `~/aaronai/conversations.db`, `~/aaronai/sessions.db`, `~/aaronai/memory.md`, `~/aaronai/settings.json`, `~/aaronai/watcher_status.json`, `~/aaronai/watcher_state.json`, `~/aaronai/dreamer_state.json`, `~/aaronai/corpus_integrity_report.json`; PostgreSQL (`embeddings`, `stage_2_queue`, `ingest_failures`); SentenceTransformer model files; faster-whisper model files; the `dream.py`, `ingest.py`, and `corpus_integrity.py` scripts which it shells out to; Nextcloud WebDAV. Runs as `aaronai.service`. - **What depends on it:** Frontend (`aaronai-web` Next.js) consumes every `/api/*` endpoint; mobile capture layer consumes `/api/capture` and `/api/captures/events`; `dream.py` POSTs to `/api/events/notify` to push SSE to the frontend; the APScheduler embedded in this process is the only thing that triggers the nightly dream cycle and the nightly conversation ingest in production. - **Behavior matches intent?** Partial. Pure-similarity retrieval matches the post-2026-04-30 architecture statement. The `chat` function ignores `client_time` for memory retrieval purposes (just inserts it into the prompt), which is consistent with the doc. Two divergences worth flagging: 1. `/auth/check` references `SESSIONS` (line 385) which is undefined — this is dead code (no `SESSIONS` set/dict exists in the file). Auth checking on the frontend evidently relies on the cookie being present rather than this endpoint working; a request would `NameError` 500. Likely a leftover from an earlier in-memory session implementation that was migrated to sqlite without removing the check. 2. `transcribe_and_save()` (the background voice capture path, line 670) does NOT save the raw audio file to `Journal/Media/` — only the transcript markdown to `Journal/Captures/`. The architecture doc's "Multimedia Ingest Pipeline" describes `Journal/Media/YYYY-MM/` as the raw-ground-truth location for all captured media. The image+voice path does write image bytes to Media, but voice-only does not. A future Late REM "raw images during synthesis" feature listed as "not yet built" in the architecture doc relies on Media existing, but for voice this means the audio is gone after transcription. Flagged. - **Notes:** APScheduler is created at module import (`scheduler = BackgroundScheduler()` at line 1105) and started in the lifespan. Stage 3 worker code is not invoked from here. The `/api/reindex` endpoint shells out to `ingest.py` which still writes to pgvector and (since `SKIP_STAGE2_ENQUEUE` is unset by default) re-enqueues to `stage_2_queue` — meaning a reindex can put files back through Stage 2 and Stage 3, which under the bespoke decision is no longer the desired path. The retry endpoint at `/api/corpus/retry` writes `text[:50000]` to `stage_2_queue` (line 1074) — reintroducing the 50KB truncation pattern that F14 fixed elsewhere. **NREM-shape divergence: the truncation cap was removed from `watcher.py`, `ingest.py`, and `corpus_integrity.py` per the F14 fix on 2026-05-01, but `api.py` retry path was not patched.** ### `dream.py` - **Path:** `scripts/dream.py` - **Status:** Working (post NREM-fix) - **Last-touched:** 2026-05-02 - **What it does:** The Active Inference engine. Provides the nightly pipeline (NREM → Early REM → Late REM → Synthesis) and a single-mode CLI entry-point. Each stage retrieves chunks from pgvector (or Graphiti when `DREAMER_SUBSTRATE=graphiti`), prompts Claude Sonnet, writes a markdown file to Nextcloud `Journal/Dreams/` via WebDAV, and feeds its output as context into the next stage. Pipeline writes a per-night manifest JSON. Lucid mode is the on-demand path used by Settings → Dream Now. State persisted in `~/aaronai/dreamer_state.json`; cumulative `retrieved_sources` capped at 500, trimmed to 400 on overflow. Score-band Early-REM exclusion (v1.1) preserved. The 2026-05-02 NREM exclusion fix is at line 478: `nrem_chunks = retrieve("nrem", excluded_sources=None)`. - **Dependencies:** `.env` (`PG_DSN`, `ANTHROPIC_API_KEY`, `NEXTCLOUD_*`); `pgvector` `embeddings` table (or graphiti sidecar `/search`); SentenceTransformer `all-MiniLM-L6-v2` (re-loaded inside `retrieve()`); `~/aaronai/dreamer_state.json`, `~/aaronai/watcher_state.json`, `~/aaronai/conversations.db`; Anthropic API; Nextcloud WebDAV; for SSE notify, the running `api.py` on `localhost:8000`. - **What depends on it:** APScheduler in `api.py` shells out to it nightly; `/api/dreamer/run` shells out for on-demand runs; `aaronai-dreamer.service` (Type=oneshot) wraps it for manual invocation; `e3_dreamer_substrate.py` invokes it under `DREAMER_SUBSTRATE=graphiti`. - **Behavior matches intent?** Yes for NREM (post-fix matches reframe's replay-and-consolidation framing); yes for Early REM and Late REM (still consult `previously_retrieved`, which the reframe permits as novelty bias); partial for Synthesis (no substrate mutation, which is fine under the architecture doc but is exactly what the reframe says is missing for E6 to work); "lucid" is implemented even though architecture doc lists Lucid mode as "not yet built" (the function exists and is reachable from the CLI/API). - **Notes:** `retrieve_graphiti()` accepts and applies `excluded_sources` (the F1 fix), but the over-fetch is `n_results * 3` and the post-filter is in-process. Dreamer falls back gracefully to empty when sidecar fails. **NREM-shape divergence candidate: the dreamer's exclusion-set state is *cumulative across all nights*, capped at 500 — every Early REM and Late REM excludes up to 500 prior sources. On a corpus of 1,200 sources this is ~40% of the corpus permanently invisible to Early/Late REM after the cap fills. The architecture doc and reframe don't specify cumulative-across-nights exclusion; they specify session-scoped novelty. The bug shape is the same as the NREM exclusion bug — a deduplication mechanism functioning silently in a way the architecture didn't request.** Flagged. ### `watcher.py` - **Path:** `scripts/watcher.py` - **Status:** Working - **Last-touched:** 2026-05-01 - **What it does:** Stage 1 of the encoding pipeline. Watches `/home/aaron/nextcloud/data/data/aaron/files` recursively via watchdog. Loads SentenceTransformer `all-MiniLM-L6-v2` once at startup. On modify/create/move/close events, debounces 120s, then chunks (500-word with 50-word overlap), embeds, and writes to pgvector `embeddings`. Enqueues full text to `stage_2_queue` unless `SKIP_STAGE2_ENQUEUE` is set. Records extraction or pgvector failures to `ingest_failures` and resolves them on success. Heartbeat written every loop tick to `~/aaronai/watcher_heartbeat`. Status JSON written to `~/aaronai/watcher_status.json`. Startup recovery scans for files with changed mtimes since last run. `on_moved` checks `dest_path` (Nextcloud writes `.part` then renames), `on_closed` belt-and-suspenders. - **Dependencies:** `.env` (`PG_DSN`); pgvector; SentenceTransformer; `pypdf`, `python-docx`, `python-pptx`; watchdog; `~/aaronai/watcher_state.json`. Runs as `aaronai-watcher.service`. - **What depends on it:** Anything that reads from pgvector `embeddings` (api.py chat, dream.py retrieval, tier1_migration.py); anything that polls `stage_2_queue` (stage2_worker); `corpus_integrity.py`; the watcher heartbeat is consumed by an external cron monitor mentioned in tech-debt. - **Behavior matches intent?** Yes against the architecture's Stage 1 description and the parity principle (no filtering, no decisions). The full-text path no longer truncates to 50KB. Under the bespoke decision the Stage 2 enqueue path is on the chopping block; it is currently still active and runs by default. - **Notes:** No truncation in `enqueue_stage2()`. `Admin/Backups` and `Journal/Media/` are excluded from indexing per the architecture's File Management Policy. `SKIP_STAGE2_ENQUEUE` env var is the documented kill-switch for migration runs. ### `ingest.py` - **Path:** `scripts/ingest.py` - **Status:** Working-degraded (functional but architecturally redundant) - **Last-touched:** 2026-05-01 - **What it does:** Bulk folder ingester. Loads SentenceTransformer at module import, walks a folder, extracts text, chunks, embeds, writes to `embeddings`, and (unless `SKIP_STAGE2_ENQUEUE`) enqueues to `stage_2_queue`. Invoked by `api.py`'s `/api/reindex` endpoint with `NEXTCLOUD_PATH` as argument. CLI default target is `~/aaronai/docs`. - **Dependencies:** Same as `watcher.py` minus watchdog. `.env`, pgvector, SentenceTransformer. No service unit — invoked on demand only. - **What depends on it:** `api.py` `/api/reindex` button; the architecture's tech-debt entry mentions `ingest_chatgpt.py` and `ingest_claude.py` (manual one-shot scripts) but neither of those files is present in `scripts/` — so the only live caller is `/api/reindex`. - **Behavior matches intent?** Partial. The architecture doc has it as one of four ingest scripts in the Layer 1 table. Only this file and `ingest_conversations.py` exist. The chunk-embed-store flow still matches Stage 1 intent. The Stage 2 enqueue side effect (running every reindex) is a wide blast radius — clicking "Re-index" puts every changed file back through cascade, which under the bespoke decision is increasingly unwanted work. - **Notes:** Almost the entire chunk/embed/extract code path is duplicated verbatim with `watcher.py`. The architecture's tech-debt entry F11 (two implementations of encoding pipeline) is real — visible side-by-side. Both scripts call their own `enqueue_stage2()` defined inline; both call SentenceTransformer at import (model is loaded twice if both are imported in the same process, which only happens during unusual import patterns). ### `stage2_worker.py` - **Path:** `scripts/stage2_worker.py` - **Status:** Working - **Last-touched:** 2026-05-01 - **What it does:** Polls `stage_2_queue` for rows with no `completed_at`/`failed_at` and `attempts < 3`. Sends document to local Mistral (`mistral:latest` via Ollama on port 11434) with a taxonomy-free prompt that returns four fields: `active_frames`, `frame_relationships`, `extraction_orientation`, `one_sentence_summary`. Documents under 2000 chars skip Stage 3 and are marked complete. Otherwise builds an orientation string and enqueues `stage_3_queue` with `(source, full_text, orientation, stage2_metadata)`. Wedge recovery: 2+ consecutive failures triggers `sudo systemctl restart ollama`. Logs to `/var/log/aaronai/stage2.log`. Heartbeat at `/var/log/aaronai/stage2-heartbeat`. Worker version 2.1. - **Dependencies:** `.env` (`PG_DSN`); Ollama on `localhost:11434`; `mistral:latest` model loaded; passwordless sudo for `/bin/systemctl restart ollama` (per `/etc/sudoers.d/aaron-aaronai`); PostgreSQL `stage_2_queue` and `stage_3_queue` tables. Runs as `aaronai-stage2.service`. - **What depends on it:** Anything that reads `stage_3_queue.completed_at` (corpus_integrity, api.py corpus status); Stage 3 worker as the queue consumer. - **Behavior matches intent?** Partial under the reframe. The taxonomy-free prompt matches the Stage 3.1 research direction the architecture doc described. Under the bespoke decision the entire Stage 2/3 pipeline is being re-evaluated; the worker itself is doing what it was redesigned to do. - **Notes:** `recover_wedge()` calls absolute `/usr/bin/sudo` and `/bin/systemctl` paths (per the v2.1 patch). No `WatchdogSec`-driven SIGKILL pattern (commented out in the systemd unit per the 2026-05-01 fix). Mistral parse-failure is detected and surfaces as `failure_reason='mistral_parse_failure'`. `RETRY_ATTEMPTS = 2` plus the original attempt = 3 max attempts before the row is dead; this matches the worker's SQL `attempts < %s` with `RETRY_ATTEMPTS + 1`. ### `stage3_worker.py` - **Path:** `scripts/stage3_worker.py` - **Status:** Stopped (per session brief — service stopped manually 2026-05-02; code is unchanged) - **Last-touched:** 2026-05-01 - **What it does:** Polls `stage_3_queue` for rows ready to process. For each, chunks document at 500-word boundaries (matching Stage 1), and POSTs to graphiti sidecar `/episodes/bulk`. Three paths by document size: (a) <1500 chars → single episode, no saga; (b) ≤10 chunks → single bulk commit with a saga tag; (c) >10 chunks → split into batches of 10 each, all tagged with the same saga so graphiti links them as one document unit. Wedge recovery: 2+ consecutive failures triggers `sudo systemctl restart aaronai-graphiti.service`, then waits 45s for sentence-transformers + BGE reranker + graphiti to re-init. Worker version 2.2. - **Dependencies:** `.env` (`PG_DSN`); graphiti sidecar on `localhost:8001`; passwordless sudo for `/bin/systemctl restart aaronai-graphiti.service`; PostgreSQL `stage_3_queue`. Runs as `aaronai-stage3.service`. - **What depends on it:** `corpus_integrity.py` reads `stage_3_queue.completed_at` to compute "Graphiti-side" coverage; `api.py`'s `/api/corpus/status` does the same. - **Behavior matches intent?** No, against the bespoke decision. The architecture doc describes Stage 3 as the cascade ingest path into graphiti; the bespoke decision dissolves that path. The code itself does what it was patched to do (saga splitting, wedge detection, sudoers). What it represents — feeding documents into a graphiti substrate — is no longer the architectural target. - **Notes:** Service is stopped per the session brief, but `stage_3_queue` rows continue to be created by `stage2_worker.py`, so the queue grows monotonically while the consumer is off. This is fine for the rollback baseline (no new rows of consequence with cascade prompts in the rolled-back form), but is worth flagging in case the watcher picks up new files. Uses the absolute `/usr/bin/sudo` and `/bin/systemctl` paths (v2.2 patch). `start` and `end` chunk indices are 1-based in the saga-batch logging — cosmetic only. ### `graphiti_service.py` - **Path:** `scripts/graphiti_service.py` - **Status:** Working (per the session brief; will be deprecated when bespoke substrate replaces graphiti) - **Last-touched:** 2026-04-30 (commit), 2026-05-02 (working-copy mtime — same content, file was rewritten then reset during rollback) - **What it does:** FastAPI sidecar on port 8001. Wraps `graphiti-core` to avoid asyncio event loop conflicts in the main FastAPI process. Single graphiti instance built in lifespan, closed on shutdown. Endpoints: `/health`, `POST /episodes` (single), `POST /episodes/bulk` (with optional `saga` link), `GET /search`. Uses `SentenceTransformerEmbedder` from `st_embedder.py` and `BGERerankerClient` from graphiti-core. `FalkorDriver` connects to FalkorDB at `localhost:6379` database `aaron`. LLM provider switchable via env (`anthropic` default → `claude-sonnet-4-6`). `max_coroutines=2`, `EMBEDDING_DIM=384`. Hard-coded group default `aaron`. - **Dependencies:** `.env` (`ANTHROPIC_API_KEY` or `LLM_API_KEY`, `LLM_PROVIDER`, `LLM_MODEL`, `FALKORDB_HOST`, `FALKORDB_PORT`, `GRAPHITI_GROUP_ID`); FalkorDB Docker container on `127.0.0.1:6379`; graphiti-core 0.29.0 in venv; sentence-transformers, BGE reranker. Runs as `aaronai-graphiti.service`. - **What depends on it:** `dream.py` `retrieve_graphiti()` (only when `DREAMER_SUBSTRATE=graphiti`); `stage3_worker.py` posts to it; `tier1_migration.py` posts to it; the bulk cost-test scripts post to it; `e3_dreamer_substrate.py` queries it; `e1_8_taxfree_cascade.py` and `e1_9_retroactive.py` post or query. - **Behavior matches intent?** Yes against the architecture doc. Under the bespoke decision this whole sidecar is the layer being replaced; the doc still says it's the target memory layer. - **Notes:** `add_episode_bulk()` is called with `saga=req.saga or None` — the saga param is what stage3_worker uses to link split-batch chunks. Result body returns `{"ok": true, "count": N}` rather than the underlying graphiti return value. Logs full traceback to `/var/log/aaronai/graphiti-sidecar.log` (the 2026-04-30 fix). ### `corpus_integrity.py` - **Path:** `scripts/corpus_integrity.py` - **Status:** Working - **Last-touched:** 2026-05-01 - **What it does:** Three-way reconciliation. Compares filesystem (Nextcloud), pgvector (`embeddings.source`), and graphiti (`tier1_migration_state.json` ingested list ∪ `stage_3_queue.completed_at IS NOT NULL` source list). Reports counts in each set, and gaps (in filesystem but neither pgvector nor graphiti). With `--fix`, attempts text extraction on each gap file and either enqueues to `stage_2_queue` (full text, no truncation) or writes to `ingest_failures` if extraction returns empty. Writes `~/aaronai/corpus_integrity_report.json`. - **Dependencies:** `.env`; pgvector `embeddings`, `stage_3_queue`, `ingest_failures`, `stage_2_queue`; `~/aaronai/experiments/tier1_migration_state.json`; pypdf, python-docx, python-pptx. No service unit — invoked by `api.py /api/corpus/reconcile` background task and by the user manually. - **What depends on it:** `api.py /api/corpus/status` reads the report it writes; the SettingsPanel UI's "Ingest Health" section consumes that. - **Behavior matches intent?** Partial. Implements the architecture's "ingest_failures + reconciliation" tech-debt-resolved item correctly. Under the bespoke decision, the graphiti side of the reconciliation is meaningless after Stage 3 is shut off — the script will keep happily reporting "this many sources are in graphiti" but those numbers won't move and won't represent useful state. Not broken, but the report's "graphiti only" / "Both" lines become semantically empty. - **Notes:** Re-implements `extract_text` for retry path inline rather than reusing watcher's; another instance of F11. ### `ingest_conversations.py` - **Path:** `scripts/ingest_conversations.py` - **Status:** Working - **Last-touched:** 2026-04-27 - **What it does:** Nightly job. Reads `conversations.db`, finds conversations with ≥3 user-assistant exchanges, slides a 2-exchange window, formats `[Aaron AI conversation: title]` chunks, embeds with SentenceTransformer, writes to pgvector `embeddings` with `id = aaronai_conv_{conv_id}_{idx}` and `type='aaronai_conversation'`. Idempotent via `ON CONFLICT DO UPDATE`. - **Dependencies:** `.env`; pgvector; `conversations.db`. Triggered by APScheduler in `api.py` at 02:30 UTC. - **What depends on it:** Anything reading from pgvector. Indirect: dream.py and chat retrieval pull these chunks. - **Behavior matches intent?** Yes. Matches the architecture's Layer 1 ingest table. - **Notes:** No watchdog/state — re-runs each night and skips already-embedded ids. `cur.close()` is missing on the read connection at line 39 (the conn is closed though, so it's harmless). ### `st_embedder.py` - **Path:** `scripts/st_embedder.py` - **Status:** Working - **Last-touched:** 2026-04-27 - **What it does:** `EmbedderClient` adapter for graphiti-core. Wraps SentenceTransformer `all-MiniLM-L6-v2` (384-dim) so graphiti uses the same embedding model as Stage 1. No API cost for graphiti embeddings. - **Dependencies:** `graphiti_core.embedder.client`, sentence-transformers. - **What depends on it:** `graphiti_service.py` imports it at sidecar startup. - **Behavior matches intent?** Yes. Implements the "embedding layer stays on Sentence Transformers regardless of LLM" architectural commitment. - **Notes:** Will be obsolete when graphiti is replaced under the bespoke decision (the embedder pattern carries over but this specific adapter does not). ### `tier1_migration.py` - **Path:** `scripts/tier1_migration.py` - **Status:** Stable but unused (already-run one-shot) - **Last-touched:** 2026-04-30 - **What it does:** Migrates ~300 most-recent pgvector sources to graphiti via the sidecar's `/episodes/bulk` endpoint. Resumable via `~/aaronai/experiments/tier1_migration_state.json`. Adapts batch size to document length (`BATCH_SIZE=4`, `LONG_DOC_BATCH_SIZE=2` for docs ≥5000 chars). Implements Max-pending-queries / timeout / rate-limit backoff. Writes per-batch results to `tier1_migration_results.json`. - **Dependencies:** `.env` (`PG_DSN`); graphiti sidecar; `~/aaronai/experiments/`. No service unit. - **What depends on it:** `corpus_integrity.py` reads the state file. `api.py` corpus status reads the same file. Both treat ingested-list as part of the "graphiti coverage" answer. - **Behavior matches intent?** Yes against the architecture's Tier 1 migration plan (already complete per the doc — 1,205 sources, 4,990 nodes, 22,289 edges). Obsolete under the bespoke decision but harmless if not run again. - **Notes:** Hard-codes `timestamp: "2026-04-28T00:00:00"` for migration episodes — all migrated sources land with that bi-temporal `valid_at`. The migration state file lives in `~/aaronai/experiments/`, which is referenced from multiple downstream readers — moving or deleting it would break corpus integrity status. ### `consolidator_v0_1.py` - **Path:** `scripts/consolidator_v0_1.py` - **Status:** Deprecated (per reframe doc and bespoke decision) - **Last-touched:** 2026-04-29 (commit), 2026-04-30 (working tree) - **What it does:** Calibration-phase alias resolution. Pulls all `:Entity` nodes from FalkorDB `aaron` graph, computes summary embeddings via Ollama `nomic-embed-text`, infers light type labels heuristically, computes pairwise (name, ego, neighbor) similarity within type blocks, writes a markdown proposals log to `Nextcloud/Journal/Consolidation/proposals-{ts}.md` plus a JSON sibling. **Does not execute merges.** The 0.1.5 in-place patch (containment metric replacing Jaccard, summary embeddings) is reflected in this file; the `.bak` is the pre-patch version. - **Dependencies:** FalkorDB on port 6379 (direct, not via sidecar); Ollama for embeddings; `Nextcloud/Journal/Consolidation/`. - **What depends on it:** Nothing in production. Designed for human review of proposals. - **Behavior matches intent?** No, under the reframe and bespoke decision. The reframe doc explicitly identifies "consolidator-as-separate-system" as the architectural mistake — its function moves into the dream phase. Track 1 should consider this a removal candidate. - **Notes:** No service unit, no scheduler entry — executed manually only. Calibration findings (2026-04-29) showed alias-from-graph-features-alone has structural problems on this corpus. ### `backup.sh` - **Path:** `scripts/backup.sh` - **Status:** Working - **Last-touched:** 2026-04-26 - **What it does:** Daily-snapshot bash script. Copies `memory.md`, `settings.json`, `conversations.db` into `~/nextcloud/.../Admin/Backups/` with date-stamped names; deletes anything older than 7 days. Output ends up inside Nextcloud's `Admin/Backups/`, which the watcher excludes from indexing — so backups don't pollute the corpus. - **Dependencies:** Read access to the three files; write access to `Admin/Backups/`. - **What depends on it:** Nothing programmatic. Operationally: the only off-host backup of `memory.md` and `settings.json`. - **Behavior matches intent?** Yes. Lightweight, no-judgement copy → Nextcloud → Nextcloud Desktop → off-machine. - **Notes:** Cron-driven (Phase 5 will confirm). Uses `find -mtime +7 -delete` so naming-format changes wouldn't break retention. ### Experimental scripts (one-shot research artifacts) The following scripts are all completed experiments. None has a service unit, none is on a schedule, none is a runtime dependency of any production code path. They are kept as reproducibility artifacts for the experiments log. **All are candidates for moving out of `scripts/` into `experiments/` or `deprecated/`** — they crowd the production directory and on cursory inspection it is hard to tell at-a-glance which files are live workers. | File | Experiment | Status | Notes | |---|---|---|---| | `audit_expansion_draw.py` | Type-aware stratified draw for n=20 audit expansion | Experimental | Sample-construction tool for `base_class_audit_rerun.py` | | `base_class_test.py` | Base-class enrichment n=20 | Experimental | OOP framing experiment, validated 2026-04-28 | | `base_class_validation.py` | Base-class enrichment n=50 | Experimental | Main validation study | | `base_class_audit_rerun.py` | Base-class enrichment audit rerun | Experimental | n=8 paired-extraction audit, 0% fabrication | | `briefing_generator_v2.py` | Experiment 002b (briefing v2) | Experimental | Validated local Mistral structural pattern recognition at 96% | | `briefing_test.py` | Experiment 002 (briefing v1) | Experimental | Superseded by v2 | | `cascade_test.py` | Entity-drafter cascade n=20 | Experimental | Falsified 2026-04-28 | | `cascade_optimization_test.py` | Optimized entity-drafter cascade n=30 | Experimental | Confirmed entity-drafter cascade is dead | | `consistency_test.py` | Mistral 3-pass consistency n=50 | Experimental | Experiment 001 | | `consistency_test_v2.py` | Entity-only consistency, fixed sampling | Experimental | Experiment 003 | | `cost_test_graphiti_bulk.py` | Bulk endpoint cost test | Experimental | Stratified n=50 | | `cost_test_graphiti_bulk_retry.py` | Retry of failed bulk batches | Experimental | Pre-MAX_QUEUED_QUERIES bump | | `cost_test_graphiti_bulk_retry2.py` | Second retry attempt | Experimental | Smaller batches, post-bump | | `cost_test_graphiti_migration.py` | Single-episode migration cost test | Experimental | Stratified n=50 | | `e1_select_sample.py` | E1 sample selection | Experimental | Cascade re-extraction sample | | `e1_run_cascade.py` | E1 orchestration | Experimental | Initial cascade run, group `aaron_cascade_test` | | `e1_run_cascade_corrected.py` | E1 corrected (custom_extraction_instructions path) | Experimental | Re-run with the fixed prompt-path | | `e1_per_source_predicates.py` | E1 per-source predicate count | Experimental | Corrected metric | | `e1_compare_metrics.py` | E1 A vs B metrics comparison | Experimental | Reads from FalkorDB via redis-cli docker exec | | `e14_select_sample.py` | E1.4 sample selection (n=30) | Experimental | Stratified, excludes E1's 10 | | `e14_run_cascade.py` | E1.4 cascade orchestration | Experimental | Group `aaron_cascade_e14` | | `e14_per_source_predicates.py` | E1.4 per-source predicate diversity | Experimental | Bucket-level analysis | | `e16_rate_purity.py` | E1.6 domain-purity human rating UI | Experimental | Surfaces taxonomic-mismatch finding | | `e16_analyze.py` | E1.6 Spearman correlation against E1.4 | Experimental | Pre-registered decision rules | | `e2_resolution_check.py` | E2 entity resolution diagnostic | Experimental | Six test entities, FalkorDB query | | `e2_alias_followup.py` | E2 alias follow-up | Experimental | Aaron AI variants etc. | | `e2_source_diversity.py` | E2 episode count per entity | Experimental | Diagnostic | | `token_measurement_test.py` | Experiment 005 — token reduction | Experimental | Validates 42.0% modeled estimate | | `experiments/e1_8_eval.py` | E1.8 eval phase | Experimental | Pulls predicate counts | | `experiments/e1_8_taxfree_cascade.py` | E1.8 ingest phase | Experimental | Taxonomy-free cascade | | `experiments/e1_9_retroactive.py` | E1.9 retroactive validation | Experimental | Phase 1 parked 2026-04-30 (graph immature) | | `experiments/e3_dreamer_substrate.py` | E3 dreamer substrate comparison | In-flight | "Genuinely ready" per architecture doc post-F1 fix; per bespoke decision now confounded — not runnable to produce a trustworthy answer | The `e3_dreamer_substrate.py` script is the only one with current relevance: its run was the proximate cause of the bespoke decision (per the decision doc, running E6 on graphiti is "a vibe check" because of issue #1325 and friends). Code is functional; under the bespoke decision the experiment it runs cannot produce a trustworthy answer. ### Backup files (`.bak*`) The following are point-in-time copies left behind by the rollback work. None is on any code path. They are documented as a group rather than individually: - `api.py.bak.20260501-001427` - `consolidator_v0_1.py.bak` (pre-0.1.5-patch) - `corpus_integrity.py.bak.20260501-021703` - `dream.py.bak`, `dream.py.bak.20260501-002209` - `graphiti_service.py.bak`, `graphiti_service.py.bak.20260501-185619`, `graphiti_service.py.bak.20260502-022307` - `ingest.py.bak.20260501-004131` - `stage2_worker.py.bak.20260501-171928`, `.20260501-172531`, `.20260501-185942` - `stage3_worker.py.bak.20260501-050354`, `.20260501-050453`, `.20260501-050719`, `.20260501-173233`, `.20260501-190357` - `watcher.py.bak`, `watcher.py.bak.20260501-004131` Stage 3 alone has five `.bak` versions, which matches the v2.0 → v2.1 → v2.2 patch history. Track 1 cleanup candidate: collapse all `.bak*` into a `deprecated/` or remove (git history is the durable artifact). ### `__pycache__/` Compiled `.pyc` files for `api`, `corpus_integrity`, `dream`, `ingest`, `stage3_worker`, `st_embedder`, `watcher`. Notably *no* `.pyc` for `stage2_worker.py` — the worker imports under uvicorn's process lifecycle rather than via Python's standard import machinery, but that's a guess from absence; uncertain. Not a code path. Remove on next clean build if desired. --- ### Phase 1 summary **Working and matching intent:** - `watcher.py` (Stage 1) - `ingest_conversations.py` (nightly conversation indexer) - `st_embedder.py` - `backup.sh` **Working with behavior-vs-intent divergences:** - `api.py` — dead `/auth/check` reference; voice capture doesn't archive raw audio to `Journal/Media/`; `/api/corpus/retry` reintroduces 50KB truncation. - `dream.py` — cumulative 500-source exclusion across nights is a NREM-shape divergence: silently shrinks Early/Late REM's reachable corpus over time without architectural mandate. NREM exclusion fix is in place but the pattern that caused that bug exists at a different layer. - `ingest.py` — duplicates Stage 1 logic (F11), default behavior re-enqueues to Stage 2 on every reindex. - `stage2_worker.py` — works as designed; under the bespoke decision is doing work that's no longer the architectural target. - `corpus_integrity.py` — graphiti side of the report becomes semantically empty after Stage 3 shutoff. - `graphiti_service.py` — works as designed; same story as Stage 2 — not aligned with bespoke direction. **Stopped / deprecated / experimental:** - `stage3_worker.py` — service stopped manually; code in repo, last-modified 2026-05-01. - `consolidator_v0_1.py` — reframe-deprecated. - `tier1_migration.py` — already-run one-shot, kept as reproducibility artifact. - All 32 experimental scripts in `scripts/` and `scripts/experiments/`. - `e3_dreamer_substrate.py` — in-flight per architecture doc, confounded per bespoke decision. **Removal candidates (do not remove):** - All `.bak*` files (~20 of them) — git history covers them. - The 32 experimental scripts could move to `deprecated/` or `experiments/` to clean up `scripts/`. - `consolidator_v0_1.py` — explicitly deprecated by reframe. - `tier1_migration.py` — completed migration; kept for reproducibility. **NREM-shaped divergences (the most important class of finding):** 1. **`dream.py` cumulative exclusion 500-cap.** The `retrieved_sources` list grows across nights and is the exclusion set for Early REM and Late REM. After enough nights it reliably hides ~40% of the corpus. The architecture and reframe specify session-scoped novelty, not corpus-lifetime exclusion. Same shape as the NREM bug: a deduplication mechanism running silently in a way the architecture didn't request. 2. **`api.py /api/corpus/retry` 50KB truncation.** The F14 fix removed truncation from `watcher.py`, `ingest.py`, `corpus_integrity.py`, but the api.py retry path was missed — clicking "Retry" on an ingest-failure still truncates. Working without errors, doing something the architecture explicitly says not to. --- ## Phase 2 — Systemd services Inventory of every `aaronai*.service` and `aaronai*.timer` in `/etc/systemd/system/`. Status is from `systemctl is-enabled` and `systemctl is-active` taken during this session. ### `aaronai.service` - **Status:** Working (enabled, active) - **Unit-file mtime:** 2026-04-24 - **Type / trigger:** `simple`, `Restart=always`, `WantedBy=multi-user.target`. Always-running. - **Command:** `/home/aaron/aaronai/venv/bin/python3 /home/aaron/aaronai/scripts/api.py` - **Depends on:** `network.target` - **What depends on it:** `aaronai-graphiti.service`, `aaronai-stage2.service`, `aaronai-stage3.service`, `aaronai-watcher.service` all `After=` it; `Requires=aaronai.service` on Stage 2 and Stage 3. - **Behavior matches intent?** Yes. Hosts the FastAPI backend and the embedded APScheduler. The architecture doc lists this as the long-running api.py process hosting nightly cycles. - **Notes:** No `WatchdogSec`. Restarts on crash. Has been "running since May 01" per the current-state doc. ### `aaronai-graphiti.service` - **Status:** Working (enabled, active) - **Unit-file mtime:** 2026-04-27 - **Type / trigger:** `simple`, `Restart=always`, always-running. - **Command:** `/home/aaron/aaronai/venv/bin/python3 /home/aaron/aaronai/scripts/graphiti_service.py` - **Depends on:** `aaronai.service` (After=, soft); FalkorDB Docker container at `127.0.0.1:6379`; `.env`. - **What depends on it:** `aaronai-stage3.service` (Requires=); `dream.py` when `DREAMER_SUBSTRATE=graphiti`; the Stage 3 worker's `recover_wedge` does `sudo systemctl restart aaronai-graphiti.service`. - **Behavior matches intent?** Yes against architecture doc. Under bespoke decision this is the layer being replaced. Service still runs and the sidecar still answers `/health`. - **Notes:** The 2026-05-01 v2.1 patches (sudoers entry, error logging) are applied in the worker code that calls this; the service unit itself is unchanged. ### `aaronai-stage2.service` - **Status:** Working (enabled, active) - **Unit-file mtime:** 2026-05-01 - **Type / trigger:** `simple`, `Restart=always`, `Requires=aaronai.service`. Always-running worker. - **Command:** `/home/aaron/aaronai/venv/bin/python3 /home/aaron/aaronai/scripts/stage2_worker.py` - **Depends on:** `aaronai.service` (Requires=); Ollama on 11434; `.env`. - **What depends on it:** Stage 3 worker (consumes the queue this fills). - **Behavior matches intent?** Yes for the worker code. Under the bespoke decision the cascade pipeline this feeds is no longer the architectural target — but the unit is doing what its code says. - **Notes:** `WatchdogSec` line is commented out (the 2026-05-01 fix). Logs to `/var/log/aaronai/stage2.log`. ### `aaronai-stage3.service` - **Status:** Stopped (enabled, **inactive**) — manually stopped per the session brief - **Unit-file mtime:** 2026-05-01 - **Type / trigger:** `simple`, `Restart=always`, `Requires=aaronai.service aaronai-graphiti.service`. Would be always-running if started. - **Command:** `/home/aaron/aaronai/venv/bin/python3 /home/aaron/aaronai/scripts/stage3_worker.py` - **Depends on:** `aaronai.service` and `aaronai-graphiti.service` (both Requires=); `.env`; passwordless sudo for `systemctl restart aaronai-graphiti.service`. - **What depends on it:** Nothing technically requires it; corpus integrity reads `stage_3_queue.completed_at` and would see those numbers stop moving while the worker is off. - **Behavior matches intent?** **Divergence.** The unit is `enabled` (i.e., will start at next boot) but currently inactive. The bespoke decision parks this work; on reboot the service will start automatically and resume processing `stage_3_queue` rows. Track 1 cleanup should `systemctl disable` it before next reboot — otherwise the manual stop is a soft guarantee that doesn't survive a power cycle. - **Notes:** `WatchdogSec` line is commented out (the 2026-05-01 fix). Logs to `/var/log/aaronai/stage3.log`. The service file's `Description` still says "Graphiti cascade ingest" — accurate but architecturally stale under bespoke. ### `aaronai-watcher.service` - **Status:** Working (enabled, active) - **Unit-file mtime:** 2026-04-30 - **Type / trigger:** `simple`, `Restart=always`. Always-running. - **Command:** `/home/aaron/aaronai/venv/bin/python3 /home/aaron/aaronai/scripts/watcher.py` - **Environment:** `TRANSFORMERS_OFFLINE=1`, `HF_HUB_OFFLINE=1`, `PATH=/home/aaron/aaronai/venv/bin`. Resource caps: `MemoryMax=3G`, `MemorySwapMax=0`. - **Depends on:** `aaronai.service` (After=); pgvector; SentenceTransformer model files (offline mode means they must already be cached). - **What depends on it:** Anything that reads pgvector or `stage_2_queue` indirectly depends on this filling them. - **Behavior matches intent?** Yes. Stage 1 architectural commitment. The 2026-04-30 in-process refactor matches the architecture doc. - **Notes:** `MemorySwapMax=0` is the post-refactor commitment. Watcher heartbeat at `/home/aaron/aaronai/watcher_heartbeat` is consumed by an external cron monitor (Phase 5 confirms). ### `aaronai-web.service` - **Status:** Working (enabled, active) - **Unit-file mtime:** 2026-04-26 - **Type / trigger:** `simple`, `Restart=always`. Always-running. - **Command:** `/usr/bin/node node_modules/next/dist/bin/next start` from `/home/aaron/aaronai-web` with `NODE_ENV=production` and `PORT=3000`. - **Depends on:** `network.target`. - **What depends on it:** nginx reverse-proxies to port 3000 (per architecture doc); Cloudflare-fronted `ai.aaronnelson.studio`. - **Behavior matches intent?** Yes. Hosts the Next.js frontend per Layer 3 architecture. - **Notes:** Working directory is `~/aaronai-web/` not `~/projects/aaronai-web/` — production deployment is a separate clone of the repo. This is consistent with the architecture doc's "Local: `~/projects/aaronai-web/`, deployed: `~/aaronai-web/`" line. ### `aaronai-dreamer.service` - **Status:** Working (oneshot; static) - **Unit-file mtime:** 2026-04-26 - **Type / trigger:** `Type=oneshot`. Not directly schedulable from systemd (no `[Install]` block — `static`). - **Command:** `/home/aaron/aaronai/venv/bin/python3 /home/aaron/aaronai/scripts/dream.py --mode nrem` - **Depends on:** `network.target`. - **What depends on it:** The session brief noted this service was used for the manual NREM run on 2026-05-02 21:33-21:34 UTC. APScheduler in `api.py` is the production trigger and uses `subprocess.Popen` directly (not this unit) — the unit is only for manual `systemctl start aaronai-dreamer.service` from the shell. - **Behavior matches intent?** Partial. The unit exists and is the only systemd-tracked dream entry point. **It still hardcodes `--mode nrem`** as the command, so a manual `systemctl start aaronai-dreamer.service` runs only NREM, not the full pipeline. The architecture says nightly is full pipeline; the production scheduler in api.py runs `dream.py` with no flag (i.e., default pipeline). The unit's `--mode nrem` is therefore an outdated invocation pattern preserved from when individual stages were run by hand. - **Notes:** Has a paired `aaronai-dreamer.timer` (next entry) that is **not enabled**. APScheduler is the only thing actually triggering nightly dreams. ### `aaronai-dreamer.timer` - **Status:** Stopped — exists but **not in `timers.target.wants/`**, so not enabled - **Unit-file mtime:** 2026-04-27 - **Schedule:** `OnCalendar=*-*-* 08:00:00`, `Persistent=true`. - **Triggers:** `aaronai-dreamer.service` - **Behavior matches intent?** Divergence — duplicate scheduling. APScheduler in `api.py` drives the actual 08:00 UTC dream run. This timer would do the same thing (with the wrong invocation — `--mode nrem`) if it were enabled. **NREM-shape divergence: a scheduling mechanism present, configured, and inactive — but its presence will confuse a future reader about who triggers the dream.** Track 1 cleanup candidate: remove or disable explicitly. ### `aaronai-index-conversations.service` - **Status:** Working (oneshot; static) - **Unit-file mtime:** 2026-04-26 - **Type / trigger:** `Type=oneshot`. Static, no Install section. - **Command:** `/home/aaron/aaronai/venv/bin/python3 /home/aaron/aaronai/scripts/ingest_conversations.py` - **Depends on:** `network.target`. - **What depends on it:** Manually triggerable. APScheduler in `api.py` runs `ingest_conversations.py` directly via `subprocess.run` — not this unit. - **Behavior matches intent?** Same shape as the dreamer unit: an alternate entry point that exists for manual debugging. Not on a path that fires. - **Notes:** Logs to `/home/aaron/aaronai/dreamer.log` — same log file as the dreamer service (likely a copy-paste artifact, not a deliberate co-mingling). ### `aaronai-index-conversations.timer` - **Status:** Stopped — not enabled - **Unit-file mtime:** 2026-04-26 - **Schedule:** `OnCalendar=*-*-* 02:30:00`, `Persistent=true`. - **Triggers:** `aaronai-index-conversations.service` - **Behavior matches intent?** Same divergence pattern as `aaronai-dreamer.timer`. APScheduler in `api.py` is the real driver at 02:30 UTC. This timer is dormant and would silently double-fire the job if enabled. ### `aaronai-maintenance.service` - **Status:** Broken (oneshot; static; **command is unrunnable**) - **Unit-file mtime:** 2026-04-26 - **Type / trigger:** `Type=oneshot`. Static. - **Command:** `/home/aaron/aaronai/venv/bin/chops hnsw rebuild --path /home/aaron/aaronai/db --collection aaronai` - **Depends on:** `chops` binary in venv, ChromaDB at `/home/aaron/aaronai/db/`. - **What depends on it:** Nothing. `aaronai-maintenance.timer` would trigger it weekly if enabled, but the timer is not enabled. - **Behavior matches intent?** **No.** This unit is from the ChromaDB era. The architecture doc records the ChromaDB → pgvector migration on 2026-04-26. Verified during this inventory: `chops` is **not present** in `~/aaronai/venv/bin/`, and `~/aaronai/db/` still contains `chroma.sqlite3` and a UUID-named subdirectory but is no longer the active corpus store. **If anyone ever ran `systemctl start aaronai-maintenance.service`, it would fail with command-not-found.** - **Notes:** Track 1 removal candidate. Both this and its timer are pure dead state; the `~/aaronai/db/` directory is a separate cleanup decision (it holds historical ChromaDB data, possibly recoverable). ### `aaronai-maintenance.timer` - **Status:** Stopped — not enabled - **Unit-file mtime:** 2026-04-26 - **Schedule:** `OnCalendar=Sun *-*-* 04:00:00`, `Persistent=true`. - **Triggers:** `aaronai-maintenance.service` (broken). - **Behavior matches intent?** No — points at a broken service. - **Notes:** Track 1 removal candidate. --- ### Phase 2 summary **Working and matching intent:** - `aaronai.service` - `aaronai-graphiti.service` (matches the existing-architecture intent; bespoke decision will replace the layer it serves) - `aaronai-stage2.service` (same caveat) - `aaronai-watcher.service` - `aaronai-web.service` **Working with behavior-vs-intent divergences:** - `aaronai-dreamer.service` — hardcodes `--mode nrem`; production trigger is APScheduler running default pipeline. The systemd entry-point and the production entry-point disagree about what "dream" means. **Stopped / broken:** - `aaronai-stage3.service` — manually stopped 2026-05-02; **still `enabled` so will autostart on next reboot**. - `aaronai-dreamer.timer`, `aaronai-index-conversations.timer` — not enabled; redundant with APScheduler. - `aaronai-maintenance.service` and `aaronai-maintenance.timer` — broken (`chops` not installed); ChromaDB-era leftover. - `aaronai-index-conversations.service` — static, harmless oneshot wrapper. **Removal candidates (do not remove):** - `aaronai-maintenance.service` and `.timer` - `aaronai-dreamer.timer`, `aaronai-index-conversations.timer` (or, alternatively, disable APScheduler and use the timers — the duplication is the problem, not the choice) - `aaronai-stage3.service` should be `disabled` even if not removed, so the manual-stop survives a reboot. **NREM-shaped divergences in Phase 2:** 1. **`aaronai-stage3.service` is `enabled` but `inactive`.** Manual stop does not survive reboot; on next reboot the worker resumes against `stage_3_queue`, which is being filled by Stage 2. Same shape as the NREM bug: the operationally-stopped state is paper-thin. The architecture's stated "service stopped" intent is undermined by a `systemctl is-enabled` value nobody changed. 2. **`aaronai-maintenance.service` against ChromaDB.** Service is configured, would attempt to run if its (disabled) timer fired, would fail. The architectural intent (ChromaDB retired) and the systemd state (unit still installed and enabled-static) are out of sync. The disabled timer is the only thing protecting against running this. 3. **Triple-scheduled triggers.** APScheduler in api.py + dreamer/index-conversations timer files = two competing schedulers configured for the same nightly work. Only APScheduler is firing; the other is dormant. This is exactly the mechanism-still-present-but-not-architecturally-intended pattern. --- --- ## Phase 3 — Database tables PostgreSQL `aaronai` database, `public` schema. Five tables. Connected via `PG_DSN` from `.env` (value not echoed in this document). All queries `SELECT`-only and `\d`-style. Counts taken during this session. ### `embeddings` - **Status:** Working (the production retrieval substrate) - **Columns:** - `id text NOT NULL` (PK) - `document text NOT NULL` (chunk content) - `embedding USER-DEFINED` (pgvector `vector(384)`) - `source text` (filename/conversation title) - `type text` (document / chatgpt_conversation / claude_conversation / aaronai_conversation / claude_memory / NULL) - `created_at text` (string-typed, not timestamptz; many rows NULL) - `metadata jsonb` - **Indexes:** - `embeddings_pkey` btree on `id` - `embeddings_vector_idx` HNSW (m=16, ef_construction=64, vector_cosine_ops) - `embeddings_source_idx` btree on `source` - **Row count:** 13,874 - **Distinct sources:** 1,236 - **Type distribution:** `document` 1,368 | `chatgpt_conversation` 1,548 | `claude_conversation` 1,074 | `aaronai_conversation` 68 | `claude_memory` 1 | NULL 9,815 - **Writes:** `watcher.py:ingest_file()`, `ingest.py:ingest_file()`, `ingest_conversations.py:run()`, `corpus_integrity.py:queue_for_retry()` (writes to `stage_2_queue`, not here — but on a normal ingest path the chunks land here) - **Reads:** `api.py:retrieve_context()`, `dream.py:retrieve()` (pgvector branch), `corpus_integrity.py`, `tier1_migration.py:fetch_tier1_sources()`, several experiment scripts - **Behavior matches intent?** Partial. **9,815 of 13,874 rows have `type IS NULL` (~71%)** — this is unexpected given the architecture doc's commitment to typing every chunk. Looking at the code, `watcher.py:ingest_file()` writes `type='document'` and `ingest_conversations.py` writes `'aaronai_conversation'`. The 9,815 NULLs are likely artifacts of older ingest runs or `ingest_chatgpt.py`/`ingest_claude.py` (referenced in the architecture doc but not present in `scripts/` — possibly run as one-shots from an earlier point and deleted). **Additionally, `created_at` is stored as `text` rather than `timestamptz`**, and 12,109 rows have it NULL. Both are NREM-shape divergences: data fields the architecture treats as load-bearing for "temporal awareness" exist in the schema but are mostly empty or mistyped. - **Notes:** HNSW index parameters match the doc. The vector dimension is 384 (matches `all-MiniLM-L6-v2`). ### `stage_2_queue` - **Status:** Working (active queue feeding stage2_worker) - **Columns:** - `id integer NOT NULL` (PK, sequence) - `source text NOT NULL UNIQUE` - `full_text text NOT NULL` (no longer truncated post-F14) - `char_length integer NOT NULL` - `enqueued_at timestamptz NOT NULL default NOW()` - `started_at`, `completed_at`, `failed_at` timestamptz nullable - `failure_reason text` - `attempts integer NOT NULL default 0` - **Indexes:** PK + unique on `source`. - **Row count:** 48 (25 completed, 21 failed, 2 pending) - **Failure breakdown:** - `park_pending_phase_2_reframe` — 19 rows (manually-marked, the parked meta-documents per the reframe) - `mistral_timeout_after_300s` — 2 rows - **Last enqueued:** 2026-05-02 22:22 UTC - **Last completed:** 2026-05-02 22:33 UTC - **Writes:** `watcher.py:enqueue_stage2()`, `ingest.py:enqueue_stage2()`, `corpus_integrity.py:queue_for_retry()`, `api.py:/api/corpus/retry`, `stage2_worker.py` (updates state) - **Reads:** `stage2_worker.py:run()` - **Behavior matches intent?** Yes. The queue is doing what it was redesigned to do post-F14. The 19 manually-parked rows match the reframe doc's mention of parked meta-documents. - **Notes:** **The watcher is still actively enqueuing rows at 2026-05-02 22:22 — meaning Stage 2 is still consuming the queue and feeding Stage 3.** This is fine architecturally for now, but worth flagging given Stage 3 is stopped (Phase 2). See Phase 3 summary divergence #1. ### `stage_3_queue` - **Status:** Working-degraded - **Columns (base):** - `id integer NOT NULL` (PK, sequence) - `source text NOT NULL UNIQUE` - `full_text text NOT NULL` - `orientation text NOT NULL` - `stage2_metadata jsonb` - `enqueued_at timestamptz NOT NULL default NOW()` - `started_at`, `completed_at`, `failed_at` timestamptz nullable - `failure_reason text` - `attempts integer NOT NULL default 0` - **Columns (rolled-back-migration leftovers, all unused by current code):** - `state_type text` (added by `30beeb3`, unused) - `state_type_confidence text` (unused) - `supersedes_prior_state boolean` (unused) - `state_type_rationale text` (unused) - `external_job_id uuid` (added by `a0bf280`, unused) - **Indexes:** - `stage_3_queue_pkey` - `stage_3_queue_source_key` (unique on source) - `stage_3_queue_supersedes_idx` btree on `supersedes_prior_state` — unused - `idx_stage_3_queue_external_job` partial btree on `external_job_id` where not-null and not-completed/failed — unused - **Row count:** 19 (11 completed, 3 failed, 6 pending). 1 row has `state_type` populated (the smoke-test); 0 have `external_job_id`. - **Failure breakdown:** - 2 × `HTTPConnectionPool(host='localhost', port=8001): Read timed out. (read timeout=600)` (the May-1 incident period) - 1 × `Bulk path against new content unpatched; deferred until search_utils.py sites 4-9 are patched` (rolled-back work artifact) - **Last enqueued:** 2026-05-02 22:33 UTC (Stage 2 just enqueued a row). - **Writes:** `stage2_worker.py:enqueue_stage3()`, `stage3_worker.py` (state updates). - **Reads:** `stage3_worker.py:run()`, `corpus_integrity.py:get_graphiti_sources()`, `api.py:get_corpus_status_data()`. - **Behavior matches intent?** **Partial / multiple divergences.** - 5 columns and 2 indexes from rolled-back migrations remain. Inert under current code, but they are visible to anyone reading the schema and will mislead. The current-state doc said `idx_stage_3_queue_supersedes` "may also still exist" — confirmed: it does, **plus** `idx_stage_3_queue_external_job` which the current-state doc didn't mention. - The queue is filling without a consumer. Stage 3 worker is stopped (Phase 2); Stage 2 worker is enqueuing. As of 22:33 UTC there are 6 pending rows. - **Notes:** Cleanup SQL is in the current-state doc. Track 1 candidate for removal (low priority — no harm in leaving). ### `graphiti_jobs` - **Status:** Working-degraded (rolled-back-code artifact) - **Columns:** - `job_id uuid NOT NULL` (PK) - `job_type text NOT NULL` - `payload jsonb NOT NULL` - `status text NOT NULL default 'queued'` - `enqueued_at timestamptz NOT NULL default NOW()` - `started_at`, `finished_at` timestamptz nullable - `error text` - `summary jsonb` - `submitted_by text` - **Indexes:** - `graphiti_jobs_pkey` - `idx_graphiti_jobs_queued` partial btree on `enqueued_at` where status='queued' - `idx_graphiti_jobs_status` btree on `status` - **Row count:** **9 (NOT empty)** — 6 failed, 3 committed. - **Activity window:** All 9 jobs from 2026-05-02 02:26 UTC to 2026-05-02 05:50 UTC — last night's experimental run, before the rollback. Mix of `single` and `bulk` job types. - **Writes:** None in current code. The Pattern 1 async-job consumer/producer was rolled back. - **Reads:** None in current code. - **Behavior matches intent?** **No.** The current-state doc said this table "exists, empty (or near-empty)". It is not empty — 9 jobs from the May-2 experimental run remain. They are inert (nothing reads or writes the table now), but the documented state and the actual state disagree. Drop the table per the current-state doc's cleanup SQL. - **Notes:** Two of the 6 failures have `started_at IS NULL` and a non-null `finished_at` — those are jobs that were marked failed without ever being claimed by a worker. Pattern in the rolled-back code. Of historical interest only. ### `ingest_failures` - **Status:** Working - **Columns:** - `id integer NOT NULL` (PK, sequence) - `source text NOT NULL UNIQUE` - `filepath text NOT NULL` - `error text NOT NULL` - `retry_count integer NOT NULL default 0` - `first_failed_at`, `last_failed_at` timestamptz default NOW() - `resolved boolean NOT NULL default false` - `category text NOT NULL default 'transient'` - **Indexes:** PK + unique on `source`. - **Row count:** 129 (all `category='unreadable'`, all `resolved=false`) - **Writes:** `watcher.py:record_ingest_failure()`, `corpus_integrity.py` (auto-queue path), `api.py:/api/corpus/retry` - **Reads:** `api.py:get_corpus_status_data()`, `corpus_integrity.py:get_ingest_failures()` - **Behavior matches intent?** Yes. Matches the architecture's "ingest_failures table for UI visibility" tech-debt-resolved entry. The 129 unreadable files match the 129 figure cited in the architecture doc — these are scanned/encrypted/corrupt PDFs awaiting OCR (priority 21b). - **Notes:** The `category` field has only one observed value (`'unreadable'`); `'transient'` is the default but no rows currently carry it. Consistent with the architecture: only persistent failures (after watcher retry) make it here. --- ### Phase 3 summary **Working and matching intent:** - `ingest_failures` (129 unreadable, awaiting OCR, all matches doc) - `stage_2_queue` (functioning queue, post-F14) **Working with behavior-vs-intent divergences:** - `embeddings` — 71% of rows have `type IS NULL`; 87% have `created_at IS NULL`; `created_at` is `text`-typed not timestamptz. The temporal-awareness commitment in the architecture is largely unsupported by the data actually in the table. - `stage_3_queue` — five rolled-back-migration columns and two unused indexes remain; queue is being filled by Stage 2 with no consumer running. **Broken / rolled-back:** - `graphiti_jobs` — 9 rows from the rolled-back experimental work; current-state doc says "empty"; reality says otherwise. No current code touches it. **Removal candidates (do not remove):** - `stage_3_queue` columns: `state_type`, `state_type_confidence`, `supersedes_prior_state`, `state_type_rationale`, `external_job_id` and the two related indexes. - `graphiti_jobs` table entirely. - `embeddings.created_at` — under bespoke, the new substrate's temporal model replaces this; the column probably gets dropped in the bespoke build. **NREM-shaped divergences in Phase 3:** 1. **Stage 2 still enqueues to Stage 3 while Stage 3 is stopped.** Pending count grows over time. There is no architectural-level decision to do this; it's a consequence of leaving Stage 2 running while turning off its consumer. The pending rows are inert until a consumer attaches, but the design says one queue stage feeds the next — and the consumer is gone. Same shape: a pipeline working "without errors" and producing state nobody is consuming. 2. **`embeddings.type` is NULL for 71% of rows.** The architecture treats `type` as a load-bearing field for distinguishing document vs conversation chunks at retrieval time. In production, more than two-thirds of chunks lack the field. Retrieval still works because nothing routes on `type`. The mechanism is in place, doing nothing visible, and the absence is invisible to anyone not querying the schema directly. 3. **`embeddings.created_at` is `text`-typed and 87% NULL.** Same shape: the doc treats temporal awareness as architectural; the data shape doesn't support time-based queries even where the column exists. 4. **`graphiti_jobs` documented as empty, actually has 9 rows.** Current-state doc explicitly anticipates the wrong state. Verifying the doc against the database surfaced this. --- --- ## Phase 4 — Configuration ### `~/aaronai/.env` Eight keys present. **Values redacted in this document; only key name, length, and shape are reported.** | Key | Length | Shape | Used by | Still referenced? | |---|---|---|---|---| | `ANTHROPIC_API_KEY` | 108 | opaque | `api.py` (Anthropic client), `dream.py:_call_claude`, `graphiti_service.py` (as fallback when `LLM_API_KEY` unset), several experiment scripts | Yes | | `AARON_AI_PASSWORD` | 16 | opaque | `api.py:/auth/login` | Yes | | `NEXTCLOUD_URL` | 36 | uri | `api.py` capture endpoint, `dream.py:deliver` | Yes | | `NEXTCLOUD_USER` | 5 | opaque | Same as above | Yes | | `NEXTCLOUD_PASSWORD` | 29 | opaque | Same — WebDAV app password | Yes | | `PG_DSN` | 71 | opaque (postgres connection string) | Every Postgres-touching script (`api.py`, `dream.py`, `watcher.py`, `ingest.py`, `ingest_conversations.py`, both workers, `corpus_integrity.py`, `tier1_migration.py`, all experiment scripts) | Yes | | `LLM_PROVIDER` | 9 | opaque (matches `"anthropic"`) | `graphiti_service.py:get_llm_client` | Yes (graphiti only) | | `LLM_MODEL` | 25 | opaque (matches `"claude-sonnet-4-6"` length) | `graphiti_service.py` | Yes (graphiti only) | **Variables documented in the architecture doc but NOT present in `.env`:** - `LLM_API_KEY` — architecture doc table lists it. `graphiti_service.py` reads `LLM_API_KEY` first, falls back to `ANTHROPIC_API_KEY`. Current behavior depends on the fallback. Architecturally fine, but the "user brings their own key" LLM-agnostic framing (architecture doc Section 5) is achieved by a fallback rather than an explicit key. Track 1 candidate: either set `LLM_API_KEY` explicitly or remove the unused fallback path from the doc. - `FALKORDB_HOST`, `FALKORDB_PORT`, `GRAPHITI_GROUP_ID` — referenced in `graphiti_service.py` with defaults (`localhost`, `6379`, `aaron`). Defaults are correct for current deployment; absence from `.env` is fine. Worth flagging only because the architecture doc lists `group_id="aaron"` as a single-tenant assumption (F26). **Variables loaded but worth flagging:** - All Postgres-touching scripts call `load_dotenv(Path.home() / "aaronai" / ".env", override=True)` (or without `override`). Different scripts use different override behavior; this is harmless but inconsistent. **Behavior matches intent?** Partial. The `.env` file works; the documented LLM-agnostic story is a fallback story, not an enforced one. Permissions are `chmod 600` per the architecture commitment (file mode confirmed in earlier pass). ### `~/aaronai/settings.json` Active contents: ```json { "theme": "light", "font_size": "medium", "web_search": true, "show_sources": true } ``` `api.py:DEFAULT_SETTINGS` (line 46) defines a wider key set: ```python { "theme": "light", "font_size": "medium", "web_search": True, "show_sources": True, "dream_hour_utc": 8, "dream_minute_utc": 0, "dream_mode": "nrem", "ingest_hour_utc": 2, "ingest_minute_utc": 30, "share_time": True, } ``` `load_settings()` merges file over defaults; `save_settings()` writes whatever it is given. The file currently holds only the four UI-tunable keys. The other six are loaded from defaults. **What is referenced by current code:** - `theme`, `font_size` — frontend only (Phase 6) - `web_search` — `api.py:chat()` (line 307) — toggles the web_search tool block - `show_sources` — `api.py:/api/chat` (line 521) — gates whether sources are returned in the chat response - `dream_hour_utc`, `dream_minute_utc` — `api.py:reschedule_jobs()` (line 1149) - `ingest_hour_utc`, `ingest_minute_utc` — `api.py:reschedule_jobs()` (line 1159) - `dream_mode` — present in defaults; **not read anywhere in `api.py` or `dream.py`**. Searching the codebase: `dream_mode` appears only in `DEFAULT_SETTINGS` and the `schedule_keys` set in `update_settings`; `run_dream_job` always invokes `dream.py` with no flag (full pipeline). The setting is dead from the scheduler's perspective — it may be read by the frontend SettingsPanel for the default value of the on-demand "Dream Now" mode dropdown (Phase 6). - `share_time` — **frontend-controlled UI flag, backend stores-and-returns.** The backend persists it via `/api/settings` but does not act on its value. Frontend reads it at `MessageInput.tsx:58` and `SettingsPanel.tsx:205` (both with `?? true` fallback) and writes it back through the SettingsPanel toggle. The flag gates whether `client_time` is included in the `/api/chat` request payload (`lib/api.ts:51-57`); when off, the request omits the key and the backend's unconditional prompt-side insertion at `chat()` line 293 has nothing to insert. *Verified by cross-repo grep 2026-05-02 — the original "frontend-only or dead" / "removal candidate" framing was wrong; this is a working persistence pattern, structurally distinct from `dream_mode`.* **Behavior matches intent?** Partial — but the two suspect keys behave very differently and should not be lumped together. **`dream_mode` is a NREM-shape divergence:** it reads as a configurable scheduling parameter (declared in `DEFAULT_SETTINGS`, listed in `schedule_keys` for the reschedule trigger), but `run_dream_job` ignores it. A future maintainer flipping the value expects different nightly behavior and gets none. **`share_time`, in contrast, is a backend-stores-and-returns persistence pattern** — the backend correctly persists a frontend-owned flag and the frontend acts on it (with a `?? true` fallback if the key is missing). The distinction matters: removing a silently-ignored key removes dead code, while removing a stores-and-returns key changes the seed default for new users. *Verification finding 2026-05-02 (cross-repo grep against `~/aaronai-web`).* --- ### Phase 4 summary **Working and matching intent:** - All eight `.env` keys are referenced by code. - The four-key `settings.json` reflects the UI-tunable preferences. **Working with behavior-vs-intent divergences:** - `LLM_API_KEY` documented but not set; relies on `ANTHROPIC_API_KEY` fallback. - `dream_mode` exists in defaults but isn't read by the scheduler. **Removal candidates (do not remove):** - `dream_mode` — clarify in code or remove from defaults. *(`share_time` was previously listed here in error; cross-repo grep 2026-05-02 confirmed it is a working frontend-controlled flag, not a removal candidate.)* **NREM-shaped divergences in Phase 4:** 1. **`dream_mode` setting silently ignored.** A scheduler-shaped knob that exists, has a default, is mergeable from settings.json, and is not used. Future maintainer flipping it expects different nightly behavior; gets none. --- --- ## Phase 5 — Cron and scheduled work ### User crontab (`crontab -l`) Two active entries: | Schedule | Command | What it does | |---|---|---| | `0 3 * * *` (daily 03:00 UTC) | `/bin/bash /home/aaron/aaronai/scripts/backup.sh` | Snapshots `memory.md`, `settings.json`, `conversations.db` into `Nextcloud/Admin/Backups/`. 7-day retention. | | `*/5 * * * *` (every 5 min) | `test $(( $(date +%s) - $(cat /home/aaron/aaronai/watcher_heartbeat 2>/dev/null || echo 0) )) -gt 600 && sudo systemctl restart aaronai-watcher >> /var/log/aaronai/watcher-cron.log 2>&1` | Heartbeat watchdog. Restarts the watcher service if the heartbeat file is older than 600 seconds. | **Behavior matches intent?** Yes. The watcher heartbeat watchdog corresponds to the architecture-doc tech-debt entry "Heartbeat file written every 5s … cron job restarts watcher if heartbeat older than 10 minutes." The 600s threshold matches the doc's "10 minutes" figure. `backup.sh` is on the documented daily schedule. **Notes:** The watcher-restart entry uses passwordless `sudo` for `systemctl restart aaronai-watcher`. This is **not** in `/etc/sudoers.d/aaron-aaronai` (which the session brief lists as containing `restart ollama` and `restart aaronai-graphiti.service`). Either it's in `/etc/sudoers` proper (the original `aaronai-web` line area), or the cron entry is silently failing on every fire. Worth verifying — the cron line redirects stderr to the log, so a `sudo: password required` would be in `watcher-cron.log` (which I haven't read here). ### `/etc/cron.d/` Stock OS files only: `certbot`, `e2scrub_all`, `sysstat`, plus the standard `cron.daily`/`cron.weekly`/`cron.hourly` directories with default Ubuntu cron jobs (`apport`, `apt-compat`, `dpkg`, `logrotate`, `man-db`, `sysstat`). **No aaronai-specific entries** in `/etc/cron.d/` or anywhere outside the user crontab. `/etc/anacrontab` is not present. Root crontab not inspected (sudo required; not granted in this read-only inventory pass). ### APScheduler jobs in `api.py` `api.py:reschedule_jobs()` (line 1137) configures two jobs against an in-process `BackgroundScheduler`. The scheduler starts in the FastAPI lifespan; jobs are re-registered any time settings that contain a schedule key are updated. | Job ID | Trigger | Function | What it does | |---|---|---|---| | `dream_job` | Cron, `hour=settings.dream_hour_utc`, `minute=settings.dream_minute_utc`, `tz=UTC` (default 08:00) | `run_dream_job` (line 1107) | `subprocess.run([PYTHON, dream.py], timeout=600)` — invokes the dreamer with no arguments → defaults to full pipeline (NREM → Early REM → Late REM → Synthesis). | | `ingest_job` | Cron, `hour=settings.ingest_hour_utc`, `minute=settings.ingest_minute_utc`, `tz=UTC` (default 02:30) | `run_ingest_job` (line 1123) | `subprocess.run([PYTHON, ingest_conversations.py], timeout=300)`. | Both `max_instances=1`, both `replace_existing=True`. Settings changes that touch the schedule keys re-register the jobs. **Behavior matches intent?** Mostly yes. The architecture's "Nightly Schedule" section says 02:30 UTC for conversation indexing and 08:00 UTC for the dream pipeline; both match. **One divergence:** `run_dream_job` uses `subprocess.run` (synchronous, with a 600s timeout). For a normal full-pipeline run this is enough, but Phase 5 of the reframe / E6 work would want longer runs — this is a soft cap nobody has hit yet. Architecture doc doesn't specify; flagging in case future longer runs need a bump. **Notes:** The 600s `subprocess.run` timeout is the only thing protecting the FastAPI process from a stuck dreamer. If the dreamer hangs (e.g., Anthropic API stall), the scheduler thread holds for 10 minutes before the timeout fires. Acceptable but worth knowing. ### Systemd timers Already documented in Phase 2 — three timer files exist (`aaronai-dreamer.timer`, `aaronai-index-conversations.timer`, `aaronai-maintenance.timer`), **none of them enabled** (none in `/etc/systemd/system/timers.target.wants/`). They duplicate (or, for maintenance, point at a broken service). APScheduler is the actual driver for the two paths the dreamer/ingest timers would cover. ### What is *not* scheduled The architecture and reframe documents reference several mechanisms that have no scheduled runner today: - **Asynchronous dreamer pruning pass** (per reframe). Designed but unimplemented; no schedule. - **Consolidator 0.1 alias resolution.** The script exists, has no schedule, was always run by hand. Track 1 will dissolve it. - **`corpus_integrity.py` reconciliation.** Designed to be runnable on demand or via the SettingsPanel. No automated weekly run; the 129 unreadable files have been sitting at zero `retry_count` since the OCR (priority 21b) hasn't shipped. - **`tier1_migration.py`** has no schedule (one-shot, already complete). --- ### Phase 5 summary **Working and matching intent:** - User crontab (backup + watcher heartbeat watchdog). - APScheduler jobs (dream + ingest_conversations) match the architecture doc's nightly schedule. **Working with behavior-vs-intent divergences:** - The watcher-restart cron uses `sudo systemctl restart aaronai-watcher`, but the only sudoers entry for aaron is for ollama and aaronai-graphiti. The line either depends on a sudoers entry not documented in the session brief, or fails silently. **Worth verifying as part of Track 1.** - `dream_job` uses 600s `subprocess.run` timeout — soft cap nobody has hit, but tightens the operational envelope for any future longer-running dream work. **Stopped / dormant:** - All three `aaronai-*.timer` units (Phase 2). They are configured, not enabled, and overlap APScheduler. **Removal candidates (do not remove):** - The three `aaronai-*.timer` files. **NREM-shaped divergences in Phase 5:** 1. **Watcher-restart sudo path.** The cron entry was probably added on the assumption that `aaron` had broad NOPASSWD sudo for systemctl, which the 2026-05-01 sudoers fix narrowed to specific commands. If the `aaronai-watcher` restart isn't in sudoers, the watchdog has been silently failing. Whether or not it has, this is the same shape: a recovery mechanism configured, configured to look like it works, possibly not working. The session brief and the architecture doc didn't cross-check it. 2. **Two parallel scheduling stacks.** APScheduler in api.py drives nightly work; three systemd `.timer` files exist but are not enabled. The duplication makes "what triggers a dream" harder to answer than it should be. --- --- ## Phase 6 — Frontend routes Next.js app router under `~/aaronai-web/app/`. Three user-facing routes plus a catch-all API proxy. | Route | File | Auth | What it does | Backend support? | |---|---|---|---|---| | `/` | `app/page.tsx` | Required (cookie redirect to `/login`) | Main chat UI, sidebar, settings panel, dreamer status, corpus integrity status. | Yes — every backed `/api/*` endpoint is proxied through the catch-all. | | `/login` | `app/login/page.tsx` | None | Password login, sets `aaronai_session` cookie. | Yes — `POST /auth/login`. | | `/capture` | `app/capture/page.tsx` | None (mobile field-recorder, public) | Voice + image capture, posts to `/api/capture`. SSE listener on `/api/captures/events`. | Yes. | | `/api/[...slug]` | `app/api/[...slug]/route.ts` | Pass-through | Catch-all proxy: forwards every request to `${API_URL || 'https://ai.aaronnelson.studio'}/api/` (or `/` for `auth/*`). Forwards `cookie`, `content-type`, `set-cookie`. | Always — it is the proxy. | That is the entire route surface. The frontend has no static `/dreams`, `/journal`, `/admin`, etc.; all dream output is delivered via Nextcloud and read out-of-band. The only data path between frontend and Aaron is chat, capture, and the SettingsPanel embedded in `/`. **Behavior matches intent?** Yes against the architecture doc's Layer 3 list ("Login/logout … Chat desktop and mobile … Sidebar … Voice: tap-to-toggle … `/capture` voice + image"). The doc's "Not yet built" entries (Consolidation agent UI, drag-and-drop capture, LLM provider selector) are correctly absent. **Notes:** - The catch-all proxy uses `process.env.API_URL` and falls back to `'https://ai.aaronnelson.studio'`. In production this is fine because the frontend talks back through the public domain (which nginx routes back to the same machine). Architecturally a bit roundabout (frontend → public DNS → nginx → backend on same host) but the deploy is consistent with what's documented. - I did not deep-read the route components or the `components/` directory — per Phase 6 scope ("don't go deep"). ### Phase 6 summary **Working and matching intent:** Three routes, all backed. **Removal candidates:** None at this layer. **NREM-shaped divergences:** None observed at the route level. (Component-level divergences would require deeper inspection.) ---