These are working artifacts of the 2026-05-02 Track 1 stabilization
work. Versioning them alongside the code keeps the operational
narrative coherent and gives future sessions clear reference docs.
The inventory document includes the cross-repo verification finding
on share_time — captured at the document level so future sessions
don't repeat the same dead-code mischaracterization.
- Fix /auth/check endpoint that referenced undefined SESSIONS
(Phase 1 finding — would NameError 500 on every call). Now uses
session_exists(token), the live session-validation mechanism
defined elsewhere in api.py.
- Remove unused DB_PATH ChromaDB-era constant (paired with the
ChromaDB directory deletion and aaronai-maintenance.service
removal earlier this session).
Found by Track 1 inventory 2026-05-02. Cross-repo verification of
share_time (third candidate from the original cleanup proposal)
revealed it is working stores-and-returns persistence rather than
dead code; share_time intentionally not modified.
Inventory document edits are committed separately under the docs/
tracking decision.
The dream_mode setting was defined in DEFAULT_SETTINGS and watched
by update_settings for reschedule, but run_dream_job never read it —
silently-ignored configuration.
Two changes:
1. DEFAULT_SETTINGS["dream_mode"] flipped from "nrem" to "pipeline".
The default was a latent regression vector: wiring up the setting
without changing the default would have silently switched all
default-config users from full-pipeline (current production
behavior) to NREM-only nightly runs.
2. run_dream_job reads dream_mode at fire-time, validates against
{"pipeline", "nrem", "early-rem", "late-rem"}, falls back to
pipeline with a warning on invalid values. Lucid intentionally
excluded — it is on-demand only by design and remains available
via CLI and /api/dreamer/run.
Nightly dream production behavior is unchanged for current users
(no settings.json key → default "pipeline" → no flag passed → same
as before). Users can now meaningfully change the nightly mode by
editing settings.json or via the SettingsPanel.
Found by Track 1 inventory 2026-05-02 (Finding 9 / divergence #9).
Moves 28 experiment scripts to scripts/experiments/ (E1, E1.4, E1.6, E2,
base_class, cascade, cost_test, briefing, consistency, token series).
Moves 2 dissolved-layer scripts to scripts/deprecated/ (consolidator_v0_1.py,
tier1_migration.py — under the bespoke decision both target retired
substrate work).
Removes 19 .bak* files from disk (gitignored, never tracked; git history
is the durable record of every prior version).
The 11 production scripts remain in scripts/. All systemd ExecStart paths,
api.py subprocess calls, and cron jobs continue to resolve correctly —
verified by grep against /etc/systemd/system/aaronai-*.service, scripts/
references in api.py, and the user crontab.
Track 1 inventory cross-cutting finding: scripts/ mixed 11 production
files with 32 experimental scripts and ~20 .bak files. After this commit
a clean-room reader can identify the live workers from a directory listing
alone.
Found by Track 1 inventory 2026-05-02. See
~/aaronai/docs/scripts-reorg-plan-2026-05-02.md for full reasoning.
After commit, run:
1. git log --oneline -3 — show the new commit on top
2. git status — confirm clean working tree (modulo the docs/ untracked files which are intentional)
The F14 fix on 2026-05-01 removed text[:50000] truncation from
watcher.py, ingest.py, and corpus_integrity.py. The retry endpoint
in api.py was missed — clicking 'Retry' on an ingest-failed file
in the SettingsPanel re-introduced the exact truncation pattern
F14 was meant to eliminate.
Found by Track 1 inventory 2026-05-02 (Finding 2 / divergence #2).
NREM in the reframe is replay-and-consolidation of recent encoded
content. Excluding previously_retrieved sources turns NREM into
novelty-finding, which is Late REM's job. NREM should re-traverse
already-encoded content; that's what consolidation is.
The May 2 abort surfaced this — 52 sources accumulated in the
exclusion list, all of them in NREM's similarity band for the
recurring research/fabrication/teaching query. The dreamer hit
zero retrievable chunks not because the corpus was empty, but
because everything semantically aligned was excluded.
Late REM and Early REM keep the exclusion mechanism — novelty is
their job. Session-scoped exclusion (nrem_high_sources flowing
into Early REM) also preserved.
The 500/400 trim on retrieved_sources is preserved for the
remaining stages that still use it.
Mirrors stage2_worker v2.1 (da98019) resilience fixes:
- Absolute paths for /usr/bin/sudo and /bin/systemctl
- Log stdout/stderr when sidecar restart fails
- Reset consecutive_failures even when wedge recovery fails (prevents
permanent stuck state if restart itself is broken)
Three classes of silent failure converted to clean terminal states:
- Mistral timeout: previously left rows in zombie state (started_at set,
failed_at null, attempts incremented past retry threshold, row invisible
to selection query). Now sets failed_at with reason
'mistral_timeout_after_300s'. Surfaced 2026-05-01 when 17 documents
accumulated in this state during the Stage 3 saga deadlock incident.
- Mistral parse failure: run_mistral returns {'error': 'parse_failed'} on
JSON decode failure but process_one wasn't checking, so empty orientation
('Active frames: . Frame relationships: ...') was shipped to Stage 3.
This is F22 from the 2026-04-30 code review. Now sets failed_at with
reason 'mistral_parse_failure'.
- Wedge recovery hammering: consecutive_failures was only reset on
successful Ollama restart. With the sudo path bug (also fixed here),
recovery always failed, so every subsequent failure re-attempted restart.
Now resets the counter regardless and logs the failure visibly.
Also: subprocess.run now uses absolute paths (/usr/bin/sudo,
/bin/systemctl) instead of relying on PATH, fixing the 'No such file or
directory: sudo' error that broke Stage 2's recover_wedge() since
deployment. F45-adjacent — sudoers entries were added 2026-05-01 but the
PATH issue was masking that fix.
Worker version bumped to 2.1 to match Stage 3's resilience patch level.
Production incident 2026-05-01: F14 re-cascade attempt surfaced three
compounding issues in cascade resilience.
stage3_worker.py changes:
- MAX_CHUNKS_PER_SAGA=10 — large documents split into multiple bulk
commits, all sharing the same saga tag for Graphiti document linking.
Original implementation sent all chunks as one saga; 17-19 chunk sagas
deadlocked sidecar's Python-side coordination.
- recover_wedge() function — restarts aaronai-graphiti.service when
consecutive_failures hits threshold. Mirrors Stage 2 pattern.
- run() loop adds consecutive_failures counter with threshold-2
escalation. Resolves F28 + F29 from code review.
- Worker version bumped 2.0 -> 2.1.
- post_bulk() helper extracts shared HTTP POST + error handling.
Outside-repo changes (system config, separately documented):
- WatchdogSec=600 commented in stage2 + stage3 systemd unit files.
Workers have no sd_notify support; per-request timeouts in code
handle the actual failure modes.
- /etc/sudoers.d/aaron-aaronai created with NOPASSWD entries for
systemctl restart ollama and restart aaronai-graphiti.service.
Stage 2's existing recover_wedge() was silently broken since
deployment due to this gap.
.gitignore — added rules for *.bak files, runtime artifacts
(watcher_heartbeat, dreamer_state.json, corpus_integrity_report.json,
watcher_state.json, watcher_status.json), Python cruft, virtual env,
.env, editor/OS files, and Aaron AI runtime data (conversations.db,
sessions.db, memory.md, settings.json).
Untracked 11 files that shouldn't have been committed in 465f2f7
(this morning): backup files and runtime artifacts.
Re-cascading Shop Class (414KB) and BirdAI-Experiments-Log.md (192KB)
through the patched worker after re-extracting full text from disk.
Cascade in progress at commit time.
- api.py: strip CV pinning workaround (parity violation, see architecture doc)
- dream.py: F1 — retrieve_graphiti() now accepts excluded_sources, over-fetches
3x and filters in-process. Was silently dropping the parameter; would have
confounded E3 with broken cross-stage exclusion in Graphiti arm.
- watcher.py + ingest.py: F14 — drop full_text[:50000] truncation. Was
propagating through entire cascade. Postgres TEXT can hold up to 1GB.
- corpus_integrity.py: F37 — same truncation, third path now clean.
Backups: api.py.bak.*, dream.py.bak.*, watcher.py.bak.*, ingest.py.bak.*,
corpus_integrity.py.bak.* timestamped pre-fix.
Re-cascaded Shop Class as Soulcraft (only already-cascaded source affected
by F14, 414KB).