Move persistent memory from the user message into system blocks with
cache_control: ephemeral on the last block. The static prefix (system prompt +
memory, ~3-5K tokens typically) is identical between the two LLM calls of a
tool_use round-trip and stable across turns within the 5-minute cache TTL.
Without this, the tool-call retrieval architecture roughly doubled input
token cost on retrieval-needed turns (full context billed twice). With cache
reads at ~10% of standard input, the duplication cost drops by ~90% — the
"twice as expensive" hit becomes "slightly more expensive plus tool overhead."
client_time stays in the user message (per-turn dynamic, should not be in the
cached prefix).
Removes classify_retrieval_intent and the type/folder filter parameters on
retrieve_context. The keyword classifier was the same anti-pattern as the
formatting-driven docx chunker: a heuristic that locks the user into specific
phrasings and fails silently on anything novel. A scope enum (personal /
library / conversations / memory) would have been the same heuristic in a
fancier wrapper — the categories themselves are mine, not Aaron's.
New shape: a retrieve_documents tool exposed to Claude. Tool takes a single
query argument; the model decides when to call it, what to search for, and
how many times per turn (multi-query falls out naturally for compound asks).
Pre-LLM retrieval is gone — memory still rides as ground truth in the prompt,
but corpus content is fetched on demand by the model with concrete queries
it crafts itself, not the user's raw phrasing.
retrieve_context is now pure: hybrid retrieval + cross-encoder rerank + dedup,
no filters. The reranker ranks, the model judges relevance. When ranking
fails (e.g. abstract instructional queries pulling philosophy books), the
right fix is a better reranker, not another query-time taxonomy. That work
is acknowledged but deferred.
System prompt updated to teach the model about the tool and to prefer
concrete tokens (named entities, project names, course codes) over abstract
phrasing when constructing search queries.
extract_blocks(filepath) is the new structured-extraction entry point, returning
list[{heading, text, kind}]. chunk_and_embed accepts either str (blind-chunk
back-compat) or list[dict] (one chunk per block, blind-split if oversize, heading
prepended for retrieval context and stored in metadata).
- pptx: one block per slide. Slide title becomes block heading; speaker notes
fold into the body. Image-only decks with title-only slides now produce
heading-only chunks instead of being recorded as extraction failures.
- docx: deliberately single-block (back-compat). Heading-style section detection
was implemented and rolled back: hand-formatted CVs are Normal-styled with
bold-as-heading, and tying chunk boundaries to formatting choices would lock
future-user into preserving those choices forever. Lexical + cross-encoder
retrieval already handles substring matching inside blind-chunked CVs.
- pdf/txt/md: unchanged (single block, blind chunking).
Recency tiebreak in retrieve_context: pull created_at into the SELECT, use it
as secondary sort key in _rerank so memory/journal snapshots prefer the latest
copy among near-duplicate content.
reindex_docx_pptx.py now accepts --ext=pptx,docx... so re-ingest can target a
subset; previous hardcoded delete regex would have wiped both even with a
single-ext target.
Three refinements to retrieve_context, all keyed off observed failures from
test_retrieval.py:
- Library/personal split. classify_retrieval_intent now returns
(type_filter, folder_exclude_prefixes). Biographical document intent excludes
Library/* so philosophy/cognition books stop crowding out CVs and dossiers
for queries like "write me a bio".
- Near-duplicate collapse. Multi-folder copies of the same file (e.g., several
Teaching Philosophy.pdf in different application folders) used to fill the
top-N with the same content. Dedup by first-300-chars hash after rerank.
- Folder in source citations. Surface metadata.folder alongside basename so
the LLM can disambiguate among 21 CV.docx variants and the user can see
which copy a citation refers to.
Also: bump hnsw.ef_search to 500 when a WHERE filter is present.
pgvector 0.6 doesn't iterate past its initial HNSW candidate list, so a
restrictive filter that excludes the nearest neighbors otherwise returns
empty.
Replaces pure-dense top-8 retrieval with a three-stage pipeline:
- BM25 (tsvector + websearch_to_tsquery) and dense (pgvector) in parallel,
fused with Reciprocal Rank Fusion
- Optional type filter driven by classify_retrieval_intent() so questions
about prior conversations don't pull documents and vice versa
- Cross-encoder rerank (ms-marco-MiniLM-L-6-v2) over RRF candidates before
taking final top-N
Also adds scripts/reindex_docx_pptx.py — one-off re-ingest used to recover
table/header/text-box content in docx and pptx after the 93c0d89 extractor
upgrade — and scripts/test_retrieval.py to exercise the new pipeline against
representative queries.
Schema: requires GIN index on to_tsvector('english', document) (already
created out-of-band via psql since Apache AGE in shared_preload_libraries
blocks ALTER TABLE on this database).
Empty transcripts and transcription failures previously
deleted the temp audio and returned without writing any
record to disk — violating parity-at-encode (raw content
is episodic context, not noise).
- Preserve audio in Journal/Media/YYYY-MM/ on all paths
(success, empty, failure) instead of unlinking.
- Write a markdown entry to Journal/Captures/ on failure
paths with status, audio_path, and error fields.
- Add status: saved to successful captures so frontmatter
is uniform across success and failure.
- Fire SSE capture_saved events on all terminal paths,
with status included.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-user personal app threat model is theft-of-device, not
stolen-cookie. 30-day idle re-prompts created friction without
proportional security benefit. Server TTL and client max-age
remain in sync via shared constant.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- session_exists() now rejects rows older than 30 days,
matching the client cookie max-age.
- Opportunistic cleanup of expired rows on session_exists()
calls, preventing unbounded growth of sessions.db from
orphaned tokens (PWA reinstalls, manual cookie clears).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The messages table declares FOREIGN KEY (conversation_id) REFERENCES
conversations(id), but PRAGMA foreign_keys was never enabled — SQLite
defaults it to OFF per connection, and _connect() did not set it. Two
orphan rows existed in messages (conversation_id='test123' pointing at
a never-existing conversation; both rows from one ~11-second test event
on 2026-04-26).
Audit before changing the PRAGMA:
- All FOREIGN KEY declarations across both DBs (conversations.db,
sessions.db) accounted for via PRAGMA foreign_key_list on each
table. Only one FK exists: messages.conversation_id ->
conversations.id, ON DELETE NO ACTION.
- All tables enumerated via sqlite_master. Two tables in
conversations.db (conversations, messages); one in sessions.db
(sessions). No surprises.
- PRAGMA foreign_key_check confirmed exactly the 2 known orphans and
zero violations elsewhere.
Both delete paths in api.py (delete_conversation at :471, and
clear_all_conversations at :986) already delete from messages BEFORE
conversations, so cascade behavior was correct in code. The orphan
state was caused by a direct INSERT against a non-existent
conversation_id at chat-test time, which an unenforced FK silently
accepted. Turning the PRAGMA on prevents this class of bug at insert
time, not delete time — no delete-path code changes were needed.
Order of operations followed the constraint that orphan cleanup must
precede PRAGMA-on (SQLite would not retroactively delete orphans, but
foreign_key_check would surface them confusingly on any future
operation that touched the messages table):
1. DELETE FROM messages WHERE conversation_id NOT IN (SELECT id FROM
conversations) — removed the 2 known orphans.
2. Added PRAGMA foreign_keys=ON to _connect() so every connection
from _connect_conversations() and _connect_sessions() gets FK
enforcement (SQLite requires per-connection setting).
3. Restarted aaronai.service.
Verification:
- Smoke: GET /api/conversations and /api/conversations/{id}/messages
both return 200 with expected payloads against the live api.
- E2E single-delete: synthetic conversation + 2 messages inserted via
the api's _connect helper (FK on); DELETE /api/conversations/{id}
via the live endpoint removed both rows from both tables.
- Clear-all e2e: skipped on live DB (destructive) — code shape is
structurally identical to single-delete, no FK-relevant logic
difference.
- Load-bearing negative test: INSERT into messages with a
non-existent conversation_id via _connect_conversations() raised
sqlite3.IntegrityError("FOREIGN KEY constraint failed"). This is
what proves the PRAGMA actually took effect, not just that we set
it.
Final counts: 7 conversations, 290 messages (down from 292 by the 2
orphans cleaned up).
Note: an explicit BEGIN/COMMIT around the two-execute delete paths
was considered and skipped. SQLite's implicit-transactional default
already gives the atomicity needed; explicit transactions would be
clarity-only and belong in a separate commit.
Followup to 4204806 (WAL + index + backup.sh). The previous commit
deferred synchronous=NORMAL because it's a per-connection PRAGMA and
api.py has 16 sqlite3.connect() call sites — setting it once at init
would have applied to nothing afterwards.
Adds three helpers near the *_DB constants:
- _connect(path): inner; sets PRAGMA synchronous=NORMAL and uses
timeout=5.0 (5000ms busy_timeout) on every new connection.
- _connect_conversations(), _connect_sessions(): named wrappers so call
sites read explicitly.
Mechanical replacement at all 16 call sites: 4 sessions, 12 conversations.
No semantic change beyond the PRAGMA + busy_timeout — every site still
opens-then-closes, no held-open connections.
busy_timeout=5000ms is cheap insurance: under WAL with api.py as sole
writer, contention should be near-zero, but the backup.sh online-backup
path briefly holds a read lock on the source, and any future second
writer would otherwise hit SQLITE_BUSY immediately on contention.
Combined effect with WAL: per-write fsync count drops from ~2 to ~1
(WAL alone) further reduced by synchronous=NORMAL deferring fsyncs to
checkpoint boundaries. No durability loss for the use case (single
host, app crash tolerated, OS crash gives at most one lost transaction).
Not included: foreign_keys=ON. Audit found 2 orphan rows in messages
(conversation_id pointing to deleted conversations) and untested write
paths that could begin raising IntegrityError. Tracked as separate
followup: inspect orphans, identify the delete path that didn't
cascade, clean up, then enable enforcement and test chat delete flow
end-to-end.
Both databases ran with journal_mode=delete — every write rewrote the
rollback journal per transaction. WAL eliminates the journal-rewrite and
lets readers run without blocking writers.
Index on messages(conversation_id, timestamp DESC) is preventive — only
280 rows today, but the access pattern (load conversation history in
order) is exactly what a composite index serves, and we don't want to
re-revisit this when the table grows.
backup.sh updated in the same commit because WAL changes the on-disk
layout: a bare `cp` of just the .db file can miss recently-committed
transactions that still live in the -wal sidecar, and can race with
concurrent writes to produce a torn file. Switched to the SQLite Online
Backup API via python3 -c "...src.backup(dst)..." — same mechanism as
the sqlite3 CLI's `.backup` (which isn't installed on this host),
handles WAL correctly without forcing a checkpoint, and is non-locking
from the writer's perspective. Verified backup integrity_check returns
ok and row counts match.
Note: synchronous=NORMAL was considered but deferred — it's a
per-connection PRAGMA, and applying it correctly requires a connect
helper that wraps every sqlite3.connect() call site in api.py (~14
sites). Out of scope for this commit; tracked as a follow-up. WAL alone
delivers the journal-rewrite elimination and reader/writer concurrency
improvements; the additional fsync reduction from synchronous=NORMAL is
a smaller marginal win on top.
Confirmed via concurrency audit that api.py is the sole writer to both
databases. ingest_conversations.py and dream.py are read-only consumers
of conversations.db; nothing else touches sessions.db.
Three changes to reduce voice-note transcription latency on the VPS:
- Model: large-v3 -> distil-large-v3 (~6x faster, near-identical English
accuracy; language is already hardcoded "en").
- beam_size: 5 (default) -> 1 (~3-4x faster on clean audio).
- cpu_threads: 8 -> 4 (the box has 8 cores running api, dreamer, watcher,
nextcloud concurrently; ctranslate2's inter-op pool plus context switching
makes 4 effectively faster than 8 here).
Combined effect expected ~10-15x over prior config. No accuracy regression
expected for the voice-note use case (English, clean audio, domain terms
already supplied via initial_prompt).
Consolidates four extract paths and two extract-chunk-embed-write pipelines
into a single shared encoding module. Fixes the embedder lifecycle
divergence between watcher and /api/reindex (no more 200MB reload per
reindex click) and unifies failure tracking so /api/reindex failures now
surface in SettingsPanel "Ingest Health".
New files:
- scripts/encoding.py — extract_text, chunk_text, chunk_and_embed,
write_embeddings_batch
- scripts/failures.py — record_ingest_failure, resolve_ingest_failure
(shared by watcher.py and ingest.py)
Refactored:
- scripts/watcher.py — drops local extract/chunk/embed implementations
and CHUNK_SIZE/CHUNK_OVERLAP/SUPPORTED constants; imports from encoding
and failures. Now writes ingest_failures row on empty-text-extract
(was silent return 0).
- scripts/ingest.py — substantial rewrite. Exposes ingest_directory(folder,
embedder=None) for in-process invocation; CLI back-compat preserved via
ingest_folder wrapper. Module-level SentenceTransformer load removed.
- scripts/corpus_integrity.py — imports extract_text from encoding;
extract_text_for_retry function removed.
- scripts/api.py — /api/reindex rewritten with BackgroundTasks (uses
module-level embedder; no subprocess); new /api/reindex/status endpoint
reading ~/aaronai/reindex_status.json; /api/corpus/retry imports
extract_text from encoding; INGEST_SCRIPT constant removed (dead after
this refactor); 409 reentrance guard prevents double-click stomping.
Behavior changes:
- /api/reindex no longer subprocess.Popens; runs in FastAPI BackgroundTasks
threadpool, doesn't block API thread.
- /api/reindex no longer reloads SentenceTransformer on each click.
- /api/reindex failures newly write to ingest_failures (visible in
SettingsPanel "Ingest Health" — badge will jump on first reindex).
- New embeddings rows always have created_at = NOW() (canonical, server-side).
- New embeddings rows always include metadata.folder field (None when not
derivable).
- /api/reindex returns 409 on second click while a job is running.
- New /api/reindex/status endpoint for polling.
Existing 9,815 NULL created_at rows remain unchanged; backfill is a
separate decision if desired.
199 insertions, 256 deletions across 6 files (codebase shrinks net).
Found by Track 1 inventory 2026-05-02 (Finding 11 / cross-cutting F11).
Pre-commit verification: BackgroundTasks already imported, sys.path
resolves correctly via script-path semantics, static import clean.
- Fix /auth/check endpoint that referenced undefined SESSIONS
(Phase 1 finding — would NameError 500 on every call). Now uses
session_exists(token), the live session-validation mechanism
defined elsewhere in api.py.
- Remove unused DB_PATH ChromaDB-era constant (paired with the
ChromaDB directory deletion and aaronai-maintenance.service
removal earlier this session).
Found by Track 1 inventory 2026-05-02. Cross-repo verification of
share_time (third candidate from the original cleanup proposal)
revealed it is working stores-and-returns persistence rather than
dead code; share_time intentionally not modified.
Inventory document edits are committed separately under the docs/
tracking decision.
The dream_mode setting was defined in DEFAULT_SETTINGS and watched
by update_settings for reschedule, but run_dream_job never read it —
silently-ignored configuration.
Two changes:
1. DEFAULT_SETTINGS["dream_mode"] flipped from "nrem" to "pipeline".
The default was a latent regression vector: wiring up the setting
without changing the default would have silently switched all
default-config users from full-pipeline (current production
behavior) to NREM-only nightly runs.
2. run_dream_job reads dream_mode at fire-time, validates against
{"pipeline", "nrem", "early-rem", "late-rem"}, falls back to
pipeline with a warning on invalid values. Lucid intentionally
excluded — it is on-demand only by design and remains available
via CLI and /api/dreamer/run.
Nightly dream production behavior is unchanged for current users
(no settings.json key → default "pipeline" → no flag passed → same
as before). Users can now meaningfully change the nightly mode by
editing settings.json or via the SettingsPanel.
Found by Track 1 inventory 2026-05-02 (Finding 9 / divergence #9).
The F14 fix on 2026-05-01 removed text[:50000] truncation from
watcher.py, ingest.py, and corpus_integrity.py. The retry endpoint
in api.py was missed — clicking 'Retry' on an ingest-failed file
in the SettingsPanel re-introduced the exact truncation pattern
F14 was meant to eliminate.
Found by Track 1 inventory 2026-05-02 (Finding 2 / divergence #2).
- api.py: strip CV pinning workaround (parity violation, see architecture doc)
- dream.py: F1 — retrieve_graphiti() now accepts excluded_sources, over-fetches
3x and filters in-process. Was silently dropping the parameter; would have
confounded E3 with broken cross-stage exclusion in Graphiti arm.
- watcher.py + ingest.py: F14 — drop full_text[:50000] truncation. Was
propagating through entire cascade. Postgres TEXT can hold up to 1GB.
- corpus_integrity.py: F37 — same truncation, third path now clean.
Backups: api.py.bak.*, dream.py.bak.*, watcher.py.bak.*, ingest.py.bak.*,
corpus_integrity.py.bak.* timestamped pre-fix.
Re-cascaded Shop Class as Soulcraft (only already-cascaded source affected
by F14, 414KB).