dream_observation: drop the 'go quiet' rule from select_mode

The earlier behavior never went quiet — it dreamed every night, even when that meant repeating itself. The 'return None on null delta' rule was a synthesis-doc invention (the dreamer-design-spec.md I treated as authoritative is itself LLM-generated) that didn't match the actual desired UX. Aaron called this out. The repetition problem the quiet rule was claimed to solve is already addressed in the retrieve layer: - LLM-generated queries from the observation signal vary nightly - MMR diversity prevents within-night cluster lock-in - NREM bias toward under-processed chunks (low consolidation_count) ensures fresh material gets selected over recently-replayed material So select_mode now always returns a mode. NREM is the default. Staleness still routes to Late REM at 3+ days for cross-domain variety. Journal entries still route to Early REM.
dream_observation: reorder select_mode so 3-day staleness wins over the quiet rule
2026-05-22 23:49:27 +00:00 · 2026-05-22 23:18:00 +00:00 · 2026-05-20 22:41:02 +00:00 · 2026-05-20 18:11:07 +00:00 · 2026-05-20 18:04:43 +00:00 · 2026-05-20 17:57:38 +00:00
22 changed files with 4047 additions and 215 deletions
@@ -8,6 +8,7 @@ dreamer_state.json
 corpus_integrity_report.json
 watcher_state.json
 watcher_status.json
+reindex_status.json

 # Logs (these belong in /var/log/)
 *.log
@@ -0,0 +1,105 @@
+# OCR install record — 2026-05-04
+
+## Machine
+
+- Host: aaronai-01 (VPS)
+- OS: Ubuntu 24.04 noble (kernel 6.8.0-110-generic, x86_64)
+
+## apt packages installed
+
+| package | version | source |
+|---|---|---|
+| tesseract-ocr | 5.3.4-1build5 | noble |
+| tesseract-ocr-eng | 1:4.1.0-2 | noble |
+| tesseract-ocr-osd | 1:4.1.0-2 | noble (automatic) |
+| libtesseract5 | 5.3.4-1build5 | noble (automatic) |
+
+## pip packages installed (into /home/aaron/aaronai/venv)
+
+| package | version |
+|---|---|
+| pytesseract | 0.3.13 |
+| ocrmypdf | 17.4.2 |
+
+Direct dependencies pulled in by the two installs above (also new in venv): `pikepdf 10.5.1`, `pdfminer-six 20260107`, `pypdfium2 5.7.1`, `img2pdf 0.6.3`, `pi-heif 1.3.0`, `cryptography 47.0.0`, `cffi 2.0.0`, `pycparser 3.0`, `Deprecated 1.3.1`, `deprecation 2.1.0`, `defusedxml 0.7.1`, `fonttools 4.62.1`, `fpdf2 2.8.7`, `uharfbuzz 0.54.1`, `wrapt 2.1.2`, `pluggy 1.6.0`. `pillow` was already at 12.2.0.
+
+## Smoke test 1 — `tesseract --version`
+
+```
+tesseract 5.3.4
+ leptonica-1.82.0
+  libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.5) : libpng 1.6.43 : libtiff 4.5.1 : zlib 1.3 : libwebp 1.3.2 : libopenjp2 2.5.0
+ Found AVX512BW
+ Found AVX512F
+```
+
+## Smoke test 2 — `tesseract --list-langs`
+
+```
+List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (2):
+eng
+osd
+```
+
+## Smoke test 3 — pytesseract on a slide image
+
+- Input pptx: `/home/aaron/nextcloud/data/data/aaron/files/Academic/DDF555 3D Computational/GH Slicer Notes.pptx`
+- Extracted image: `ppt/media/image1.PNG` (1768×504 PNG)
+- Wall-clock: 0.220s
+- Chars extracted: 126
+- First 200 chars:
+
+```
+Generates the Bounding Box for NESS
+
+round(x, 4), round(y, 4), round(z, 4), round(a, 4))
+
+Format ("HSS5 X(0} ¥(1} W(2} H(3)",
+```
+
+Note: the first image in `Renders.pptx` (image1.jpg, 640×480) returned 0 chars on first attempt. Sampled 15 images in `Renders.pptx`; all 15 are pure rendered designs/photographs with no text. Switched to `GH Slicer Notes.pptx` (per the original 4-image-only-pptx candidate list) where image1.PNG is a textual code-screenshot. Tesseract behavior is correct in both cases; `Renders.pptx` is not a useful OCR test target because it contains no text. Some character-recognition noise on the code screenshot (e.g. `¥(1}` for `Y(1)`, mojibake on parentheses/braces) — acceptable for a baseline smoke; production tuning is a worker-design concern.
+
+## Smoke test 4 — ocrmypdf on a Lexmark CX510de scan
+
+- Input PDF: `/home/aaron/nextcloud/data/data/aaron/files/Admin/Dossier/Tenure/Dossier Scan 2022/image2022-01-07-133846 - CAryn.pdf` (4 pages, Producer: Lexmark CX510de, Creator: HardCopy)
+- Command: `ocrmypdf --skip-text -l eng <input> /tmp/ocr_smoke/caryn_ocred.pdf`
+- Wall-clock: 3.72s (whole PDF, 4 pages)
+- Exit: 0
+- After OCR, `pdftotext` on the output produced 2347 chars (2270 non-whitespace).
+- First 200 chars of OCR'd text:
+
+```
+nN New Paltz
+STATE UNIVERSITY OF NEW YORK
+
+The Honors Program
+
+May 30, 2017
+
+Dear Aaron,
+
+Thank you for serving as a reader for Caryn Byllott’s thesis on "Recall/Reconstruct: The Exploration of
+Memory
+```
+
+Real readable English. The "nN" header is the Lexmark logo glyph; otherwise clean. ~0.93s/page on this scan, which is the reference number for sizing the async worker queue.
+
+## Reference timing
+
+| operation | input size | wall-clock |
+|---|---|---|
+| pytesseract single image | 1768×504 PNG | 0.22s |
+| ocrmypdf 4-page scan | 4 pages, ~A4 | 3.72s (~0.93s/page) |
+
+## Deferred — project dep-tracking
+
+The project has no dependency manifest on disk: no `requirements.txt`, `pyproject.toml`, `setup.py`, `Pipfile`, or `poetry.lock`. Pip deps live only in `venv/`. The OCR install adds `pytesseract` and `ocrmypdf` (plus their transitive closure listed above) to that untracked venv state.
+
+This commit does not introduce a manifest. Tracking the dep-manifest decision as its own followup; the natural deadline is the capture-path integration commit, where `import pytesseract` will become load-bearing in the repo. If the manifest question is unresolved by then, that integration commit is the right place to address it.
+
+## Followups
+
+- Async OCR worker (separate session). Use the reference timing above to size the queue.
+- Capture path integration: phone-camera images → `pytesseract.image_to_string` → existing chunk/embed pipeline.
+- Backlog processing of 75 scanned PDFs (Lexmark CX510de and similar) and the 4 image-only pptx (`Renders.pptx`, `Ribbon Cutting Slideshow.pptx`, two `GH Slicer Notes` variants). Per the smoke results, `Renders.pptx` is unlikely to yield useful OCR text — it is rendered-design content, not scanned documents — and may instead need exclusion rather than processing.
+- Project dep-manifest decision (see Deferred section above).
@@ -0,0 +1,4 @@
+# Local backups created by apply.sh — environment state, not source.
+# Keeping these out of version control prevents repo bloat and avoids
+# checking in graphiti-core's Apache-2.0 source under our repo's tree.
+backups/
@@ -0,0 +1,58 @@
+# graphiti-core Patches — FalkorDB Vector Index Support
+
+Vendored patches against graphiti-core 0.29.0 adding native FalkorDB
+vector index support. Three files modified, all under
+`graphiti_core/driver/falkordb/` and `graphiti_core/graph_queries.py`.
+No changes to Neo4j or Kuzu code paths.
+
+## Why this exists
+
+graphiti-core's FalkorDB driver uses interpreted Cypher cosine math
+(`vec.cosineDistance(...)`) for similarity search. Each query becomes a
+full table scan over Entity/RELATES_TO/Community nodes. At ~4,000+
+entities, single-episode ingest's resolve-against-existing-graph step
+takes 8+ minutes and bulk ingest hangs FalkorDB. FalkorDB itself
+supports `db.idx.vector.queryNodes` and `db.idx.vector.queryRelationships`
+procedures backed by HNSW indexes; graphiti-core's driver doesn't use
+them.
+
+These patches:
+
+1. Add `get_vector_indices()` to `graph_queries.py` returning CREATE
+   VECTOR INDEX statements for FalkorDB on Entity.name_embedding,
+   RELATES_TO.fact_embedding, and Community.name_embedding.
+2. Extend `falkordb_driver.py:build_indices_and_constraints()` to create
+   the vector indexes alongside range and fulltext indexes.
+3. Rewrite the three vector-similarity call sites in
+   `falkordb/operations/search_ops.py` to use
+   `db.idx.vector.queryNodes` and `db.idx.vector.queryRelationships`
+   instead of full-scan cosine math. Over-fetches by a configurable
+   multiplier to handle filter rejections.
+
+## Files
+
+| Patched file | Source |
+|---|---|
+| `graphiti_core/graph_queries.py` | Adds `get_vector_indices()` |
+| `graphiti_core/driver/falkordb/falkordb_driver.py` | Extends `build_indices_and_constraints` |
+| `graphiti_core/driver/falkordb/operations/search_ops.py` | Three query rewrites |
+
+## How to apply
+
+`./apply.sh` — backs up the originals into `./backups/<timestamp>/`
+and copies the patched files over.
+
+## How to revert
+
+Move the timestamped backup back over the venv:
+
+    cp backups/<ts>/graph_queries.py /home/aaron/aaronai/venv/lib/python3.12/site-packages/graphiti_core/graph_queries.py
+    # ...etc
+
+## Upstream candidate
+
+Documented gap (issue #1263 references it indirectly via vector store
+overlay RFC). Maintainers' attention is on Milvus/external vector DB
+overlay; this patch is the FalkorDB-native alternative for users who
+don't want a separate vector DB. Consider PR after empirical validation
+in production.
@@ -0,0 +1,77 @@
+#!/usr/bin/env bash
+# apply.sh — Apply the BirdAI vendored graphiti-core patches.
+#
+# Backs up the original venv files into ./backups/<timestamp>/ before
+# overwriting. The backup directory layout mirrors the venv layout so a
+# revert is just a tree copy back.
+#
+# Usage: ./apply.sh
+
+set -euo pipefail
+
+PATCH_DIR="$(cd "$(dirname "$0")" && pwd)"
+VENV_BASE="/home/aaron/aaronai/venv/lib/python3.12/site-packages"
+TIMESTAMP="$(date +%Y%m%d-%H%M%S)"
+BACKUP_DIR="$PATCH_DIR/backups/$TIMESTAMP"
+
+# Files to patch — paths relative to graphiti_core/.
+FILES=(
+    "graph_queries.py"
+    "driver/falkordb_driver.py"
+    "driver/falkordb/operations/search_ops.py"
+)
+
+echo "graphiti-core vendored patch apply — BirdAI"
+echo "Patch directory: $PATCH_DIR"
+echo "Venv target:     $VENV_BASE/graphiti_core/"
+echo "Backup to:       $BACKUP_DIR"
+echo
+
+# Pre-flight: confirm all source patch files exist.
+for rel in "${FILES[@]}"; do
+    if [ ! -f "$PATCH_DIR/graphiti_core/$rel" ]; then
+        echo "ERROR: missing patch file: $PATCH_DIR/graphiti_core/$rel" >&2
+        exit 1
+    fi
+done
+
+# Pre-flight: confirm all target venv files exist.
+for rel in "${FILES[@]}"; do
+    if [ ! -f "$VENV_BASE/graphiti_core/$rel" ]; then
+        echo "ERROR: missing venv file: $VENV_BASE/graphiti_core/$rel" >&2
+        echo "  graphiti-core may not be installed, or version differs from 0.29.0." >&2
+        exit 1
+    fi
+done
+
+# Backup originals.
+echo "[1/3] Backing up originals..."
+for rel in "${FILES[@]}"; do
+    backup_path="$BACKUP_DIR/graphiti_core/$rel"
+    mkdir -p "$(dirname "$backup_path")"
+    cp "$VENV_BASE/graphiti_core/$rel" "$backup_path"
+    echo "  backed up: $rel"
+done
+echo
+
+# Apply patches by copying.
+echo "[2/3] Applying patches..."
+for rel in "${FILES[@]}"; do
+    cp "$PATCH_DIR/graphiti_core/$rel" "$VENV_BASE/graphiti_core/$rel"
+    echo "  patched: $rel"
+done
+echo
+
+# Sanity check: confirm patched files have the marker.
+echo "[3/3] Verifying patched files..."
+for rel in "${FILES[@]}"; do
+    if grep -q "PATCHED 2026-05-02" "$VENV_BASE/graphiti_core/$rel"; then
+        echo "  OK: $rel contains patch marker"
+    else
+        echo "  WARNING: $rel missing patch marker (may be expected for graph_queries.py — its docstring uses the marker only in the module header)"
+    fi
+done
+echo
+echo "Done. Backup: $BACKUP_DIR"
+echo "Restart the sidecar to pick up changes:"
+echo "  sudo systemctl restart aaronai-graphiti.service"
@@ -0,0 +1,904 @@
+"""
+Copyright 2024, Zep Software, Inc.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+
+import logging
+from typing import Any
+
+from graphiti_core.driver.driver import GraphProvider
+from graphiti_core.driver.falkordb import STOPWORDS
+from graphiti_core.driver.operations.search_ops import SearchOperations
+from graphiti_core.driver.query_executor import QueryExecutor
+from graphiti_core.driver.record_parsers import (
+    community_node_from_record,
+    entity_edge_from_record,
+    entity_node_from_record,
+    episodic_node_from_record,
+)
+from graphiti_core.edges import EntityEdge
+from graphiti_core.graph_queries import (
+    get_nodes_query,
+    get_relationships_query,
+    get_vector_cosine_func_query,
+)
+from graphiti_core.models.edges.edge_db_queries import get_entity_edge_return_query
+from graphiti_core.models.nodes.node_db_queries import (
+    COMMUNITY_NODE_RETURN,
+    EPISODIC_NODE_RETURN,
+    get_entity_node_return_query,
+)
+from graphiti_core.nodes import CommunityNode, EntityNode, EpisodicNode
+from graphiti_core.search.search_filters import (
+    SearchFilters,
+    edge_search_filter_query_constructor,
+    node_search_filter_query_constructor,
+)
+
+logger = logging.getLogger(__name__)
+
+MAX_QUERY_LENGTH = 128
+
+# ---------------------------------------------------------------------------
+# Vector index dispatcher (PATCHED 2026-05-02, BirdAI vendored patch).
+#
+# graphiti-core's FalkorDB driver historically composed similarity queries
+# using `vec.cosineDistance(...)` in interpreted Cypher, which produces a
+# full-table scan for every search. FalkorDB supports native vector indexes
+# via `db.idx.vector.queryNodes` and `db.idx.vector.queryRelationships`;
+# this dispatcher uses them when present and falls back to the cosine math
+# otherwise.
+#
+# Index existence is checked once per (label, attribute, entity_type) and
+# cached at module scope. The cache should be invalidated whenever
+# `build_indices_and_constraints` runs (since indexes may have been created
+# or dropped). FalkorDriver.build_indices_and_constraints is patched to
+# call `_invalidate_falkordb_vector_index_cache()` after building.
+#
+# Over-fetch factor (VECTOR_INDEX_CANDIDATE_MULTIPLIER from graph_queries)
+# preserves recall when WHERE filters reject some of the top-k candidates.
+# ---------------------------------------------------------------------------
+
+from graphiti_core.graph_queries import (
+    VECTOR_INDEX_CANDIDATE_MULTIPLIER,
+    get_vector_cosine_func_query,
+)
+
+# Cache: key = (label, attribute, entity_type), value = bool
+# entity_type is 'NODE' or 'RELATIONSHIP'.
+_FALKORDB_VECTOR_INDEX_CACHE: dict[tuple[str, str, str], bool] = {}
+
+
+def _invalidate_falkordb_vector_index_cache() -> None:
+    """Clear the vector-index existence cache. Call after build_indices_and_constraints."""
+    _FALKORDB_VECTOR_INDEX_CACHE.clear()
+
+
+async def _falkordb_vector_index_exists(
+    executor: QueryExecutor,
+    label: str,
+    attribute: str,
+    entity_type: str,
+) -> bool:
+    """Check whether a FalkorDB vector index exists for the given target.
+
+    entity_type is 'NODE' for node-label indexes, 'RELATIONSHIP' for edge-type indexes.
+    Result is cached at module scope; call _invalidate_falkordb_vector_index_cache()
+    after building or dropping indexes.
+    """
+    key = (label, attribute, entity_type)
+    if key in _FALKORDB_VECTOR_INDEX_CACHE:
+        return _FALKORDB_VECTOR_INDEX_CACHE[key]
+
+    try:
+        records, _, _ = await executor.execute_query(
+            "CALL db.indexes() YIELD label, properties, types, entitytype "
+            "RETURN label, properties, types, entitytype"
+        )
+    except Exception as e:
+        # If we cannot enumerate indexes, fall back to "no index" rather than
+        # propagating the error. The fallback cosine-math path is correct,
+        # just slower.
+        logger.warning(f"FalkorDB vector index probe failed; assuming none exist: {e}")
+        _FALKORDB_VECTOR_INDEX_CACHE[key] = False
+        return False
+
+    found = False
+    for r in records:
+        # Records come back as dict-like rows keyed by column name (not
+        # tuples). Access by string keys matching the YIELD clause above.
+        rec_label = r.get('label') if hasattr(r, 'get') else r['label']
+        rec_props = r.get('properties') if hasattr(r, 'get') else r['properties']
+        rec_types = r.get('types') if hasattr(r, 'get') else r['types']
+        rec_entitytype = r.get('entitytype') if hasattr(r, 'get') else r['entitytype']
+        if rec_props is None:
+            rec_props = []
+        if rec_types is None:
+            rec_types = {}
+
+        if rec_label != label:
+            continue
+        if rec_entitytype is not None and rec_entitytype != entity_type:
+            continue
+        if attribute not in rec_props:
+            continue
+
+        # rec_types is a dict like {attribute: ['VECTOR', ...], ...} or sometimes
+        # a flat list — handle both shapes.
+        if isinstance(rec_types, dict):
+            attr_types = rec_types.get(attribute, [])
+        else:
+            attr_types = rec_types
+        if 'VECTOR' in attr_types:
+            found = True
+            break
+
+    _FALKORDB_VECTOR_INDEX_CACHE[key] = found
+    return found
+
+
+def _falkordb_vector_node_search_cypher(
+    label: str,
+    embedding_attr: str,
+    search_vector_param: str,
+    use_index: bool,
+) -> tuple[str, str]:
+    """Build the cypher prefix and node-binding for a node-vector search.
+
+    Returns (prefix, node_var) where:
+      - prefix is the Cypher fragment that binds the node variable and a
+        `score` variable. With index, it's a CALL ... YIELD; without, it's
+        a MATCH plus WITH cosine math.
+      - node_var is the variable name the caller's downstream Cypher should
+        reference (always 'n' here for parity with the existing code).
+
+    The caller appends WHERE filters and RETURN/ORDER BY/LIMIT as usual.
+    The over-fetch parameter `$candidate_k` must be passed by the caller
+    when use_index is True.
+    """
+    if use_index:
+        return (
+            f"CALL db.idx.vector.queryNodes("
+            f"'{label}', '{embedding_attr}', $candidate_k, vecf32({search_vector_param})"
+            f") YIELD node, score "
+            f"WITH node AS n, score "
+        ), "n"
+    # Fallback: original cosine math path
+    cosine = get_vector_cosine_func_query(
+        f"n.{embedding_attr}", search_vector_param, GraphProvider.FALKORDB
+    )
+    return (
+        f"MATCH (n:{label}) "
+        f"WITH n, {cosine} AS score "
+    ), "n"
+
+
+def _falkordb_vector_edge_search_cypher(
+    relationship_type: str,
+    embedding_attr: str,
+    search_vector_param: str,
+    use_index: bool,
+) -> tuple[str, str]:
+    """Build the cypher prefix and edge-binding for an edge-vector search.
+
+    Returns (prefix, edge_var). With the index, the procedure binds the
+    relationship variable; we then MATCH source and target via the existing
+    edge to recover (n)-[e]->(m). Without the index, it's the original
+    MATCH-and-cosine path.
+
+    Variable name is 'e' for parity with existing code; source/target are
+    'n' and 'm' respectively, also for parity.
+    """
+    if use_index:
+        return (
+            f"CALL db.idx.vector.queryRelationships("
+            f"'{relationship_type}', '{embedding_attr}', $candidate_k, vecf32({search_vector_param})"
+            f") YIELD relationship, score "
+            f"MATCH (n:Entity)-[e:{relationship_type}]->(m:Entity) "
+            f"WHERE e = relationship "
+            f"WITH DISTINCT e, n, m, score "
+        ), "e"
+    # Fallback
+    cosine = get_vector_cosine_func_query(
+        f"e.{embedding_attr}", search_vector_param, GraphProvider.FALKORDB
+    )
+    return (
+        f"MATCH (n:Entity)-[e:{relationship_type}]->(m:Entity) "
+        f"WITH DISTINCT e, n, m, {cosine} AS score "
+    ), "e"
+
+
+
+# FalkorDB separator characters that break text into tokens
+_SEPARATOR_MAP = str.maketrans(
+    {
+        ',': ' ',
+        '.': ' ',
+        '<': ' ',
+        '>': ' ',
+        '{': ' ',
+        '}': ' ',
+        '[': ' ',
+        ']': ' ',
+        '"': ' ',
+        "'": ' ',
+        ':': ' ',
+        ';': ' ',
+        '!': ' ',
+        '@': ' ',
+        '#': ' ',
+        '$': ' ',
+        '%': ' ',
+        '^': ' ',
+        '&': ' ',
+        '*': ' ',
+        '(': ' ',
+        ')': ' ',
+        '-': ' ',
+        '+': ' ',
+        '=': ' ',
+        '~': ' ',
+        '?': ' ',
+        '|': ' ',
+        '/': ' ',
+        '\\': ' ',
+    }
+)
+
+
+def _sanitize(query: str) -> str:
+    """Replace FalkorDB special characters with whitespace."""
+    sanitized = query.translate(_SEPARATOR_MAP)
+    return ' '.join(sanitized.split())
+
+
+def _build_falkor_fulltext_query(
+    query: str,
+    group_ids: list[str] | None = None,
+    max_query_length: int = MAX_QUERY_LENGTH,
+) -> str:
+    """Build a fulltext query string for FalkorDB using RedisSearch syntax."""
+    if group_ids is None or len(group_ids) == 0:
+        group_filter = ''
+    else:
+        escaped_group_ids = [f'"{gid}"' for gid in group_ids]
+        group_values = '|'.join(escaped_group_ids)
+        group_filter = f'(@group_id:{group_values})'
+
+    sanitized_query = _sanitize(query)
+
+    # Remove stopwords and empty tokens
+    query_words = sanitized_query.split()
+    filtered_words = [word for word in query_words if word and word.lower() not in STOPWORDS]
+    sanitized_query = ' | '.join(filtered_words)
+
+    if len(sanitized_query.split(' ')) + len(group_ids or '') >= max_query_length:
+        return ''
+
+    full_query = group_filter + ' (' + sanitized_query + ')'
+    return full_query
+
+
+class FalkorSearchOperations(SearchOperations):
+    # --- Node search ---
+
+    async def node_fulltext_search(
+        self,
+        executor: QueryExecutor,
+        query: str,
+        search_filter: SearchFilters,
+        group_ids: list[str] | None = None,
+        limit: int = 10,
+    ) -> list[EntityNode]:
+        fuzzy_query = _build_falkor_fulltext_query(query, group_ids)
+        if fuzzy_query == '':
+            return []
+
+        filter_queries, filter_params = node_search_filter_query_constructor(
+            search_filter, GraphProvider.FALKORDB
+        )
+
+        if group_ids is not None:
+            filter_queries.append('n.group_id IN $group_ids')
+            filter_params['group_ids'] = group_ids
+
+        filter_query = ''
+        if filter_queries:
+            filter_query = ' WHERE ' + (' AND '.join(filter_queries))
+
+        cypher = (
+            get_nodes_query(
+                'node_name_and_summary', '$query', limit=limit, provider=GraphProvider.FALKORDB
+            )
+            + 'YIELD node AS n, score'
+            + filter_query
+            + """
+            WITH n, score
+            ORDER BY score DESC
+            LIMIT $limit
+            RETURN
+            """
+            + get_entity_node_return_query(GraphProvider.FALKORDB)
+        )
+
+        records, _, _ = await executor.execute_query(
+            cypher,
+            query=fuzzy_query,
+            limit=limit,
+            **filter_params,
+        )
+
+        return [entity_node_from_record(r) for r in records]
+
+    async def node_similarity_search(
+        self,
+        executor: QueryExecutor,
+        search_vector: list[float],
+        search_filter: SearchFilters,
+        group_ids: list[str] | None = None,
+        limit: int = 10,
+        min_score: float = 0.6,
+    ) -> list[EntityNode]:
+        filter_queries, filter_params = node_search_filter_query_constructor(
+            search_filter, GraphProvider.FALKORDB
+        )
+
+        if group_ids is not None:
+            filter_queries.append('n.group_id IN $group_ids')
+            filter_params['group_ids'] = group_ids
+
+        filter_query = ''
+        if filter_queries:
+            filter_query = ' WHERE ' + (' AND '.join(filter_queries))
+
+        # PATCHED 2026-05-02 (BirdAI vendored patch): use FalkorDB native vector
+        # index when available; fall back to interpreted-Cypher cosine math
+        # otherwise. The filter clause's position changes between paths
+        # (after MATCH for fallback, after YIELD for index path), but the
+        # filter expressions themselves are identical because they reference
+        # the bound variable `n` either way.
+        use_index = await _falkordb_vector_index_exists(
+            executor, 'Entity', 'name_embedding', 'NODE'
+        )
+        prefix, _ = _falkordb_vector_node_search_cypher(
+            'Entity', 'name_embedding', '$search_vector', use_index
+        )
+        where_clauses = []
+        if filter_query:
+            where_clauses.append(filter_query.replace(' WHERE ', '', 1).strip())
+        where_clauses.append('score > $min_score')
+        unified_where = ' WHERE ' + ' AND '.join(where_clauses)
+
+        cypher = (
+            prefix
+            + unified_where
+            + """
+            RETURN
+            """
+            + get_entity_node_return_query(GraphProvider.FALKORDB)
+            + """
+            ORDER BY score DESC
+            LIMIT $limit
+            """
+        )
+        params = dict(
+            search_vector=search_vector,
+            limit=limit,
+            min_score=min_score,
+            **filter_params,
+        )
+        if use_index:
+            params['candidate_k'] = limit * VECTOR_INDEX_CANDIDATE_MULTIPLIER
+        records, _, _ = await executor.execute_query(cypher, **params)
+
+        return [entity_node_from_record(r) for r in records]
+
+    async def node_bfs_search(
+        self,
+        executor: QueryExecutor,
+        origin_uuids: list[str],
+        search_filter: SearchFilters,
+        max_depth: int,
+        group_ids: list[str] | None = None,
+        limit: int = 10,
+    ) -> list[EntityNode]:
+        if not origin_uuids or max_depth < 1:
+            return []
+
+        filter_queries, filter_params = node_search_filter_query_constructor(
+            search_filter, GraphProvider.FALKORDB
+        )
+
+        if group_ids is not None:
+            filter_queries.append('n.group_id IN $group_ids')
+            filter_queries.append('origin.group_id IN $group_ids')
+            filter_params['group_ids'] = group_ids
+
+        filter_query = ''
+        if filter_queries:
+            filter_query = ' AND ' + (' AND '.join(filter_queries))
+
+        cypher = (
+            f"""
+            UNWIND $bfs_origin_node_uuids AS origin_uuid
+            MATCH (origin {{uuid: origin_uuid}})-[:RELATES_TO|MENTIONS*1..{max_depth}]->(n:Entity)
+            WHERE n.group_id = origin.group_id
+            """
+            + filter_query
+            + """
+            RETURN
+            """
+            + get_entity_node_return_query(GraphProvider.FALKORDB)
+            + """
+            LIMIT $limit
+            """
+        )
+
+        records, _, _ = await executor.execute_query(
+            cypher,
+            bfs_origin_node_uuids=origin_uuids,
+            limit=limit,
+            **filter_params,
+        )
+
+        return [entity_node_from_record(r) for r in records]
+
+    # --- Edge search ---
+
+    async def edge_fulltext_search(
+        self,
+        executor: QueryExecutor,
+        query: str,
+        search_filter: SearchFilters,
+        group_ids: list[str] | None = None,
+        limit: int = 10,
+    ) -> list[EntityEdge]:
+        fuzzy_query = _build_falkor_fulltext_query(query, group_ids)
+        if fuzzy_query == '':
+            return []
+
+        filter_queries, filter_params = edge_search_filter_query_constructor(
+            search_filter, GraphProvider.FALKORDB
+        )
+
+        if group_ids is not None:
+            filter_queries.append('e.group_id IN $group_ids')
+            filter_params['group_ids'] = group_ids
+
+        filter_query = ''
+        if filter_queries:
+            filter_query = ' WHERE ' + (' AND '.join(filter_queries))
+
+        cypher = (
+            get_relationships_query(
+                'edge_name_and_fact', limit=limit, provider=GraphProvider.FALKORDB
+            )
+            + """
+            YIELD relationship AS rel, score
+            MATCH (n:Entity)-[e:RELATES_TO {uuid: rel.uuid}]->(m:Entity)
+            """
+            + filter_query
+            + """
+            WITH e, score, n, m
+            RETURN
+            """
+            + get_entity_edge_return_query(GraphProvider.FALKORDB)
+            + """
+            ORDER BY score DESC
+            LIMIT $limit
+            """
+        )
+
+        records, _, _ = await executor.execute_query(
+            cypher,
+            query=fuzzy_query,
+            limit=limit,
+            **filter_params,
+        )
+
+        return [entity_edge_from_record(r) for r in records]
+
+    async def edge_similarity_search(
+        self,
+        executor: QueryExecutor,
+        search_vector: list[float],
+        source_node_uuid: str | None,
+        target_node_uuid: str | None,
+        search_filter: SearchFilters,
+        group_ids: list[str] | None = None,
+        limit: int = 10,
+        min_score: float = 0.6,
+    ) -> list[EntityEdge]:
+        filter_queries, filter_params = edge_search_filter_query_constructor(
+            search_filter, GraphProvider.FALKORDB
+        )
+
+        if group_ids is not None:
+            filter_queries.append('e.group_id IN $group_ids')
+            filter_params['group_ids'] = group_ids
+
+            if source_node_uuid is not None:
+                filter_params['source_uuid'] = source_node_uuid
+                filter_queries.append('n.uuid = $source_uuid')
+
+            if target_node_uuid is not None:
+                filter_params['target_uuid'] = target_node_uuid
+                filter_queries.append('m.uuid = $target_uuid')
+
+        filter_query = ''
+        if filter_queries:
+            filter_query = ' WHERE ' + (' AND '.join(filter_queries))
+
+        # PATCHED 2026-05-02 (BirdAI vendored patch): use FalkorDB native vector
+        # index on RELATES_TO.fact_embedding when available. The unindexed
+        # fallback is the same MATCH-and-cosine math that previously hung
+        # for 6+ minutes on a 4,000-entity graph; this is the load-bearing
+        # call site that motivated the patch.
+        use_index = await _falkordb_vector_index_exists(
+            executor, 'RELATES_TO', 'fact_embedding', 'RELATIONSHIP'
+        )
+        prefix, _ = _falkordb_vector_edge_search_cypher(
+            'RELATES_TO', 'fact_embedding', '$search_vector', use_index
+        )
+        where_clauses = []
+        if filter_query:
+            where_clauses.append(filter_query.replace(' WHERE ', '', 1).strip())
+        where_clauses.append('score > $min_score')
+        unified_where = ' WHERE ' + ' AND '.join(where_clauses)
+
+        cypher = (
+            prefix
+            + unified_where
+            + """
+            RETURN
+            """
+            + get_entity_edge_return_query(GraphProvider.FALKORDB)
+            + """
+            ORDER BY score DESC
+            LIMIT $limit
+            """
+        )
+        params = dict(
+            search_vector=search_vector,
+            limit=limit,
+            min_score=min_score,
+            **filter_params,
+        )
+        if use_index:
+            params['candidate_k'] = limit * VECTOR_INDEX_CANDIDATE_MULTIPLIER
+        records, _, _ = await executor.execute_query(cypher, **params)
+
+        return [entity_edge_from_record(r) for r in records]
+
+    async def edge_bfs_search(
+        self,
+        executor: QueryExecutor,
+        origin_uuids: list[str],
+        max_depth: int,
+        search_filter: SearchFilters,
+        group_ids: list[str] | None = None,
+        limit: int = 10,
+    ) -> list[EntityEdge]:
+        if not origin_uuids:
+            return []
+
+        filter_queries, filter_params = edge_search_filter_query_constructor(
+            search_filter, GraphProvider.FALKORDB
+        )
+
+        if group_ids is not None:
+            filter_queries.append('e.group_id IN $group_ids')
+            filter_params['group_ids'] = group_ids
+
+        filter_query = ''
+        if filter_queries:
+            filter_query = ' WHERE ' + (' AND '.join(filter_queries))
+
+        cypher = (
+            f"""
+            UNWIND $bfs_origin_node_uuids AS origin_uuid
+            MATCH path = (origin {{uuid: origin_uuid}})-[:RELATES_TO|MENTIONS*1..{max_depth}]->(:Entity)
+            UNWIND relationships(path) AS rel
+            MATCH (n:Entity)-[e:RELATES_TO {{uuid: rel.uuid}}]-(m:Entity)
+            """
+            + filter_query
+            + """
+            RETURN DISTINCT
+            """
+            + get_entity_edge_return_query(GraphProvider.FALKORDB)
+            + """
+            LIMIT $limit
+            """
+        )
+
+        records, _, _ = await executor.execute_query(
+            cypher,
+            bfs_origin_node_uuids=origin_uuids,
+            depth=max_depth,
+            limit=limit,
+            **filter_params,
+        )
+
+        return [entity_edge_from_record(r) for r in records]
+
+    # --- Episode search ---
+
+    async def episode_fulltext_search(
+        self,
+        executor: QueryExecutor,
+        query: str,
+        search_filter: SearchFilters,  # noqa: ARG002
+        group_ids: list[str] | None = None,
+        limit: int = 10,
+    ) -> list[EpisodicNode]:
+        fuzzy_query = _build_falkor_fulltext_query(query, group_ids)
+        if fuzzy_query == '':
+            return []
+
+        filter_params: dict[str, Any] = {}
+        group_filter_query = ''
+        if group_ids is not None:
+            group_filter_query += '\nAND e.group_id IN $group_ids'
+            filter_params['group_ids'] = group_ids
+
+        cypher = (
+            get_nodes_query(
+                'episode_content', '$query', limit=limit, provider=GraphProvider.FALKORDB
+            )
+            + """
+            YIELD node AS episode, score
+            MATCH (e:Episodic)
+            WHERE e.uuid = episode.uuid
+            """
+            + group_filter_query
+            + """
+            RETURN
+            """
+            + EPISODIC_NODE_RETURN
+            + """
+            ORDER BY score DESC
+            LIMIT $limit
+            """
+        )
+
+        records, _, _ = await executor.execute_query(
+            cypher, query=fuzzy_query, limit=limit, **filter_params
+        )
+
+        return [episodic_node_from_record(r) for r in records]
+
+    # --- Community search ---
+
+    async def community_fulltext_search(
+        self,
+        executor: QueryExecutor,
+        query: str,
+        group_ids: list[str] | None = None,
+        limit: int = 10,
+    ) -> list[CommunityNode]:
+        fuzzy_query = _build_falkor_fulltext_query(query, group_ids)
+        if fuzzy_query == '':
+            return []
+
+        filter_params: dict[str, Any] = {}
+        group_filter_query = ''
+        if group_ids is not None:
+            group_filter_query = 'WHERE c.group_id IN $group_ids'
+            filter_params['group_ids'] = group_ids
+
+        cypher = (
+            get_nodes_query(
+                'community_name', '$query', limit=limit, provider=GraphProvider.FALKORDB
+            )
+            + """
+            YIELD node AS c, score
+            WITH c, score
+            """
+            + group_filter_query
+            + """
+            RETURN
+            """
+            + COMMUNITY_NODE_RETURN
+            + """
+            ORDER BY score DESC
+            LIMIT $limit
+            """
+        )
+
+        records, _, _ = await executor.execute_query(
+            cypher, query=fuzzy_query, limit=limit, **filter_params
+        )
+
+        return [community_node_from_record(r) for r in records]
+
+    async def community_similarity_search(
+        self,
+        executor: QueryExecutor,
+        search_vector: list[float],
+        group_ids: list[str] | None = None,
+        limit: int = 10,
+        min_score: float = 0.6,
+    ) -> list[CommunityNode]:
+        query_params: dict[str, Any] = {}
+
+        group_filter_query = ''
+        if group_ids is not None:
+            group_filter_query += ' WHERE c.group_id IN $group_ids'
+            query_params['group_ids'] = group_ids
+
+        # PATCHED 2026-05-02 (BirdAI vendored patch): use FalkorDB native vector
+        # index on Community.name_embedding when available. Note: the existing
+        # filter is built into `group_filter_query` (already prefixed with
+        # ' WHERE ' if non-empty) and uses variable `c`. The dispatcher binds
+        # the node as `n` for parity with the helper signature, then we
+        # re-bind to `c` via WITH so the rest of the query is unchanged.
+        use_index = await _falkordb_vector_index_exists(
+            executor, 'Community', 'name_embedding', 'NODE'
+        )
+        prefix, _ = _falkordb_vector_node_search_cypher(
+            'Community', 'name_embedding', '$search_vector', use_index
+        )
+        prefix = prefix + ' WITH n AS c, score '
+        where_clauses = []
+        if group_filter_query:
+            where_clauses.append(group_filter_query.replace(' WHERE ', '', 1).strip())
+        where_clauses.append('score > $min_score')
+        unified_where = ' WHERE ' + ' AND '.join(where_clauses)
+
+        cypher = (
+            prefix
+            + unified_where
+            + """
+            RETURN
+            """
+            + COMMUNITY_NODE_RETURN
+            + """
+            ORDER BY score DESC
+            LIMIT $limit
+            """
+        )
+        params = dict(
+            search_vector=search_vector,
+            limit=limit,
+            min_score=min_score,
+            **query_params,
+        )
+        if use_index:
+            params['candidate_k'] = limit * VECTOR_INDEX_CANDIDATE_MULTIPLIER
+        records, _, _ = await executor.execute_query(cypher, **params)
+
+        return [community_node_from_record(r) for r in records]
+
+    # --- Rerankers ---
+
+    async def node_distance_reranker(
+        self,
+        executor: QueryExecutor,
+        node_uuids: list[str],
+        center_node_uuid: str,
+        min_score: float = 0,
+    ) -> list[EntityNode]:
+        filtered_uuids = [u for u in node_uuids if u != center_node_uuid]
+        scores: dict[str, float] = {center_node_uuid: 0.0}
+
+        cypher = """
+        UNWIND $node_uuids AS node_uuid
+        MATCH (center:Entity {uuid: $center_uuid})-[:RELATES_TO]-(n:Entity {uuid: node_uuid})
+        RETURN 1 AS score, node_uuid AS uuid
+        """
+
+        results, _, _ = await executor.execute_query(
+            cypher,
+            node_uuids=filtered_uuids,
+            center_uuid=center_node_uuid,
+        )
+
+        for result in results:
+            scores[result['uuid']] = result['score']
+
+        for uuid in filtered_uuids:
+            if uuid not in scores:
+                scores[uuid] = float('inf')
+
+        filtered_uuids.sort(key=lambda cur_uuid: scores[cur_uuid])
+
+        if center_node_uuid in node_uuids:
+            scores[center_node_uuid] = 0.1
+            filtered_uuids = [center_node_uuid] + filtered_uuids
+
+        reranked_uuids = [u for u in filtered_uuids if (1 / scores[u]) >= min_score]
+
+        if not reranked_uuids:
+            return []
+
+        get_query = """
+            MATCH (n:Entity)
+            WHERE n.uuid IN $uuids
+            RETURN
+            """ + get_entity_node_return_query(GraphProvider.FALKORDB)
+
+        records, _, _ = await executor.execute_query(get_query, uuids=reranked_uuids)
+
+        node_map = {r['uuid']: entity_node_from_record(r) for r in records}
+        return [node_map[u] for u in reranked_uuids if u in node_map]
+
+    async def episode_mentions_reranker(
+        self,
+        executor: QueryExecutor,
+        node_uuids: list[str],
+        min_score: float = 0,
+    ) -> list[EntityNode]:
+        if not node_uuids:
+            return []
+
+        scores: dict[str, float] = {}
+
+        results, _, _ = await executor.execute_query(
+            """
+            UNWIND $node_uuids AS node_uuid
+            MATCH (episode:Episodic)-[r:MENTIONS]->(n:Entity {uuid: node_uuid})
+            RETURN count(*) AS score, n.uuid AS uuid
+            """,
+            node_uuids=node_uuids,
+        )
+
+        for result in results:
+            scores[result['uuid']] = result['score']
+
+        for uuid in node_uuids:
+            if uuid not in scores:
+                scores[uuid] = float('inf')
+
+        sorted_uuids = list(node_uuids)
+        sorted_uuids.sort(key=lambda cur_uuid: scores[cur_uuid])
+
+        reranked_uuids = [u for u in sorted_uuids if scores[u] >= min_score]
+
+        if not reranked_uuids:
+            return []
+
+        get_query = """
+            MATCH (n:Entity)
+            WHERE n.uuid IN $uuids
+            RETURN
+            """ + get_entity_node_return_query(GraphProvider.FALKORDB)
+
+        records, _, _ = await executor.execute_query(get_query, uuids=reranked_uuids)
+
+        node_map = {r['uuid']: entity_node_from_record(r) for r in records}
+        return [node_map[u] for u in reranked_uuids if u in node_map]
+
+    # --- Filter builders ---
+
+    def build_node_search_filters(self, search_filters: SearchFilters) -> Any:
+        filter_queries, filter_params = node_search_filter_query_constructor(
+            search_filters, GraphProvider.FALKORDB
+        )
+        return {'filter_queries': filter_queries, 'filter_params': filter_params}
+
+    def build_edge_search_filters(self, search_filters: SearchFilters) -> Any:
+        filter_queries, filter_params = edge_search_filter_query_constructor(
+            search_filters, GraphProvider.FALKORDB
+        )
+        return {'filter_queries': filter_queries, 'filter_params': filter_params}
+
+    # --- Fulltext query builder ---
+
+    def build_fulltext_query(
+        self,
+        query: str,
+        group_ids: list[str] | None = None,
+        max_query_length: int = MAX_QUERY_LENGTH,
+    ) -> str:
+        return _build_falkor_fulltext_query(query, group_ids, max_query_length)
@@ -0,0 +1,444 @@
+"""
+Copyright 2024, Zep Software, Inc.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+
+import asyncio
+import datetime
+import logging
+from typing import TYPE_CHECKING, Any
+
+if TYPE_CHECKING:
+    from falkordb import Graph as FalkorGraph
+    from falkordb.asyncio import FalkorDB
+else:
+    try:
+        from falkordb import Graph as FalkorGraph
+        from falkordb.asyncio import FalkorDB
+    except ImportError:
+        # If falkordb is not installed, raise an ImportError
+        raise ImportError(
+            'falkordb is required for FalkorDriver. '
+            'Install it with: pip install graphiti-core[falkordb]'
+        ) from None
+
+from graphiti_core.driver.driver import GraphDriver, GraphDriverSession, GraphProvider
+from graphiti_core.driver.falkordb import STOPWORDS as STOPWORDS
+from graphiti_core.driver.falkordb.operations.community_edge_ops import (
+    FalkorCommunityEdgeOperations,
+)
+from graphiti_core.driver.falkordb.operations.community_node_ops import (
+    FalkorCommunityNodeOperations,
+)
+from graphiti_core.driver.falkordb.operations.entity_edge_ops import FalkorEntityEdgeOperations
+from graphiti_core.driver.falkordb.operations.entity_node_ops import FalkorEntityNodeOperations
+from graphiti_core.driver.falkordb.operations.episode_node_ops import FalkorEpisodeNodeOperations
+from graphiti_core.driver.falkordb.operations.episodic_edge_ops import FalkorEpisodicEdgeOperations
+from graphiti_core.driver.falkordb.operations.graph_ops import FalkorGraphMaintenanceOperations
+from graphiti_core.driver.falkordb.operations.has_episode_edge_ops import (
+    FalkorHasEpisodeEdgeOperations,
+)
+from graphiti_core.driver.falkordb.operations.next_episode_edge_ops import (
+    FalkorNextEpisodeEdgeOperations,
+)
+from graphiti_core.driver.falkordb.operations.saga_node_ops import FalkorSagaNodeOperations
+from graphiti_core.driver.falkordb.operations.search_ops import FalkorSearchOperations
+from graphiti_core.driver.operations.community_edge_ops import CommunityEdgeOperations
+from graphiti_core.driver.operations.community_node_ops import CommunityNodeOperations
+from graphiti_core.driver.operations.entity_edge_ops import EntityEdgeOperations
+from graphiti_core.driver.operations.entity_node_ops import EntityNodeOperations
+from graphiti_core.driver.operations.episode_node_ops import EpisodeNodeOperations
+from graphiti_core.driver.operations.episodic_edge_ops import EpisodicEdgeOperations
+from graphiti_core.driver.operations.graph_ops import GraphMaintenanceOperations
+from graphiti_core.driver.operations.has_episode_edge_ops import HasEpisodeEdgeOperations
+from graphiti_core.driver.operations.next_episode_edge_ops import NextEpisodeEdgeOperations
+from graphiti_core.driver.operations.saga_node_ops import SagaNodeOperations
+from graphiti_core.driver.operations.search_ops import SearchOperations
+from graphiti_core.graph_queries import get_fulltext_indices, get_range_indices, get_vector_indices
+from graphiti_core.helpers import validate_group_ids
+from graphiti_core.utils.datetime_utils import convert_datetimes_to_strings
+
+logger = logging.getLogger(__name__)
+
+
+class FalkorDriverSession(GraphDriverSession):
+    provider = GraphProvider.FALKORDB
+
+    def __init__(self, graph: FalkorGraph):
+        self.graph = graph
+
+    async def __aenter__(self):
+        return self
+
+    async def __aexit__(self, exc_type, exc, tb):
+        # No cleanup needed for Falkor, but method must exist
+        pass
+
+    async def close(self):
+        # No explicit close needed for FalkorDB, but method must exist
+        pass
+
+    async def execute_write(self, func, *args, **kwargs):
+        # Directly await the provided async function with `self` as the transaction/session
+        return await func(self, *args, **kwargs)
+
+    async def run(self, query: str | list, **kwargs: Any) -> Any:
+        # FalkorDB does not support argument for Label Set, so it's converted into an array of queries
+        if isinstance(query, list):
+            for cypher, params in query:
+                params = convert_datetimes_to_strings(params)
+                await self.graph.query(str(cypher), params)  # type: ignore[reportUnknownArgumentType]
+        else:
+            params = dict(kwargs)
+            params = convert_datetimes_to_strings(params)
+            await self.graph.query(str(query), params)  # type: ignore[reportUnknownArgumentType]
+        # Assuming `graph.query` is async (ideal); otherwise, wrap in executor
+        return None
+
+
+class FalkorDriver(GraphDriver):
+    provider = GraphProvider.FALKORDB
+    default_group_id: str = '\\_'
+    fulltext_syntax: str = '@'  # FalkorDB uses a redisearch-like syntax for fulltext queries
+    aoss_client: None = None
+
+    def __init__(
+        self,
+        host: str = 'localhost',
+        port: int = 6379,
+        username: str | None = None,
+        password: str | None = None,
+        falkor_db: FalkorDB | None = None,
+        database: str = 'default_db',
+    ):
+        """
+        Initialize the FalkorDB driver.
+
+        FalkorDB is a multi-tenant graph database.
+        To connect, provide the host and port.
+        The default parameters assume a local (on-premises) FalkorDB instance.
+
+        Args:
+        host (str): The host where FalkorDB is running.
+        port (int): The port on which FalkorDB is listening.
+        username (str | None): The username for authentication (if required).
+        password (str | None): The password for authentication (if required).
+        falkor_db (FalkorDB | None): An existing FalkorDB instance to use instead of creating a new one.
+        database (str): The name of the database to connect to. Defaults to 'default_db'.
+        """
+        super().__init__()
+        self._database = database
+        if falkor_db is not None:
+            # If a FalkorDB instance is provided, use it directly
+            self.client = falkor_db
+        else:
+            self.client = FalkorDB(host=host, port=port, username=username, password=password)
+
+        # Instantiate FalkorDB operations
+        self._entity_node_ops = FalkorEntityNodeOperations()
+        self._episode_node_ops = FalkorEpisodeNodeOperations()
+        self._community_node_ops = FalkorCommunityNodeOperations()
+        self._saga_node_ops = FalkorSagaNodeOperations()
+        self._entity_edge_ops = FalkorEntityEdgeOperations()
+        self._episodic_edge_ops = FalkorEpisodicEdgeOperations()
+        self._community_edge_ops = FalkorCommunityEdgeOperations()
+        self._has_episode_edge_ops = FalkorHasEpisodeEdgeOperations()
+        self._next_episode_edge_ops = FalkorNextEpisodeEdgeOperations()
+        self._search_ops = FalkorSearchOperations()
+        self._graph_ops = FalkorGraphMaintenanceOperations()
+
+        # Schedule the indices and constraints to be built
+        try:
+            # Try to get the current event loop
+            loop = asyncio.get_running_loop()
+            # Schedule the build_indices_and_constraints to run
+            loop.create_task(self.build_indices_and_constraints())
+        except RuntimeError:
+            # No event loop running, this will be handled later
+            pass
+
+    # --- Operations properties ---
+
+    @property
+    def entity_node_ops(self) -> EntityNodeOperations:
+        return self._entity_node_ops
+
+    @property
+    def episode_node_ops(self) -> EpisodeNodeOperations:
+        return self._episode_node_ops
+
+    @property
+    def community_node_ops(self) -> CommunityNodeOperations:
+        return self._community_node_ops
+
+    @property
+    def saga_node_ops(self) -> SagaNodeOperations:
+        return self._saga_node_ops
+
+    @property
+    def entity_edge_ops(self) -> EntityEdgeOperations:
+        return self._entity_edge_ops
+
+    @property
+    def episodic_edge_ops(self) -> EpisodicEdgeOperations:
+        return self._episodic_edge_ops
+
+    @property
+    def community_edge_ops(self) -> CommunityEdgeOperations:
+        return self._community_edge_ops
+
+    @property
+    def has_episode_edge_ops(self) -> HasEpisodeEdgeOperations:
+        return self._has_episode_edge_ops
+
+    @property
+    def next_episode_edge_ops(self) -> NextEpisodeEdgeOperations:
+        return self._next_episode_edge_ops
+
+    @property
+    def search_ops(self) -> SearchOperations:
+        return self._search_ops
+
+    @property
+    def graph_ops(self) -> GraphMaintenanceOperations:
+        return self._graph_ops
+
+    def _get_graph(self, graph_name: str | None) -> FalkorGraph:
+        # FalkorDB requires a non-None database name for multi-tenant graphs; the default is "default_db"
+        if graph_name is None:
+            graph_name = self._database
+        return self.client.select_graph(graph_name)
+
+    async def execute_query(self, cypher_query_, **kwargs: Any):
+        graph = self._get_graph(self._database)
+
+        # Convert datetime objects to ISO strings (FalkorDB does not support datetime objects directly)
+        params = convert_datetimes_to_strings(dict(kwargs))
+
+        try:
+            result = await graph.query(cypher_query_, params)  # type: ignore[reportUnknownArgumentType]
+        except Exception as e:
+            if 'already indexed' in str(e):
+                # check if index already exists
+                logger.info(f'Index already exists: {e}')
+                return None
+            logger.error(f'Error executing FalkorDB query: {e}\n{cypher_query_}\n{params}')
+            raise
+
+        # Convert the result header to a list of strings
+        header = [h[1] for h in result.header]
+
+        # Convert FalkorDB's result format (list of lists) to the format expected by Graphiti (list of dicts)
+        records = []
+        for row in result.result_set:
+            record = {}
+            for i, field_name in enumerate(header):
+                if i < len(row):
+                    record[field_name] = row[i]
+                else:
+                    # If there are more fields in header than values in row, set to None
+                    record[field_name] = None
+            records.append(record)
+
+        return records, header, None
+
+    def session(self, database: str | None = None) -> GraphDriverSession:
+        return FalkorDriverSession(self._get_graph(database))
+
+    async def close(self) -> None:
+        """Close the driver connection."""
+        if hasattr(self.client, 'aclose'):
+            await self.client.aclose()  # type: ignore[reportUnknownMemberType]
+        elif hasattr(self.client.connection, 'aclose'):
+            await self.client.connection.aclose()
+        elif hasattr(self.client.connection, 'close'):
+            await self.client.connection.close()
+
+    async def delete_all_indexes(self) -> None:
+        result = await self.execute_query('CALL db.indexes()')
+        if not result:
+            return
+
+        records, _, _ = result
+        drop_tasks = []
+
+        for record in records:
+            label = record['label']
+            entity_type = record['entitytype']
+
+            for field_name, index_type in record['types'].items():
+                if 'RANGE' in index_type:
+                    drop_tasks.append(self.execute_query(f'DROP INDEX ON :{label}({field_name})'))
+                elif 'FULLTEXT' in index_type:
+                    if entity_type == 'NODE':
+                        drop_tasks.append(
+                            self.execute_query(
+                                f'DROP FULLTEXT INDEX FOR (n:{label}) ON (n.{field_name})'
+                            )
+                        )
+                    elif entity_type == 'RELATIONSHIP':
+                        drop_tasks.append(
+                            self.execute_query(
+                                f'DROP FULLTEXT INDEX FOR ()-[e:{label}]-() ON (e.{field_name})'
+                            )
+                        )
+
+        if drop_tasks:
+            await asyncio.gather(*drop_tasks)
+
+    async def build_indices_and_constraints(self, delete_existing=False):
+        if delete_existing:
+            await self.delete_all_indexes()
+        # PATCHED 2026-05-02 (BirdAI vendored patch): add vector indexes alongside
+        # range and fulltext. FalkorDB supports native vector indexes via
+        # db.idx.vector.queryNodes / queryRelationships; without these, similarity
+        # search runs as full-table-scan cosine math in interpreted Cypher.
+        index_queries = (
+            get_range_indices(self.provider)
+            + get_fulltext_indices(self.provider)
+            + get_vector_indices(self.provider)
+        )
+        for query in index_queries:
+            await self.execute_query(query)
+        # Invalidate the search_ops vector-index existence cache so subsequent
+        # similarity queries re-probe and discover the indexes we just built.
+        try:
+            from graphiti_core.driver.falkordb.operations.search_ops import (
+                _invalidate_falkordb_vector_index_cache,
+            )
+            _invalidate_falkordb_vector_index_cache()
+        except ImportError:
+            # search_ops module not yet imported (cold start); cache is empty
+            # by default, so no invalidation needed.
+            pass
+
+    def clone(self, database: str) -> 'GraphDriver':
+        """
+        Returns a shallow copy of this driver with a different default database.
+        Reuses the same connection (e.g. FalkorDB, Neo4j).
+        """
+        if database == self._database:
+            cloned = self
+        elif database == self.default_group_id:
+            cloned = FalkorDriver(falkor_db=self.client)
+        else:
+            # Create a new instance of FalkorDriver with the same connection but a different database
+            cloned = FalkorDriver(falkor_db=self.client, database=database)
+
+        return cloned
+
+    async def health_check(self) -> None:
+        """Check FalkorDB connectivity by running a simple query."""
+        try:
+            await self.execute_query('MATCH (n) RETURN 1 LIMIT 1')
+            return None
+        except Exception as e:
+            print(f'FalkorDB health check failed: {e}')
+            raise
+
+    @staticmethod
+    def convert_datetimes_to_strings(obj):
+        if isinstance(obj, dict):
+            return {k: FalkorDriver.convert_datetimes_to_strings(v) for k, v in obj.items()}
+        elif isinstance(obj, list):
+            return [FalkorDriver.convert_datetimes_to_strings(item) for item in obj]
+        elif isinstance(obj, tuple):
+            return tuple(FalkorDriver.convert_datetimes_to_strings(item) for item in obj)
+        elif isinstance(obj, datetime):
+            return obj.isoformat()
+        else:
+            return obj
+
+    def sanitize(self, query: str) -> str:
+        """
+        Replace FalkorDB special characters with whitespace.
+        Based on FalkorDB tokenization rules: ,.<>{}[]"':;!@#$%^&*()-+=~
+        """
+        # FalkorDB separator characters that break text into tokens
+        separator_map = str.maketrans(
+            {
+                ',': ' ',
+                '.': ' ',
+                '<': ' ',
+                '>': ' ',
+                '{': ' ',
+                '}': ' ',
+                '[': ' ',
+                ']': ' ',
+                '"': ' ',
+                "'": ' ',
+                ':': ' ',
+                ';': ' ',
+                '!': ' ',
+                '@': ' ',
+                '#': ' ',
+                '$': ' ',
+                '%': ' ',
+                '^': ' ',
+                '&': ' ',
+                '*': ' ',
+                '(': ' ',
+                ')': ' ',
+                '-': ' ',
+                '+': ' ',
+                '=': ' ',
+                '~': ' ',
+                '?': ' ',
+                '|': ' ',
+                '/': ' ',
+                '\\': ' ',
+            }
+        )
+        sanitized = query.translate(separator_map)
+        # Clean up multiple spaces
+        sanitized = ' '.join(sanitized.split())
+        return sanitized
+
+    def build_fulltext_query(
+        self, query: str, group_ids: list[str] | None = None, max_query_length: int = 128
+    ) -> str:
+        """
+        Build a fulltext query string for FalkorDB using RedisSearch syntax.
+        FalkorDB uses RedisSearch-like syntax where:
+        - Field queries use @ prefix: @field:value
+        - Multiple values for same field: (@field:value1|value2)
+        - Text search doesn't need @ prefix for content fields
+        - AND is implicit with space: (@group_id:value) (text)
+        - OR uses pipe within parentheses: (@group_id:value1|value2)
+        """
+        validate_group_ids(group_ids)
+
+        if group_ids is None or len(group_ids) == 0:
+            group_filter = ''
+        else:
+            # Escape group_ids with quotes to prevent RediSearch syntax errors
+            # with reserved words like "main" or special characters like hyphens
+            escaped_group_ids = [f'"{gid}"' for gid in group_ids]
+            group_values = '|'.join(escaped_group_ids)
+            group_filter = f'(@group_id:{group_values})'
+
+        sanitized_query = self.sanitize(query)
+
+        # Remove stopwords and empty tokens from the sanitized query
+        query_words = sanitized_query.split()
+        filtered_words = [word for word in query_words if word and word.lower() not in STOPWORDS]
+        sanitized_query = ' | '.join(filtered_words)
+
+        # If the query is too long return no query
+        if len(sanitized_query.split(' ')) + len(group_ids or '') >= max_query_length:
+            return ''
+
+        full_query = group_filter + ' (' + sanitized_query + ')'
+
+        return full_query
@@ -0,0 +1,242 @@
+"""
+Database query utilities for different graph database backends.
+
+This module provides database-agnostic query generation for Neo4j and FalkorDB,
+supporting index creation, fulltext search, and bulk operations.
+
+PATCHED for FalkorDB native vector index support (BirdAI vendored patch,
+2026-05-02). Adds:
+- get_vector_indices(): CREATE VECTOR INDEX statements for FalkorDB
+- get_vector_search_query(): Cypher fragment for vector similarity using
+  FalkorDB's db.idx.vector procedures, with fallback to cosine math when
+  the index does not yet exist
+- VECTOR_INDEX_CANDIDATE_MULTIPLIER: over-fetch factor for vector index
+  queries to handle filter rejections after index lookup
+
+No changes to Neo4j or Kuzu code paths.
+"""
+
+from typing_extensions import LiteralString
+
+from graphiti_core.driver.driver import GraphProvider
+
+# Mapping from Neo4j fulltext index names to FalkorDB node labels
+NEO4J_TO_FALKORDB_MAPPING = {
+    'node_name_and_summary': 'Entity',
+    'community_name': 'Community',
+    'episode_content': 'Episodic',
+    'edge_name_and_fact': 'RELATES_TO',
+}
+# Mapping from fulltext index names to Kuzu node labels
+INDEX_TO_LABEL_KUZU_MAPPING = {
+    'node_name_and_summary': 'Entity',
+    'community_name': 'Community',
+    'episode_content': 'Episodic',
+    'edge_name_and_fact': 'RelatesToNode_',
+}
+
+# Vector index over-fetch multiplier. When a vector index search is
+# combined with WHERE filters (group_id, source_uuid, etc.), some of
+# the top-k index results may be filtered out. Over-fetching by this
+# factor preserves recall against the final LIMIT after filtering.
+# Conservative default; tunable per-deployment by editing this constant
+# or via environment-variable override at the driver level (future).
+VECTOR_INDEX_CANDIDATE_MULTIPLIER = 5
+
+
+def get_range_indices(provider: GraphProvider) -> list[LiteralString]:
+    if provider == GraphProvider.FALKORDB:
+        return [
+            # Entity node
+            'CREATE INDEX FOR (n:Entity) ON (n.uuid, n.group_id, n.name, n.created_at)',
+            # Episodic node
+            'CREATE INDEX FOR (n:Episodic) ON (n.uuid, n.group_id, n.created_at, n.valid_at)',
+            # Community node
+            'CREATE INDEX FOR (n:Community) ON (n.uuid)',
+            # Saga node
+            'CREATE INDEX FOR (n:Saga) ON (n.uuid, n.group_id, n.name)',
+            # RELATES_TO edge
+            'CREATE INDEX FOR ()-[e:RELATES_TO]-() ON (e.uuid, e.group_id, e.name, e.created_at, e.expired_at, e.valid_at, e.invalid_at)',
+            # MENTIONS edge
+            'CREATE INDEX FOR ()-[e:MENTIONS]-() ON (e.uuid, e.group_id)',
+            # HAS_MEMBER edge
+            'CREATE INDEX FOR ()-[e:HAS_MEMBER]-() ON (e.uuid)',
+            # HAS_EPISODE edge
+            'CREATE INDEX FOR ()-[e:HAS_EPISODE]-() ON (e.uuid, e.group_id)',
+            # NEXT_EPISODE edge
+            'CREATE INDEX FOR ()-[e:NEXT_EPISODE]-() ON (e.uuid, e.group_id)',
+        ]
+
+    if provider == GraphProvider.KUZU:
+        return []
+
+    return [
+        'CREATE INDEX entity_uuid IF NOT EXISTS FOR (n:Entity) ON (n.uuid)',
+        'CREATE INDEX episode_uuid IF NOT EXISTS FOR (n:Episodic) ON (n.uuid)',
+        'CREATE INDEX community_uuid IF NOT EXISTS FOR (n:Community) ON (n.uuid)',
+        'CREATE INDEX saga_uuid IF NOT EXISTS FOR (n:Saga) ON (n.uuid)',
+        'CREATE INDEX relation_uuid IF NOT EXISTS FOR ()-[e:RELATES_TO]-() ON (e.uuid)',
+        'CREATE INDEX mention_uuid IF NOT EXISTS FOR ()-[e:MENTIONS]-() ON (e.uuid)',
+        'CREATE INDEX has_member_uuid IF NOT EXISTS FOR ()-[e:HAS_MEMBER]-() ON (e.uuid)',
+        'CREATE INDEX has_episode_uuid IF NOT EXISTS FOR ()-[e:HAS_EPISODE]-() ON (e.uuid)',
+        'CREATE INDEX next_episode_uuid IF NOT EXISTS FOR ()-[e:NEXT_EPISODE]-() ON (e.uuid)',
+        'CREATE INDEX entity_group_id IF NOT EXISTS FOR (n:Entity) ON (n.group_id)',
+        'CREATE INDEX episode_group_id IF NOT EXISTS FOR (n:Episodic) ON (n.group_id)',
+        'CREATE INDEX community_group_id IF NOT EXISTS FOR (n:Community) ON (n.group_id)',
+        'CREATE INDEX saga_group_id IF NOT EXISTS FOR (n:Saga) ON (n.group_id)',
+        'CREATE INDEX relation_group_id IF NOT EXISTS FOR ()-[e:RELATES_TO]-() ON (e.group_id)',
+        'CREATE INDEX mention_group_id IF NOT EXISTS FOR ()-[e:MENTIONS]-() ON (e.group_id)',
+        'CREATE INDEX has_episode_group_id IF NOT EXISTS FOR ()-[e:HAS_EPISODE]-() ON (e.group_id)',
+        'CREATE INDEX next_episode_group_id IF NOT EXISTS FOR ()-[e:NEXT_EPISODE]-() ON (e.group_id)',
+        'CREATE INDEX name_entity_index IF NOT EXISTS FOR (n:Entity) ON (n.name)',
+        'CREATE INDEX saga_name IF NOT EXISTS FOR (n:Saga) ON (n.name)',
+        'CREATE INDEX created_at_entity_index IF NOT EXISTS FOR (n:Entity) ON (n.created_at)',
+        'CREATE INDEX created_at_episodic_index IF NOT EXISTS FOR (n:Episodic) ON (n.created_at)',
+        'CREATE INDEX valid_at_episodic_index IF NOT EXISTS FOR (n:Episodic) ON (n.valid_at)',
+        'CREATE INDEX name_edge_index IF NOT EXISTS FOR ()-[e:RELATES_TO]-() ON (e.name)',
+        'CREATE INDEX created_at_edge_index IF NOT EXISTS FOR ()-[e:RELATES_TO]-() ON (e.created_at)',
+        'CREATE INDEX expired_at_edge_index IF NOT EXISTS FOR ()-[e:RELATES_TO]-() ON (e.expired_at)',
+        'CREATE INDEX valid_at_edge_index IF NOT EXISTS FOR ()-[e:RELATES_TO]-() ON (e.valid_at)',
+        'CREATE INDEX invalid_at_edge_index IF NOT EXISTS FOR ()-[e:RELATES_TO]-() ON (e.invalid_at)',
+    ]
+
+
+def get_fulltext_indices(provider: GraphProvider) -> list[LiteralString]:
+    if provider == GraphProvider.FALKORDB:
+        from typing import cast
+
+        from graphiti_core.driver.falkordb import STOPWORDS
+
+        # Convert to string representation for embedding in queries
+        stopwords_str = str(STOPWORDS)
+
+        # Use type: ignore to satisfy LiteralString requirement while maintaining single source of truth
+        return cast(
+            list[LiteralString],
+            [
+                f"""CALL db.idx.fulltext.createNodeIndex(
+                                                {{
+                                                    label: 'Episodic',
+                                                    stopwords: {stopwords_str}
+                                                }},
+                                                'content', 'source', 'source_description', 'group_id'
+                                                )""",
+                f"""CALL db.idx.fulltext.createNodeIndex(
+                                                {{
+                                                    label: 'Entity',
+                                                    stopwords: {stopwords_str}
+                                                }},
+                                                'name', 'summary', 'group_id'
+                                                )""",
+                f"""CALL db.idx.fulltext.createNodeIndex(
+                                                {{
+                                                    label: 'Community',
+                                                    stopwords: {stopwords_str}
+                                                }},
+                                                'name', 'group_id'
+                                                )""",
+                """CREATE FULLTEXT INDEX FOR ()-[e:RELATES_TO]-() ON (e.name, e.fact, e.group_id)""",
+            ],
+        )
+
+    if provider == GraphProvider.KUZU:
+        return [
+            "CALL CREATE_FTS_INDEX('Episodic', 'episode_content', ['content', 'source', 'source_description']);",
+            "CALL CREATE_FTS_INDEX('Entity', 'node_name_and_summary', ['name', 'summary']);",
+            "CALL CREATE_FTS_INDEX('Community', 'community_name', ['name']);",
+            "CALL CREATE_FTS_INDEX('RelatesToNode_', 'edge_name_and_fact', ['name', 'fact']);",
+        ]
+
+    return [
+        """CREATE FULLTEXT INDEX episode_content IF NOT EXISTS
+        FOR (e:Episodic) ON EACH [e.content, e.source, e.source_description, e.group_id]""",
+        """CREATE FULLTEXT INDEX node_name_and_summary IF NOT EXISTS
+        FOR (n:Entity) ON EACH [n.name, n.summary, n.group_id]""",
+        """CREATE FULLTEXT INDEX community_name IF NOT EXISTS
+        FOR (n:Community) ON EACH [n.name, n.group_id]""",
+        """CREATE FULLTEXT INDEX edge_name_and_fact IF NOT EXISTS
+        FOR ()-[e:RELATES_TO]-() ON EACH [e.name, e.fact, e.group_id]""",
+    ]
+
+
+def get_vector_indices(provider: GraphProvider, dimension: int = 384) -> list[LiteralString]:
+    """Return CREATE VECTOR INDEX statements for the given provider.
+
+    For FalkorDB: creates HNSW vector indexes on Entity.name_embedding,
+    RELATES_TO.fact_embedding, and Community.name_embedding. Backed by
+    FalkorDB's native vector index (db.idx.vector.queryNodes /
+    queryRelationships).
+
+    For Neo4j and Kuzu: returns an empty list. Those backends create vector
+    indexes via different mechanisms (Neo4j auto-creates them when needed
+    via its vector.similarity.cosine function; Kuzu uses array_cosine_similarity
+    and does not require pre-built vector indexes for graphiti-core's usage).
+
+    Args:
+        provider: The graph database provider.
+        dimension: Embedding dimension. Defaults to 384 (all-MiniLM-L6-v2).
+            Embedders with different dimensions should pass their own value
+            through driver configuration. graphiti-core's default embedder
+            is 1536 (OpenAI ada-002); BirdAI uses 384 (sentence-transformers).
+
+    Returns:
+        List of CREATE VECTOR INDEX statements. Idempotent at FalkorDB level
+        if the index already exists with matching options.
+    """
+    if provider == GraphProvider.FALKORDB:
+        from typing import cast
+        return cast(
+            list[LiteralString],
+            [
+                f"CREATE VECTOR INDEX FOR (n:Entity) ON (n.name_embedding) "
+                f"OPTIONS {{dimension: {dimension}, similarityFunction: 'cosine'}}",
+                f"CREATE VECTOR INDEX FOR ()-[e:RELATES_TO]-() ON (e.fact_embedding) "
+                f"OPTIONS {{dimension: {dimension}, similarityFunction: 'cosine'}}",
+                f"CREATE VECTOR INDEX FOR (n:Community) ON (n.name_embedding) "
+                f"OPTIONS {{dimension: {dimension}, similarityFunction: 'cosine'}}",
+            ],
+        )
+
+    return []
+
+
+def get_nodes_query(name: str, query: str, limit: int, provider: GraphProvider) -> str:
+    if provider == GraphProvider.FALKORDB:
+        label = NEO4J_TO_FALKORDB_MAPPING[name]
+        return f"CALL db.idx.fulltext.queryNodes('{label}', {query})"
+
+    if provider == GraphProvider.KUZU:
+        label = INDEX_TO_LABEL_KUZU_MAPPING[name]
+        return f"CALL QUERY_FTS_INDEX('{label}', '{name}', {query}, TOP := $limit)"
+
+    return f'CALL db.index.fulltext.queryNodes("{name}", {query}, {{limit: $limit}})'
+
+
+def get_vector_cosine_func_query(vec1, vec2, provider: GraphProvider) -> str:
+    """Return a Cypher fragment for cosine similarity score in [0, 1].
+
+    PRESERVED for backward compatibility and as fallback when vector indexes
+    do not yet exist on the FalkorDB backend. New code paths should prefer
+    get_vector_search_query() which uses the native vector index when
+    available.
+    """
+    if provider == GraphProvider.FALKORDB:
+        # FalkorDB uses a different syntax for regular cosine similarity and Neo4j uses normalized cosine similarity
+        return f'(2 - vec.cosineDistance({vec1}, vecf32({vec2})))/2'
+
+    if provider == GraphProvider.KUZU:
+        return f'array_cosine_similarity({vec1}, {vec2})'
+
+    return f'vector.similarity.cosine({vec1}, {vec2})'
+
+
+def get_relationships_query(name: str, limit: int, provider: GraphProvider) -> str:
+    if provider == GraphProvider.FALKORDB:
+        label = NEO4J_TO_FALKORDB_MAPPING[name]
+        return f"CALL db.idx.fulltext.queryRelationships('{label}', $query)"
+
+    if provider == GraphProvider.KUZU:
+        label = INDEX_TO_LABEL_KUZU_MAPPING[name]
+        return f"CALL QUERY_FTS_INDEX('{label}', '{name}', cast($query AS STRING), TOP := $limit)"
+
+    return f'CALL db.index.fulltext.queryRelationships("{name}", $query, {{limit: $limit}})'
@@ -1,12 +1,14 @@
 import os
+import re
 import json
 import sqlite3
 import subprocess
 import hashlib
+import requests
 from pathlib import Path
-from datetime import datetime
+from datetime import datetime, timedelta
 from dotenv import load_dotenv
-from sentence_transformers import SentenceTransformer
+from sentence_transformers import SentenceTransformer, CrossEncoder
 import anthropic
 from fastapi import FastAPI, Request, Response, Depends, HTTPException, BackgroundTasks
 import psycopg2
@@ -38,6 +40,19 @@ load_dotenv(Path.home() / "aaronai" / ".env")

 MEMORY_PATH = Path.home() / "aaronai" / "memory.md"
 CONVERSATIONS_DB = str(Path.home() / "aaronai" / "conversations.db")
+
+def _connect(path):
+    conn = sqlite3.connect(path, timeout=5.0)
+    conn.execute("PRAGMA synchronous=NORMAL")
+    conn.execute("PRAGMA foreign_keys=ON")
+    return conn
+
+def _connect_conversations():
+    return _connect(CONVERSATIONS_DB)
+
+def _connect_sessions():
+    return _connect(SESSIONS_DB)
+
 SETTINGS_PATH = Path.home() / "aaronai" / "settings.json"
 WATCHER_LOG = str(Path.home() / "aaronai" / "watcher.log")
 WATCHER_STATE = str(Path.home() / "aaronai" / "watcher_state.json")
@@ -73,11 +88,12 @@ WHISPER_PROMPT = (
 whisper_model = None
 if HAS_WHISPER:
    try:
-        whisper_model = WhisperModel("large-v3", device="cpu", compute_type="int8", cpu_threads=8)
+        whisper_model = WhisperModel("distil-large-v3", device="cpu", compute_type="int8", cpu_threads=4)
        print("Whisper model loaded")
    except Exception as e:
        print(f"Whisper not available: {e}")
 embedder = SentenceTransformer("all-MiniLM-L6-v2")
+reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
 # ChromaDB removed — using pgvector
 anthropic_client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

@@ -108,22 +124,65 @@ economical, specific, never performative. When answering questions,
 cite sources and acknowledge uncertainty rather than filling gaps with
 plausible-sounding content.

-You have access to his complete document corpus, conversation history,
-and a persistent memory file that carries his current context. Treat
-the memory file as ground truth for his present situation. Use web
-search automatically when current information is needed. Never
-re-brief on context that's already in memory or documents.
+You have a persistent memory file (always present below) that carries
+Aaron's current context — treat it as ground truth for his present
+situation.
+
+For anything beyond what's in memory, you have a retrieve_documents
+tool that searches his full knowledge base: personal documents,
+reading library, conversation transcripts, and journal entries. Call
+it whenever you need concrete information — names, dates, project
+specifics, prior thinking, exhibition records, syllabi, anything you
+don't already know. For compound questions, call it multiple times
+with different concrete queries; one call per distinct information
+need. Prefer specific tokens (named entities, project names, course
+codes) over abstract instructional phrasing — search "FWN3D
+consulting" not "my work." Results are unfiltered and ranked by
+semantic similarity; judge each chunk for relevance and ignore
+irrelevant hits rather than forcing them into the answer.
+
+You also have a search_facts tool that queries a knowledge graph of
+atomic facts about Aaron's entities and their relationships. The graph
+was populated through early May 2026 and is not currently being
+updated; treat it as a *historical* layer that holds biographical
+content (career, projects, consulting), exhibition records, key
+people, dossier-era claims, and time-stamped facts with explicit
+validity windows. For biographical or relational questions ("write
+me a bio", "what's the FWN3D / HVAMC relationship", "who did I
+consult for at IBM"), call search_facts *in addition to*
+retrieve_documents — the two return complementary shapes (atomic
+facts vs. document passages). For current-state questions, the
+persistent memory file is more authoritative than the graph.
+
+When Aaron asks for a document file — bio, cover letter, statement,
+CV section, anything he wants to send or edit outside chat — produce
+the full text as your chat reply first. NEVER call save_document on
+the same turn as the initial request, even when Aaron's phrasing
+includes words like "save", "output", "write", or "as docx/pdf" in
+the original ask. Those are part of the topic, not a save approval.
+The first call to save_document only happens in a *later* turn,
+after Aaron has read the draft and explicitly approves it — examples:
+"save it", "yes save it", "looks good, write it out", "go ahead".
+If Aaron asks for revisions, iterate in chat without calling
+save_document. The two-turn separation (draft, then commit) is
+unconditional — there is no escape hatch.
+
+Use web search automatically when current external information is
+needed. Never re-brief on context that's already in memory or
+retrieved chunks.

 When making factual claims about Aaron — his history, credentials, locations, dates, relationships, projects, or any specific event — you must ground the claim in a specific retrieved document or the memory file. Cite the source by name inline. If no source supports the claim, say so explicitly rather than filling the gap with plausible-sounding content. Do not confabulate. If you are inferring rather than citing, mark it as inference."""

 # Auth configuration
 import os
 SESSION_PASSWORD = os.getenv("AARON_AI_PASSWORD", "changeme")
+SESSION_MAX_AGE_SECONDS = 60 * 60 * 24 * 365
 SESSIONS_DB = str(Path.home() / "aaronai" / "sessions.db")

 def _init_sessions():
-    conn = sqlite3.connect(SESSIONS_DB)
+    conn = _connect_sessions()
    conn.execute("CREATE TABLE IF NOT EXISTS sessions (token TEXT PRIMARY KEY, created_at TEXT)")
+    conn.execute("PRAGMA journal_mode=WAL")
    conn.commit()
    conn.close()

@@ -136,20 +195,23 @@ def hash_password(password: str) -> str:
    return hashlib.sha256(password.encode()).hexdigest()

 def save_session(token: str):
-    conn = sqlite3.connect(SESSIONS_DB)
+    conn = _connect_sessions()
    conn.execute("INSERT OR REPLACE INTO sessions VALUES (?, ?)", (token, datetime.now().isoformat()))
    conn.commit()
    conn.close()

 def delete_session(token: str):
-    conn = sqlite3.connect(SESSIONS_DB)
+    conn = _connect_sessions()
    conn.execute("DELETE FROM sessions WHERE token = ?", (token,))
    conn.commit()
    conn.close()

 def session_exists(token: str) -> bool:
-    conn = sqlite3.connect(SESSIONS_DB)
-    row = conn.execute("SELECT 1 FROM sessions WHERE token = ?", (token,)).fetchone()
+    conn = _connect_sessions()
+    cutoff = (datetime.now() - timedelta(seconds=SESSION_MAX_AGE_SECONDS)).isoformat()
+    conn.execute("DELETE FROM sessions WHERE created_at < ?", (cutoff,))
+    conn.commit()
+    row = conn.execute("SELECT 1 FROM sessions WHERE token = ? AND created_at >= ?", (token, cutoff)).fetchone()
    conn.close()
    return row is not None

@@ -163,7 +225,7 @@ def require_auth(request: Request):
    return token

 def init_conversations_db():
-    conn = sqlite3.connect(CONVERSATIONS_DB)
+    conn = _connect_conversations()
    c = conn.cursor()
    c.execute('''CREATE TABLE IF NOT EXISTS conversations (
        id TEXT PRIMARY KEY,
@@ -182,6 +244,8 @@ def init_conversations_db():
        timestamp TEXT NOT NULL,
        FOREIGN KEY (conversation_id) REFERENCES conversations(id)
    )''')
+    c.execute("PRAGMA journal_mode=WAL")
+    c.execute("CREATE INDEX IF NOT EXISTS idx_messages_conv_ts ON messages(conversation_id, timestamp DESC)")
    conn.commit()
    conn.close()

@@ -223,34 +287,131 @@ def remove_from_memory(item):
    save_memory("\n".join(filtered))
    return len(lines) - len(filtered)

-def retrieve_context(query, n_results=8):
-    """Pure semantic retrieval over pgvector. Top-N by cosine similarity, threshold 0.3.
-    No CV pinning, no keyword routing — see architecture doc substrate-dependency section.
-    Substrate-level workarounds (entity-keyed routing, hybrid retrieval) live at the
-    Graphiti layer, not as wrapper logic above pgvector."""
+HYBRID_CANDIDATES = 30
+RRF_K = 60
+FINAL_LIMIT = 8
+MAX_RETRIEVALS_PER_TURN = 5
+MAX_CITED_SOURCES = 5
+
+_TSQUERY_SANITIZE_RE = re.compile(r"[^\w\s\"'-]")
+
+
+def _websearch_query(text: str) -> str:
+    """Strip characters websearch_to_tsquery doesn't handle cleanly. Quoted
+    phrases and 'or' are preserved by the function itself."""
+    return _TSQUERY_SANITIZE_RE.sub(" ", text).strip()
+
+
+def _rerank(query: str, candidates: list[tuple]) -> list[tuple]:
+    """Cross-encoder rerank. Candidates are (id, document, source, folder, created_at)
+    tuples. Returns the same tuples reordered by reranker score with created_at as
+    secondary key — so when two chunks score similarly the newer one wins, which
+    keeps memory/journal files biased toward the latest snapshot."""
+    if not candidates:
+        return []
+    pairs = [(query, row[1]) for row in candidates]
+    scores = reranker.predict(pairs)
+    return [row for row, _ in sorted(
+        zip(candidates, scores),
+        key=lambda x: (float(x[1]), x[0][4] or ""),
+        reverse=True,
+    )]
+
+
+def _format_source(source: str, folder: str) -> str:
+    """Surface folder context to the LLM so it can disambiguate same-named files
+    (e.g., 21 different CV.docx files across job-application folders)."""
+    source = source or "unknown"
+    if folder and folder not in ("", "."):
+        return f"{folder}/{source}"
+    return source
+
+
+def _dedup_key(doc: str) -> str:
+    """Collapse near-duplicates by content. Files copied to multiple folders
+    produce byte-identical chunks; this catches those without affecting
+    legitimately-different chunks of the same source (e.g., separate sections
+    of a conversation)."""
+    return hashlib.md5(doc[:300].lower().encode("utf-8", "ignore")).hexdigest()
+
+
+def retrieve_context(query, n_results=FINAL_LIMIT):
+    """Hybrid retrieval (dense + lexical, RRF fused) followed by cross-encoder rerank.
+
+    - Dense (pgvector) handles paraphrase / semantic similarity.
+    - Lexical (tsvector) catches rare named tokens (FWN3D, Sono-Tek, course codes)
+      the embedding model has no signal for.
+    - RRF combines the two rankings without calibrating score scales.
+    - Cross-encoder rerank scores each (query, chunk) pair jointly.
+    - Near-duplicate collapse on output so top-N slots aren't burned by
+      multi-folder copies of the same file.
+
+    No type or folder filtering: imposing a taxonomy at retrieval time is a
+    heuristic we've explicitly rejected. The reranker ranks, the caller (LLM)
+    decides what's relevant to its task."""
    query_embedding = embedder.encode([query]).tolist()[0]
+    ts_query = _websearch_query(query)
+
    context_pieces = []
    sources = []
+
    try:
        pg = get_pg()
        cur = pg.cursor()
+
        cur.execute("""
-            SELECT document, source, 1 - (embedding <=> %s::vector) as similarity
+            SELECT id, document, source, metadata->>'folder' AS folder, created_at
            FROM embeddings
            ORDER BY embedding <=> %s::vector
            LIMIT %s
-        """, (query_embedding, query_embedding, n_results))
-        for doc, source, similarity in cur.fetchall():
-            if similarity > 0.3:
-                context_pieces.append(doc)
-                sources.append(source or "unknown")
+        """, (query_embedding, HYBRID_CANDIDATES))
+        dense_hits = cur.fetchall()
+
+        lexical_hits = []
+        if ts_query:
+            cur.execute("""
+                SELECT id, document, source, metadata->>'folder' AS folder, created_at
+                FROM embeddings
+                WHERE to_tsvector('english', document)
+                      @@ websearch_to_tsquery('english', %s)
+                ORDER BY ts_rank(to_tsvector('english', document),
+                                 websearch_to_tsquery('english', %s)) DESC
+                LIMIT %s
+            """, (ts_query, ts_query, HYBRID_CANDIDATES))
+            lexical_hits = cur.fetchall()
+
        pg.close()
+
+        scores = {}
+        rows_by_id = {}
+        for rank, row in enumerate(dense_hits):
+            scores[row[0]] = scores.get(row[0], 0) + 1.0 / (RRF_K + rank + 1)
+            rows_by_id[row[0]] = row
+        for rank, row in enumerate(lexical_hits):
+            scores[row[0]] = scores.get(row[0], 0) + 1.0 / (RRF_K + rank + 1)
+            rows_by_id[row[0]] = row
+
+        rrf_ranked = sorted(scores.items(), key=lambda kv: kv[1], reverse=True)
+        candidates = [rows_by_id[doc_id] for doc_id, _ in rrf_ranked]
+
+        seen = set()
+        for _id, doc, source, folder, _created_at in _rerank(query, candidates):
+            key = _dedup_key(doc)
+            if key in seen:
+                continue
+            seen.add(key)
+            context_pieces.append(doc)
+            sources.append(_format_source(source, folder))
+            if len(context_pieces) >= n_results:
+                break
+
    except Exception as e:
-        print(f"pgvector retrieval error: {e}")
+        print(f"hybrid retrieval error: {e}")
+
    return context_pieces, sources

 def get_conversation_history(conversation_id, limit=20):
-    conn = sqlite3.connect(CONVERSATIONS_DB)
+    conn = _connect_conversations()
    c = conn.cursor()
    c.execute('''SELECT role, content FROM messages
                 WHERE conversation_id = ?
@@ -260,7 +421,7 @@ def get_conversation_history(conversation_id, limit=20):
    return [{"role": r[0], "content": r[1]} for r in reversed(rows)]

 def save_message(conversation_id, role, content, sources=None):
-    conn = sqlite3.connect(CONVERSATIONS_DB)
+    conn = _connect_conversations()
    c = conn.cursor()
    msg_id = hashlib.md5(f"{conversation_id}{role}{datetime.now().isoformat()}".encode()).hexdigest()
    timestamp = datetime.now().isoformat()
@@ -274,7 +435,7 @@ def save_message(conversation_id, role, content, sources=None):
    conn.close()

 def create_conversation(title="New conversation"):
-    conn = sqlite3.connect(CONVERSATIONS_DB)
+    conn = _connect_conversations()
    c = conn.cursor()
    conv_id = hashlib.md5(f"{datetime.now().isoformat()}".encode()).hexdigest()[:16]
    now = datetime.now().isoformat()
@@ -284,50 +445,370 @@ def create_conversation(title="New conversation"):
    conn.close()
    return conv_id

+NEXTCLOUD_URL = os.getenv("NEXTCLOUD_URL", "https://nextcloud.aaronnelson.studio")
+NEXTCLOUD_USER = os.getenv("NEXTCLOUD_USER", "aaron")
+NEXTCLOUD_PASSWORD = os.getenv("NEXTCLOUD_PASSWORD", "")
+DRAFTS_WEBDAV = f"{NEXTCLOUD_URL}/remote.php/dav/files/{NEXTCLOUD_USER}/Drafts"
+
+_FILENAME_SAFE_RE = re.compile(r"[^A-Za-z0-9_\-\. ]")
+
+
+GRAPHITI_URL = os.getenv("GRAPHITI_URL", "http://localhost:8001")
+GRAPHITI_GROUP_ID = os.getenv("GRAPHITI_GROUP_ID", "aaron")
+
+
+SEARCH_FACTS_TOOL = {
+    "name": "search_facts",
+    "description": (
+        "Search Aaron's knowledge graph for atomic facts about entities and "
+        "their relationships. The graph holds time-stamped facts captured up "
+        "to early May 2026 — biographical content (career, projects, "
+        "consulting), exhibition history, key relationships, dossier-era "
+        "claims. Returns short sentence-shaped facts with valid_at / "
+        "invalid_at timestamps so you can distinguish current state from "
+        "superseded history. Useful for: bios, 'who did I consult for', "
+        "'what's the relationship between X and Y', any question shaped like "
+        "a relational lookup. Complements retrieve_documents (which returns "
+        "longer chunk passages). Call this *in addition to* retrieve_documents "
+        "for biographical or relational questions — the two return "
+        "different shapes of evidence. The graph hasn't been updated since "
+        "early May 2026; for current-state questions, the persistent memory "
+        "file or recent documents are more authoritative."
+    ),
+    "input_schema": {
+        "type": "object",
+        "properties": {
+            "query": {
+                "type": "string",
+                "description": "The fact-shaped query. Concrete entity names work best.",
+            },
+        },
+        "required": ["query"],
+    },
+}
+
+
+def _push_chat_turn_to_graphiti(conversation_id, user_message, assistant_message):
+    """Async fire-and-forget push of a chat turn into Graphiti. Single episode,
+    default extraction, no custom_extraction_instructions. Takes ~20 min in
+    the background against the current ~4,300-entity graph; the chat caller
+    is not gated on this. Errors are logged, never raised."""
+    if os.getenv("SKIP_GRAPHITI_CHAT_PUSH"):
+        return
+    if not (user_message or "").strip() and not (assistant_message or "").strip():
+        return
+    import threading
+    from datetime import datetime as _dt
+
+    def _work():
+        try:
+            episode_name = f"chat-{conversation_id[:8]}-{_dt.now().strftime('%Y%m%dT%H%M%S')}"
+            content = (
+                f"User: {user_message}\n\n"
+                f"Assistant: {assistant_message}"
+            )
+            payload = {
+                "name": episode_name,
+                "content": content,
+                "source_description": f"chat turn (conversation {conversation_id})",
+                "timestamp": _dt.now().isoformat(),
+                "group_id": GRAPHITI_GROUP_ID,
+            }
+            # Long timeout — sidecar add_episode against the current graph
+            # is empirically ~20 min wall-clock. We're patient; chat isn't.
+            r = requests.post(f"{GRAPHITI_URL}/episodes", json=payload, timeout=1800)
+            if r.status_code == 200:
+                print(f"[graphiti-push] turn ingested: {episode_name}", flush=True)
+            else:
+                print(f"[graphiti-push] non-200 ({r.status_code}) for {episode_name}: {r.text[:200]}", flush=True)
+        except requests.RequestException as e:
+            print(f"[graphiti-push] request failed: {e}", flush=True)
+        except Exception as e:
+            print(f"[graphiti-push] unexpected error: {e}", flush=True)
+
+    threading.Thread(target=_work, daemon=True).start()
+
+
+def _execute_search_facts(tool_input):
+    """Hit Graphiti /search, format the results as text for Claude."""
+    query = (tool_input or {}).get("query", "").strip()
+    if not query:
+        return "No query provided."
+    try:
+        r = requests.get(
+            f"{GRAPHITI_URL}/search",
+            params={"query": query, "limit": 8, "group_id": GRAPHITI_GROUP_ID},
+            timeout=15,
+        )
+    except requests.RequestException as e:
+        return f"search_facts: Graphiti unreachable ({e})."
+    if r.status_code != 200:
+        return f"search_facts: Graphiti returned {r.status_code}."
+    results = r.json().get("results", [])
+    if not results:
+        return f"No facts found for {query!r}."
+    lines = []
+    for i, f in enumerate(results, 1):
+        fact = f.get("fact", "").strip()
+        valid_at = f.get("valid_at") or "?"
+        invalid_at = f.get("invalid_at")
+        validity = (f"valid {valid_at}" + (f" → superseded {invalid_at}"
+                                            if invalid_at and invalid_at != "None" else ""))
+        lines.append(f"[{i}] {fact}  ({validity})")
+    return "\n".join(lines)
+
+
+SAVE_DOCUMENT_TOOL = {
+    "name": "save_document",
+    "description": (
+        "Render markdown content to docx or pdf and save it to Aaron's Nextcloud "
+        "Drafts/ folder (syncs to his other devices and web UI). Use this when "
+        "Aaron asks for a document file rather than chat text — bios, cover "
+        "letters, statements, CV sections, anything he'll edit or send. Returns "
+        "the saved filename. Pick a descriptive filename (no extension) like "
+        "'Aaron_Nelson_Bio_Utah_2026-05'. Format is 'docx' for editable drafts, "
+        "'pdf' for typeset/print-ready output. Content should be well-formed "
+        "markdown — # headings, **bold**, *italic*, - bulleted lists. Don't "
+        "embed file content in the chat response too; just call this tool and "
+        "tell Aaron where it landed."
+    ),
+    "input_schema": {
+        "type": "object",
+        "properties": {
+            "content": {
+                "type": "string",
+                "description": "Document content in markdown.",
+            },
+            "filename": {
+                "type": "string",
+                "description": "Descriptive filename without extension.",
+            },
+            "format": {
+                "type": "string",
+                "enum": ["docx", "pdf"],
+                "description": "Output format.",
+            },
+        },
+        "required": ["content", "filename", "format"],
+    },
+}
+
+
+def _safe_filename(name: str, ext: str) -> str:
+    """Strip path components and unsafe chars; force the requested extension."""
+    base = Path(name).name
+    base = _FILENAME_SAFE_RE.sub("_", base).strip().rstrip(".")
+    if not base:
+        base = "untitled"
+    base = Path(base).stem
+    return f"{base}.{ext}"
+
+
+def _webdav_unique_url(base_url: str, filename: str, auth) -> tuple[str, str]:
+    """Return a WebDAV URL that doesn't collide with an existing file. Appends
+    _2, _3, ... until PROPFIND returns 404. Matches the convention dream.py uses."""
+    stem = Path(filename).stem
+    suffix = Path(filename).suffix
+    name = filename
+    i = 2
+    while True:
+        url = f"{base_url}/{name}"
+        check = requests.request("PROPFIND", url, auth=auth, timeout=10)
+        if check.status_code == 404:
+            return url, name
+        name = f"{stem}_{i}{suffix}"
+        i += 1
+        if i > 50:
+            raise RuntimeError("could not find a free filename")
+
+
+def _execute_save_document(tool_input):
+    """Generate a document via pandoc and PUT it to Nextcloud Drafts/.
+    Returns a user-facing status string for Claude to relay."""
+    if not NEXTCLOUD_PASSWORD:
+        return "save_document: NEXTCLOUD_PASSWORD not configured."
+
+    payload = tool_input or {}
+    content = payload.get("content", "")
+    raw_filename = payload.get("filename", "untitled")
+    fmt = payload.get("format", "docx")
+
+    if not content.strip():
+        return "save_document: empty content, nothing saved."
+    if fmt not in ("docx", "pdf"):
+        return f"save_document: unsupported format {fmt!r}; use 'docx' or 'pdf'."
+
+    safe_name = _safe_filename(raw_filename, fmt)
+    auth = (NEXTCLOUD_USER, NEXTCLOUD_PASSWORD)
+
+    # Ensure Drafts/ exists. 201 = created, 405 = already there — both fine.
+    try:
+        requests.request("MKCOL", DRAFTS_WEBDAV, auth=auth, timeout=10)
+    except requests.RequestException as e:
+        return f"save_document: could not reach Nextcloud ({e})."
+
+    try:
+        url, final_name = _webdav_unique_url(DRAFTS_WEBDAV, safe_name, auth)
+    except (requests.RequestException, RuntimeError) as e:
+        return f"save_document: filename probe failed ({e})."
+
+    cmd = ["pandoc", "-f", "markdown", "-t", fmt, "-o", "-"]
+    if fmt == "pdf":
+        cmd.insert(-2, "--pdf-engine=xelatex")
+    try:
+        proc = subprocess.run(
+            cmd, input=content.encode("utf-8"),
+            capture_output=True, timeout=120,
+        )
+    except subprocess.TimeoutExpired:
+        return "save_document: pandoc timed out (>120s)."
+    except FileNotFoundError:
+        return ("save_document: pandoc binary not reachable from the api process "
+                "(check that PATH in aaronai.service includes /usr/bin).")
+    if proc.returncode != 0:
+        err = proc.stderr.decode("utf-8", errors="replace")[:400]
+        return f"save_document: pandoc failed: {err}"
+
+    try:
+        put = requests.put(url, data=proc.stdout, auth=auth, timeout=60)
+    except requests.RequestException as e:
+        return f"save_document: WebDAV upload failed ({e})."
+    if put.status_code not in (200, 201, 204):
+        return f"save_document: WebDAV upload returned {put.status_code}."
+
+    return f"Saved to Nextcloud: Drafts/{final_name}"
+
+
+RETRIEVE_DOCUMENTS_TOOL = {
+    "name": "retrieve_documents",
+    "description": (
+        "Search Aaron's knowledge base — personal documents, reading library, "
+        "conversation transcripts, and journal entries — for content relevant "
+        "to a query. Call whenever you need concrete information you don't "
+        "already have from the persistent memory file. For compound questions "
+        "(e.g. 'bio emphasizing consulting work and recent research'), call "
+        "this tool multiple times with different concrete queries; one call "
+        "per distinct information need. Prefer specific named entities, "
+        "project names, course codes, or topic-specific terms over abstract "
+        "instructional phrasing — 'FWN3D consulting' retrieves better than "
+        "'my work'. Results are ranked by semantic + lexical hybrid retrieval "
+        "and a cross-encoder reranker; no taxonomy is applied, so judge each "
+        "returned chunk on its own merits and ignore irrelevant hits."
+    ),
+    "input_schema": {
+        "type": "object",
+        "properties": {
+            "query": {
+                "type": "string",
+                "description": "The search query. Use concrete terms.",
+            },
+        },
+        "required": ["query"],
+    },
+}
+
+
+def _execute_retrieve_documents(tool_input):
+    """Run retrieve_context for a tool call. Returns (tool_result_text, sources)."""
+    query = (tool_input or {}).get("query", "").strip()
+    if not query:
+        return ("No query provided.", [])
+    pieces, sources = retrieve_context(query)
+    if not pieces:
+        return (f"No results for query={query!r}.", [])
+    parts = []
+    for i, (piece, src) in enumerate(zip(pieces, sources), 1):
+        parts.append(f"[{i}] Source: {src}\n{piece}")
+    return ("\n\n---\n\n".join(parts), sources)
+
+
 def chat(user_message, conversation_id, settings, client_time=None):
    memory = load_memory()
-    context_pieces, sources = retrieve_context(user_message)
    history = get_conversation_history(conversation_id)

-    context_parts = []
-    if client_time:
-        context_parts.append(f"Current time (user-supplied, not logged): {client_time}")
+    # System prompt + persistent memory are stable across the tool_use round-trip
+    # and across turns within the 5-minute cache TTL. Putting cache_control on the
+    # last system block creates a cache breakpoint here — the second LLM call in a
+    # tool_use turn reads this prefix from cache (~10% of standard input cost)
+    # instead of re-billing it. Memory lives here (not in the user message) so its
+    # position stays stable for cache hits.
+    system_blocks = [{"type": "text", "text": SYSTEM_PROMPT}]
    if memory:
-        context_parts.append(f"Aaron's persistent memory:\n\n{memory}")
-    if context_pieces:
-        context_str = "\n\n---\n\n".join(context_pieces)
-        unique_sources = list(set(sources))
-        context_parts.append(
-            f"Relevant excerpts from Aaron's documents:\n\n{context_str}\n\nSources: {', '.join(unique_sources)}"
+        system_blocks.append({
+            "type": "text",
+            "text": f"Aaron's persistent memory:\n\n{memory}",
+        })
+    system_blocks[-1]["cache_control"] = {"type": "ephemeral"}
+
+    # client_time is per-turn dynamic, so it stays out of the cached prefix.
+    if client_time:
+        full_message = (
+            f"Current time (user-supplied, not logged): {client_time}\n\n"
+            f"---\n\n{user_message}"
        )
-    context_block = "\n\n====\n\n".join(context_parts) + "\n\n---\n\n" if context_parts else ""
-    full_message = context_block + user_message
+    else:
+        full_message = user_message

    messages = history + [{"role": "user", "content": full_message}]

-    tools = [{"type": "web_search_20250305", "name": "web_search"}] if settings.get("web_search", True) else []
+    tools = [RETRIEVE_DOCUMENTS_TOOL, SEARCH_FACTS_TOOL, SAVE_DOCUMENT_TOOL]
+    if settings.get("web_search", True):
+        tools.append({"type": "web_search_20250305", "name": "web_search"})
+
+    accumulated_sources = []
+    retrieval_count = 0

    while True:
-        kwargs = {
-            "model": "claude-sonnet-4-6",
-            "max_tokens": 2048,
-            "system": SYSTEM_PROMPT,
-            "messages": messages
-        }
-        if tools:
-            kwargs["tools"] = tools
-
-        response = anthropic_client.messages.create(**kwargs)
+        response = anthropic_client.messages.create(
+            model="claude-sonnet-4-6",
+            max_tokens=2048,
+            system=system_blocks,
+            messages=messages,
+            tools=tools,
+        )

        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})
            tool_results = []
            for block in response.content:
-                if block.type == "tool_use":
+                if block.type != "tool_use":
+                    continue
+                if block.name == "retrieve_documents":
+                    if retrieval_count >= MAX_RETRIEVALS_PER_TURN:
+                        result_text = (
+                            f"Retrieval budget exhausted "
+                            f"({MAX_RETRIEVALS_PER_TURN} calls used this turn). "
+                            "Answer with the information you already have or "
+                            "tell Aaron you need a more focused question."
+                        )
+                    else:
+                        result_text, result_sources = _execute_retrieve_documents(block.input)
+                        accumulated_sources.extend(result_sources)
+                        retrieval_count += 1
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
-                        "content": "Search completed"
+                        "content": result_text,
+                    })
+                elif block.name == "search_facts":
+                    result_text = _execute_search_facts(block.input)
+                    tool_results.append({
+                        "type": "tool_result",
+                        "tool_use_id": block.id,
+                        "content": result_text,
+                    })
+                elif block.name == "save_document":
+                    result_text = _execute_save_document(block.input)
+                    tool_results.append({
+                        "type": "tool_result",
+                        "tool_use_id": block.id,
+                        "content": result_text,
+                    })
+                else:
+                    tool_results.append({
+                        "type": "tool_result",
+                        "tool_use_id": block.id,
+                        "content": "Search completed",
                    })
            messages.append({"role": "user", "content": tool_results})
        else:
@@ -335,7 +816,18 @@ def chat(user_message, conversation_id, settings, client_time=None):
            for block in response.content:
                if hasattr(block, "text"):
                    assistant_message += block.text
-            return assistant_message, list(set(sources))
+            # Async fire-and-forget into Graphiti so the turn lands in the
+            # graph as a single episode for future search_facts queries to
+            # find. Takes ~20 min wall-clock in the background; chat returns
+            # immediately. Disable via SKIP_GRAPHITI_CHAT_PUSH=1 if needed.
+            _push_chat_turn_to_graphiti(conversation_id, user_message, assistant_message)
+            # Cap citations: accumulated_sources can grow large across multiple
+            # retrieve_documents calls and not every chunk that came back was
+            # actually used in the answer. Insertion order preserves rank
+            # (each call returns chunks reranker-ordered, so the earliest
+            # entries are the highest-relevance from the most direct queries).
+            deduped = list(dict.fromkeys(accumulated_sources))
+            return assistant_message, deduped[:MAX_CITED_SOURCES]

 from contextlib import asynccontextmanager

@@ -365,7 +857,7 @@ async def login(request: Request, response: Response):
        httponly=True,
        secure=True,
        samesite="lax",
-        max_age=60 * 60 * 24 * 30
+        max_age=SESSION_MAX_AGE_SECONDS
    )
    response.body = b'{"ok": true}'
    response.status_code = 200
@@ -409,7 +901,7 @@ async def update_settings(request: Request, auth: str = Depends(require_auth)):

@app.get("/api/conversations")
 async def list_conversations(auth: str = Depends(require_auth)):
-    conn = sqlite3.connect(CONVERSATIONS_DB)
+    conn = _connect_conversations()
    c = conn.cursor()
    c.execute('''SELECT id, title, created_at, updated_at, message_count
                 FROM conversations ORDER BY updated_at DESC LIMIT 100''')
@@ -429,7 +921,7 @@ async def new_conversation(request: Request, auth: str = Depends(require_auth)):

@app.get("/api/conversations/{conv_id}/messages")
 async def get_messages(conv_id: str, auth: str = Depends(require_auth)):
-    conn = sqlite3.connect(CONVERSATIONS_DB)
+    conn = _connect_conversations()
    c = conn.cursor()
    c.execute('''SELECT role, content, sources, timestamp FROM messages
                 WHERE conversation_id = ? ORDER BY timestamp ASC''', (conv_id,))
@@ -446,7 +938,7 @@ async def rename_conversation(conv_id: str, request: Request, auth: str = Depend
    title = data.get("title", "")
    if not title:
        return JSONResponse({"error": "Title required"}, status_code=400)
-    conn = sqlite3.connect(CONVERSATIONS_DB)
+    conn = _connect_conversations()
    c = conn.cursor()
    c.execute("UPDATE conversations SET title = ? WHERE id = ?", (title, conv_id))
    conn.commit()
@@ -455,7 +947,7 @@ async def rename_conversation(conv_id: str, request: Request, auth: str = Depend

@app.delete("/api/conversations/{conv_id}")
 async def delete_conversation(conv_id: str, auth: str = Depends(require_auth)):
-    conn = sqlite3.connect(CONVERSATIONS_DB)
+    conn = _connect_conversations()
    c = conn.cursor()
    c.execute("DELETE FROM messages WHERE conversation_id = ?", (conv_id,))
    c.execute("DELETE FROM conversations WHERE id = ?", (conv_id,))
@@ -500,14 +992,14 @@ async def chat_endpoint(request: Request, auth: str = Depends(require_auth)):
    save_message(conversation_id, "user", user_message)

    # Auto-title conversation from first message
-    conn = sqlite3.connect(CONVERSATIONS_DB)
+    conn = _connect_conversations()
    c = conn.cursor()
    c.execute("SELECT message_count, title FROM conversations WHERE id = ?", (conversation_id,))
    row = c.fetchone()
    conn.close()
    if row and row[0] <= 1 and row[1] == "New conversation":
        auto_title = user_message[:60] + ("..." if len(user_message) > 60 else "")
-        conn = sqlite3.connect(CONVERSATIONS_DB)
+        conn = _connect_conversations()
        c = conn.cursor()
        c.execute("UPDATE conversations SET title = ? WHERE id = ?", (auto_title, conversation_id))
        conn.commit()
@@ -587,7 +1079,7 @@ async def get_status(auth: str = Depends(require_auth)):
        pass

    # Conversation count
-    conn = sqlite3.connect(CONVERSATIONS_DB)
+    conn = _connect_conversations()
    c = conn.cursor()
    c.execute("SELECT COUNT(*) FROM conversations")
    conv_count = c.fetchone()[0]
@@ -623,6 +1115,7 @@ async def transcribe_audio(request: Request, audio: UploadFile = File(...), auth
            tmp_path,
            language="en",
            vad_filter=True,
+            beam_size=1,
            initial_prompt=WHISPER_PROMPT
        )
        transcript = " ".join(s.text.strip() for s in segments)
@@ -669,44 +1162,92 @@ async def run_dreamer(request: Request, auth: str = Depends(require_auth)):
        return JSONResponse({"started": False, "error": str(e)})

 def transcribe_and_save(tmp_path, timestamp, nextcloud_url, nextcloud_user, nextcloud_password):
-    """Background task — transcribes audio and saves to Nextcloud after endpoint returns."""
+    """Background task — transcribes audio and saves to Nextcloud after endpoint returns.
+    Audio is preserved in Journal/Media/ on every terminal path; failed and empty-transcript
+    captures still produce a markdown record in Journal/Captures/ with a status field."""
    import requests as req_lib
    nc_auth = (nextcloud_user, nextcloud_password)
-    try:
-        segments, _ = whisper_model.transcribe(
-            tmp_path, language="en", vad_filter=True, initial_prompt=WHISPER_PROMPT
-        )
-        transcript = " ".join(s.text.strip() for s in segments).strip()
-        os.unlink(tmp_path)
-        if not transcript:
-            print(f"Async transcription empty for {timestamp} — nothing saved")
-            return
-        filename = f"{timestamp}-voice.md"
-        content_md = f"# Capture — {timestamp}\n\n**type:** voice\n**modality:** audio\n**status:** unprocessed\n\n---\n\n{transcript}\n"
-        captures_dir = f"{nextcloud_url}/remote.php/dav/files/{nextcloud_user}/Journal/Captures"
-        req_lib.request("MKCOL", captures_dir, auth=nc_auth, timeout=10)
-        url = f"{captures_dir}/{filename}"
-        req_lib.put(url, data=content_md.encode("utf-8"), auth=nc_auth, timeout=30)
-        print(f"Async transcription saved: {filename}")
-        # Notify SSE clients that transcription is complete
+    month_dir = timestamp[:7]
+    audio_ext = os.path.splitext(tmp_path)[1] or ".webm"
+    audio_filename = f"{timestamp}-voice{audio_ext}"
+    audio_relpath = f"Journal/Media/{month_dir}/{audio_filename}"
+
+    def archive_audio() -> bool:
        try:
-            import requests as _req
-            _req.post("http://localhost:8000/api/events/notify", json={
-                "type": "capture_saved",
-                "filename": filename,
-                "timestamp": timestamp,
-            }, timeout=3)
-            _req.post("http://localhost:8000/api/captures/events/notify", json={
-                "type": "capture_saved",
-                "filename": filename,
-                "timestamp": timestamp,
-            }, timeout=3)
+            with open(tmp_path, "rb") as f:
+                audio_bytes = f.read()
+            media_parent = f"{nextcloud_url}/remote.php/dav/files/{nextcloud_user}/Journal/Media"
+            media_dir = f"{media_parent}/{month_dir}"
+            req_lib.request("MKCOL", media_parent, auth=nc_auth, timeout=10)
+            req_lib.request("MKCOL", media_dir, auth=nc_auth, timeout=10)
+            req_lib.put(f"{media_dir}/{audio_filename}", data=audio_bytes, auth=nc_auth, timeout=60)
+            return True
+        except Exception as e:
+            print(f"Audio archival failed for {timestamp}: {e}")
+            return False
+        finally:
+            if os.path.exists(tmp_path):
+                os.unlink(tmp_path)
+
+    def write_capture(filename: str, content_md: str, status: str):
+        captures_dir = f"{nextcloud_url}/remote.php/dav/files/{nextcloud_user}/Journal/Captures"
+        try:
+            req_lib.request("MKCOL", captures_dir, auth=nc_auth, timeout=10)
+            req_lib.put(f"{captures_dir}/{filename}", data=content_md.encode("utf-8"), auth=nc_auth, timeout=30)
+        except Exception as e:
+            print(f"Capture markdown write failed for {timestamp}: {e}")
+            return
+        try:
+            payload = {"type": "capture_saved", "filename": filename, "timestamp": timestamp, "status": status}
+            req_lib.post("http://localhost:8000/api/events/notify", json=payload, timeout=3)
+            req_lib.post("http://localhost:8000/api/captures/events/notify", json=payload, timeout=3)
        except Exception:
            pass
+
+    transcript = ""
+    transcribe_error = None
+    try:
+        segments, _ = whisper_model.transcribe(
+            tmp_path, language="en", vad_filter=True, beam_size=1, initial_prompt=WHISPER_PROMPT
+        )
+        transcript = " ".join(s.text.strip() for s in segments).strip()
    except Exception as e:
-        if os.path.exists(tmp_path):
-            os.unlink(tmp_path)
-        print(f"Async transcription failed for {timestamp}: {e}")
+        transcribe_error = str(e)
+
+    audio_archived = archive_audio()
+    audio_line = f"**audio_path:** {audio_relpath}\n" if audio_archived else "**audio_archive_failed:** true\n"
+
+    if transcribe_error is not None:
+        filename = f"{timestamp}-voice-failed.md"
+        content_md = (
+            f"# Capture — {timestamp}\n\n"
+            f"**type:** voice\n**modality:** audio\n**status:** failed_transcription\n"
+            f"{audio_line}"
+            f"**error:** {transcribe_error}\n"
+        )
+        write_capture(filename, content_md, "failed_transcription")
+        print(f"Async transcription failed for {timestamp}: {transcribe_error}")
+        return
+
+    if not transcript:
+        filename = f"{timestamp}-voice-empty.md"
+        content_md = (
+            f"# Capture — {timestamp}\n\n"
+            f"**type:** voice\n**modality:** audio\n**status:** empty_transcript\n"
+            f"{audio_line}"
+        )
+        write_capture(filename, content_md, "empty_transcript")
+        print(f"Async transcription empty for {timestamp}: audio archived")
+        return
+
+    filename = f"{timestamp}-voice.md"
+    content_md = (
+        f"# Capture — {timestamp}\n\n"
+        f"**type:** voice\n**modality:** audio\n**status:** saved\n"
+        f"{audio_line}\n---\n\n{transcript}\n"
+    )
+    write_capture(filename, content_md, "saved")
+    print(f"Async transcription saved: {filename}")


@app.post("/api/capture")
@@ -760,7 +1301,7 @@ async def capture_endpoint(
                    tmp.write(audio_bytes)
                    tmp_audio_path = tmp.name
                segments, _ = whisper_model.transcribe(
-                    tmp_audio_path, language="en", vad_filter=True, initial_prompt=WHISPER_PROMPT
+                    tmp_audio_path, language="en", vad_filter=True, beam_size=1, initial_prompt=WHISPER_PROMPT
                )
                voice_annotation = " ".join(s.text.strip() for s in segments).strip() or None
                os.unlink(tmp_audio_path)
@@ -813,7 +1354,7 @@ Keep the full description to 150-250 words. Do not speculate beyond what is visi

 **type:** {capture_type}
 **modality:** {modality}
-**status:** unprocessed
+**status:** saved
 **media:** {media_path}
 {f"**project:** {project}" if project else ""}

@@ -969,7 +1510,7 @@ async def reindex_status(auth: str = Depends(require_auth)):

@app.delete("/api/conversations")
 async def clear_all_conversations(auth: str = Depends(require_auth)):
-    conn = sqlite3.connect(CONVERSATIONS_DB)
+    conn = _connect_conversations()
    c = conn.cursor()
    c.execute("DELETE FROM messages")
    c.execute("DELETE FROM conversations")
@@ -0,0 +1,128 @@
+"""One-off: backfill last_consolidated_at + consolidation_count on embeddings
+from the dream-manifest-*.json files already in Journal/Dreams/.
+
+Why this exists: the consolidation cursor columns added by the dreamer
+redesign migration default to NULL / 0. Without history, the
+underprocessed-count signal in dream_observation.observe_corpus() reports
+"every chunk is underprocessed" (degenerate percentile), and NREM has no
+basis to bias replay toward least-recently-consolidated chunks.
+
+We have ~25 historical dream manifests in Nextcloud/Journal/Dreams/, each
+listing the sources retrieved per stage. For each (manifest, source) pair
+this script:
+  - finds matching embeddings rows by source (basename match)
+  - increments consolidation_count by 1
+  - updates last_consolidated_at to the manifest date (UTC midnight)
+
+Idempotent: re-running will not double-count because we drop existing
+cursor values to NULL/0 before backfilling. Pass --dry-run to print what
+would change without writing.
+"""
+
+import json
+import os
+import sys
+from datetime import datetime, timezone
+from pathlib import Path
+
+from dotenv import load_dotenv
+import psycopg2
+
+load_dotenv(Path.home() / "aaronai" / ".env", override=True)
+
+PG_DSN = os.getenv("PG_DSN")
+DREAMS_DIR = Path("/home/aaron/nextcloud/data/data/aaron/files/Journal/Dreams")
+DRY_RUN = "--dry-run" in sys.argv
+
+
+def get_pg():
+    return psycopg2.connect(PG_DSN)
+
+
+def collect_manifest_records():
+    """Return a list of (source_basename, manifest_date_utc) tuples from all
+    dream-manifest-*.json files. One pair per (manifest, source) appearance."""
+    pairs = []
+    if not DREAMS_DIR.exists():
+        return pairs
+    for path in sorted(DREAMS_DIR.glob("dream-manifest-*.json")):
+        try:
+            m = json.loads(path.read_text())
+        except Exception as e:
+            print(f"  skip {path.name}: {e}")
+            continue
+        date_str = m.get("date")
+        if not date_str:
+            continue
+        try:
+            dt = datetime.fromisoformat(date_str).replace(tzinfo=timezone.utc)
+        except ValueError:
+            continue
+        stages = m.get("stages") or {}
+        for stage_name in ("nrem", "early_rem", "late_rem", "synthesis"):
+            stage = stages.get(stage_name) or {}
+            for src in (stage.get("sources") or []):
+                if src:
+                    pairs.append((src, dt))
+    return pairs
+
+
+def main():
+    print(f"Mode: {'DRY-RUN' if DRY_RUN else 'APPLY'}")
+    print(f"Scanning manifests in {DREAMS_DIR}")
+    pairs = collect_manifest_records()
+    print(f"Collected {len(pairs)} (source, manifest_date) pairs across all manifests")
+    if not pairs:
+        print("Nothing to backfill.")
+        return
+
+    # Aggregate per source: count + latest date
+    from collections import defaultdict
+    counts = defaultdict(int)
+    latest = {}
+    for src, dt in pairs:
+        counts[src] += 1
+        if src not in latest or dt > latest[src]:
+            latest[src] = dt
+    print(f"Unique sources to update: {len(counts)}")
+
+    # Sample what we'd write
+    print("Sample (top 5 by appearance count):")
+    for src, n in sorted(counts.items(), key=lambda kv: -kv[1])[:5]:
+        print(f"  {n:>3} appearances — {src} → last_consolidated_at = {latest[src].date()}")
+
+    if DRY_RUN:
+        print("\nDry-run only. Re-run without --dry-run to apply.")
+        return
+
+    pg = get_pg()
+    cur = pg.cursor()
+
+    # Reset cursor for any sources we're about to backfill so reruns are clean.
+    print("\nResetting cursor for sources we'll touch...")
+    sources = list(counts.keys())
+    cur.execute(
+        "UPDATE embeddings SET last_consolidated_at = NULL, consolidation_count = 0 "
+        "WHERE source = ANY(%s)",
+        (sources,),
+    )
+    print(f"  reset {cur.rowcount} embeddings rows")
+
+    # Apply per-source updates. For each source, set count and latest date.
+    print("Applying per-source backfill...")
+    updated_rows = 0
+    for src, n in counts.items():
+        cur.execute(
+            "UPDATE embeddings "
+            "SET consolidation_count = %s, last_consolidated_at = %s "
+            "WHERE source = %s",
+            (n, latest[src], src),
+        )
+        updated_rows += cur.rowcount
+    pg.commit()
+    pg.close()
+    print(f"Done. Updated {updated_rows} embeddings rows across {len(counts)} unique sources.")
+
+
+if __name__ == "__main__":
+    main()
@@ -6,7 +6,7 @@ mkdir -p "$BACKUP_DIR"
 # Copy critical files
 cp ~/aaronai/memory.md "$BACKUP_DIR/memory-$DATE.md"
 cp ~/aaronai/settings.json "$BACKUP_DIR/settings-$DATE.json"
-cp ~/aaronai/conversations.db "$BACKUP_DIR/conversations-$DATE.db"
+python3 -c "import sqlite3, sys; src = sqlite3.connect('$HOME/aaronai/conversations.db'); dst = sqlite3.connect('$BACKUP_DIR/conversations-$DATE.db'); src.backup(dst); dst.close(); src.close()"

 # Keep only last 7 days
 find "$BACKUP_DIR" -name "*.md" -mtime +7 -delete
@@ -16,12 +16,14 @@ import os
 import json
 import sqlite3
 import argparse
+from functools import lru_cache
 from collections import Counter
 from pathlib import Path
 from datetime import datetime, timedelta
 from dotenv import load_dotenv
 import psycopg2
 import hashlib
+import numpy as np

 load_dotenv(Path.home() / "aaronai" / ".env", override=True)

@@ -41,6 +43,26 @@ NEXTCLOUD_USER     = os.getenv("NEXTCLOUD_USER", "aaron")
 NEXTCLOUD_PASSWORD = os.getenv("NEXTCLOUD_PASSWORD", "")
 DREAMS_WEBDAV      = f"{NEXTCLOUD_URL}/remote.php/dav/files/{NEXTCLOUD_USER}/Journal/Dreams"

+# ─── Retrieval-window config (per dreamer-multimodal-design.md §2) ─────────
+# Biological grounding: NREM replays recent traces (24-72 hrs); REM links
+# across time on structural similarity, not temporal proximity. Synthesis
+# pulls from salience across the full corpus (no window). Spec calls for
+# these to be mutable rather than hardcoded — this is the mutable home.
+TIME_WINDOWS_HOURS = {
+    "nrem":      72,            # 24-72 hrs, take wider end
+    "early-rem": 24 * 30,       # 30 days
+    "late-rem":  24 * 90,       # 90 days
+    "lucid":     None,          # no window
+}
+
+# Maximal Marginal Relevance: λ=1 → pure relevance, λ=0 → pure diversity.
+# 0.5 is the standard balance; tune later if the dossier-cluster problem
+# isn't sufficiently broken up.
+MMR_LAMBDA = 0.5
+
+# Fast/cheap model for query generation. Sonnet for synthesis (in synthesize_*).
+LLM_QUERY_MODEL = os.getenv("DREAMER_QUERY_MODEL", "claude-haiku-4-5-20251001")
+
 # Similarity ranges calibrated for all-MiniLM-L6-v2
 MODE_RANGES = {
    "nrem":      (0.48, 0.72),
@@ -283,71 +305,298 @@ def retrieve_graphiti(mode, task=None, n_results=8, excluded_sources=None):
        print(f"[Graphiti retrieval error: {e}] — falling back to empty.")
        return []

-def retrieve(mode, task=None, n_results=8, excluded_sources=None, type_filter=None):
-    # E3 experiment: DREAMER_SUBSTRATE=graphiti routes retrieval to Graphiti /search
-    # Default behavior: pgvector similarity search (unchanged)
-    # type_filter is experimental and applies to pgvector retrieval only — Graphiti
-    # facts are not embeddings rows and have no embeddings.type to filter on.
-    substrate = os.getenv("DREAMER_SUBSTRATE", "pgvector")
-    if substrate == "graphiti":
-        return retrieve_graphiti(mode, task=task, n_results=n_results, excluded_sources=excluded_sources)
+@lru_cache(maxsize=1)
+def _get_embedder():
    from sentence_transformers import SentenceTransformer
-    embedder = SentenceTransformer("all-MiniLM-L6-v2")
-    low, high = MODE_RANGES[mode]
+    return SentenceTransformer("all-MiniLM-L6-v2")
+
+def _llm_generate_queries(mode, signal, task=None, n_queries=4):
+    """Park et al. 2023 reflection-style query generation. Feeds the LLM the
+    observation signal + a mode-specific framing; emits N retrieval queries
+    that probe different corners of the recent corpus instead of the same
+    hardcoded string every night. Sources cited in dream_observation.py.
+
+    Falls back to recent_questions from the signal if the LLM call fails."""
+    import anthropic

    if task:
-        query = task
-    elif mode == "late-rem":
-        delta = observe_corpus()
-        topics = delta.get("recent_topics", [])
-        query = topics[0] if topics else "practice place memory making"
-    elif mode == "early-rem":
-        query = "career decision personal change what matters next"
+        # Lucid mode: decompose the user's task into sub-queries
+        prompt = (
+            f"Decompose this user task into {n_queries} distinct sub-questions, "
+            f"each suitable as a retrieval query against Aaron's personal corpus.\n\n"
+            f"TASK: {task}\n\n"
+            f'Output JSON ONLY: {{"queries": ["...", "...", ...]}}'
+        )
    else:
-        query = "research fabrication teaching practice recent work"
+        mode_framings = {
+            "nrem": (
+                "NREM is replay-and-consolidation of RECENT traces. Generate queries "
+                "that probe what Aaron has been working on or capturing in the last "
+                "few days. Concrete entities — project names, course codes, named "
+                "subjects. The dreamer is re-touching specific recent material to "
+                "strengthen schema connections, not finding novel content."
+            ),
+            "early-rem": (
+                "Early REM is associative bridging with emotional/personal register. "
+                "Generate queries that surface unresolved themes, career questions, "
+                "ongoing personal threads — material that connects intellectual and "
+                "emotional dimensions. Tone: thoughtful friend, not researcher."
+            ),
+            "late-rem": (
+                "Late REM tests novel connections across DISTANT material. Generate "
+                "queries that pair concrete subjects from DIFFERENT domains of Aaron's "
+                "work (e.g., one from academic teaching, one from consulting, one from "
+                "creative practice) to probe for surprising structural similarity. "
+                "Cross-domain is required."
+            ),
+        }
+        framing = mode_framings.get(mode, mode_framings["nrem"])
+        questions_snippet = "\n".join(
+            f"  - {q[:200]}" for q in signal.get("recent_questions", [])[:8]
+        ) or "  (no recent user questions)"
+        journal_snippet = ", ".join(signal.get("new_journal_entries", [])[:5]) or "(none)"
+        days_str = (
+            f"{signal['days_since_dream']:.1f}"
+            if signal.get("days_since_dream") not in (None, float("inf"))
+            else "infinite (first dream)"
+        )
+        prompt = (
+            f"You generate retrieval queries for an Active Inference dreamer. The "
+            f"dreamer surfaces prediction errors — gaps between Aaron's model and "
+            f"reality — not summaries or generic associations.\n\n"
+            f"MODE: {mode}\n"
+            f"FRAMING: {framing}\n\n"
+            f"OBSERVATION SIGNAL:\n"
+            f"- Days since last dream: {days_str}\n"
+            f"- New chunks since last dream: {signal.get('new_chunks', 0)}\n"
+            f"- New journal entries: {journal_snippet}\n"
+            f"- Underprocessed chunks pool: {signal.get('underprocessed_count', 0):,}\n\n"
+            f"RECENT USER QUESTIONS (last 14 days, top 8):\n{questions_snippet}\n\n"
+            f"Generate {n_queries} retrieval queries. Requirements:\n"
+            f"- Use concrete entities, named projects, course codes, specific topics "
+            f"— NOT generic phrasing like 'research work practice'\n"
+            f"- Each query probes a DIFFERENT corner of recent activity\n"
+            f"- Match the {mode} framing\n"
+            f"- 5-15 words each\n\n"
+            f'Output JSON ONLY: {{"queries": ["...", "...", ...]}}'
+        )

-    embedding = embedder.encode([query]).tolist()[0]
-    chunks = []
-    seen_sources = set()
+    try:
+        client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
+        resp = client.messages.create(
+            model=LLM_QUERY_MODEL,
+            max_tokens=512,
+            messages=[{"role": "user", "content": prompt}],
+        )
+        text = "".join(b.text for b in resp.content if hasattr(b, "text")).strip()
+        if text.startswith("```"):
+            text = text.split("```", 2)[1]
+            if text.startswith("json"):
+                text = text[4:]
+            text = text.strip()
+        data = json.loads(text)
+        queries = data.get("queries", [])
+        if isinstance(queries, list) and queries:
+            return [str(q).strip() for q in queries[:n_queries] if str(q).strip()]
+    except Exception as e:
+        print(f"[dream] LLM query generation failed ({e}); falling back to recent questions")

+    fallback = signal.get("recent_questions", [])[:n_queries] if signal else []
+    return fallback or [task or "recent activity decisions thinking"]
+
+
+def _mmr_select(candidate_embeddings, query_embedding, n, lambda_=MMR_LAMBDA):
+    """Maximal Marginal Relevance — greedy selection that balances relevance
+    against pairwise diversity. Carbonell & Goldstein 1998. Used to prevent
+    cluster lock-in (e.g., 8 dossier-narrative variants filling all 8 slots).
+
+    candidate_embeddings: (N, D) numpy array
+    query_embedding: (D,) numpy array
+    Returns: list of indices into candidate_embeddings, len ≤ n."""
+    if len(candidate_embeddings) == 0:
+        return []
+    n = min(n, len(candidate_embeddings))
+    cands = candidate_embeddings / (np.linalg.norm(candidate_embeddings, axis=1, keepdims=True) + 1e-9)
+    q = query_embedding / (np.linalg.norm(query_embedding) + 1e-9)
+    relevance = cands @ q
+    selected = []
+    remaining = list(range(len(cands)))
+    while len(selected) < n and remaining:
+        if not selected:
+            best = max(remaining, key=lambda i: relevance[i])
+        else:
+            sel = cands[selected]
+            scores = {
+                i: lambda_ * relevance[i] - (1 - lambda_) * float((cands[i] @ sel.T).max())
+                for i in remaining
+            }
+            best = max(scores, key=scores.get)
+        selected.append(best)
+        remaining.remove(best)
+    return selected
+
+
+def _bump_consolidation_cursor(chunks):
+    """Increment consolidation_count + set last_consolidated_at=NOW() for each
+    source represented in chunks. Called from dream_pipeline after NREM
+    completes. Per sharp-wave-ripples biology, NREM does the actual
+    consolidation; REM is associative use, so we only bump on NREM."""
+    if not chunks:
+        return
+    sources = list({c["source"] for c in chunks if c.get("source")})
+    if not sources:
+        return
    try:
        pg = get_pg()
        cur = pg.cursor()
-        excluded_sources = excluded_sources or set()
-        where, params = [], []
-        if excluded_sources:
-            where.append("source NOT IN %s")
-            params.append(tuple(excluded_sources))
-        if type_filter:
-            where.append("type = ANY(%s)")
-            params.append(list(type_filter))
-        where_clause = ("WHERE " + " AND ".join(where)) if where else ""
-        cur.execute(f"""
-            SELECT document, source, type, 1 - (embedding <=> %s::vector) as similarity
-            FROM embeddings
-            {where_clause}
-            ORDER BY embedding <=> %s::vector
-            LIMIT %s
-        """, [embedding, *params, embedding, n_results * 3])
-
-        for doc, source, etype, similarity in cur.fetchall():
-            if not (low <= similarity <= high):
-                continue
-            if source in seen_sources:
-                continue
-            chunks.append({
-                "source": source or "unknown",
-                "content": doc,
-                "relevance": similarity,
-                "similarity": similarity,
-                "type": etype,
-            })
-            seen_sources.add(source)
-            if len(chunks) >= n_results:
-                break
+        cur.execute(
+            "UPDATE embeddings "
+            "SET consolidation_count = consolidation_count + 1, "
+            "    last_consolidated_at = NOW() "
+            "WHERE source = ANY(%s)",
+            (sources,),
+        )
+        pg.commit()
        pg.close()
    except Exception as e:
-        print(f"pgvector retrieval error: {e}")
+        print(f"[dream] cursor bump failed (non-fatal): {e}")
+
+
+def retrieve(mode, task=None, n_results=8, excluded_sources=None,
+             type_filter=None, signal=None):
+    """Refactored retrieval — see dreamer-design-spec.md Stage 3 + the
+    external-literature prescription in birdai-dreamer-exclusion-finding-2026-05-02.md.
+
+    Changes from the prior hardcoded-query version:
+    - Queries are LLM-generated from the observation signal (Park et al.
+      reflection pattern) instead of fixed strings. Solves the "same 8 sources
+      every night" failure where fixed seeds locked into one neighborhood.
+    - Per-mode time windows (24-72hr NREM / 30d Early REM / 90d Late REM)
+      filter candidates before vector search. Spec calls for these to be
+      mutable; they live in TIME_WINDOWS_HOURS.
+    - NREM biases toward under-processed chunks (low consolidation_count).
+      Biologically motivated: sharp-wave ripples tag what to replay, not
+      uniform sampling.
+    - Multiple queries (4 by default) → over-fetch → MMR merge for
+      within-night diversity. Prevents cluster domination.
+
+    signal is the observation-signal dict from dream_observation.observe_corpus().
+    If None, observe_corpus is called inline (back-compat for ad-hoc invocation).
+    """
+    # E3 substrate experiment unchanged
+    substrate = os.getenv("DREAMER_SUBSTRATE", "pgvector")
+    if substrate == "graphiti":
+        return retrieve_graphiti(mode, task=task, n_results=n_results,
+                                 excluded_sources=excluded_sources)
+
+    if signal is None:
+        from dream_observation import observe_corpus as _obs
+        signal = _obs()
+
+    queries = _llm_generate_queries(mode, signal, task=task, n_queries=4)
+    if not queries:
+        print(f"[dream:{mode}] no queries generated; bailing")
+        return []
+    print(f"[dream:{mode}] generated queries: {queries}")
+
+    embedder = _get_embedder()
+    excluded_sources = excluded_sources or set()
+    window_hours = TIME_WINDOWS_HOURS.get(mode)
+    per_query_n = 12   # over-fetch for MMR
+
+    candidates = []
+    seen_ids = set()
+    try:
+        pg = get_pg()
+        cur = pg.cursor()
+        for q in queries:
+            q_emb = embedder.encode([q]).tolist()[0]
+            where, params = [], []
+            if excluded_sources:
+                where.append("source NOT IN %s")
+                params.append(tuple(excluded_sources))
+            if type_filter:
+                where.append("type = ANY(%s)")
+                params.append(list(type_filter))
+            if window_hours is not None:
+                # created_at is TEXT (legacy); cast it. NULL created_at fails
+                # the comparison so legacy rows are excluded from windowed
+                # modes — correct: NULL means "indexed before cursor existed,"
+                # which by definition is older than any window.
+                where.append(
+                    f"(created_at IS NOT NULL AND "
+                    f"created_at::timestamptz > NOW() - INTERVAL '{int(window_hours)} hours')"
+                )
+            where_clause = ("WHERE " + " AND ".join(where)) if where else ""
+            # NREM bias: order by consolidation_count ASC first (under-processed
+            # chunks win the tiebreak before vector distance). Other modes:
+            # vector distance only.
+            order_clause = (
+                "ORDER BY consolidation_count ASC, embedding <=> %s::vector"
+                if mode == "nrem"
+                else "ORDER BY embedding <=> %s::vector"
+            )
+            cur.execute(f"""
+                SELECT id, document, source, type, embedding,
+                       1 - (embedding <=> %s::vector) as similarity
+                FROM embeddings
+                {where_clause}
+                {order_clause}
+                LIMIT %s
+            """, [q_emb, *params, q_emb, per_query_n])
+            for row in cur.fetchall():
+                if row[0] in seen_ids:
+                    continue
+                seen_ids.add(row[0])
+                emb = row[4]
+                # pgvector returns embeddings as string "[...]" by default
+                if isinstance(emb, str):
+                    emb = np.array([float(x) for x in emb.strip("[]").split(",")])
+                else:
+                    emb = np.array(emb)
+                candidates.append({
+                    "id": row[0],
+                    "content": row[1],
+                    "source": row[2] or "unknown",
+                    "type": row[3],
+                    "embedding": emb,
+                    "similarity": float(row[5]),
+                })
+        pg.close()
+    except Exception as e:
+        import traceback
+        print(f"[dream:{mode}] retrieval SQL error: {e}")
+        traceback.print_exc()
+        return []
+
+    if not candidates:
+        print(f"[dream:{mode}] zero candidates after filters")
+        return []
+
+    # MMR over the union, using the first query as pivot for the relevance term.
+    # Averaging query embeddings would be theoretically cleaner but adds
+    # complexity for marginal benefit at this scale.
+    pivot_emb = np.array(embedder.encode([queries[0]]).tolist()[0])
+    cand_embs = np.array([c["embedding"] for c in candidates])
+    selected_idx = _mmr_select(cand_embs, pivot_emb, n=n_results * 2)
+
+    # Post-MMR source-level dedup (multi-chunk same source collapses to one).
+    chunks = []
+    seen_sources = set()
+    for i in selected_idx:
+        c = candidates[i]
+        if c["source"] in seen_sources:
+            continue
+        seen_sources.add(c["source"])
+        chunks.append({
+            "source": c["source"],
+            "content": c["content"],
+            "relevance": c["similarity"],
+            "similarity": c["similarity"],
+            "type": c["type"],
+        })
+        if len(chunks) >= n_results:
+            break

    return chunks

@@ -480,16 +729,23 @@ def write_manifest(date_str, stage_data, corpus_data):
    auth = (NEXTCLOUD_USER, NEXTCLOUD_PASSWORD)
    url = f"{DREAMS_WEBDAV}/dream-manifest-{date_str}.json"
    try:
-        requests.put(url, data=content.encode("utf-8"), auth=auth, timeout=30)
+        response = requests.put(url, data=content.encode("utf-8"), auth=auth, timeout=30)
+        response.raise_for_status()
        print(f"Manifest written: Journal/Dreams/dream-manifest-{date_str}.json")
    except Exception as e:
-        print(f"Manifest write failed (non-critical): {e}")
+        print(f"Manifest write failed — manifest not persisted: {e}")


 def dream_pipeline(type_filter=None):
    """
    Full nightly pipeline — interdependent stages.
    NREM output feeds Early REM. Both feed Late REM. All three feed Synthesis.
+
+    Per dreamer-design-spec.md, this now runs Stage 1 (observe) and Stage 2
+    (select) first. If select_mode returns None — corpus unchanged and no new
+    journal entry — the dreamer goes quiet rather than manufacturing novelty.
+    Otherwise NREM/Early-REM/Late-REM run with LLM-generated queries seeded
+    from the observation signal.
    """
    print(f"Dreamer pipeline starting — {datetime.now().strftime('%Y-%m-%d %H:%M')}")

@@ -497,21 +753,47 @@ def dream_pipeline(type_filter=None):
    state.pop("retrieved_sources", None)  # legacy key; session-scoped novelty now
    session_retrieved = set()

-    delta = observe_corpus()
-    print(f"Corpus: {delta['new_chunks']} new chunks, {delta['days_since_dream']:.1f} days since last dream")
-    print("Novelty: session-scoped (no across-night exclusion)")
+    # ── Stage 1 + 2: Observe + Select ──────────────────────────────────────
+    from dream_observation import observe_corpus as _obs, select_mode as _select
+    signal = _obs()
+    print(
+        f"Signal: new_chunks={signal['new_chunks']}, "
+        f"new_journal={len(signal['new_journal_entries'])}, "
+        f"days_since={signal['days_since_dream']:.1f}, "
+        f"underprocessed={signal['underprocessed_count']:,}"
+    )
+    selected = _select(signal)
+    if selected is None:
+        print("[select_mode] None — nothing worth dreaming about tonight (going quiet)")
+        # Update last-dream-attempted-at but not last_dream — caller can distinguish
+        # an actual dream from a skipped night by looking at last_dream_file or
+        # checking the manifest dir.
+        state["last_select_quiet_at"] = datetime.now().isoformat()
+        save_dreamer_state(state)
+        return None
+    print(f"[select_mode] → {selected}")

-    # ── Stage 1: NREM ──────────────────────────────────────────────────────
+    # The pipeline always runs all three modes for the manifest's continuity.
+    # select_mode's choice signals the *primary* focus; the others still run
+    # but draw from their own mode-appropriate windows.
+    primary_mode = selected
+
+    # ── Stage 3: NREM ──────────────────────────────────────────────────────
    print("\n[NREM] Retrieving...")
    # NREM is replay-and-consolidation — does not exclude prior traces.
    # Late REM and Early REM exclude prior content for novelty; NREM does not.
-    nrem_chunks = retrieve("nrem", excluded_sources=None, type_filter=type_filter)
+    nrem_chunks = retrieve("nrem", excluded_sources=None,
+                           type_filter=type_filter, signal=signal)
    session_retrieved.update(c["source"] for c in nrem_chunks)
    # Track sources that scored above Early REM ceiling — these are the only ones Early REM should exclude
    nrem_high_sources = {c["source"] for c in nrem_chunks if c["similarity"] > 0.55}
    if not nrem_chunks:
        print("[NREM] No suitable chunks — aborting pipeline")
        return None
+    # Cursor bump: NREM is the consolidation stage. Each appearance increments
+    # consolidation_count + updates last_consolidated_at, so the next dream's
+    # observation sees these sources as less under-processed.
+    _bump_consolidation_cursor(nrem_chunks)

    print(f"[NREM] Retrieved {len(nrem_chunks)} chunks. Synthesizing...")
    nrem_output = synthesize_nrem(nrem_chunks)
@@ -522,7 +804,7 @@ def dream_pipeline(type_filter=None):
        "nrem": {
            "chunks_retrieved": len(nrem_chunks),
            "avg_similarity": round(sum(c["relevance"] for c in nrem_chunks) / len(nrem_chunks), 3),
-            "query": "research fabrication teaching practice recent work",
+            "query": "[llm-generated from observation signal]",
            "word_count": len(nrem_output.split()),
            "sources": nrem_sources,
            "distinct_folders": nrem_folders,
@@ -540,7 +822,8 @@ def dream_pipeline(type_filter=None):
    print("\n[Early REM] Retrieving...")
    # Early REM excludes previously retrieved + NREM high-scorers only (not full session_retrieved)
    # Sources that scored in Early REM band during NREM remain available
-    early_chunks = retrieve("early-rem", excluded_sources=nrem_high_sources, type_filter=type_filter)
+    early_chunks = retrieve("early-rem", excluded_sources=nrem_high_sources,
+                            type_filter=type_filter, signal=signal)
    session_retrieved.update(c["source"] for c in early_chunks)
    if not early_chunks:
        print("[Early REM] No suitable chunks — skipping")
@@ -554,7 +837,7 @@ def dream_pipeline(type_filter=None):
        stage_data["early_rem"] = {
            "chunks_retrieved": len(early_chunks),
            "avg_similarity": round(sum(c["relevance"] for c in early_chunks) / len(early_chunks), 3),
-            "query": "career decision personal change what matters next",
+            "query": "[llm-generated from observation signal]",
            "word_count": len(early_rem_output.split()),
            "sources": early_sources,
            "distinct_folders": early_folders,
@@ -566,7 +849,8 @@ def dream_pipeline(type_filter=None):

    # ── Stage 3: Late REM — informed by NREM + Early REM ──────────────────
    print("\n[Late REM] Retrieving...")
-    late_chunks = retrieve("late-rem", excluded_sources=session_retrieved, type_filter=type_filter)
+    late_chunks = retrieve("late-rem", excluded_sources=session_retrieved,
+                           type_filter=type_filter, signal=signal)
    session_retrieved.update(c["source"] for c in late_chunks)
    if not late_chunks:
        print("[Late REM] No suitable chunks — skipping")
@@ -585,7 +869,7 @@ def dream_pipeline(type_filter=None):
        stage_data["late_rem"] = {
            "chunks_retrieved": len(late_chunks),
            "avg_similarity": round(sum(c["relevance"] for c in late_chunks) / len(late_chunks), 3),
-            "query": "practice place memory making",
+            "query": "[llm-generated from observation signal]",
            "word_count": len(late_rem_output.split()),
            "sources": late_sources,
            "distinct_folders": list(set(late_folders)),
@@ -613,8 +897,20 @@ def dream_pipeline(type_filter=None):
    # Write manifest
    all_session_sources = list(session_retrieved)
    all_session_folders = list({extract_folder(s) for s in all_session_sources})
+    total_chunks = 0
+    pg = None
+    try:
+        pg = get_pg()
+        cur = pg.cursor()
+        cur.execute("SELECT COUNT(*) FROM embeddings")
+        total_chunks = cur.fetchone()[0]
+    except Exception as e:
+        print(f"total_chunks query failed (non-critical): {e}")
+    finally:
+        if pg is not None:
+            pg.close()
    corpus_data = {
-        "total_chunks": delta.get("new_chunks", 0),
+        "total_chunks": total_chunks,
        "new_chunks_since_last_dream": delta.get("new_chunks", 0),
        "days_since_last_dream": round(delta.get("days_since_dream", 0), 2),
        "substrate": "pgvector",
@@ -0,0 +1,235 @@
+"""
+Dreamer Stages 1 + 2 — Observe and Select.
+
+Implements `dreamer-design-spec.md`'s Stage 1 (observe_corpus) and Stage 2
+(select_mode). These have been latent in dream.py — observe_corpus existed
+in skeletal form but its output was largely unused; select_mode did not
+exist at all. The dreamer always ran all stages with hardcoded queries.
+
+Per spec (lines 27–34 of dreamer-design-spec.md):
+    delta = observe_corpus()
+    selected_mode = select_mode(delta, task, project)
+    if selected_mode is None:
+        return                         # nothing worth dreaming
+
+The "returns None — dreamer goes quiet rather than manufacturing novelty"
+semantics (spec line 67) is the canonical answer to the repetition problem
+documented in birdai-dreamer-exclusion-finding-2026-05-02.md.
+
+Grounded in:
+- Active Inference (Friston 2010, 2017) — observe error, choose action that
+  minimizes free energy. The dreamer is a prediction-error machine; observe
+  what's diverged from the model, dream about that.
+- Sleep stages (Stickgold 2005; Walker 2017; Diekelberg & Born 2010) — NREM
+  for replay of new traces, REM for associative cross-cluster integration.
+- Sharp-wave ripples (Buzsáki, Wilson) — biology tags WHAT to replay
+  (under-processed chunks); not uniform. Implemented via the consolidation
+  cursor on the embeddings table.
+"""
+
+import json
+import os
+import sqlite3
+from datetime import datetime, timedelta
+from pathlib import Path
+
+from dotenv import load_dotenv
+import psycopg2
+
+load_dotenv(Path.home() / "aaronai" / ".env", override=True)
+
+# ─── Paths ──────────────────────────────────────────────────────────────────
+
+PG_DSN          = os.getenv("PG_DSN")
+CONVERSATIONS_DB = str(Path.home() / "aaronai" / "conversations.db")
+WATCHER_STATE    = str(Path.home() / "aaronai" / "watcher_state.json")
+DREAMER_STATE    = str(Path.home() / "aaronai" / "dreamer_state.json")
+JOURNAL_DAILY    = "/home/aaron/nextcloud/data/data/aaron/files/Journal/Daily"
+
+# ─── Thresholds ─────────────────────────────────────────────────────────────
+# Per spec, these become settings-panel controls eventually. For now they're
+# constants here; moving them to a config module is task #48.
+
+NEW_CHUNK_THRESHOLD       = 5    # below this, NREM not warranted on novelty alone
+STALENESS_TRIGGER_DAYS    = 3    # corpus quiet ≥3 days → Late REM ("shake things loose")
+QUESTION_LOOKBACK_DAYS    = 14   # spec line 61: "the last 14 days"
+UNDERPROCESSED_PERCENTILE = 0.25  # bottom quartile of consolidation_count
+
+
+# ─── Helpers ────────────────────────────────────────────────────────────────
+
+def _get_pg():
+    return psycopg2.connect(PG_DSN)
+
+
+def _load_json(path, default):
+    try:
+        return json.loads(Path(path).read_text())
+    except Exception:
+        return default
+
+
+def _recent_user_questions(days=QUESTION_LOOKBACK_DAYS, limit=20):
+    """Pull recent user-turn content from conversations.db. The spec calls
+    these 'live questions' — what Aaron has been asking about. They become
+    seed material for the REM modes."""
+    try:
+        conn = sqlite3.connect(CONVERSATIONS_DB)
+        cutoff = (datetime.now() - timedelta(days=days)).isoformat()
+        cur = conn.cursor()
+        cur.execute(
+            """
+            SELECT m.content FROM messages m
+            JOIN conversations c ON m.conversation_id = c.id
+            WHERE m.role = 'user' AND c.updated_at > ?
+            ORDER BY m.timestamp DESC LIMIT ?
+            """,
+            (cutoff, limit),
+        )
+        rows = cur.fetchall()
+        conn.close()
+        return [r[0][:280] for r in rows]
+    except Exception:
+        return []
+
+
+def _new_journal_entries(since_ts):
+    """Files in Journal/Daily/ created or modified since the last dream.
+    Journal entries with emotional/personal register route to Early REM per
+    the spec (line 71)."""
+    journal_path = Path(JOURNAL_DAILY)
+    if not journal_path.exists():
+        return []
+    new = []
+    for p in journal_path.rglob("*.md"):
+        try:
+            if p.stat().st_mtime > since_ts:
+                new.append(str(p.relative_to(journal_path)))
+        except OSError:
+            continue
+    return new
+
+
+def _new_chunks_count(since_ts):
+    """Files in the watcher state with mtime > last_dream. The spec calls
+    this 'what changed' (line 58). Used as the NREM novelty signal."""
+    state = _load_json(WATCHER_STATE, {})
+    count = 0
+    for _path, mtime in state.items():
+        try:
+            if float(mtime) > since_ts:
+                count += 1
+        except (ValueError, TypeError):
+            continue
+    return count
+
+
+def _underprocessed_chunk_count():
+    """Chunks below the underprocessed percentile by consolidation_count.
+    Biologically motivated: sharp-wave ripples bias replay toward novel /
+    under-encoded experience, not uniform sampling. We give NREM a pool of
+    'least-replayed' chunks to draw from in Stage 3."""
+    try:
+        pg = _get_pg()
+        cur = pg.cursor()
+        cur.execute(
+            """
+            WITH t AS (
+              SELECT percentile_cont(%s) WITHIN GROUP (ORDER BY consolidation_count)
+                  AS threshold
+              FROM embeddings
+            )
+            SELECT COUNT(*) FROM embeddings, t
+            WHERE consolidation_count <= t.threshold
+            """,
+            (UNDERPROCESSED_PERCENTILE,),
+        )
+        result = cur.fetchone()[0]
+        pg.close()
+        return int(result or 0)
+    except Exception:
+        return 0
+
+
+# ─── Stage 1: observe_corpus ────────────────────────────────────────────────
+
+def observe_corpus():
+    """Build the signal vector consumed by select_mode and (downstream) by
+    retrieve. Concrete observations only — no interpretation. Each key is
+    a direct measurement from the corpus, watcher, journal, or conversation
+    log.
+
+    Returns a dict with:
+      now_ts                 -- current Unix timestamp
+      last_dream_ts          -- last completed dream timestamp (0 if never)
+      days_since_dream       -- float; inf if never dreamed
+      new_chunks             -- count of files newer than last_dream
+      new_journal_entries    -- list of Journal/Daily/*.md filenames since last_dream
+      recent_questions       -- user-turn content from last 14 days
+      underprocessed_count   -- chunks in the bottom 25% by consolidation_count
+    """
+    state = _load_json(DREAMER_STATE, {})
+    last_dream_ts = float(state.get("last_dream_timestamp", 0) or 0)
+    now_ts = datetime.now().timestamp()
+
+    return {
+        "now_ts": now_ts,
+        "last_dream_ts": last_dream_ts,
+        "days_since_dream": (now_ts - last_dream_ts) / 86400 if last_dream_ts else float("inf"),
+        "new_chunks": _new_chunks_count(last_dream_ts),
+        "new_journal_entries": _new_journal_entries(last_dream_ts),
+        "recent_questions": _recent_user_questions(),
+        "underprocessed_count": _underprocessed_chunk_count(),
+    }
+
+
+# ─── Stage 2: select_mode ───────────────────────────────────────────────────
+
+def select_mode(signal, task=None, explicit_mode=None):
+    """Return one of {'nrem', 'early-rem', 'late-rem', 'lucid'}. Never None.
+
+    The dreamer fires every scheduled night. The earlier "go quiet on null
+    delta" rule was a synthesis-doc invention that didn't match the actual
+    desired UX — the original dreamer always dreamed, even if it repeated
+    itself. The cure for repetition lives in the retrieve layer
+    (LLM-generated queries from the observation signal, MMR diversity,
+    cursor bias toward under-processed chunks), not in skipping nights.
+
+    Routing logic:
+      - explicit_mode argument wins
+      - task supplied → 'lucid' (question-anchored)
+      - days_since_dream ≥ STALENESS_TRIGGER_DAYS → 'late-rem' (shake loose
+        via cross-domain pairs when nothing's been added in a while)
+      - new journal entry → 'early-rem' (emotional/personal register)
+      - default → 'nrem' (replay-and-consolidation; always has something to
+        do because the corpus always has under-processed chunks)
+    """
+    if explicit_mode:
+        return explicit_mode
+    if task:
+        return "lucid"
+
+    days_since = signal["days_since_dream"]
+    new_journal = signal["new_journal_entries"]
+
+    if days_since >= STALENESS_TRIGGER_DAYS:
+        return "late-rem"
+
+    if new_journal:
+        return "early-rem"
+
+    return "nrem"
+
+
+# ─── CLI for manual inspection ──────────────────────────────────────────────
+
+if __name__ == "__main__":
+    signal = observe_corpus()
+    short = {k: v for k, v in signal.items() if k != "recent_questions"}
+    print("Signal (excluding recent_questions):")
+    print(json.dumps(short, indent=2, default=str))
+    print(f"\nRecent user questions ({len(signal['recent_questions'])}):")
+    for q in signal["recent_questions"][:5]:
+        print(f"  - {q[:140]}")
+    mode = select_mode(signal)
+    print(f"\nselect_mode() → {mode!r}")
@@ -1,17 +1,20 @@
 """
 Aaron AI Stage 1 encoding helpers — single canonical implementation of:
-  - extract_text(filepath) — four-extension text extraction
-  - chunk_text(text, chunk_size, overlap) — word-based chunking
-  - chunk_and_embed(text, source, embedder, filepath, folder) — produce ready-to-write rows
+  - extract_blocks(filepath) — section-aware extraction (docx heading-bounded
+    sections, pptx per-slide, pdf/txt/md single-block)
+  - extract_text(filepath) — back-compat string concatenation over blocks
+  - chunk_text(text, chunk_size, overlap) — word-based blind chunking
+  - chunk_and_embed(text_or_blocks, source, embedder, filepath, folder) —
+    produce ready-to-write rows. Accepts str (blind) or list[dict] (section-aware).
  - write_embeddings_batch(conn, batch) — server-side NOW() canonical INSERT

 Used by watcher.py, ingest.py, corpus_integrity.py, and api.py /api/corpus/retry.
-Replaces four separate extract reimplementations and two extract-chunk-embed paths.
 """

 import hashlib
 import json
 import logging
+import re
 from pathlib import Path

 from docx import Document as DocxDocument
@@ -24,33 +27,187 @@ SUPPORTED = {".docx", ".pdf", ".pptx", ".txt", ".md"}
 DEFAULT_CHUNK_SIZE = 500
 DEFAULT_CHUNK_OVERLAP = 50

+_BOLD_KV_RE = re.compile(r"^\*\*[\w +/-]+?:\*\*")

-def extract_text(filepath: Path) -> str:
-    """Return the text of a supported file. Returns "" on any failure or
-    unsupported extension. Does not write to ingest_failures — caller decides."""
+
+def _strip_md_frontmatter(text: str) -> str:
+    """Strip a leading frontmatter block from markdown, if present.
+
+    Recognizes two formats:
+      - YAML-style: file's first non-empty line is `---`, terminated by `---`.
+        Only triggered when no heading precedes — guards against `---`
+        horizontal rules that follow an H1.
+      - Capture-style: optional H1 heading, then one or more `**key:** value`
+        lines (and blanks), terminated by `---`. The H1 is preserved; the
+        key/value block + separator are removed.
+
+    Body `---` rules and body `**bold:**` lines are never touched — the scan
+    aborts as soon as a non-frontmatter line appears in the leading block.
+    """
+    lines = text.splitlines()
+    n = len(lines)
+    i = 0
+    while i < n and not lines[i].strip():
+        i += 1
+    heading = None
+    if i < n and lines[i].startswith("# "):
+        heading = lines[i]
+        i += 1
+        while i < n and not lines[i].strip():
+            i += 1
+    if i >= n:
+        return text
+    first = lines[i].strip()
+    if heading is None and first == "---":
+        j = i + 1
+        while j < n and lines[j].strip() != "---":
+            j += 1
+        if j >= n:
+            return text
+        body_start = j + 1
+    elif _BOLD_KV_RE.match(first):
+        j = i
+        while j < n:
+            s = lines[j].strip()
+            if not s or _BOLD_KV_RE.match(s):
+                j += 1
+                continue
+            if s == "---":
+                body_start = j + 1
+                break
+            return text
+        else:
+            return text
+    else:
+        return text
+    body = "\n".join(lines[body_start:]).lstrip("\n")
+    return f"{heading}\n\n{body}" if heading else body
+
+
+def _docx_cell_paragraphs(cell):
+    yield from (p for p in cell.paragraphs if p.text.strip())
+    for nested in cell.tables:
+        for row in nested.rows:
+            for c in row.cells:
+                yield from _docx_cell_paragraphs(c)
+
+
+def _pptx_shape_text(shape):
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+    parts = []
+    if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
+        for sub in shape.shapes:
+            parts.extend(_pptx_shape_text(sub))
+        return parts
+    if hasattr(shape, "text") and shape.text.strip():
+        parts.append(shape.text)
+    if getattr(shape, "has_table", False):
+        for cell in shape.table.iter_cells():
+            if cell.text.strip():
+                parts.append(cell.text)
+    return parts
+
+
+def _extract_docx_blocks(filepath: Path) -> list[dict]:
+    """Return docx content as a single block. Earlier attempt at section-aware
+    chunking via Heading styles was rolled back: the user's docs are mostly
+    Normal-styled with bold-as-heading, and tying chunk boundaries to formatting
+    choices locks future-them into preserving those choices forever. Lexical
+    + cross-encoder retrieval already finds the right substrings within a
+    blind-chunked CV, so the section structure isn't load-bearing for retrieval."""
+    from docx.oxml.ns import qn
+
+    doc = DocxDocument(filepath)
+    parts = [p.text for p in doc.paragraphs if p.text.strip()]
+    for tbl in doc.tables:
+        for row in tbl.rows:
+            for cell in row.cells:
+                parts.extend(p.text for p in _docx_cell_paragraphs(cell))
+    for section in doc.sections:
+        parts.extend(p.text for p in section.header.paragraphs if p.text.strip())
+        parts.extend(p.text for p in section.footer.paragraphs if p.text.strip())
+    for txbx in doc.element.body.findall(".//" + qn("w:txbxContent")):
+        for p in txbx.findall(".//" + qn("w:p")):
+            text = "".join(t.text or "" for t in p.findall(".//" + qn("w:t")))
+            if text.strip():
+                parts.append(text)
+    text = "\n".join(parts)
+    return [{"heading": None, "text": text, "kind": "doc"}] if text.strip() else []
+
+
+def _extract_pptx_blocks(filepath: Path) -> list[dict]:
+    """One block per slide. Heading = slide title (or 'Slide N' fallback).
+    Body = non-title shape text + speaker notes."""
+    prs = Presentation(filepath)
+    blocks = []
+    for i, slide in enumerate(prs.slides, 1):
+        title_shape = None
+        try:
+            title_shape = slide.shapes.title
+        except (AttributeError, KeyError):
+            pass
+        title = None
+        body_parts = []
+        for shape in slide.shapes:
+            if title_shape is not None and shape == title_shape and shape.has_text_frame:
+                title = shape.text_frame.text.strip() or None
+                continue
+            body_parts.extend(_pptx_shape_text(shape))
+        if slide.has_notes_slide:
+            notes = slide.notes_slide.notes_text_frame.text
+            if notes.strip():
+                body_parts.append(f"[Notes] {notes}")
+        if title or body_parts:
+            blocks.append({
+                "heading": title or f"Slide {i}",
+                "text": "\n".join(body_parts),
+                "kind": "slide",
+            })
+    return blocks
+
+
+def extract_blocks(filepath: Path) -> list[dict]:
+    """Structured extraction. Returns list of {heading, text, kind} blocks.
+
+    - docx: section-aware via Heading-style paragraphs (kind='section').
+    - pptx: one block per slide (kind='slide').
+    - pdf/txt/md: single block, no heading (kind='doc').
+
+    Empty list on any failure or unsupported extension."""
    suffix = filepath.suffix.lower()
    try:
        if suffix == ".docx":
-            doc = DocxDocument(filepath)
-            return "\n".join(p.text for p in doc.paragraphs if p.text.strip())
-        elif suffix == ".pdf":
+            return _extract_docx_blocks(filepath)
+        if suffix == ".pptx":
+            return _extract_pptx_blocks(filepath)
+        if suffix == ".pdf":
            reader = PdfReader(filepath)
-            return "".join(
+            text = "".join(
                page.extract_text() + "\n"
                for page in reader.pages if page.extract_text()
            )
-        elif suffix == ".pptx":
-            prs = Presentation(filepath)
-            return "\n".join(
-                shape.text for slide in prs.slides
-                for shape in slide.shapes
-                if hasattr(shape, "text") and shape.text.strip()
-            )
-        elif suffix in {".txt", ".md"}:
-            return filepath.read_text(encoding="utf-8", errors="ignore")
+            return [{"heading": None, "text": text, "kind": "doc"}] if text.strip() else []
+        if suffix in {".txt", ".md"}:
+            text = filepath.read_text(encoding="utf-8", errors="ignore")
+            if suffix == ".md":
+                text = _strip_md_frontmatter(text)
+            return [{"heading": None, "text": text, "kind": "doc"}] if text.strip() else []
    except Exception as e:
-        log.warning(f"Text extraction failed for {filepath.name}: {e}")
-    return ""
+        log.warning(f"Extraction failed for {filepath.name}: {e}")
+    return []
+
+
+def extract_text(filepath: Path) -> str:
+    """Back-compat wrapper: concatenate extract_blocks() output. Section
+    structure is lost; use extract_blocks() directly for chunking."""
+    blocks = extract_blocks(filepath)
+    parts = []
+    for b in blocks:
+        if b.get("heading"):
+            parts.append(b["heading"])
+        if b.get("text"):
+            parts.append(b["text"])
+    return "\n".join(parts)


 def chunk_text(text: str,
@@ -73,18 +230,49 @@ def _chunk_id(filepath, source: str, index: int) -> str:
    return f"{hashlib.md5(basis.encode()).hexdigest()[:8]}_{index}"


-def chunk_and_embed(text: str,
+def chunk_and_embed(text_or_blocks,
                    source: str,
                    embedder,
                    filepath=None,
                    folder=None) -> list[dict]:
-    """Chunk text, embed each chunk, return rows ready for write_embeddings_batch."""
-    chunks = chunk_text(text)
+    """Chunk + embed for write_embeddings_batch. Accepts either:
+
+      - str: blind chunking with 500-word windows (pdf/txt/md legacy path).
+      - list[dict]: section-aware path (docx Heading-bounded sections, pptx
+        slides). Each block emits one chunk if its text fits within
+        DEFAULT_CHUNK_SIZE words, otherwise is blind-split with overlap.
+
+    The block heading is prepended to the chunk text (so retrieval sees the
+    section context) and stored in metadata as heading/kind."""
+    if isinstance(text_or_blocks, str):
+        blocks = [{"heading": None, "text": text_or_blocks, "kind": "doc"}]
+    else:
+        blocks = text_or_blocks
+
+    chunks = []
+    for block in blocks:
+        body = block.get("text") or ""
+        heading = block.get("heading")
+        kind = block.get("kind", "doc")
+        if not body.strip() and not (heading and heading.strip()):
+            continue
+        if heading and body.strip():
+            contextualized = f"{heading}\n\n{body}"
+        elif heading:
+            contextualized = heading
+        else:
+            contextualized = body
+        if len(contextualized.split()) <= DEFAULT_CHUNK_SIZE:
+            chunks.append((contextualized, heading, kind))
+        else:
+            for sub in chunk_text(contextualized):
+                chunks.append((sub, heading, kind))
+
    if not chunks:
        return []
-    embeddings = embedder.encode(chunks).tolist()
+    embeddings = embedder.encode([c[0] for c in chunks]).tolist()
    rows = []
-    for i, (chunk, emb) in enumerate(zip(chunks, embeddings)):
+    for i, ((chunk, heading, kind), emb) in enumerate(zip(chunks, embeddings)):
        rows.append({
            "id": _chunk_id(filepath, source, i),
            "document": chunk,
@@ -95,13 +283,15 @@ def chunk_and_embed(text: str,
                "source": source,
                "filepath": str(filepath) if filepath else source,
                "folder": folder,
+                "heading": heading,
+                "kind": kind,
            },
        })
    return rows


-def write_embeddings_batch(conn, batch: list[dict]) -> int:
-    """Single canonical INSERT. Sets created_at = NOW() server-side. Commits.
+def write_embeddings_batch(conn, batch: list[dict], commit: bool = True) -> int:
+    """Single canonical INSERT. Sets created_at = NOW() server-side.

    Every row dict must supply 'type'. created_at is SQL-supplied (NOW()), so
    callers do not need to provide it. The application-layer assertion is the
@@ -109,6 +299,11 @@ def write_embeddings_batch(conn, batch: list[dict]) -> int:
    historical NULLs were resolved by the Improvement #2 backfill, and a
    Python-level raise gives a faster, more debuggable failure than a
    Postgres constraint error.
+
+    When commit=True (default), this function commits the connection itself.
+    When commit=False, the caller is responsible for committing. Use
+    commit=False when composing this write with other writes that must land
+    atomically in the same transaction.
    """
    if not batch:
        return 0
@@ -131,5 +326,6 @@ def write_embeddings_batch(conn, batch: list[dict]) -> int:
                metadata   = EXCLUDED.metadata
        """, (row["id"], row["document"], row["embedding"],
              row["source"], row["type"], json.dumps(row["metadata"])))
-    conn.commit()
+    if commit:
+        conn.commit()
    return len(batch)
@@ -75,6 +75,17 @@ async def lifespan(app: FastAPI):
        max_coroutines=2,
    )
    await graphiti_instance.build_indices_and_constraints()
+    # Bridge driver._search_ops to driver.search_interface — graphiti-core 0.29.0
+    # builds FalkorSearchOperations as driver._search_ops in FalkorDriver.__init__
+    # but never assigns it to driver.search_interface. search_utils.py dispatches
+    # on driver.search_interface; without this assignment it falls back to
+    # interpreted-Cypher cosine math (full table scans). Together with the
+    # vendored patches in graphiti_patches/, this activates FalkorDB's native
+    # vector index for entity dedup similarity search.
+    if (hasattr(graphiti_instance.driver, "_search_ops")
+            and graphiti_instance.driver.search_interface is None):
+        graphiti_instance.driver.search_interface = graphiti_instance.driver._search_ops
+        log.info("Wired driver.search_interface = driver._search_ops (vector index path active)")
    log.info(f"Graphiti ready — provider: {LLM_PROVIDER}, group: {GROUP_ID}")
    yield
    await graphiti_instance.close()
@@ -15,7 +15,7 @@ from dotenv import load_dotenv
 import psycopg2
 from sentence_transformers import SentenceTransformer

-from encoding import extract_text, chunk_and_embed, write_embeddings_batch, SUPPORTED
+from encoding import extract_blocks, chunk_and_embed, write_embeddings_batch, SUPPORTED
 from failures import (
    record_ingest_failure as _record_failure_sql,
    resolve_ingest_failure as _resolve_failure_sql,
@@ -77,14 +77,29 @@ def _resolve_failure(source: str) -> None:
        print(f"  Could not resolve ingest failure record (non-fatal): {e}")


+IGNORED_TOP_FOLDERS = {"Drafts"}
+
+
 def _ingest_one(filepath: Path, embedder, root: Path = None) -> int:
    """Ingest a single file. Returns chunk count, 0 on skip/failure."""
-    if filepath.name.startswith(("~$", ".")):
+    # "~" catches Office lock files (~$) including the case where Nextcloud
+    # filesystem encoding has mangled the "$" to a unicode replacement char.
+    if filepath.name.startswith(("~", ".")):
        return 0
    if filepath.suffix.lower() not in SUPPORTED:
        return 0
-    text = extract_text(filepath)
-    if not text.strip():
+    if root is not None:
+        try:
+            rel = filepath.parent.relative_to(root)
+            if rel.parts and rel.parts[0] in IGNORED_TOP_FOLDERS:
+                return 0
+        except ValueError:
+            pass
+    blocks = extract_blocks(filepath)
+    if not blocks or not any(
+        (b.get("text") or "").strip() or (b.get("heading") or "").strip()
+        for b in blocks
+    ):
        _record_failure(filepath, "Text extraction failed or empty")
        return 0
    folder_rel = None
@@ -94,7 +109,7 @@ def _ingest_one(filepath: Path, embedder, root: Path = None) -> int:
        except ValueError:
            pass
    try:
-        rows = chunk_and_embed(text, filepath.name, embedder,
+        rows = chunk_and_embed(blocks, filepath.name, embedder,
                               filepath=filepath, folder=folder_rel)
    except Exception as e:
        _record_failure(filepath, f"Embedding failed: {e}")
@@ -113,7 +128,11 @@ def _ingest_one(filepath: Path, embedder, root: Path = None) -> int:
    print(f"  Indexed {len(rows)} chunks: {filepath.name}")
    _resolve_failure(filepath.name)
    if not os.getenv("SKIP_STAGE2_ENQUEUE"):
-        enqueue_stage2(filepath.name, text)
+        full_text = "\n".join(
+            f"{b['heading']}\n{b['text']}" if b.get("heading") else b.get("text", "")
+            for b in blocks
+        )
+        enqueue_stage2(filepath.name, full_text)
    return len(rows)


@@ -18,8 +18,14 @@ CONVERSATIONS_DB = str(Path.home() / "aaronai" / "conversations.db")
 PG_DSN = os.getenv("PG_DSN")
 MIN_EXCHANGES = 3

-print("Loading embedding model...")
-embedder = SentenceTransformer("all-MiniLM-L6-v2")
+_embedder = None
+
+def get_embedder():
+    global _embedder
+    if _embedder is None:
+        print("Loading embedding model...")
+        _embedder = SentenceTransformer("all-MiniLM-L6-v2")
+    return _embedder

 def get_conversations():
    conn = sqlite3.connect(CONVERSATIONS_DB)
@@ -123,7 +129,7 @@ def run():
        
        # Embed and insert
        texts = [c[1] for c in new_chunks]
-        embeddings = embedder.encode(texts, show_progress_bar=False).tolist()
+        embeddings = get_embedder().encode(texts, show_progress_bar=False).tolist()
        
        for (chunk_id, chunk_text, meta), embedding in zip(new_chunks, embeddings):
            if not meta.get("type"):
@@ -0,0 +1,136 @@
+"""
+Orientation Indexer — feeds Stage 2's document-level orientations into pgvector
+so they're searchable alongside chunk text by the retrieve_documents tool.
+
+Each completed row in stage_3_queue has an `orientation` string (active_frames
+ frame_relationships + extraction_orientation + one_sentence_summary) that
+describes the document at a conceptual level. Indexing it as its own row in
+the embeddings table gives the cross-encoder a second surface to rank against
+— "what is this document about" rather than just "what does this chunk say."
+
+This worker is part of the "read-only Graphiti + orientation-into-pgvector"
+plan B that replaced the Stage 3 → Graphiti write path. The graph layer is
+queried directly via the search_facts chat tool; orientations land here.
+
+State tracking: a row is considered indexed if the embeddings table already
+holds a row with source=<source> and metadata->>'kind'='orientation'. The
+worker is idempotent — restart-safe, resumable.
+
+Runs as systemd: aaronai-orientation-indexer.service
+"""
+
+import logging
+import os
+import sys
+import time
+from pathlib import Path
+
+from dotenv import load_dotenv
+import psycopg2
+from sentence_transformers import SentenceTransformer
+
+load_dotenv(Path.home() / "aaronai" / ".env", override=True)
+
+sys.path.insert(0, str(Path(__file__).parent))
+from encoding import write_embeddings_batch
+
+PG_DSN = os.getenv("PG_DSN")
+EMBED_MODEL = "all-MiniLM-L6-v2"
+BATCH_SIZE = 25
+POLL_INTERVAL_SECS = 30
+LOG_FILE = "/var/log/aaronai/orientation-indexer.log"
+HEARTBEAT_FILE = "/var/log/aaronai/orientation-indexer-heartbeat"
+
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [orientation-indexer] %(levelname)s %(message)s",
+    handlers=[logging.FileHandler(LOG_FILE, mode="a")],
+)
+log = logging.getLogger("orientation-indexer")
+
+
+def get_pg():
+    return psycopg2.connect(PG_DSN)
+
+
+def fetch_unindexed(cur, limit):
+    """Pull stage_3_queue rows with a non-null orientation whose orientation
+    hasn't been written to the embeddings table yet."""
+    cur.execute(
+        """
+        SELECT s.source, s.orientation
+        FROM stage_3_queue s
+        WHERE s.orientation IS NOT NULL
+          AND NOT EXISTS (
+              SELECT 1 FROM embeddings e
+              WHERE e.source = s.source
+                AND e.metadata->>'kind' = 'orientation'
+          )
+        ORDER BY s.enqueued_at
+        LIMIT %s
+        """,
+        (limit,),
+    )
+    return cur.fetchall()
+
+
+def _row_for(source: str, orientation: str, embedding) -> dict:
+    """Build an embeddings row for the orientation. id is deterministic so
+    re-runs don't create duplicates if the unique check above ever races."""
+    import hashlib
+    chunk_id = hashlib.md5(f"orientation:{source}".encode()).hexdigest()[:8] + "_orient"
+    return {
+        "id": chunk_id,
+        "document": orientation,
+        "embedding": embedding,
+        "source": source,
+        "type": "document",
+        "metadata": {
+            "source": source,
+            "kind": "orientation",
+        },
+    }
+
+
+def write_heartbeat():
+    try:
+        Path(HEARTBEAT_FILE).write_text(str(time.time()))
+    except Exception:
+        pass
+
+
+def main():
+    log.info("Orientation indexer starting...")
+    log.info(f"Loading embedding model: {EMBED_MODEL}")
+    embedder = SentenceTransformer(EMBED_MODEL)
+    log.info("Embedding model ready.")
+
+    while True:
+        write_heartbeat()
+        try:
+            pg = get_pg()
+            try:
+                cur = pg.cursor()
+                rows = fetch_unindexed(cur, BATCH_SIZE)
+                if not rows:
+                    pg.close()
+                    time.sleep(POLL_INTERVAL_SECS)
+                    continue
+
+                orientations = [r[1] for r in rows]
+                embeddings = embedder.encode(orientations).tolist()
+                batch = [
+                    _row_for(source, orient, emb)
+                    for (source, orient), emb in zip(rows, embeddings)
+                ]
+                write_embeddings_batch(pg, batch)
+                log.info(f"Indexed {len(batch)} orientation(s)")
+            finally:
+                pg.close()
+        except Exception as e:
+            log.error(f"Indexing loop iteration failed: {e}")
+            time.sleep(POLL_INTERVAL_SECS)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,146 @@
+"""One-off: re-ingest docx+pptx after the 2026-05-04 extractor upgrade (commit 93c0d89).
+
+Pre-upgrade extraction missed tables, headers/footers, text boxes, group shapes,
+and pptx notes — leaving CVs/dossiers as section-header skeletons in the index.
+
+Steps when run with --apply:
+  1. DELETE all embeddings rows where source ends in .docx or .pptx
+  2. Walk NEXTCLOUD_PATH and re-ingest every .docx/.pptx via _ingest_one
+  3. Stage 2 enqueue is suppressed (SKIP_STAGE2_ENQUEUE=1)
+
+Without --apply: dry-run. Counts files and chunks, prints a sample, writes nothing.
+"""
+
+import os
+import re
+import sys
+import time
+from pathlib import Path
+
+os.environ["SKIP_STAGE2_ENQUEUE"] = "1"
+
+from dotenv import load_dotenv
+load_dotenv(Path.home() / "aaronai" / ".env", override=True)
+
+import psycopg2
+from sentence_transformers import SentenceTransformer
+
+sys.path.insert(0, str(Path(__file__).parent))
+from ingest import _ingest_one, get_pg
+
+NEXTCLOUD_PATH = Path("/home/aaron/nextcloud/data/data/aaron/files")
+
+APPLY = "--apply" in sys.argv
+_ext_args = [a for a in sys.argv[1:] if a.startswith("--ext=")]
+if _ext_args:
+    TARGET_EXTS = {("." + e.lstrip(".")) for arg in _ext_args
+                   for e in arg.split("=", 1)[1].split(",")}
+else:
+    TARGET_EXTS = {".docx", ".pptx"}
+
+
+def _ext_regex():
+    inner = "|".join(re.escape(e.lstrip(".")) for e in sorted(TARGET_EXTS))
+    return f"\\.({inner})$"
+
+
+def count_stale():
+    pg = get_pg()
+    cur = pg.cursor()
+    cur.execute(
+        f"SELECT lower(substring(source from '\\.[^.]+$')) AS ext, "
+        f"COUNT(DISTINCT source) AS files, COUNT(*) AS chunks "
+        f"FROM embeddings WHERE lower(source) ~ '{_ext_regex()}' "
+        f"GROUP BY 1 ORDER BY 1"
+    )
+    rows = cur.fetchall()
+    pg.close()
+    return rows
+
+
+def delete_stale():
+    pg = get_pg()
+    cur = pg.cursor()
+    cur.execute(f"DELETE FROM embeddings WHERE lower(source) ~ '{_ext_regex()}'")
+    deleted = cur.rowcount
+    pg.commit()
+    pg.close()
+    return deleted
+
+
+def find_files():
+    files = []
+    for f in NEXTCLOUD_PATH.rglob("*"):
+        if not f.is_file():
+            continue
+        if f.suffix.lower() not in TARGET_EXTS:
+            continue
+        if f.name.startswith(("~$", ".")):
+            continue
+        files.append(f)
+    return files
+
+
+def main():
+    print(f"Mode: {'APPLY (destructive)' if APPLY else 'DRY-RUN (no writes)'}")
+    print(f"Target: {NEXTCLOUD_PATH}")
+    print(f"Extensions: {sorted(TARGET_EXTS)}")
+    print(f"SKIP_STAGE2_ENQUEUE={os.environ.get('SKIP_STAGE2_ENQUEUE')}")
+    print()
+
+    print("Stale chunks currently in DB:")
+    for ext, files, chunks in count_stale():
+        print(f"  {ext}: {files} files, {chunks} chunks")
+    print()
+
+    files = find_files()
+    by_ext = {}
+    for f in files:
+        by_ext.setdefault(f.suffix.lower(), []).append(f)
+    print(f"Files on disk to re-ingest:")
+    for ext, lst in sorted(by_ext.items()):
+        print(f"  {ext}: {len(lst)} files")
+    print(f"  total: {len(files)}")
+    print()
+    print("Sample (5 random):")
+    import random
+    for f in random.sample(files, min(5, len(files))):
+        print(f"  {f}")
+    print()
+
+    if not APPLY:
+        print("Dry-run only. Re-run with --apply to delete + re-ingest.")
+        return
+
+    print("Deleting stale chunks...")
+    n = delete_stale()
+    print(f"  deleted {n} rows")
+    print()
+
+    print("Loading embedder...")
+    embedder = SentenceTransformer("all-MiniLM-L6-v2")
+    print()
+
+    print(f"Re-ingesting {len(files)} files...")
+    started = time.time()
+    ingested = failed = total_chunks = 0
+    for i, f in enumerate(files, 1):
+        n = _ingest_one(f, embedder, root=NEXTCLOUD_PATH)
+        if n > 0:
+            ingested += 1
+            total_chunks += n
+        else:
+            failed += 1
+        if i % 25 == 0 or i == len(files):
+            elapsed = time.time() - started
+            rate = i / elapsed if elapsed else 0
+            print(f"  [{i}/{len(files)}] ingested={ingested} failed={failed} "
+                  f"chunks={total_chunks} ({rate:.1f} files/s)")
+    elapsed = time.time() - started
+    print()
+    print(f"Done in {elapsed:.0f}s: {ingested} ingested, {failed} failed, "
+          f"{total_chunks} chunks written.")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,123 @@
+"""One-off: remove embeddings rows that no longer correspond to a file on disk.
+
+Two passes:
+  1. Modern rows (metadata.filepath set): check each filepath, delete if missing.
+  2. Legacy rows (metadata.filepath null): build a set of all basenames present
+     anywhere under NEXTCLOUD_PATH, then delete rows whose `source` basename
+     isn't in that set.
+
+Default mode is a dry-run (counts + sample paths, no writes). Pass --apply to
+actually delete.
+"""
+
+import os
+import sys
+from pathlib import Path
+from collections import defaultdict
+
+from dotenv import load_dotenv
+load_dotenv(Path.home() / "aaronai" / ".env", override=True)
+
+import psycopg2
+
+NEXTCLOUD_PATH = Path("/home/aaron/nextcloud/data/data/aaron/files")
+APPLY = "--apply" in sys.argv
+
+
+def get_pg():
+    return psycopg2.connect(os.environ["PG_DSN"])
+
+
+def scan_modern_orphans():
+    """Rows with metadata.filepath whose file doesn't exist on disk."""
+    pg = get_pg()
+    cur = pg.cursor()
+    cur.execute(
+        "SELECT id, source, metadata->>'filepath' AS filepath "
+        "FROM embeddings WHERE metadata->>'filepath' IS NOT NULL"
+    )
+    orphans = []
+    by_source = defaultdict(int)
+    for row in cur.fetchall():
+        fp = row[2]
+        if fp and not Path(fp).exists():
+            orphans.append(row)
+            by_source[row[1]] += 1
+    pg.close()
+    return orphans, by_source
+
+
+def scan_legacy_orphans():
+    """Rows without metadata.filepath whose basename isn't anywhere under
+    NEXTCLOUD_PATH. Restricted to type='document' so conversations and memory
+    snapshots (which are synthetic sources, not files on disk) aren't flagged
+    as orphans. Walks the filesystem once to build the basename set."""
+    print(f"  walking {NEXTCLOUD_PATH} to build basename index...")
+    on_disk = set()
+    for p in NEXTCLOUD_PATH.rglob("*"):
+        if p.is_file():
+            on_disk.add(p.name)
+    print(f"  {len(on_disk):,} files on disk")
+
+    pg = get_pg()
+    cur = pg.cursor()
+    cur.execute(
+        "SELECT id, source FROM embeddings "
+        "WHERE metadata->>'filepath' IS NULL AND type = 'document'"
+    )
+    orphans = []
+    by_source = defaultdict(int)
+    for row in cur.fetchall():
+        if row[1] not in on_disk:
+            orphans.append(row)
+            by_source[row[1]] += 1
+    pg.close()
+    return orphans, by_source
+
+
+def delete_rows(ids):
+    pg = get_pg()
+    cur = pg.cursor()
+    cur.execute("DELETE FROM embeddings WHERE id = ANY(%s)", (list(ids),))
+    deleted = cur.rowcount
+    pg.commit()
+    pg.close()
+    return deleted
+
+
+def main():
+    print(f"Mode: {'APPLY (destructive)' if APPLY else 'DRY-RUN (no writes)'}")
+    print(f"Target: {NEXTCLOUD_PATH}")
+    print()
+
+    print("Pass 1 — modern rows (metadata.filepath set):")
+    modern, modern_by_src = scan_modern_orphans()
+    print(f"  {len(modern):,} orphan rows across {len(modern_by_src):,} files")
+    for src, n in sorted(modern_by_src.items(), key=lambda kv: -kv[1])[:10]:
+        print(f"    {n:>4} chunks — {src}")
+    print()
+
+    print("Pass 2 — legacy rows (no metadata.filepath):")
+    legacy, legacy_by_src = scan_legacy_orphans()
+    print(f"  {len(legacy):,} orphan rows across {len(legacy_by_src):,} files")
+    for src, n in sorted(legacy_by_src.items(), key=lambda kv: -kv[1])[:10]:
+        print(f"    {n:>4} chunks — {src}")
+    print()
+
+    total = len(modern) + len(legacy)
+    if total == 0:
+        print("Nothing to delete.")
+        return
+
+    if not APPLY:
+        print(f"Dry-run only. Re-run with --apply to delete {total:,} rows.")
+        return
+
+    print(f"Deleting {total:,} orphan rows...")
+    n1 = delete_rows([r[0] for r in modern]) if modern else 0
+    n2 = delete_rows([r[0] for r in legacy]) if legacy else 0
+    print(f"  modern: {n1:,}  legacy: {n2:,}  total: {n1 + n2:,}")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,53 @@
+"""End-to-end test of retrieve_context with intent routing + reranking.
+
+Avoids loading the full FastAPI app; replicates the chat-handler retrieval
+call shape and prints classifier output + final ranked sources for each query.
+"""
+
+import os
+import sys
+from pathlib import Path
+
+from dotenv import load_dotenv
+load_dotenv(Path.home() / "aaronai" / ".env", override=True)
+
+sys.path.insert(0, str(Path(__file__).parent))
+
+# Stub anthropic so api.py import doesn't fail without the SDK loaded.
+# We only need retrieve_context.
+import types
+sys.modules.setdefault("anthropic", types.ModuleType("anthropic"))
+sys.modules["anthropic"].Anthropic = lambda **kw: None
+
+# Same for whisper if present
+if "faster_whisper" not in sys.modules:
+    sys.modules["faster_whisper"] = types.ModuleType("faster_whisper")
+
+import importlib.util
+spec = importlib.util.spec_from_file_location("api", Path(__file__).parent / "api.py")
+api = importlib.util.module_from_spec(spec)
+# Don't execute the whole module (it starts FastAPI). Instead, exec only definitions.
+# Easier: just import the functions we need by exec'ing the file but catching errors.
+try:
+    spec.loader.exec_module(api)
+except Exception as e:
+    print(f"(continuing despite api.py side-effect error: {e})")
+
+retrieve_context = api.retrieve_context
+
+QUERIES = [
+    "write me a bio",
+    "my professional bio",
+    "Aaron Nelson CV consulting and design work",
+    "FWN3D consulting",
+    "syllabi I have taught",
+    "philosophy of teaching",
+    "Hudson Valley Additive Manufacturing Center",
+    "Aaron Nelson is an artist and educator working in additive manufacturing",
+]
+
+for q in QUERIES:
+    pieces, sources = retrieve_context(q)
+    print(f"\n=== {q!r} ===")
+    for i, src in enumerate(sources, 1):
+        print(f"  {i}. {src}")
@@ -29,7 +29,7 @@ from sentence_transformers import SentenceTransformer
 from watchdog.observers import Observer
 from watchdog.events import FileSystemEventHandler

-from encoding import extract_text, chunk_and_embed, write_embeddings_batch, SUPPORTED
+from encoding import extract_blocks, chunk_and_embed, write_embeddings_batch, SUPPORTED
 from failures import (
    record_ingest_failure as _record_failure_sql,
    resolve_ingest_failure as _resolve_failure_sql,
@@ -123,13 +123,61 @@ def resolve_ingest_failure(source: str):
        log.warning(f"Could not resolve ingest failure record (non-fatal): {e}")


+def delete_embeddings_for_path(filepath: Path):
+    """Remove embeddings rows for a file that no longer exists. Matches by
+    metadata.filepath so multi-folder same-basename files don't collide.
+    Legacy rows without filepath metadata are left alone — they get cleaned
+    by sweep_orphans.py."""
+    try:
+        pg = get_pg()
+        try:
+            cur = pg.cursor()
+            cur.execute(
+                "DELETE FROM embeddings WHERE metadata->>'filepath' = %s",
+                (str(filepath),),
+            )
+            deleted = cur.rowcount
+            pg.commit()
+            if deleted:
+                log.info(f"Deleted {deleted} chunks for removed file: {filepath}")
+        finally:
+            pg.close()
+    except Exception as e:
+        log.warning(f"Could not delete embeddings for {filepath} (non-fatal): {e}")
+
+
+def remove_from_state(filepath: Path):
+    """Drop a deleted file from watcher_state.json so it isn't carried as
+    'known mtime' indefinitely."""
+    try:
+        state = load_state()
+        key = str(filepath)
+        if key in state:
+            del state[key]
+            save_state(state)
+    except Exception as e:
+        log.warning(f"Could not update state for deleted {filepath} (non-fatal): {e}")
+
+
+IGNORED_TOP_FOLDERS = {"Drafts"}
+
+
 def ingest_file(filepath: Path, embedder) -> int:
-    if filepath.name.startswith(("~$", ".")):
+    if filepath.name.startswith(("~$", "~", ".")):
        return 0
    if filepath.suffix.lower() not in SUPPORTED:
        return 0
-    text = extract_text(filepath)
-    if not text.strip():
+    try:
+        rel = filepath.parent.relative_to(NEXTCLOUD_PATH)
+        if rel.parts and rel.parts[0] in IGNORED_TOP_FOLDERS:
+            return 0
+    except ValueError:
+        pass
+    blocks = extract_blocks(filepath)
+    if not blocks or not any(
+        (b.get("text") or "").strip() or (b.get("heading") or "").strip()
+        for b in blocks
+    ):
        record_ingest_failure(filepath, "Text extraction failed or empty")
        return 0
    folder_rel = None
@@ -138,7 +186,7 @@ def ingest_file(filepath: Path, embedder) -> int:
    except ValueError:
        pass
    try:
-        rows = chunk_and_embed(text, filepath.name, embedder,
+        rows = chunk_and_embed(blocks, filepath.name, embedder,
                               filepath=filepath, folder=folder_rel)
    except Exception as e:
        log.error(f"Embedding failed for {filepath.name}: {e}")
@@ -159,7 +207,11 @@ def ingest_file(filepath: Path, embedder) -> int:
        return 0
    log.info(f"Indexed {len(rows)} chunks: {filepath.name}")
    resolve_ingest_failure(source)
-    enqueue_stage2(source, text)
+    full_text = "\n".join(
+        f"{b['heading']}\n{b['text']}" if b.get("heading") else b.get("text", "")
+        for b in blocks
+    )
+    enqueue_stage2(source, full_text)
    return len(rows)


@@ -168,7 +220,8 @@ def ingest_files(paths: list, embedder, state: dict) -> dict:
    for path in paths:
        count = ingest_file(path, embedder)
        total += count
-        state[str(path)] = str(path.stat().st_mtime)
+        if count > 0:
+            state[str(path)] = str(path.stat().st_mtime)
    log.info(f"Ingestion complete. {total} chunks across {len(paths)} files.")
    return state

@@ -196,12 +249,24 @@ def get_changed_files(state: dict) -> list:
            continue
        if path.suffix.lower() not in SUPPORTED:
            continue
-        if path.name.startswith((".", "~$")):
+        if path.name.startswith((".", "~$", "~")):
            continue
        if "Admin/Backups" in str(path) or "Backups" in path.parts:
            continue
        if "Journal/Media" in str(path):
            continue
+        if "Generative Design" in path.parts and "Processing" in path.parts:
+            continue
+        if "Computational Design 2017" in path.parts and "Student Work" in path.parts:
+            continue
+        if path.name in ("Renders.pptx", "Ribbon Cutting Slideshow.pptx") \
+                and "Presentations" in path.parts:
+            continue
+        if path.name == "GH Slicer Notes [Autosaved].pptx" \
+                and "DDF555 3D Computational" in path.parts:
+            continue
+        if path.stat().st_size == 0:
+            continue
        if state.get(str(path)) != str(path.stat().st_mtime):
            changed.append(path)
    return changed
@@ -280,12 +345,22 @@ class IngestHandler(FileSystemEventHandler):
        self.last_event = 0

    def _should_ignore(self, path: Path) -> bool:
-        if path.name.startswith((".", "~$")):
+        if path.name.startswith((".", "~$", "~")):
            return True
        if "Admin/Backups" in str(path) or "Backups" in path.parts:
            return True
        if "Journal/Media" in str(path):
            return True
+        if "Generative Design" in path.parts and "Processing" in path.parts:
+            return True
+        if "Computational Design 2017" in path.parts and "Student Work" in path.parts:
+            return True
+        if path.name in ("Renders.pptx", "Ribbon Cutting Slideshow.pptx") \
+                and "Presentations" in path.parts:
+            return True
+        if path.name == "GH Slicer Notes [Autosaved].pptx" \
+                and "DDF555 3D Computational" in path.parts:
+            return True
        return False

    def on_created(self, event):
@@ -311,15 +386,47 @@ class IngestHandler(FileSystemEventHandler):
    def on_moved(self, event):
        if event.is_directory:
            return
+        src = Path(event.src_path)
+        dest = Path(event.dest_path)
+        # If destination is outside NEXTCLOUD_PATH (e.g., Nextcloud trashbin at
+        # /home/aaron/nextcloud/data/data/aaron/files_trashbin/), treat as a
+        # delete — the file is no longer in the watched corpus.
+        try:
+            dest.relative_to(NEXTCLOUD_PATH)
+        except ValueError:
+            if src.suffix.lower() in SUPPORTED:
+                log.info(f"Event: moved out of tree {src} -> {dest}")
+                threading.Thread(
+                    target=lambda: (
+                        delete_embeddings_for_path(src),
+                        remove_from_state(src),
+                    ),
+                    daemon=True,
+                ).start()
+            return
        # Nextcloud WebDAV writes .part temp files then renames to final path.
        # src_path is the .part file; dest_path is the final filename.
-        dest = Path(event.dest_path)
        if dest.suffix.lower() not in SUPPORTED or self._should_ignore(dest):
            return
        log.info(f"Event: moved -> {dest}")
        self.pending = True
        self.last_event = time.time()

+    def on_deleted(self, event):
+        if event.is_directory:
+            return
+        path = Path(event.src_path)
+        if path.suffix.lower() not in SUPPORTED:
+            return
+        log.info(f"Event: deleted {path}")
+        threading.Thread(
+            target=lambda: (
+                delete_embeddings_for_path(path),
+                remove_from_state(path),
+            ),
+            daemon=True,
+        ).start()
+
    def on_closed(self, event):
        # FileClosedEvent fires on the final file after Nextcloud completes write.
        # Belt-and-suspenders catch for any write pattern not caught by on_moved.