graphiti_patches: vendored FalkorDB vector index support for graphiti-core 0.29.0

Adds native FalkorDB vector index support to graphiti-core's FalkorDB
driver. Three patched files (graph_queries.py, falkordb_driver.py,
falkordb/operations/search_ops.py) plus apply.sh that backs up venv
files and copies patches over.

Why this exists: graphiti-core 0.29.0 builds similarity queries using
interpreted Cypher cosine math (vec.cosineDistance) which produces a
full-table scan over Entity/RELATES_TO/Community nodes for every search.
At ~4,000+ entities, single-episode add_episode took 8+ minutes for the
resolve-against-existing-graph step and bulk ingest hung indefinitely.
FalkorDB itself supports db.idx.vector.queryNodes and queryRelationships
procedures backed by HNSW indexes; the driver just doesn't use them.

Patches:

1. graph_queries.py — adds get_vector_indices() returning CREATE VECTOR
   INDEX statements for FalkorDB (Entity.name_embedding,
   RELATES_TO.fact_embedding, Community.name_embedding). HNSW with
   cosine similarity. Adds VECTOR_INDEX_CANDIDATE_MULTIPLIER for
   over-fetch when WHERE filters reject some top-k results. Original
   get_vector_cosine_func_query preserved for fallback.

2. falkordb_driver.py — extends build_indices_and_constraints() to call
   get_vector_indices() alongside range and fulltext. Adds cache
   invalidation hook so the search_ops dispatcher re-probes for indexes
   after they're built.

3. falkordb/operations/search_ops.py — adds vector-index dispatcher
   helpers (_falkordb_vector_index_exists with module-level cache,
   _falkordb_vector_node_search_cypher, _falkordb_vector_edge_search_cypher).
   Rewrites the three vector-similarity call sites (Entity.name_embedding,
   RELATES_TO.fact_embedding, Community.name_embedding) to use
   db.idx.vector.queryNodes / queryRelationships when available, fall
   back to interpreted-Cypher cosine math when not. Index existence
   probed once per (label, attribute, entity_type) and cached.

Empirical result: single-episode add_episode against a 4,277-entity
graph went from indefinite hang to 8.2 seconds. Bulk re-ingest of
already-known content (worst case for entity dedup) committed in 60ms.

Activation requires bridging driver._search_ops to driver.search_interface
in the sidecar (see graphiti_service.py). graphiti-core declares
search_interface as the dispatcher attribute but never assigns the
per-driver implementation to it — naming mismatch in their internal
refactor. The bridge is one line in our sidecar's lifespan.

Upstream candidate: this is a known gap (referenced indirectly in
upstream issue #1263 RFC for external vector store overlay). Maintainers'
attention is on Milvus/Qdrant/Pinecone overlay; this is the FalkorDB-
native alternative for users who don't want to run a separate vector DB.
PR after empirical validation in production. Apache-2.0 graphiti-core
source is NOT vendored — backups/ is gitignored to keep the upstream
source out of this repo.
This commit is contained in:
2026-05-02 05:19:01 +00:00
parent 10bb29290a
commit d2ec20e373
6 changed files with 1729 additions and 0 deletions
+77
View File
@@ -0,0 +1,77 @@
#!/usr/bin/env bash
# apply.sh — Apply the BirdAI vendored graphiti-core patches.
#
# Backs up the original venv files into ./backups/<timestamp>/ before
# overwriting. The backup directory layout mirrors the venv layout so a
# revert is just a tree copy back.
#
# Usage: ./apply.sh
set -euo pipefail
PATCH_DIR="$(cd "$(dirname "$0")" && pwd)"
VENV_BASE="/home/aaron/aaronai/venv/lib/python3.12/site-packages"
TIMESTAMP="$(date +%Y%m%d-%H%M%S)"
BACKUP_DIR="$PATCH_DIR/backups/$TIMESTAMP"
# Files to patch — paths relative to graphiti_core/.
FILES=(
"graph_queries.py"
"driver/falkordb_driver.py"
"driver/falkordb/operations/search_ops.py"
)
echo "graphiti-core vendored patch apply — BirdAI"
echo "Patch directory: $PATCH_DIR"
echo "Venv target: $VENV_BASE/graphiti_core/"
echo "Backup to: $BACKUP_DIR"
echo
# Pre-flight: confirm all source patch files exist.
for rel in "${FILES[@]}"; do
if [ ! -f "$PATCH_DIR/graphiti_core/$rel" ]; then
echo "ERROR: missing patch file: $PATCH_DIR/graphiti_core/$rel" >&2
exit 1
fi
done
# Pre-flight: confirm all target venv files exist.
for rel in "${FILES[@]}"; do
if [ ! -f "$VENV_BASE/graphiti_core/$rel" ]; then
echo "ERROR: missing venv file: $VENV_BASE/graphiti_core/$rel" >&2
echo " graphiti-core may not be installed, or version differs from 0.29.0." >&2
exit 1
fi
done
# Backup originals.
echo "[1/3] Backing up originals..."
for rel in "${FILES[@]}"; do
backup_path="$BACKUP_DIR/graphiti_core/$rel"
mkdir -p "$(dirname "$backup_path")"
cp "$VENV_BASE/graphiti_core/$rel" "$backup_path"
echo " backed up: $rel"
done
echo
# Apply patches by copying.
echo "[2/3] Applying patches..."
for rel in "${FILES[@]}"; do
cp "$PATCH_DIR/graphiti_core/$rel" "$VENV_BASE/graphiti_core/$rel"
echo " patched: $rel"
done
echo
# Sanity check: confirm patched files have the marker.
echo "[3/3] Verifying patched files..."
for rel in "${FILES[@]}"; do
if grep -q "PATCHED 2026-05-02" "$VENV_BASE/graphiti_core/$rel"; then
echo " OK: $rel contains patch marker"
else
echo " WARNING: $rel missing patch marker (may be expected for graph_queries.py — its docstring uses the marker only in the module header)"
fi
done
echo
echo "Done. Backup: $BACKUP_DIR"
echo "Restart the sidecar to pick up changes:"
echo " sudo systemctl restart aaronai-graphiti.service"