d2ec20e373
Adds native FalkorDB vector index support to graphiti-core's FalkorDB driver. Three patched files (graph_queries.py, falkordb_driver.py, falkordb/operations/search_ops.py) plus apply.sh that backs up venv files and copies patches over. Why this exists: graphiti-core 0.29.0 builds similarity queries using interpreted Cypher cosine math (vec.cosineDistance) which produces a full-table scan over Entity/RELATES_TO/Community nodes for every search. At ~4,000+ entities, single-episode add_episode took 8+ minutes for the resolve-against-existing-graph step and bulk ingest hung indefinitely. FalkorDB itself supports db.idx.vector.queryNodes and queryRelationships procedures backed by HNSW indexes; the driver just doesn't use them. Patches: 1. graph_queries.py — adds get_vector_indices() returning CREATE VECTOR INDEX statements for FalkorDB (Entity.name_embedding, RELATES_TO.fact_embedding, Community.name_embedding). HNSW with cosine similarity. Adds VECTOR_INDEX_CANDIDATE_MULTIPLIER for over-fetch when WHERE filters reject some top-k results. Original get_vector_cosine_func_query preserved for fallback. 2. falkordb_driver.py — extends build_indices_and_constraints() to call get_vector_indices() alongside range and fulltext. Adds cache invalidation hook so the search_ops dispatcher re-probes for indexes after they're built. 3. falkordb/operations/search_ops.py — adds vector-index dispatcher helpers (_falkordb_vector_index_exists with module-level cache, _falkordb_vector_node_search_cypher, _falkordb_vector_edge_search_cypher). Rewrites the three vector-similarity call sites (Entity.name_embedding, RELATES_TO.fact_embedding, Community.name_embedding) to use db.idx.vector.queryNodes / queryRelationships when available, fall back to interpreted-Cypher cosine math when not. Index existence probed once per (label, attribute, entity_type) and cached. Empirical result: single-episode add_episode against a 4,277-entity graph went from indefinite hang to 8.2 seconds. Bulk re-ingest of already-known content (worst case for entity dedup) committed in 60ms. Activation requires bridging driver._search_ops to driver.search_interface in the sidecar (see graphiti_service.py). graphiti-core declares search_interface as the dispatcher attribute but never assigns the per-driver implementation to it — naming mismatch in their internal refactor. The bridge is one line in our sidecar's lifespan. Upstream candidate: this is a known gap (referenced indirectly in upstream issue #1263 RFC for external vector store overlay). Maintainers' attention is on Milvus/Qdrant/Pinecone overlay; this is the FalkorDB- native alternative for users who don't want to run a separate vector DB. PR after empirical validation in production. Apache-2.0 graphiti-core source is NOT vendored — backups/ is gitignored to keep the upstream source out of this repo.
59 lines
2.3 KiB
Markdown
59 lines
2.3 KiB
Markdown
# graphiti-core Patches — FalkorDB Vector Index Support
|
|
|
|
Vendored patches against graphiti-core 0.29.0 adding native FalkorDB
|
|
vector index support. Three files modified, all under
|
|
`graphiti_core/driver/falkordb/` and `graphiti_core/graph_queries.py`.
|
|
No changes to Neo4j or Kuzu code paths.
|
|
|
|
## Why this exists
|
|
|
|
graphiti-core's FalkorDB driver uses interpreted Cypher cosine math
|
|
(`vec.cosineDistance(...)`) for similarity search. Each query becomes a
|
|
full table scan over Entity/RELATES_TO/Community nodes. At ~4,000+
|
|
entities, single-episode ingest's resolve-against-existing-graph step
|
|
takes 8+ minutes and bulk ingest hangs FalkorDB. FalkorDB itself
|
|
supports `db.idx.vector.queryNodes` and `db.idx.vector.queryRelationships`
|
|
procedures backed by HNSW indexes; graphiti-core's driver doesn't use
|
|
them.
|
|
|
|
These patches:
|
|
|
|
1. Add `get_vector_indices()` to `graph_queries.py` returning CREATE
|
|
VECTOR INDEX statements for FalkorDB on Entity.name_embedding,
|
|
RELATES_TO.fact_embedding, and Community.name_embedding.
|
|
2. Extend `falkordb_driver.py:build_indices_and_constraints()` to create
|
|
the vector indexes alongside range and fulltext indexes.
|
|
3. Rewrite the three vector-similarity call sites in
|
|
`falkordb/operations/search_ops.py` to use
|
|
`db.idx.vector.queryNodes` and `db.idx.vector.queryRelationships`
|
|
instead of full-scan cosine math. Over-fetches by a configurable
|
|
multiplier to handle filter rejections.
|
|
|
|
## Files
|
|
|
|
| Patched file | Source |
|
|
|---|---|
|
|
| `graphiti_core/graph_queries.py` | Adds `get_vector_indices()` |
|
|
| `graphiti_core/driver/falkordb/falkordb_driver.py` | Extends `build_indices_and_constraints` |
|
|
| `graphiti_core/driver/falkordb/operations/search_ops.py` | Three query rewrites |
|
|
|
|
## How to apply
|
|
|
|
`./apply.sh` — backs up the originals into `./backups/<timestamp>/`
|
|
and copies the patched files over.
|
|
|
|
## How to revert
|
|
|
|
Move the timestamped backup back over the venv:
|
|
|
|
cp backups/<ts>/graph_queries.py /home/aaron/aaronai/venv/lib/python3.12/site-packages/graphiti_core/graph_queries.py
|
|
# ...etc
|
|
|
|
## Upstream candidate
|
|
|
|
Documented gap (issue #1263 references it indirectly via vector store
|
|
overlay RFC). Maintainers' attention is on Milvus/external vector DB
|
|
overlay; this patch is the FalkorDB-native alternative for users who
|
|
don't want a separate vector DB. Consider PR after empirical validation
|
|
in production.
|