Commit Graph

1 Commits

Author SHA1 Message Date
aaron 10bb29290a watcher: handle deletes; sweep_orphans cleans existing phantom chunks
watcher.py now listens for on_deleted events and treats on_moved
destinations that fall outside NEXTCLOUD_PATH (Nextcloud trashbin, moves
to other volumes) as deletes. Both cases call delete_embeddings_for_path
(DELETE WHERE metadata.filepath = ...) and remove_from_state to drop the
file from watcher_state.json so it isn't carried as known-mtime.

Match is by metadata.filepath, not source basename, so files that share a
name across folders don't collide.

scripts/sweep_orphans.py is the one-time cleanup for chunks the watcher
missed before this fix:
- Modern pass: rows with metadata.filepath whose file no longer exists.
- Legacy pass: rows with NULL filepath and type='document' whose basename
  isn't anywhere on disk. type='document' restriction skips conversations
  and memory snapshots (synthetic sources, not files on disk).

First run cleaned 629 rows: 628 from moved-file duplicates (e.g., BirdAI
docs that traveled across Journal/, Library/, Journal/Projects/BirdAI/)
plus the AARON_NELSON_BIO.pdf phantom Aaron flagged.
2026-05-20 02:52:00 +00:00