3c7c228db028d9b7d3a6559a4a95823eed849de5
Backfills 9,815 type-NULL rows to 'document' (extension classifier, 100% hit)
and 12,109 created_at-NULL rows via five batches:
C1 filepath_stat: 9,649 filesystem mtime via metadata.filepath
C2 watcher_state_unique: 676 unique source-name lookup in watcher_state
C3 watcher_state_collision_pick_latest_of_N:
234 collision; most-recent watcher mtime
C4 chatgpt_export: 1,548 convo create_time from export JSONs
(168/168 distinct convo_ids resolved)
C5 sentinel: 2 2026-04-26T00:00:00Z (pgvector migration date)
Provenance written to metadata.type_source and metadata.created_at_source
on every row changed by this run. type_source is empty on rows where the
type field was already populated pre-run; in those cases the snapshot
table is the source of truth for what changed.
Snapshot: embeddings_backup_2026_05_03 (CREATE TABLE AS SELECT id, type,
created_at, metadata FROM embeddings; 14,069 rows; revertable via id-join).
Verification:
V1 live counts: type_null=0 ca_null=0
V2 spot-check 11 rows across cohorts: provenance correct
V3 snapshot intact: 14,069 rows, pre-backfill NULL counts preserved
V4 cross-check vs snapshot: reconciles per-provenance to dry-run
Read-side use (B + C: writer enforcement + minimal retrieval read) deferred
to a separate session. The backfill is complete and verified, but the type
and created_at fields are not yet load-bearing — every current reader still
ignores them. Without B+C this lands as data prep, not behavior change.
Description
No description provided
Languages
Python
95.9%
HTML
3.7%
Shell
0.4%