aaronAI

aaron/aaronAI

Fork 0

Commit Graph

Author	SHA1	Message	Date
aaron	2df1a2fe01	docs/inventory: layer 2026-05-03 updates (resolutions, corrections, new findings) Inventory dated 2026-05-02 is preserved as a point-in-time snapshot. Today's updates are layered on top in a dated addendum section after "Findings summary" and before "Phase 1 — Scripts" so the original snapshot reads as written and readers can see what changed and when. Resolved: - NREM-shape divergence #1 (`dream.py` cumulative cross-night exclusion 500-cap) — replaced with session-scoped novelty. Corrections to existing findings: - `stage2_metadata` lives on `stage_3_queue`, not `stage_2_queue` (the 2026-05-02 entry implied otherwise). Verified by direct schema read. - Stage 2 char_length gate runs before the Mistral call. For sub-2000-char docs, Mistral is never invoked — frames are not extracted then discarded, they are simply not extracted. Reframes the architecture's "Stage 2 produces orientation for everything" commitment. New findings (from the 2026-05-03 frame analysis): - `ingest_conversations.py` bypasses Stage 2 entirely. 198 conversation sources have zero frame coverage by design. Combined with the char-gate exclusion and Stage 2 failures, only 56% of corpus has any frame data. - All 14 voice notes and all 39 dream outputs are in the 339-doc gap. Primary capture and self-reflection channels are silent to the frame system; dreamer cannot frame-condition on its own output. - File-type \u00d7 frame stratification provides discriminating signal that cross-links Improvement #3 to the existing `embeddings.type` NULL-rate finding. Same NREM shape as the original cumulative-exclusion bug — the architecture's stated commitment and what the code actually does diverge silently. This is exactly what the inventory exists to surface.	2026-05-03 20:32:55 +00:00
aaron	ed2d090afc	experiments/frame_distribution_report: Stage 2 frame analysis (Track 1 Improvement #3 ) Read-only inspection of the frame data Mistral produces in Stage 2, in service of Track 2 substrate design (Step 2.4 operation set spec). Artifacts: - New SQL view `stage2_frames_v` over `stage_3_queue.stage2_metadata` (CREATE OR REPLACE; idempotent; raw JSONB exposed alongside structured fields so worker-version drift is inspectable). - Analysis script: frequency, label-hygiene collisions, per-doc count, co-occurrence (top-K), file-type \u00d7 frame cross-tab, worker-version split, data-gap accounting, corpus-wide coverage. - JSON sidecar for diff-across-runs reproducibility. - Markdown report with explicit Track 2 viability section. Headline findings: - Frames cluster meaningfully on the framed-doc subset (subject to validation on larger samples for the file-type cross-tab). - Only 56% of corpus has frame coverage. 198 conversation sources bypass Stage 2 by design (`ingest_conversations.py` writes directly to embeddings); 339 short docs (<2000 chars) skip Mistral by char-gate; 12 Stage 2 failures. - All 14 voice notes and all 39 dream outputs are in the data gap. Primary capture and self-reflection channels are silent to the frame system. Dreamer cannot frame-condition on its own output. - 54 normalized label collisions (`Professional Experience` vs `Professional_Experience`, etc.) — any router must normalize first. - "Education" is a near-universal frame (36% of frame-extracted docs); cheap 20-doc hand-inspection diagnostic in report \u00a78 to distinguish prompt artifact from corpus shape. - File-type \u00d7 frame stratification is concrete signal that ties to Improvement #2 (`embeddings.type` backfill); currently NULL for 71% of rows. No production code touched. View is droppable; script is read-only.	2026-05-03 20:32:37 +00:00
aaron	ec67e19b4f	docs/: track Track 1 inventory and reorg plan These are working artifacts of the 2026-05-02 Track 1 stabilization work. Versioning them alongside the code keeps the operational narrative coherent and gives future sessions clear reference docs. The inventory document includes the cross-repo verification finding on share_time — captured at the document level so future sessions don't repeat the same dead-code mischaracterization.	2026-05-03 00:00:16 +00:00

Author

SHA1

Message

Date

aaron

2df1a2fe01

docs/inventory: layer 2026-05-03 updates (resolutions, corrections, new findings)

Inventory dated 2026-05-02 is preserved as a point-in-time snapshot. Today's
updates are layered on top in a dated addendum section after "Findings
summary" and before "Phase 1 — Scripts" so the original snapshot reads as
written and readers can see what changed and when.

Resolved:
- NREM-shape divergence #1 (`dream.py` cumulative cross-night exclusion
  500-cap) — replaced with session-scoped novelty.

Corrections to existing findings:
- `stage2_metadata` lives on `stage_3_queue`, not `stage_2_queue` (the
  2026-05-02 entry implied otherwise). Verified by direct schema read.
- Stage 2 char_length gate runs *before* the Mistral call. For sub-2000-char
  docs, Mistral is never invoked — frames are not extracted then discarded,
  they are simply not extracted. Reframes the architecture's "Stage 2
  produces orientation for everything" commitment.

New findings (from the 2026-05-03 frame analysis):
- `ingest_conversations.py` bypasses Stage 2 entirely. 198 conversation
  sources have zero frame coverage by design. Combined with the char-gate
  exclusion and Stage 2 failures, only 56% of corpus has any frame data.
- All 14 voice notes and all 39 dream outputs are in the 339-doc gap.
  Primary capture and self-reflection channels are silent to the frame
  system; dreamer cannot frame-condition on its own output.
- File-type \u00d7 frame stratification provides discriminating signal that
  cross-links Improvement #3 to the existing `embeddings.type` NULL-rate
  finding.

Same NREM shape as the original cumulative-exclusion bug — the architecture's
stated commitment and what the code actually does diverge silently. This is
exactly what the inventory exists to surface.

2026-05-03 20:32:55 +00:00

aaron

ed2d090afc

experiments/frame_distribution_report: Stage 2 frame analysis (Track 1 Improvement #3 )

Read-only inspection of the frame data Mistral produces in Stage 2, in
service of Track 2 substrate design (Step 2.4 operation set spec).

Artifacts:
- New SQL view `stage2_frames_v` over `stage_3_queue.stage2_metadata`
  (CREATE OR REPLACE; idempotent; raw JSONB exposed alongside structured
  fields so worker-version drift is inspectable).
- Analysis script: frequency, label-hygiene collisions, per-doc count,
  co-occurrence (top-K), file-type \u00d7 frame cross-tab, worker-version split,
  data-gap accounting, corpus-wide coverage.
- JSON sidecar for diff-across-runs reproducibility.
- Markdown report with explicit Track 2 viability section.

Headline findings:
- Frames cluster meaningfully on the framed-doc subset (subject to
  validation on larger samples for the file-type cross-tab).
- Only 56% of corpus has frame coverage. 198 conversation sources bypass
  Stage 2 by design (`ingest_conversations.py` writes directly to
  embeddings); 339 short docs (<2000 chars) skip Mistral by char-gate;
  12 Stage 2 failures.
- All 14 voice notes and all 39 dream outputs are in the data gap.
  Primary capture and self-reflection channels are silent to the frame
  system. Dreamer cannot frame-condition on its own output.
- 54 normalized label collisions (`Professional Experience` vs
  `Professional_Experience`, etc.) — any router must normalize first.
- "Education" is a near-universal frame (36% of frame-extracted docs);
  cheap 20-doc hand-inspection diagnostic in report \u00a78 to distinguish
  prompt artifact from corpus shape.
- File-type \u00d7 frame stratification is concrete signal that ties to
  Improvement #2 (`embeddings.type` backfill); currently NULL for 71% of
  rows.

No production code touched. View is droppable; script is read-only.

2026-05-03 20:32:37 +00:00

aaron

ec67e19b4f

docs/: track Track 1 inventory and reorg plan

These are working artifacts of the 2026-05-02 Track 1 stabilization
work. Versioning them alongside the code keeps the operational
narrative coherent and gives future sessions clear reference docs.

The inventory document includes the cross-repo verification finding
on share_time — captured at the document level so future sessions
don't repeat the same dead-code mischaracterization.

2026-05-03 00:00:16 +00:00

3 Commits