The cumulative `retrieved_sources` list (capped at 500, trimmed to 400 on
overflow) was hiding ~40% of the corpus from Early REM and Late REM after the
cap filled. The architecture and reframe both specify session-scoped novelty,
not corpus-lifetime exclusion. Same NREM-shape divergence as the 2026-05-02
NREM exclusion fix.
Changes:
- Drop `previously_retrieved` load; pop the legacy `retrieved_sources` key
from `dreamer_state.json` at pipeline start.
- Early REM excludes only the current session's NREM high-scorers.
- Late REM excludes only the current session's NREM \u222a Early REM.
- Remove the across-night accumulation block at the end of the pipeline; reuse
the in-scope state object for the post-pipeline metadata write (eliminates a
redundant disk re-read that was reintroducing the legacy key).
NREM exclusion fix from 2026-05-02 preserved (`nrem_chunks = retrieve("nrem",
excluded_sources=None)`).
Verification: post-fix dream-manifest source count rose to 24 (NREM 8 + Early
REM 8 + Late REM 8) vs. 13 / 16 on the two prior comparable runs. Legacy key
absent from `dreamer_state.json` post-run.
prompt_hash() in dream.py was hashing function __doc__ strings, but the
synth functions don't have docstrings, so the hash was always MD5("") =
d41d8cd9 for every dream. The manifest field meant to detect undeclared
prompt drift carried no useful information.
Refactor:
- Each synth function's prompt template moved to a module-level constant
(NREM_PROMPT_TEMPLATE, EARLY_REM_PROMPT_TEMPLATE, LATE_REM_PROMPT_TEMPLATE,
SYNTHESIS_PROMPT_TEMPLATE, LUCID_PROMPT_TEMPLATE) using str.format()
placeholders instead of f-string interpolation.
- Synth functions call TEMPLATE.format(...) at use time. Output is byte-
identical to the previous f-string implementation.
- prompt_hash() now hashes the four pipeline template constants (lucid is
on-demand, not part of the nightly manifest — preserves prior scope).
- LUCID_DEFAULT_TASK extracted as a named constant from the lucid fallback
question (factoring only, no behavior change).
- PROMPT_VERSION_* constants and synth function signatures untouched.
- v1.1 register-shift comment in synthesize_early_rem preserved inline.
The post-fix hash will differ from d41d8cd9 (verified: b65695a1 in static
test). Historical manifests still carry d41d8cd9; the discontinuity is
intentional — pre-fix hashes were equally meaningless and faking continuity
would be worse than acknowledging the break.
Found by Track 1 inventory 2026-05-02 (Finding 11 / divergence #11).
Verified static import + hash determinism before commit.
NREM in the reframe is replay-and-consolidation of recent encoded
content. Excluding previously_retrieved sources turns NREM into
novelty-finding, which is Late REM's job. NREM should re-traverse
already-encoded content; that's what consolidation is.
The May 2 abort surfaced this — 52 sources accumulated in the
exclusion list, all of them in NREM's similarity band for the
recurring research/fabrication/teaching query. The dreamer hit
zero retrievable chunks not because the corpus was empty, but
because everything semantically aligned was excluded.
Late REM and Early REM keep the exclusion mechanism — novelty is
their job. Session-scoped exclusion (nrem_high_sources flowing
into Early REM) also preserved.
The 500/400 trim on retrieved_sources is preserved for the
remaining stages that still use it.
- api.py: strip CV pinning workaround (parity violation, see architecture doc)
- dream.py: F1 — retrieve_graphiti() now accepts excluded_sources, over-fetches
3x and filters in-process. Was silently dropping the parameter; would have
confounded E3 with broken cross-stage exclusion in Graphiti arm.
- watcher.py + ingest.py: F14 — drop full_text[:50000] truncation. Was
propagating through entire cascade. Postgres TEXT can hold up to 1GB.
- corpus_integrity.py: F37 — same truncation, third path now clean.
Backups: api.py.bak.*, dream.py.bak.*, watcher.py.bak.*, ingest.py.bak.*,
corpus_integrity.py.bak.* timestamped pre-fix.
Re-cascaded Shop Class as Soulcraft (only already-cascaded source affected
by F14, 414KB).