api.py: tool-call retrieval, drop the keyword intent classifier

Removes classify_retrieval_intent and the type/folder filter parameters on
retrieve_context. The keyword classifier was the same anti-pattern as the
formatting-driven docx chunker: a heuristic that locks the user into specific
phrasings and fails silently on anything novel. A scope enum (personal /
library / conversations / memory) would have been the same heuristic in a
fancier wrapper — the categories themselves are mine, not Aaron's.

New shape: a retrieve_documents tool exposed to Claude. Tool takes a single
query argument; the model decides when to call it, what to search for, and
how many times per turn (multi-query falls out naturally for compound asks).
Pre-LLM retrieval is gone — memory still rides as ground truth in the prompt,
but corpus content is fetched on demand by the model with concrete queries
it crafts itself, not the user's raw phrasing.

retrieve_context is now pure: hybrid retrieval + cross-encoder rerank + dedup,
no filters. The reranker ranks, the model judges relevance. When ranking
fails (e.g. abstract instructional queries pulling philosophy books), the
right fix is a better reranker, not another query-time taxonomy. That work
is acknowledged but deferred.

System prompt updated to teach the model about the tool and to prefer
concrete tokens (named entities, project names, course codes) over abstract
phrasing when constructing search queries.
This commit is contained in:
2026-05-19 23:05:25 +00:00
parent 9955c7e383
commit 9e86297e2a
2 changed files with 102 additions and 116 deletions
+99 -106
View File
@@ -123,11 +123,26 @@ economical, specific, never performative. When answering questions,
cite sources and acknowledge uncertainty rather than filling gaps with cite sources and acknowledge uncertainty rather than filling gaps with
plausible-sounding content. plausible-sounding content.
You have access to his complete document corpus, conversation history, You have a persistent memory file (always present below) that carries
and a persistent memory file that carries his current context. Treat Aaron's current context — treat it as ground truth for his present
the memory file as ground truth for his present situation. Use web situation.
search automatically when current information is needed. Never
re-brief on context that's already in memory or documents. For anything beyond what's in memory, you have a retrieve_documents
tool that searches his full knowledge base: personal documents,
reading library, conversation transcripts, and journal entries. Call
it whenever you need concrete information — names, dates, project
specifics, prior thinking, exhibition records, syllabi, anything you
don't already know. For compound questions, call it multiple times
with different concrete queries; one call per distinct information
need. Prefer specific tokens (named entities, project names, course
codes) over abstract instructional phrasing — search "FWN3D
consulting" not "my work." Results are unfiltered and ranked by
semantic similarity; judge each chunk for relevance and ignore
irrelevant hits rather than forcing them into the answer.
Use web search automatically when current external information is
needed. Never re-brief on context that's already in memory or
retrieved chunks.
When making factual claims about Aaron — his history, credentials, locations, dates, relationships, projects, or any specific event — you must ground the claim in a specific retrieved document or the memory file. Cite the source by name inline. If no source supports the claim, say so explicitly rather than filling the gap with plausible-sounding content. Do not confabulate. If you are inferring rather than citing, mark it as inference.""" When making factual claims about Aaron — his history, credentials, locations, dates, relationships, projects, or any specific event — you must ground the claim in a specific retrieved document or the memory file. Cite the source by name inline. If no source supports the claim, say so explicitly rather than filling the gap with plausible-sounding content. Do not confabulate. If you are inferring rather than citing, mark it as inference."""
@@ -251,24 +266,6 @@ FINAL_LIMIT = 8
_TSQUERY_SANITIZE_RE = re.compile(r"[^\w\s\"'-]") _TSQUERY_SANITIZE_RE = re.compile(r"[^\w\s\"'-]")
CONVERSATION_TYPES = ["chatgpt_conversation", "claude_conversation", "aaronai_conversation"]
DOCUMENT_TYPES = ["document"]
MEMORY_TYPES = ["claude_memory"]
_CONVO_SIGNALS = (
"what did i tell", "what did we discuss", "what did we talk",
"in our conversation", "you mentioned", "we talked about",
"earlier you said", "earlier i said", "did i tell you",
"did i say", "what did chatgpt", "what did claude",
)
_DOC_SIGNALS = (
"write me a bio", "draft a bio", "my bio", "my cv", "my resume",
"my professional", "my work history", "my exhibitions",
"my publications", "my syllabi", "my courses", "my teaching",
"my philosophy", "about my career", "draft a cover letter",
"draft my", "write a bio", "professional bio",
)
def _websearch_query(text: str) -> str: def _websearch_query(text: str) -> str:
"""Strip characters websearch_to_tsquery doesn't handle cleanly. Quoted """Strip characters websearch_to_tsquery doesn't handle cleanly. Quoted
@@ -276,31 +273,6 @@ def _websearch_query(text: str) -> str:
return _TSQUERY_SANITIZE_RE.sub(" ", text).strip() return _TSQUERY_SANITIZE_RE.sub(" ", text).strip()
def classify_retrieval_intent(query: str):
"""Return (type_filter, folder_exclude_prefixes). Either may be None.
type_filter restricts the candidate pool by `type`; folder_exclude_prefixes
excludes any chunk whose metadata.folder matches a LIKE 'prefix%' pattern.
Implementation is a low-effort keyword classifier — explicitly tunable and
swappable. For nuanced routing, replace with an LLM classifier returning
the same shape.
Precedence: conversation signals win over document signals — "what did I
tell you about my CV" is asking about the conversation, not the CV.
For biographical/document intent, also exclude the reference library
(Library/Foundations/* — philosophy and cognition books), which is
categorically different from personal artifacts but lives in the same
`type='document'` bucket."""
q = query.lower()
if any(s in q for s in _CONVO_SIGNALS):
return (CONVERSATION_TYPES, None)
if any(s in q for s in _DOC_SIGNALS):
return (DOCUMENT_TYPES, ["Library/"])
return (None, None)
def _rerank(query: str, candidates: list[tuple]) -> list[tuple]: def _rerank(query: str, candidates: list[tuple]) -> list[tuple]:
"""Cross-encoder rerank. Candidates are (id, document, source, folder, created_at) """Cross-encoder rerank. Candidates are (id, document, source, folder, created_at)
tuples. Returns the same tuples reordered by reranker score with created_at as tuples. Returns the same tuples reordered by reranker score with created_at as
@@ -334,71 +306,49 @@ def _dedup_key(doc: str) -> str:
return hashlib.md5(doc[:300].lower().encode("utf-8", "ignore")).hexdigest() return hashlib.md5(doc[:300].lower().encode("utf-8", "ignore")).hexdigest()
def retrieve_context(query, n_results=FINAL_LIMIT, def retrieve_context(query, n_results=FINAL_LIMIT):
type_filter=None, folder_exclude_prefixes=None):
"""Hybrid retrieval (dense + lexical, RRF fused) followed by cross-encoder rerank. """Hybrid retrieval (dense + lexical, RRF fused) followed by cross-encoder rerank.
- Dense (pgvector) handles paraphrase / semantic similarity. - Dense (pgvector) handles paraphrase / semantic similarity.
- Lexical (tsvector) catches rare named tokens (FWN3D, Sono-Tek, course codes) - Lexical (tsvector) catches rare named tokens (FWN3D, Sono-Tek, course codes)
the embedding model has no signal for. the embedding model has no signal for.
- RRF combines the two rankings without calibrating score scales. - RRF combines the two rankings without calibrating score scales.
- Cross-encoder rerank scores each (query, chunk) pair jointly, bridging - Cross-encoder rerank scores each (query, chunk) pair jointly.
semantic gaps that bi-encoders can't (e.g., "write me a bio" -> CV chunk).
- Near-duplicate collapse on output so top-N slots aren't burned by - Near-duplicate collapse on output so top-N slots aren't burned by
multi-folder copies of the same file. multi-folder copies of the same file.
type_filter: optional list of `type` values to restrict the candidate pool to. No type or folder filtering: imposing a taxonomy at retrieval time is a
folder_exclude_prefixes: optional list of folder LIKE prefixes to exclude. heuristic we've explicitly rejected. The reranker ranks, the caller (LLM)
Both default to None (no restriction). Use classify_retrieval_intent() to derive.""" decides what's relevant to its task."""
query_embedding = embedder.encode([query]).tolist()[0] query_embedding = embedder.encode([query]).tolist()[0]
ts_query = _websearch_query(query) ts_query = _websearch_query(query)
context_pieces = [] context_pieces = []
sources = [] sources = []
where_clauses = []
extra_params = []
if type_filter:
where_clauses.append("type = ANY(%s)")
extra_params.append(list(type_filter))
for prefix in (folder_exclude_prefixes or []):
where_clauses.append("(metadata->>'folder' IS NULL OR metadata->>'folder' NOT LIKE %s)")
extra_params.append(prefix + "%")
common_where = ("WHERE " + " AND ".join(where_clauses)) if where_clauses else ""
try: try:
pg = get_pg() pg = get_pg()
cur = pg.cursor() cur = pg.cursor()
# pgvector 0.6 HNSW doesn't iterate past its initial candidate list when cur.execute("""
# a restrictive WHERE filter is present — so a filter that excludes the
# top-N nearest leaves nothing. Bumping ef_search forces the index to
# explore more graph nodes. Cheap when unfiltered; load-bearing when filtered.
if where_clauses:
cur.execute("SET LOCAL hnsw.ef_search = 500")
cur.execute(f"""
SELECT id, document, source, metadata->>'folder' AS folder, created_at SELECT id, document, source, metadata->>'folder' AS folder, created_at
FROM embeddings FROM embeddings
{common_where}
ORDER BY embedding <=> %s::vector ORDER BY embedding <=> %s::vector
LIMIT %s LIMIT %s
""", (*extra_params, query_embedding, HYBRID_CANDIDATES)) """, (query_embedding, HYBRID_CANDIDATES))
dense_hits = cur.fetchall() dense_hits = cur.fetchall()
lexical_hits = [] lexical_hits = []
if ts_query: if ts_query:
lex_match = "to_tsvector('english', document) @@ websearch_to_tsquery('english', %s)" cur.execute("""
lex_where = ("WHERE " + " AND ".join([lex_match] + where_clauses))
cur.execute(f"""
SELECT id, document, source, metadata->>'folder' AS folder, created_at SELECT id, document, source, metadata->>'folder' AS folder, created_at
FROM embeddings FROM embeddings
{lex_where} WHERE to_tsvector('english', document)
@@ websearch_to_tsquery('english', %s)
ORDER BY ts_rank(to_tsvector('english', document), ORDER BY ts_rank(to_tsvector('english', document),
websearch_to_tsquery('english', %s)) DESC websearch_to_tsquery('english', %s)) DESC
LIMIT %s LIMIT %s
""", (ts_query, *extra_params, ts_query, HYBRID_CANDIDATES)) """, (ts_query, ts_query, HYBRID_CANDIDATES))
lexical_hits = cur.fetchall() lexical_hits = cur.fetchall()
pg.close() pg.close()
@@ -466,13 +416,51 @@ def create_conversation(title="New conversation"):
conn.close() conn.close()
return conv_id return conv_id
RETRIEVE_DOCUMENTS_TOOL = {
"name": "retrieve_documents",
"description": (
"Search Aaron's knowledge base — personal documents, reading library, "
"conversation transcripts, and journal entries — for content relevant "
"to a query. Call whenever you need concrete information you don't "
"already have from the persistent memory file. For compound questions "
"(e.g. 'bio emphasizing consulting work and recent research'), call "
"this tool multiple times with different concrete queries; one call "
"per distinct information need. Prefer specific named entities, "
"project names, course codes, or topic-specific terms over abstract "
"instructional phrasing — 'FWN3D consulting' retrieves better than "
"'my work'. Results are ranked by semantic + lexical hybrid retrieval "
"and a cross-encoder reranker; no taxonomy is applied, so judge each "
"returned chunk on its own merits and ignore irrelevant hits."
),
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query. Use concrete terms.",
},
},
"required": ["query"],
},
}
def _execute_retrieve_documents(tool_input):
"""Run retrieve_context for a tool call. Returns (tool_result_text, sources)."""
query = (tool_input or {}).get("query", "").strip()
if not query:
return ("No query provided.", [])
pieces, sources = retrieve_context(query)
if not pieces:
return (f"No results for query={query!r}.", [])
parts = []
for i, (piece, src) in enumerate(zip(pieces, sources), 1):
parts.append(f"[{i}] Source: {src}\n{piece}")
return ("\n\n---\n\n".join(parts), sources)
def chat(user_message, conversation_id, settings, client_time=None): def chat(user_message, conversation_id, settings, client_time=None):
memory = load_memory() memory = load_memory()
type_filter, folder_excludes = classify_retrieval_intent(user_message)
context_pieces, sources = retrieve_context(
user_message, type_filter=type_filter,
folder_exclude_prefixes=folder_excludes,
)
history = get_conversation_history(conversation_id) history = get_conversation_history(conversation_id)
context_parts = [] context_parts = []
@@ -480,40 +468,45 @@ def chat(user_message, conversation_id, settings, client_time=None):
context_parts.append(f"Current time (user-supplied, not logged): {client_time}") context_parts.append(f"Current time (user-supplied, not logged): {client_time}")
if memory: if memory:
context_parts.append(f"Aaron's persistent memory:\n\n{memory}") context_parts.append(f"Aaron's persistent memory:\n\n{memory}")
if context_pieces:
context_str = "\n\n---\n\n".join(context_pieces)
unique_sources = list(set(sources))
context_parts.append(
f"Relevant excerpts from Aaron's documents:\n\n{context_str}\n\nSources: {', '.join(unique_sources)}"
)
context_block = "\n\n====\n\n".join(context_parts) + "\n\n---\n\n" if context_parts else "" context_block = "\n\n====\n\n".join(context_parts) + "\n\n---\n\n" if context_parts else ""
full_message = context_block + user_message full_message = context_block + user_message
messages = history + [{"role": "user", "content": full_message}] messages = history + [{"role": "user", "content": full_message}]
tools = [{"type": "web_search_20250305", "name": "web_search"}] if settings.get("web_search", True) else [] tools = [RETRIEVE_DOCUMENTS_TOOL]
if settings.get("web_search", True):
tools.append({"type": "web_search_20250305", "name": "web_search"})
accumulated_sources = []
while True: while True:
kwargs = { response = anthropic_client.messages.create(
"model": "claude-sonnet-4-6", model="claude-sonnet-4-6",
"max_tokens": 2048, max_tokens=2048,
"system": SYSTEM_PROMPT, system=SYSTEM_PROMPT,
"messages": messages messages=messages,
} tools=tools,
if tools: )
kwargs["tools"] = tools
response = anthropic_client.messages.create(**kwargs)
if response.stop_reason == "tool_use": if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content}) messages.append({"role": "assistant", "content": response.content})
tool_results = [] tool_results = []
for block in response.content: for block in response.content:
if block.type == "tool_use": if block.type != "tool_use":
continue
if block.name == "retrieve_documents":
result_text, result_sources = _execute_retrieve_documents(block.input)
accumulated_sources.extend(result_sources)
tool_results.append({ tool_results.append({
"type": "tool_result", "type": "tool_result",
"tool_use_id": block.id, "tool_use_id": block.id,
"content": "Search completed" "content": result_text,
})
else:
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": "Search completed",
}) })
messages.append({"role": "user", "content": tool_results}) messages.append({"role": "user", "content": tool_results})
else: else:
@@ -521,7 +514,7 @@ def chat(user_message, conversation_id, settings, client_time=None):
for block in response.content: for block in response.content:
if hasattr(block, "text"): if hasattr(block, "text"):
assistant_message += block.text assistant_message += block.text
return assistant_message, list(set(sources)) return assistant_message, list(dict.fromkeys(accumulated_sources))
from contextlib import asynccontextmanager from contextlib import asynccontextmanager
+3 -10
View File
@@ -14,7 +14,7 @@ load_dotenv(Path.home() / "aaronai" / ".env", override=True)
sys.path.insert(0, str(Path(__file__).parent)) sys.path.insert(0, str(Path(__file__).parent))
# Stub anthropic so api.py import doesn't fail without the SDK loaded. # Stub anthropic so api.py import doesn't fail without the SDK loaded.
# We only need retrieve_context + classify_retrieval_intent. # We only need retrieve_context.
import types import types
sys.modules.setdefault("anthropic", types.ModuleType("anthropic")) sys.modules.setdefault("anthropic", types.ModuleType("anthropic"))
sys.modules["anthropic"].Anthropic = lambda **kw: None sys.modules["anthropic"].Anthropic = lambda **kw: None
@@ -34,27 +34,20 @@ except Exception as e:
print(f"(continuing despite api.py side-effect error: {e})") print(f"(continuing despite api.py side-effect error: {e})")
retrieve_context = api.retrieve_context retrieve_context = api.retrieve_context
classify_retrieval_intent = api.classify_retrieval_intent
QUERIES = [ QUERIES = [
"write me a bio", "write me a bio",
"my professional bio", "my professional bio",
"draft a bio for the Utah application",
"Aaron Nelson CV consulting and design work", "Aaron Nelson CV consulting and design work",
"FWN3D consulting", "FWN3D consulting",
"syllabi I have taught", "syllabi I have taught",
"philosophy of teaching", "philosophy of teaching",
"what did I tell Claude about FWN3D",
"what did we discuss about the Utah job",
"Hudson Valley Additive Manufacturing Center", "Hudson Valley Additive Manufacturing Center",
"Aaron Nelson is an artist and educator working in additive manufacturing",
] ]
for q in QUERIES: for q in QUERIES:
type_filter, folder_excludes = classify_retrieval_intent(q) pieces, sources = retrieve_context(q)
pieces, sources = retrieve_context(
q, type_filter=type_filter, folder_exclude_prefixes=folder_excludes,
)
print(f"\n=== {q!r} ===") print(f"\n=== {q!r} ===")
print(f" type_filter: {type_filter} folder_excludes: {folder_excludes}")
for i, src in enumerate(sources, 1): for i, src in enumerate(sources, 1):
print(f" {i}. {src}") print(f" {i}. {src}")