aaronAI

T

aaron a27f22ceaf api.py: switch whisper to distil-large-v3, beam_size=1, cpu_threads=4

Three changes to reduce voice-note transcription latency on the VPS:
- Model: large-v3 -> distil-large-v3 (~6x faster, near-identical English
  accuracy; language is already hardcoded "en").
- beam_size: 5 (default) -> 1 (~3-4x faster on clean audio).
- cpu_threads: 8 -> 4 (the box has 8 cores running api, dreamer, watcher,
  nextcloud concurrently; ctranslate2's inter-op pool plus context switching
  makes 4 effectively faster than 8 here).

Combined effect expected ~10-15x over prior config. No accuracy regression
expected for the voice-note use case (English, clean audio, domain terms
already supplied via initial_prompt).

2026-05-04 01:00:32 +00:00

deprecated

chore: archive deprecated chromadb and migration scripts

2026-04-28 00:15:46 +00:00

docs

docs/inventory: layer 2026-05-03 updates (resolutions, corrections, new findings)

2026-05-03 20:32:55 +00:00

experiments

embeddings: backfill type and created_at (Improvement #2 part A)

2026-05-03 23:58:53 +00:00

scripts

api.py: switch whisper to distil-large-v3, beam_size=1, cpu_threads=4