8e61e4dedb3bfdc7ae3eb4ffd4af2c57ba31ded8
Tesseract OCR installed on the VPS (apt: tesseract-ocr, tesseract-ocr-eng). Python wrappers added to venv (pip: pytesseract, ocrmypdf). This commit is the install record only. No code change — async OCR worker, capture path integration, and backlog processing are separate followups. Smoke test results captured in the file: - pytesseract on a textual GH Slicer Notes.pptx slide image: 126 chars in 0.22s (Renders.pptx, also in the 4-image-only-pptx cohort, was tried first but contains only rendered designs with no text — noted as a likely candidate for exclusion rather than OCR). - ocrmypdf on a 4-page Lexmark CX510de scan from the Tenure/Dossier Scan 2022 set: 2270 non-whitespace chars in 3.72s (~0.93s/page). Real readable English; usable as the reference timing for the eventual async worker queue. Deferred decision: project has no dependency manifest (no requirements.txt, pyproject.toml, etc). Tracking that as its own followup rather than bolting it onto this install. The capture-path integration commit will be the natural point to address it if it hasn't been resolved by then.
Description
No description provided
Languages
Python
95.9%
HTML
3.7%
Shell
0.4%