SOPs
Whisper Model Inventory
The Rule
Section titled “The Rule”openai-whisper caches downloaded model weights at ~/.cache/whisper/ as .pt files. Each model is one file. The whisper CLI and the Python whisper.load_model(name) both read from this directory, so whatever is present is immediately usable — no install step beyond the initial download.
What is downloaded where
Section titled “What is downloaded where”Audited 2026-04-19. Paths are ~/.cache/whisper/<name>.pt on each machine.
| Model | Size | MacBook Pro | Mac Studio | Notes |
|---|---|---|---|---|
tiny | 72 MB | Yes | No | Fastest, low accuracy. Debugging only. |
base | 139 MB | Yes | Yes | Decent default for English word timestamps. |
small | 461 MB | Yes | Yes | Better accuracy, still fast on M-series. |
medium | 1.4 GB | Yes | Yes | English near-human quality; slower. |
large-v3-turbo | 1.5 GB | Yes | Yes | Whisper Turbo — fastest large-quality model. Preferred for production transcription. |
large-v3 | 2.9 GB | Yes | No | Highest accuracy, slowest. Rarely needed over turbo. |
Total cache size: MacBook Pro ~6.5 GB, Mac Studio ~3.5 GB.
Turbo (large-v3-turbo)
Section titled “Turbo (large-v3-turbo)”Released by OpenAI October 2024. Same encoder as large-v3, much smaller decoder trained to match large-v3 output. Roughly 8x faster than large-v3 at near-identical quality for English. This is the right default when the task needs real-quality transcription (word timestamps, long-form audio, noisy environments).
In code: whisper.load_model("large-v3-turbo") or whisper.load_model("turbo") (alias).
Install / verify
Section titled “Install / verify”Whisper itself is a Python package:
pip show openai-whisper # confirm installedwhich whisper # confirm CLI on PATHMacBook Pro: installed, CLI at ~/.local/bin/whisper.
Mac Studio: models are cached but the whisper CLI is not on PATH — Python import-and-use still works if a project imports whisper directly, but the CLI is not wired up. Re-install if CLI usage is needed: pip install --user openai-whisper.
Downloading a missing model
Section titled “Downloading a missing model”First use of a model auto-downloads it from OpenAI’s CDN to ~/.cache/whisper/. Either trigger it from Python:
import whisperwhisper.load_model("large-v3-turbo") # downloads if missing, otherwise loads from cache…or via the CLI:
whisper --model large-v3-turbo some-audio.mp3Download happens once per model per machine. No re-download on subsequent runs.
Syncing models across machines
Section titled “Syncing models across machines”Models are large binary files. They do NOT belong in a git repo. Two valid ways to get a model onto another machine:
- Let the machine download it on first use — simplest, works offline from that point on.
- Copy the
.ptfile over SSH — faster if the other machine already has the bandwidth-heavy download on local network:Terminal window scp ~/.cache/whisper/large-v3-turbo.pt studio:~/.cache/whisper/
Do not symlink between machines or share the cache dir over cloud storage — the files are binary blobs that openai-whisper loads once into memory, and any lazy-loading cloud sync (Google Drive, iCloud Drive) will add startup latency or corrupt the read.
Picking a model for a task
Section titled “Picking a model for a task”| Task | Recommended |
|---|---|
| Quick word-boundary detection on English voice notes | base |
| Real transcription of recorded calls / podcasts | large-v3-turbo |
| Multi-language audio | large-v3-turbo or large-v3 |
| Running on the VPS or low-memory machine | small (fits in 1 GB RAM with headroom) |
Rule of thumb: start with base while iterating, upgrade to large-v3-turbo for the final pass. Going straight to large-v3 is rarely worth the 2x slowdown over turbo.
Related
Section titled “Related”- audio-slow
--wordsmode uses Whisper for editor-precision word boundaries (~/apps/audio-slow/audio_slow.py). - youtube-auto-commenter and tms-ops use Whisper for transcription of captured audio.