Skip to content

Capabilities

Redacting Images with OCR

~/apps/cc/recipes/redact-image.py takes a screenshot and one or more strings, finds each string with OCR, and blurs or black-bars the pixels behind it. Output writes to input.redacted.png next to the original unless -o is passed.

Terminal window
python3 ~/apps/cc/recipes/redact-image.py input.png "ojhurst@gmail.com"

Multi-word strings work. Multiple strings in one call work. The script reports any strings that did not match so nothing ships by accident.

  • input — image path (positional)
  • -o / --output — output path (default: input.redacted.ext)
  • --mode blur|black — default blur; black draws a flat rectangle
  • --radius N — Gaussian blur radius (default 10)
  • --pad N — pixels of padding around each match (default 4)
  • trailing positional args — one or more strings to redact

Example with black bars:

Terminal window
python3 ~/apps/cc/recipes/redact-image.py shot.png --mode black "472 S Loafer View Dr" "All Things Handy"
  1. Pillow opens the image.
  2. pytesseract.image_to_data runs Tesseract and returns bounding boxes for every word it detected.
  3. For each target string, the script walks the OCR token stream and matches multi-word phrases by normalizing out punctuation and casing.
  4. For each match, it crops the region, applies a Gaussian blur (or a black fill), and pastes it back.
  5. Saves to the output path and exits non-zero if any target string was not found in OCR.

No network calls. No image leaves the machine. No paid APIs. Matches the super-CLAUDE rule against paid API usage for automation.

  • tesseractbrew install tesseract
  • pytesseractpip3 install pytesseract --break-system-packages
  • Pillow — typically already present
  • Publishing a screenshot that shows an email, phone, address, or business name
  • Scrubbing a CRM contact ID out of a screen recording frame
  • Any image headed to a public blog post, social, or external support ticket

Pair with the Screenshot Triage flow: grab the screenshot, identify the strings to blur, redact, commit the clean copy to the target repo.

  • OCR miss on stylized fonts or low-contrast text — Tesseract sometimes will not find the text at all. The script prints the missed strings to stderr. Fix: upscale the image before feeding it in, or fall back to --mode black with manual coordinates.
  • Partial word match on substring-only OCR — the matcher uses in rather than equality to tolerate OCR noise, which can occasionally over-match. If you see extra redactions, tighten by passing the full phrase instead of a single word.
  • Chrome Extension SOP — the “never send James hunting for logs” sibling principle: if Claude can do it in a tool, the tool owns it
  • Screenshot storage: ~/apps/cc/screenshots/ (auto-capture by the screenshot daemon)