Skip to content

Capabilities

Generate an Image

Takes a text prompt and returns a PNG saved to disk. Three image backends behind an escalation ladder — start free, pay only when the output demands it.

TierScriptModelCostStrength
1generate-image-cloudflare.pyFlux SchnellFree (~250 neurons against 10k/day)Fast, nails the whiteboard-marker voice, open source
2generate-image-openai.pygpt-image-2 (launched Apr 21, 2026)PaidNear-perfect text rendering, 4K output, reasoning-powered
3generate-image.pynano-banana-pro-preview (Gemini 3 Pro Image)PaidHighest-detail finish, reference-photo support for consistent characters

Strong at three things specifically:

  1. Marker-style marketing doodles — Russell-Brunson-style stick-figure playbooks with readable all-caps labels. Flux nails the voice for free. When small labels come back garbled (“YOUUBST” for “YouTube”), gpt-image-2 fixes it directly. For a tms-internal publish-ready finish, escalate to Nano Banana Pro.
  2. YouTube thumbnails — large bold headlines, face + text layouts, dramatic compositions. All three backends handle this; pick based on cost and text precision needs.
  3. Consistent characters across a series — Nano Banana Pro only. Pass a reference photo with --ref and the model keeps the same face, outfit, or mascot across generations.

For stick-figure marketing doodles, use the /doodle skill (defined at ~/.claude/commands/doodle.md). It auto-invokes on phrases like “make a doodle of that” and generates three different metaphors in parallel via Cloudflare Flux (free default). If the batch has garbled labels, say “try OpenAI” to re-run with gpt-image-2. For publish-ready finishes, say “try Gemini” to escalate to Nano Banana Pro. Full documentation in the doodles repo CLAUDE.md.

  • Marketing playbook doodles — “make me a doodle of {concept}”, stick-figure explainers for blog posts, social posts, or whiteboard-style lead magnets. Start with /doodle which defaults to CF Flux.
  • YouTube thumbnails — especially face-plus-headline layouts. gpt-image-2 is the new sweet spot since it renders text cleanly at thumbnail scale.
  • Illustrated explainers — diagrams, flowcharts, funnels, metaphorical illustrations (fishing, ladders, conveyor belts). CF Flux is usually enough; escalate when labels need to be readable at small sizes.
  • Series-consistent imagery — Nano Banana Pro with --ref. Hero images for a multi-part blog series, character mascots, repeated icons.

Not for: photorealistic portraits, licensed character work, or production logos.

Terminal window
python3 ~/apps/cc/generate-image-cloudflare.py \
--prompt "black marker on white whiteboard, stick figures..." \
--out ~/Desktop/output.png

OpenAI gpt-image-2 (first paid escalation — fixes garbled text)

Section titled “OpenAI gpt-image-2 (first paid escalation — fixes garbled text)”
Terminal window
python3 ~/apps/cc/generate-image-openai.py \
--prompt "..." \
--out ~/Desktop/output.png
# default model is gpt-image-2; pass --model gpt-image-1 to hit the older tier

Gemini Nano Banana Pro (premium, reference-photo support)

Section titled “Gemini Nano Banana Pro (premium, reference-photo support)”
Terminal window
python3 ~/apps/cc/generate-image.py \
--prompt "..." \
--out ~/Desktop/output.png
# with reference photo for consistent faces on thumbnails
python3 ~/apps/cc/generate-image.py \
--prompt "YouTube thumbnail: bold yellow headline..." \
--ref ~/apps/james-voice/headshot.jpg \
--out ~/Desktop/thumbnail.png

Prompts can also come from stdin:

Terminal window
cat prompt.txt | python3 ~/apps/cc/generate-image-cloudflare.py --out ~/Desktop/output.png

All three backends respond well to specific, scene-level direction. A good prompt names:

  • Style — “hand-drawn marker on white paper”, “whiteboard sketch”, “Russell Brunson DotCom Secrets diagram”, “Saul Bass poster”
  • Composition — “three panels left-to-right”, “single scene”, “top-down diagram”, “funnel narrowing from top to bottom”
  • Characters — “stick figures with circle heads”, “cartoon mascot”, “stylized face”
  • Text rules — “all-caps block printed handwriting”, “labels in black marker”, “one bold headline”
  • Negatives — “no color except black on white”, “no shading”, “no photorealism”

The more specific the scene-level description, the closer the first draft lands. Vague prompts produce generic stock-illustration-looking output on every backend.

Flux is known-bad at hand-lettered small text — if the prompt has labels under ~18pt rendered, escalate to gpt-image-2 which fixes exactly that failure mode.

RequirementWhere
Cloudflare Workers AI tokenCF_WORKERS_AI_TOKEN (fallback: CF_API_TOKEN) in secrets.json
Cloudflare account IDCF_ACCOUNT_ID in secrets.json
OpenAI API keyOPENAI_API_KEY in secrets.json
Gemini API keyGEMINI_API_KEY in secrets.json
Python 3system
  • Scripts:
    • ~/apps/cc/generate-image-cloudflare.py — Cloudflare Flux Schnell (free)
    • ~/apps/cc/generate-image-openai.py — OpenAI gpt-image-2 (default) / gpt-image-1
    • ~/apps/cc/generate-image.py — Gemini nano-banana-pro-preview
  • Doodle workflow wrapper: ~/apps/doodles/generate-doodle.py (adds whiteboard preamble + concept folder output)
  • Skill: ~/.claude/commands/doodle.md
  • Capability registry entry: ~/apps/cc/memory/capabilities.md under Generate / Create
James: make me a doodle of the comment ladder DM technique —
hand raise, DM, then by-the-way offer
Claude: [picks three different metaphors: fishing, ladder, funnel]
[runs generate-doodle.py three times in parallel — Cloudflare
Flux, free default — outputs land in
~/apps/doodles/doodles/comment-ladder-dm-technique/]
[opens the folder in Finder]
Three variants in Finder. The small labels in the fishing
one look a bit garbled — want me to re-run that through
gpt-image-2?
James: yes, try OpenAI
Claude: [re-runs with --backend openai, three new files land
next to the CF ones]
James: the OpenAI fishing one is the winner
Claude: [copies the winning PNG to tms-internal/public/playbooks/,
drafts the matching playbook page]
  • Cloudflare Flux Schnell: free within the 10k-neuron daily allotment (roughly 40 images/day)
  • OpenAI gpt-image-2: paid, priced by quality + resolution tier (announced April 21, 2026)
  • OpenAI gpt-image-1: older tier, still available via --model gpt-image-1
  • Gemini Nano Banana Pro (nano-banana-pro-preview): ~11–15¢ per image
  • Gemini Nano Banana (gemini-2.5-flash-image): ~4¢ per image, lower quality

Default to Tier 1. Escalate only on actual failure or a live-publish need. For ten iterations on the same prompt via Flux, total cost is zero.

  • No native aspect-ratio flag on the Gemini script. State the ratio in the prompt (“16:9 landscape composition”). The SDK does accept an aspect_ratio parameter worth wiring up explicitly.
  • Reference-image weighting is implicit. Gemini --ref works; no way to say “face from ref 1, outfit from ref 2” — the model blends based on the prompt.
  • No edit-in-place in the scripts. gpt-image-2 supports multi-turn editing (“change the background to sunset, make the text larger” on an existing frame) but the current CLI is generate-from-scratch only. A --edit <path> mode is a natural next add.
  • Auto-escalation not implemented. Today the operator manually escalates CF → OpenAI → Gemini. Future: OCR the Flux output, auto-escalate when garbled text is detected.