Capabilities
Generate an Image
What It Does
Section titled “What It Does”Takes a text prompt and returns a PNG saved to disk. Three image backends behind an escalation ladder — start free, pay only when the output demands it.
| Tier | Script | Model | Cost | Strength |
|---|---|---|---|---|
| 1 | generate-image-cloudflare.py | Flux Schnell | Free (~250 neurons against 10k/day) | Fast, nails the whiteboard-marker voice, open source |
| 2 | generate-image-openai.py | gpt-image-2 (launched Apr 21, 2026) | Paid | Near-perfect text rendering, 4K output, reasoning-powered |
| 3 | generate-image.py | nano-banana-pro-preview (Gemini 3 Pro Image) | Paid | Highest-detail finish, reference-photo support for consistent characters |
Strong at three things specifically:
- Marker-style marketing doodles — Russell-Brunson-style stick-figure playbooks with readable all-caps labels. Flux nails the voice for free. When small labels come back garbled (“YOUUBST” for “YouTube”),
gpt-image-2fixes it directly. For a tms-internal publish-ready finish, escalate to Nano Banana Pro. - YouTube thumbnails — large bold headlines, face + text layouts, dramatic compositions. All three backends handle this; pick based on cost and text precision needs.
- Consistent characters across a series — Nano Banana Pro only. Pass a reference photo with
--refand the model keeps the same face, outfit, or mascot across generations.
The /doodle Skill Wires It Together
Section titled “The /doodle Skill Wires It Together”For stick-figure marketing doodles, use the /doodle skill (defined at ~/.claude/commands/doodle.md). It auto-invokes on phrases like “make a doodle of that” and generates three different metaphors in parallel via Cloudflare Flux (free default). If the batch has garbled labels, say “try OpenAI” to re-run with gpt-image-2. For publish-ready finishes, say “try Gemini” to escalate to Nano Banana Pro. Full documentation in the doodles repo CLAUDE.md.
When To Use
Section titled “When To Use”- Marketing playbook doodles — “make me a doodle of {concept}”, stick-figure explainers for blog posts, social posts, or whiteboard-style lead magnets. Start with
/doodlewhich defaults to CF Flux. - YouTube thumbnails — especially face-plus-headline layouts.
gpt-image-2is the new sweet spot since it renders text cleanly at thumbnail scale. - Illustrated explainers — diagrams, flowcharts, funnels, metaphorical illustrations (fishing, ladders, conveyor belts). CF Flux is usually enough; escalate when labels need to be readable at small sizes.
- Series-consistent imagery — Nano Banana Pro with
--ref. Hero images for a multi-part blog series, character mascots, repeated icons.
Not for: photorealistic portraits, licensed character work, or production logos.
How To Invoke
Section titled “How To Invoke”Cloudflare Flux (free, start here)
Section titled “Cloudflare Flux (free, start here)”python3 ~/apps/cc/generate-image-cloudflare.py \ --prompt "black marker on white whiteboard, stick figures..." \ --out ~/Desktop/output.pngOpenAI gpt-image-2 (first paid escalation — fixes garbled text)
Section titled “OpenAI gpt-image-2 (first paid escalation — fixes garbled text)”python3 ~/apps/cc/generate-image-openai.py \ --prompt "..." \ --out ~/Desktop/output.png# default model is gpt-image-2; pass --model gpt-image-1 to hit the older tierGemini Nano Banana Pro (premium, reference-photo support)
Section titled “Gemini Nano Banana Pro (premium, reference-photo support)”python3 ~/apps/cc/generate-image.py \ --prompt "..." \ --out ~/Desktop/output.png
# with reference photo for consistent faces on thumbnailspython3 ~/apps/cc/generate-image.py \ --prompt "YouTube thumbnail: bold yellow headline..." \ --ref ~/apps/james-voice/headshot.jpg \ --out ~/Desktop/thumbnail.pngPrompts can also come from stdin:
cat prompt.txt | python3 ~/apps/cc/generate-image-cloudflare.py --out ~/Desktop/output.pngPrompt Patterns That Work
Section titled “Prompt Patterns That Work”All three backends respond well to specific, scene-level direction. A good prompt names:
- Style — “hand-drawn marker on white paper”, “whiteboard sketch”, “Russell Brunson DotCom Secrets diagram”, “Saul Bass poster”
- Composition — “three panels left-to-right”, “single scene”, “top-down diagram”, “funnel narrowing from top to bottom”
- Characters — “stick figures with circle heads”, “cartoon mascot”, “stylized face”
- Text rules — “all-caps block printed handwriting”, “labels in black marker”, “one bold headline”
- Negatives — “no color except black on white”, “no shading”, “no photorealism”
The more specific the scene-level description, the closer the first draft lands. Vague prompts produce generic stock-illustration-looking output on every backend.
Flux is known-bad at hand-lettered small text — if the prompt has labels under ~18pt rendered, escalate to gpt-image-2 which fixes exactly that failure mode.
Prerequisites
Section titled “Prerequisites”| Requirement | Where |
|---|---|
| Cloudflare Workers AI token | CF_WORKERS_AI_TOKEN (fallback: CF_API_TOKEN) in secrets.json |
| Cloudflare account ID | CF_ACCOUNT_ID in secrets.json |
| OpenAI API key | OPENAI_API_KEY in secrets.json |
| Gemini API key | GEMINI_API_KEY in secrets.json |
| Python 3 | system |
Where It Lives
Section titled “Where It Lives”- Scripts:
~/apps/cc/generate-image-cloudflare.py— Cloudflare Flux Schnell (free)~/apps/cc/generate-image-openai.py— OpenAIgpt-image-2(default) /gpt-image-1~/apps/cc/generate-image.py— Gemininano-banana-pro-preview
- Doodle workflow wrapper:
~/apps/doodles/generate-doodle.py(adds whiteboard preamble + concept folder output) - Skill:
~/.claude/commands/doodle.md - Capability registry entry:
~/apps/cc/memory/capabilities.mdunder Generate / Create
Example Session
Section titled “Example Session”James: make me a doodle of the comment ladder DM technique — hand raise, DM, then by-the-way offer
Claude: [picks three different metaphors: fishing, ladder, funnel] [runs generate-doodle.py three times in parallel — Cloudflare Flux, free default — outputs land in ~/apps/doodles/doodles/comment-ladder-dm-technique/] [opens the folder in Finder] Three variants in Finder. The small labels in the fishing one look a bit garbled — want me to re-run that through gpt-image-2?
James: yes, try OpenAI
Claude: [re-runs with --backend openai, three new files land next to the CF ones]
James: the OpenAI fishing one is the winner
Claude: [copies the winning PNG to tms-internal/public/playbooks/, drafts the matching playbook page]- Cloudflare Flux Schnell: free within the 10k-neuron daily allotment (roughly 40 images/day)
- OpenAI
gpt-image-2: paid, priced by quality + resolution tier (announced April 21, 2026) - OpenAI
gpt-image-1: older tier, still available via--model gpt-image-1 - Gemini Nano Banana Pro (
nano-banana-pro-preview): ~11–15¢ per image - Gemini Nano Banana (
gemini-2.5-flash-image): ~4¢ per image, lower quality
Default to Tier 1. Escalate only on actual failure or a live-publish need. For ten iterations on the same prompt via Flux, total cost is zero.
Known Gaps / TODOs
Section titled “Known Gaps / TODOs”- No native aspect-ratio flag on the Gemini script. State the ratio in the prompt (“16:9 landscape composition”). The SDK does accept an
aspect_ratioparameter worth wiring up explicitly. - Reference-image weighting is implicit. Gemini
--refworks; no way to say “face from ref 1, outfit from ref 2” — the model blends based on the prompt. - No edit-in-place in the scripts.
gpt-image-2supports multi-turn editing (“change the background to sunset, make the text larger” on an existing frame) but the current CLI is generate-from-scratch only. A--edit <path>mode is a natural next add. - Auto-escalation not implemented. Today the operator manually escalates CF → OpenAI → Gemini. Future: OCR the Flux output, auto-escalate when garbled text is detected.
Related
Section titled “Related”- Create a Blog Post — pair with this to illustrate new posts
- Redact Images — for scrubbing PII out of screenshots before publishing
- ChatGPT Images 2.0 announcement (April 2026)
- Gemini image generation docs
- Cloudflare Workers AI — Flux Schnell