Capabilities

Generate an Image

What It Does

Takes a text prompt and returns a PNG saved to disk. Three image backends behind an escalation ladder — start free, pay only when the output demands it.

Tier	Script	Model	Cost	Strength
1	`generate-image-cloudflare.py`	Flux Schnell	Free (~250 neurons against 10k/day)	Fast, nails the whiteboard-marker voice, open source
2	`generate-image-openai.py`	`gpt-image-2` (launched Apr 21, 2026)	Paid	Near-perfect text rendering, 4K output, reasoning-powered
3	`generate-image.py`	`nano-banana-pro-preview` (Gemini 3 Pro Image)	Paid	Highest-detail finish, reference-photo support for consistent characters

Strong at three things specifically:

Marker-style marketing doodles — Russell-Brunson-style stick-figure playbooks with readable all-caps labels. Flux nails the voice for free. When small labels come back garbled (“YOUUBST” for “YouTube”), gpt-image-2 fixes it directly. For a tms-internal publish-ready finish, escalate to Nano Banana Pro.
YouTube thumbnails — large bold headlines, face + text layouts, dramatic compositions. All three backends handle this; pick based on cost and text precision needs.
Consistent characters across a series — Nano Banana Pro only. Pass a reference photo with --ref and the model keeps the same face, outfit, or mascot across generations.

The `/doodle` Skill Wires It Together

For stick-figure marketing doodles, use the /doodle skill (defined at ~/.claude/commands/doodle.md). It auto-invokes on phrases like “make a doodle of that” and generates three different metaphors in parallel via Cloudflare Flux (free default). If the batch has garbled labels, say “try OpenAI” to re-run with gpt-image-2. For publish-ready finishes, say “try Gemini” to escalate to Nano Banana Pro. Full documentation in the doodles repo CLAUDE.md.

When To Use

Marketing playbook doodles — “make me a doodle of {concept}”, stick-figure explainers for blog posts, social posts, or whiteboard-style lead magnets. Start with /doodle which defaults to CF Flux.
YouTube thumbnails — especially face-plus-headline layouts. gpt-image-2 is the new sweet spot since it renders text cleanly at thumbnail scale.
Illustrated explainers — diagrams, flowcharts, funnels, metaphorical illustrations (fishing, ladders, conveyor belts). CF Flux is usually enough; escalate when labels need to be readable at small sizes.
Series-consistent imagery — Nano Banana Pro with --ref. Hero images for a multi-part blog series, character mascots, repeated icons.

Not for: photorealistic portraits, licensed character work, or production logos.

How To Invoke

Cloudflare Flux (free, start here)

python3 ~/apps/cc/generate-image-cloudflare.py \
  --prompt "black marker on white whiteboard, stick figures..." \
  --out ~/Desktop/output.png

OpenAI gpt-image-2 (first paid escalation — fixes garbled text)

python3 ~/apps/cc/generate-image-openai.py \
  --prompt "..." \
  --out ~/Desktop/output.png
# default model is gpt-image-2; pass --model gpt-image-1 to hit the older tier

Gemini Nano Banana Pro (premium, reference-photo support)

python3 ~/apps/cc/generate-image.py \
  --prompt "..." \
  --out ~/Desktop/output.png

# with reference photo for consistent faces on thumbnails
python3 ~/apps/cc/generate-image.py \
  --prompt "YouTube thumbnail: bold yellow headline..." \
  --ref ~/apps/james-voice/headshot.jpg \
  --out ~/Desktop/thumbnail.png

Prompts can also come from stdin:

cat prompt.txt | python3 ~/apps/cc/generate-image-cloudflare.py --out ~/Desktop/output.png

Prompt Patterns That Work

All three backends respond well to specific, scene-level direction. A good prompt names:

Style — “hand-drawn marker on white paper”, “whiteboard sketch”, “Russell Brunson DotCom Secrets diagram”, “Saul Bass poster”
Composition — “three panels left-to-right”, “single scene”, “top-down diagram”, “funnel narrowing from top to bottom”
Characters — “stick figures with circle heads”, “cartoon mascot”, “stylized face”
Text rules — “all-caps block printed handwriting”, “labels in black marker”, “one bold headline”
Negatives — “no color except black on white”, “no shading”, “no photorealism”

The more specific the scene-level description, the closer the first draft lands. Vague prompts produce generic stock-illustration-looking output on every backend.

Flux is known-bad at hand-lettered small text — if the prompt has labels under ~18pt rendered, escalate to gpt-image-2 which fixes exactly that failure mode.

Prerequisites

Requirement	Where
Cloudflare Workers AI token	`CF_WORKERS_AI_TOKEN` (fallback: `CF_API_TOKEN`) in `secrets.json`
Cloudflare account ID	`CF_ACCOUNT_ID` in `secrets.json`
OpenAI API key	`OPENAI_API_KEY` in `secrets.json`
Gemini API key	`GEMINI_API_KEY` in `secrets.json`
Python 3	system

Where It Lives

Scripts:
- ~/apps/cc/generate-image-cloudflare.py — Cloudflare Flux Schnell (free)
- ~/apps/cc/generate-image-openai.py — OpenAI gpt-image-2 (default) / gpt-image-1
- ~/apps/cc/generate-image.py — Gemini nano-banana-pro-preview
Doodle workflow wrapper: ~/apps/doodles/generate-doodle.py (adds whiteboard preamble + concept folder output)
Skill: ~/.claude/commands/doodle.md
Capability registry entry: ~/apps/cc/memory/capabilities.md under Generate / Create

Example Session

James: make me a doodle of the comment ladder DM technique —
       hand raise, DM, then by-the-way offer

Claude: [picks three different metaphors: fishing, ladder, funnel]
        [runs generate-doodle.py three times in parallel — Cloudflare
         Flux, free default — outputs land in
         ~/apps/doodles/doodles/comment-ladder-dm-technique/]
        [opens the folder in Finder]
        Three variants in Finder. The small labels in the fishing
        one look a bit garbled — want me to re-run that through
        gpt-image-2?

James: yes, try OpenAI

Claude: [re-runs with --backend openai, three new files land
         next to the CF ones]

James: the OpenAI fishing one is the winner

Claude: [copies the winning PNG to tms-internal/public/playbooks/,
         drafts the matching playbook page]

Cost

Cloudflare Flux Schnell: free within the 10k-neuron daily allotment (roughly 40 images/day)
OpenAI gpt-image-2: paid, priced by quality + resolution tier (announced April 21, 2026)
OpenAI gpt-image-1: older tier, still available via --model gpt-image-1
Gemini Nano Banana Pro (nano-banana-pro-preview): ~11–15¢ per image
Gemini Nano Banana (gemini-2.5-flash-image): ~4¢ per image, lower quality

Default to Tier 1. Escalate only on actual failure or a live-publish need. For ten iterations on the same prompt via Flux, total cost is zero.

Known Gaps / TODOs

No native aspect-ratio flag on the Gemini script. State the ratio in the prompt (“16:9 landscape composition”). The SDK does accept an aspect_ratio parameter worth wiring up explicitly.
Reference-image weighting is implicit. Gemini --ref works; no way to say “face from ref 1, outfit from ref 2” — the model blends based on the prompt.
No edit-in-place in the scripts. gpt-image-2 supports multi-turn editing (“change the background to sunset, make the text larger” on an existing frame) but the current CLI is generate-from-scratch only. A --edit <path> mode is a natural next add.
Auto-escalation not implemented. Today the operator manually escalates CF → OpenAI → Gemini. Future: OCR the Flux output, auto-escalate when garbled text is detected.

Create a Blog Post — pair with this to illustrate new posts
Redact Images — for scrubbing PII out of screenshots before publishing
ChatGPT Images 2.0 announcement (April 2026)
Gemini image generation docs
Cloudflare Workers AI — Flux Schnell