Skill: Image Generation via Browser (Gemini)
What
Generate panel images for investigations using Gemini in the browser via Chrome MCP extension. Uses the multi-tab fire-and-harvest approach for throughput with reliable single-tab download accuracy.
Evolution Log
- v1: Multi-tab parallel with parallel downloads → abandoned (panel swaps, download failures)
- v2: Single-tab sequential → reliable but slow (~60-90s per panel including text response wait)
- v3 (current): Multi-tab fire-and-harvest → 4 panels per batch, ~2 min per batch = ~30s/panel effective
ONE MODEL PER PROJECT (NON-NEGOTIABLE)
Use ONE model for an entire investigation. Mixing models creates visual inconsistency.
Default: Gemini. Superior quality, better artistic interpretation, better atmosphere.
Switch to ChatGPT ONLY when:
- Gemini's daily image generation quota is exhausted
- Gemini repeatedly refuses a prompt (content policy)
- Gemini download mechanism is completely broken after all retries
Prerequisites
scene-plan.md exists with panel prompts
- Panel folder exists:
projects/{id}/panels/{style}/
- Chrome MCP tools available (Claude in Chrome extension)
- User logged into Gemini in browser
The Workflow: Multi-Tab Fire-and-Harvest
Concept
Instead of generating one panel at a time (waiting 60-90s each for image + text response), open N tabs, fire a prompt into each one, wait once for all to finish, then go back and download from each tab.
PASS 1 — FIRE (fast, ~30s for 4 tabs):
Tab A: click input → prime → type prompt → send
Tab B: click input → prime → type prompt → send
Tab C: click input → prime → type prompt → send
Tab D: click input → prime → type prompt → send
WAIT (~60-90s for all 4 to generate images + text responses)
PASS 2 — HARVEST (fast, ~30s for 4 tabs):
Tab A: scroll down → hover image → click download icon
Tab B: scroll down → hover image → click download icon
Tab C: scroll down → hover image → click download icon
Tab D: scroll down → hover image → click download icon
MOVE FILES (~10s):
Check ~/Downloads for Gemini files
Move in timestamp order → panel-NN.png (oldest = first tab fired)
Effective throughput: ~30s per panel vs ~90s sequential. 3x faster.
Phase 1: Setup
1. Create N tabs (4 is the sweet spot — more creates context overhead):
tabs_create_mcp (repeat N-1 times, first tab already exists)
2. Navigate ALL tabs to Gemini simultaneously:
navigate(url: "https://gemini.google.com/app", tabId: each)
→ Can fire all navigate calls in parallel
3. Wait 5 seconds for all pages to fully load
4. WARM UP each tab by taking a screenshot:
screenshot(tabId: each)
→ This is CRITICAL. The Chrome extension doesn't properly connect to a tab
→ until it takes a screenshot of it. Without this, type() silently drops text.
→ This was the #1 cause of failed prompts on fresh tabs.
5. Record the tab-to-panel mapping:
Tab {tabId_A} → Panel {NN}
Tab {tabId_B} → Panel {NN+1}
Tab {tabId_C} → Panel {NN+2}
Tab {tabId_D} → Panel {NN+3}
Phase 2: Fire Prompts (Pass 1)
For EACH tab, in sequence:
PER TAB:
1. Click the input field area (coordinate ~762, 286 for full viewport)
2. Wait 1 second
3. PRIME the input: type "x" then Ctrl+A → Backspace
→ This is CRITICAL. Fresh Gemini pages silently drop the first typed text.
→ Typing a single character "x" then clearing it primes the input to accept real text.
4. Type the prompt (ASCII only — no em dashes, curly quotes, or Unicode)
5. Click the send arrow (coordinate ~1143, 372 for full viewport)
→ If viewport is narrower (multiple tabs visible), coordinates shift:
1145px wide: send at ~955, 362
1524px wide: send at ~1143, 372
→ The active/focused tab gets full viewport width
6. Verify: URL should change from /app to /app/{hash}
→ If URL didn't change: the text or send click failed. Retry.
7. Move to next tab immediately — DON'T wait for generation
Critical: The "x" prime trick.
Fresh Gemini pages have a bug where the first computer(type) action is silently dropped.
The input field appears to accept text but then clears it. Typing "x" → Ctrl+A → Backspace
forces the input into an active state. After priming, the real prompt types correctly.
Coordinate awareness:
When you switch between tabs, the viewport width may change (Chrome resizes tabs).
The input field and send button coordinates shift. Always use coordinates appropriate
for the current viewport width. Screenshot to verify if unsure.
Phase 3: Wait
Wait 60-90 seconds for all tabs to generate images + text responses.
Use: computer(wait, duration: 10) repeated 6-9 times.
Don't check individual tabs during this time — let them all cook.
The text responses are the bottleneck (~30-60s each), but they run in parallel.
Phase 4: Harvest Downloads (Pass 2)
For EACH tab, in sequence:
PER TAB:
1. Scroll down to find the generated image:
scroll(direction: down, amount: 10)
→ The image is below the prompt message, above the input field
2. Verify the image is fully generated:
→ Mic icon visible (not stop button) = generation complete
→ If stop button still visible: wait 10s more, or skip to next tab and come back
3. Hover over the image to reveal download icons:
hover(coordinate: ~700, 240)
→ Three icons appear in top-right corner: [share] [copy] [download ↓]
4. Click the download icon (rightmost):
click(coordinate: ~900, 135)
→ This is the arrow-into-tray icon
→ DO NOT click the image center — that opens the lightbox
→ Click the TOP-RIGHT area where the download icon appears
5. Move to next tab immediately — don't wait for download to complete
Phase 5: Move Files
1. Wait 15 seconds after last download click (files are 8-10MB each)
2. List downloaded files by timestamp:
ls -lt ~/Downloads/Gemini_Generated_Image*.png
3. Map files to panels by timestamp order:
→ Oldest file = first tab's download (Panel NN)
→ Newest file = last tab's download (Panel NN+3)
→ This works because hover-downloads are triggered sequentially
4. Move each file:
mv ~/Downloads/Gemini_*.png panels/{style}/panel-{NN}.png
5. Verify count matches expected (should be N files for N tabs)
Phase 6: Reuse Tabs for Next Batch
The same tabs can be reused for the next batch of panels.
Each tab already has one conversation — the next prompt goes into the same chat.
PER TAB (for subsequent batches):
1. Click the input field (coordinate ~762, 491 — below the image)
2. Wait 1 second
3. Type the next prompt (no need to prime — only fresh pages need priming)
4. Click send (coordinate ~1143, 542 — send arrow shifts down when input has text)
5. Move to next tab
IMPORTANT: Wait for ALL tabs' previous responses to fully complete (mic icon visible)
before typing the next prompt. If Gemini is still generating text (stop button visible),
the input field is locked and typed text gets dropped.
Download Method: Hover-Icon (THE ONLY RELIABLE WAY)
1. DO NOT click the image (that opens the lightbox)
2. HOVER over the image thumbnail in the chat
3. Three small circular icons appear in the top-right corner:
[share] [copy] [download ↓]
4. Click the download icon (rightmost, arrow-into-tray)
5. File appears as: Gemini_Generated_Image_{hash}.png in ~/Downloads
Why not lightbox? The lightbox "Download full-sized image" button works ~30% of the time.
The hover-icon method works consistently.
Why 15 seconds wait? Files are 8-10MB. They take time to write to disk.
Checking at 3-5 seconds causes false negatives that waste time in retry loops.
Critical Rules
- Screenshot each tab before typing. The Chrome extension must screenshot a tab before it can reliably type into it. Without this warmup, type() silently drops all text. This is the #1 reliability fix.
- ASCII only in prompts. Em dashes, curly quotes, Unicode get garbled. Use commas and periods.
- Prime fresh pages if screenshot warmup isn't enough. Type "x" → Ctrl+A → Backspace. Usually screenshot alone is sufficient.
- Subsequent prompts in same tab don't need priming or warmup. Only fresh pages need it.
- Wait for full completion before reusing a tab. Stop button = locked input. Mic icon = ready.
- NEVER stop Gemini's text response. Stopping leaves stale text in the input field. Next prompt gets appended to it, creating merged prompts that generate wrong images.
- If you DO stop a response: Ctrl+A → Backspace to clear before typing next prompt.
- Hover-download, not lightbox. The lightbox method is unreliable.
- 15 second download wait. Don't check earlier for large files.
- Timestamp ordering for batch moves. Files are created in the order you triggered downloads.
- Record tab-to-panel mapping. Use tab IDs as keys to avoid panel swaps.
Gemini Session Limit
No hard session limit observed. Sessions can handle many images without refreshing.
If text responses start taking >30s consistently, navigate to fresh /app for that tab.
Batch Size Recommendations
| Panels remaining | Batch size | Why |
|---|
| 1-3 | 1-3 tabs | Not worth the setup overhead for fewer |
| 4-8 | 4 tabs | Sweet spot — manageable, good throughput |
| 9-16 | 4 tabs x 2-4 batches | Reuse tabs across batches |
| 17-25 | 4 tabs x 5-7 batches | Full investigation |
Failure Modes and Fixes
| Problem | Fix |
|---|
| Text vanishes on fresh page | Prime with "x" → Ctrl+A → Backspace |
| Send button doesn't click | Screenshot to find exact coordinates — they shift with viewport width |
| Image not visible after scrolling | Scroll more (down 10 ticks), or scroll up if overshot |
| Download icon click opens lightbox | You clicked the image center, not the icon. Hover first, then click top-right (~900, 135) |
| Multiple Gemini files, unsure which is which | Use timestamp order — oldest = first tab downloaded |
| Tab still generating when trying to type next batch | Wait for mic icon. Don't type during generation. |
| Merged prompts (two prompts concatenated) | Previous response was stopped, leaving stale text. Clear with Ctrl+A → Backspace. |
Example: Full 4-Panel Batch
SETUP:
Tab 2082987463 → Panel 15
Tab 2082987464 → Panel 16
Tab 2082987465 → Panel 17
Tab 2082987466 → Panel 18
FIRE (30s):
Tab A: click input → x → Ctrl+A Backspace → type prompt 15 → click send ✓
Tab B: click input → x → Ctrl+A Backspace → type prompt 16 → click send ✓
Tab C: click input → x → Ctrl+A Backspace → type prompt 17 → click send ✓
Tab D: click input → x → Ctrl+A Backspace → type prompt 18 → click send ✓
WAIT (60-90s):
wait 10s x 6-9 times
HARVEST (30s):
Tab A: scroll down → hover → click download icon
Tab B: scroll down → hover → click download icon
Tab C: scroll down → hover → click download icon
Tab D: scroll down → hover → click download icon
MOVE (10s):
ls -lt ~/Downloads/Gemini*.png → 4 files
oldest → panel-15.png
next → panel-16.png
next → panel-17.png
newest → panel-18.png
TOTAL: ~2 minutes for 4 panels = 30s/panel
vs sequential: ~6 minutes for 4 panels = 90s/panel
Naming Convention
- Panel files:
panel-{NN}.png (zero-padded: 01-25)
- Never use UUID filenames — always rename when moving
- One model per project for visual consistency
Validation
After each batch, spot-check 1-2 panels:
Read projects/{id}/panels/{style}/panel-{NN}.png
Compare to scene-plan.md description
If it doesn't match: delete and regenerate in next batch
Full validation of all 25 panels happens after all batches complete.