Skill: Image Generation via Browser (Gemini)

Generate panel images for investigations using Gemini in the browser via Chrome MCP extension. Uses the **multi-tab fire-and-harvest** approach for throughput with reliable single-tab download accuracy.

PublishedMay 30, 2026

Loading actions...

5 minBeginnerprompt21 files

Skill content

Main instructions and any bundled files for this skill.

markdown

Additional Files (20)

Skill: Image Generation via Browser (Gemini)

What

Generate panel images for investigations using Gemini in the browser via Chrome MCP extension. Uses the multi-tab fire-and-harvest approach for throughput with reliable single-tab download accuracy.

Evolution Log

v1: Multi-tab parallel with parallel downloads → abandoned (panel swaps, download failures)
v2: Single-tab sequential → reliable but slow (~60-90s per panel including text response wait)
v3 (current): Multi-tab fire-and-harvest → 4 panels per batch, ~2 min per batch = ~30s/panel effective

ONE MODEL PER PROJECT (NON-NEGOTIABLE)

Use ONE model for an entire investigation. Mixing models creates visual inconsistency.

Default: Gemini. Superior quality, better artistic interpretation, better atmosphere.

Switch to ChatGPT ONLY when:

Gemini's daily image generation quota is exhausted
Gemini repeatedly refuses a prompt (content policy)
Gemini download mechanism is completely broken after all retries

Prerequisites

scene-plan.md exists with panel prompts
Panel folder exists: projects/{id}/panels/{style}/
Chrome MCP tools available (Claude in Chrome extension)
User logged into Gemini in browser

The Workflow: Multi-Tab Fire-and-Harvest

Concept

Instead of generating one panel at a time (waiting 60-90s each for image + text response), open N tabs, fire a prompt into each one, wait once for all to finish, then go back and download from each tab.

PASS 1 — FIRE (fast, ~30s for 4 tabs):
  Tab A: click input → prime → type prompt → send
  Tab B: click input → prime → type prompt → send
  Tab C: click input → prime → type prompt → send
  Tab D: click input → prime → type prompt → send

WAIT (~60-90s for all 4 to generate images + text responses)

PASS 2 — HARVEST (fast, ~30s for 4 tabs):
  Tab A: scroll down → hover image → click download icon
  Tab B: scroll down → hover image → click download icon
  Tab C: scroll down → hover image → click download icon
  Tab D: scroll down → hover image → click download icon

MOVE FILES (~10s):
  Check ~/Downloads for Gemini files
  Move in timestamp order → panel-NN.png (oldest = first tab fired)

Effective throughput: ~30s per panel vs ~90s sequential. 3x faster.

Phase 1: Setup

1. Create N tabs (4 is the sweet spot — more creates context overhead):
   tabs_create_mcp (repeat N-1 times, first tab already exists)

2. Navigate ALL tabs to Gemini simultaneously:
   navigate(url: "https://gemini.google.com/app", tabId: each)
   → Can fire all navigate calls in parallel

3. Wait 5 seconds for all pages to fully load

4. WARM UP each tab by taking a screenshot:
   screenshot(tabId: each)
   → This is CRITICAL. The Chrome extension doesn't properly connect to a tab
   → until it takes a screenshot of it. Without this, type() silently drops text.
   → This was the #1 cause of failed prompts on fresh tabs.

5. Record the tab-to-panel mapping:
   Tab {tabId_A} → Panel {NN}
   Tab {tabId_B} → Panel {NN+1}
   Tab {tabId_C} → Panel {NN+2}
   Tab {tabId_D} → Panel {NN+3}

Phase 2: Fire Prompts (Pass 1)

For EACH tab, in sequence:

PER TAB:
1. Click the input field area (coordinate ~762, 286 for full viewport)
2. Wait 1 second
3. PRIME the input: type "x" then Ctrl+A → Backspace
   → This is CRITICAL. Fresh Gemini pages silently drop the first typed text.
   → Typing a single character "x" then clearing it primes the input to accept real text.
4. Type the prompt (ASCII only — no em dashes, curly quotes, or Unicode)
5. Click the send arrow (coordinate ~1143, 372 for full viewport)
   → If viewport is narrower (multiple tabs visible), coordinates shift:
     1145px wide: send at ~955, 362
     1524px wide: send at ~1143, 372
   → The active/focused tab gets full viewport width
6. Verify: URL should change from /app to /app/{hash}
   → If URL didn't change: the text or send click failed. Retry.
7. Move to next tab immediately — DON'T wait for generation

Critical: The "x" prime trick. Fresh Gemini pages have a bug where the first computer(type) action is silently dropped. The input field appears to accept text but then clears it. Typing "x" → Ctrl+A → Backspace forces the input into an active state. After priming, the real prompt types correctly.

Coordinate awareness: When you switch between tabs, the viewport width may change (Chrome resizes tabs). The input field and send button coordinates shift. Always use coordinates appropriate for the current viewport width. Screenshot to verify if unsure.

Phase 3: Wait

Wait 60-90 seconds for all tabs to generate images + text responses.
Use: computer(wait, duration: 10) repeated 6-9 times.

Don't check individual tabs during this time — let them all cook.
The text responses are the bottleneck (~30-60s each), but they run in parallel.

Phase 4: Harvest Downloads (Pass 2)

For EACH tab, in sequence:

PER TAB:
1. Scroll down to find the generated image:
   scroll(direction: down, amount: 10)
   → The image is below the prompt message, above the input field

2. Verify the image is fully generated:
   → Mic icon visible (not stop button) = generation complete
   → If stop button still visible: wait 10s more, or skip to next tab and come back

3. Hover over the image to reveal download icons:
   hover(coordinate: ~700, 240)
   → Three icons appear in top-right corner: [share] [copy] [download ↓]

4. Click the download icon (rightmost):
   click(coordinate: ~900, 135)
   → This is the arrow-into-tray icon
   → DO NOT click the image center — that opens the lightbox
   → Click the TOP-RIGHT area where the download icon appears

5. Move to next tab immediately — don't wait for download to complete

Phase 5: Move Files

1. Wait 15 seconds after last download click (files are 8-10MB each)

2. List downloaded files by timestamp:
   ls -lt ~/Downloads/Gemini_Generated_Image*.png

3. Map files to panels by timestamp order:
   → Oldest file = first tab's download (Panel NN)
   → Newest file = last tab's download (Panel NN+3)
   → This works because hover-downloads are triggered sequentially

4. Move each file:
   mv ~/Downloads/Gemini_*.png panels/{style}/panel-{NN}.png

5. Verify count matches expected (should be N files for N tabs)

Phase 6: Reuse Tabs for Next Batch

The same tabs can be reused for the next batch of panels.
Each tab already has one conversation — the next prompt goes into the same chat.

PER TAB (for subsequent batches):
1. Click the input field (coordinate ~762, 491 — below the image)
2. Wait 1 second
3. Type the next prompt (no need to prime — only fresh pages need priming)
4. Click send (coordinate ~1143, 542 — send arrow shifts down when input has text)
5. Move to next tab

IMPORTANT: Wait for ALL tabs' previous responses to fully complete (mic icon visible)
before typing the next prompt. If Gemini is still generating text (stop button visible),
the input field is locked and typed text gets dropped.

Download Method: Hover-Icon (THE ONLY RELIABLE WAY)

1. DO NOT click the image (that opens the lightbox)
2. HOVER over the image thumbnail in the chat
3. Three small circular icons appear in the top-right corner:
   [share]  [copy]  [download ↓]
4. Click the download icon (rightmost, arrow-into-tray)
5. File appears as: Gemini_Generated_Image_{hash}.png in ~/Downloads

Why not lightbox? The lightbox "Download full-sized image" button works ~30% of the time. The hover-icon method works consistently.

Why 15 seconds wait? Files are 8-10MB. They take time to write to disk. Checking at 3-5 seconds causes false negatives that waste time in retry loops.

Critical Rules

Screenshot each tab before typing. The Chrome extension must screenshot a tab before it can reliably type into it. Without this warmup, type() silently drops all text. This is the #1 reliability fix.
ASCII only in prompts. Em dashes, curly quotes, Unicode get garbled. Use commas and periods.
Prime fresh pages if screenshot warmup isn't enough. Type "x" → Ctrl+A → Backspace. Usually screenshot alone is sufficient.
Subsequent prompts in same tab don't need priming or warmup. Only fresh pages need it.
Wait for full completion before reusing a tab. Stop button = locked input. Mic icon = ready.
NEVER stop Gemini's text response. Stopping leaves stale text in the input field. Next prompt gets appended to it, creating merged prompts that generate wrong images.
If you DO stop a response: Ctrl+A → Backspace to clear before typing next prompt.
Hover-download, not lightbox. The lightbox method is unreliable.
15 second download wait. Don't check earlier for large files.
Timestamp ordering for batch moves. Files are created in the order you triggered downloads.
Record tab-to-panel mapping. Use tab IDs as keys to avoid panel swaps.

Gemini Session Limit

No hard session limit observed. Sessions can handle many images without refreshing. If text responses start taking >30s consistently, navigate to fresh /app for that tab.

Batch Size Recommendations

Panels remaining	Batch size	Why
1-3	1-3 tabs	Not worth the setup overhead for fewer
4-8	4 tabs	Sweet spot — manageable, good throughput
9-16	4 tabs x 2-4 batches	Reuse tabs across batches
17-25	4 tabs x 5-7 batches	Full investigation

Failure Modes and Fixes

Problem	Fix
Text vanishes on fresh page	Prime with "x" → Ctrl+A → Backspace
Send button doesn't click	Screenshot to find exact coordinates — they shift with viewport width
Image not visible after scrolling	Scroll more (down 10 ticks), or scroll up if overshot
Download icon click opens lightbox	You clicked the image center, not the icon. Hover first, then click top-right (~900, 135)
Multiple Gemini files, unsure which is which	Use timestamp order — oldest = first tab downloaded
Tab still generating when trying to type next batch	Wait for mic icon. Don't type during generation.
Merged prompts (two prompts concatenated)	Previous response was stopped, leaving stale text. Clear with Ctrl+A → Backspace.

Example: Full 4-Panel Batch

SETUP:
  Tab 2082987463 → Panel 15
  Tab 2082987464 → Panel 16
  Tab 2082987465 → Panel 17
  Tab 2082987466 → Panel 18

FIRE (30s):
  Tab A: click input → x → Ctrl+A Backspace → type prompt 15 → click send ✓
  Tab B: click input → x → Ctrl+A Backspace → type prompt 16 → click send ✓
  Tab C: click input → x → Ctrl+A Backspace → type prompt 17 → click send ✓
  Tab D: click input → x → Ctrl+A Backspace → type prompt 18 → click send ✓

WAIT (60-90s):
  wait 10s x 6-9 times

HARVEST (30s):
  Tab A: scroll down → hover → click download icon
  Tab B: scroll down → hover → click download icon
  Tab C: scroll down → hover → click download icon
  Tab D: scroll down → hover → click download icon

MOVE (10s):
  ls -lt ~/Downloads/Gemini*.png → 4 files
  oldest → panel-15.png
  next → panel-16.png
  next → panel-17.png
  newest → panel-18.png

TOTAL: ~2 minutes for 4 panels = 30s/panel
vs sequential: ~6 minutes for 4 panels = 90s/panel

Naming Convention

Panel files: panel-{NN}.png (zero-padded: 01-25)
Never use UUID filenames — always rename when moving
One model per project for visual consistency

Validation

After each batch, spot-check 1-2 panels:

Read projects/{id}/panels/{style}/panel-{NN}.png
Compare to scene-plan.md description
If it doesn't match: delete and regenerate in next batch

Full validation of all 25 panels happens after all batches complete.

Contents

View Original Source

Related Skills

General

PromptBeginner5 minmarkdown

Untitled Skill

193

Jan 12, 2026

General

PromptBeginner5 minmarkdown

Frontend Typescript Linting.mdc

TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend li...

160

Feb 15, 2026

General

PromptBeginner5 minmarkdown

2. Apply Deepthink Protocol (reason about dependencies

risks

127

Jan 15, 2026