PM Local AI Prototyping Guide

01

Prerequisites & Hardware Requirements

What you need before touching a terminal

Gemma 4 model variants & RAM requirements

Model	Parameters	Context	Min VRAM/RAM	Best for PMs	Fit
`gemma4:e2b`	2.3B eff.	128K	5 GB	Quick idea drafts, ultra-low-spec laptops	Limited
`gemma4:e4b`	4.5B eff.	128K	8 GB	Most dev laptops — ideal starting point	Recommended
`gemma4:26b`	26B MoE	256K	18 GB	Complex multi-file prototypes, richer reasoning	Best quality
`gemma4:31b`	31B dense	256K	24 GB	Frontier intelligence locally — workstation/DGX	High-end only

PM tip: Start with gemma4:e4b. It fits a standard dev laptop (16 GB RAM), supports 128K context, handles function calling natively, and runs inference under 2 seconds per token on modern hardware. Upgrade to 26B when you need richer multi-step reasoning.

Windows WSL2 Requirements

Windows 10 v2004+ or Windows 11 (check: winver)
Virtualization enabled in BIOS/UEFI
WSL2 with Ubuntu 22.04 or 24.04
Node.js 18+ inside WSL (not Windows)
Git installed inside WSL environment
Anthropic account (Claude Pro/Max or API key)
~1 GB download for setup; model sizes vary

Linux Requirements

Ubuntu 20.04+, Debian 11+, or Arch
Node.js 18+ (via NVM recommended)
Git + ripgrep installed
NVIDIA GPU with 8+ GB VRAM (optional but faster)
NVIDIA drivers + CUDA 11.8+ for GPU mode
Anthropic account (Claude Pro/Max or API key)
curl, bash, python3 (usually pre-installed)

02

Install Ollama

The local model runtime that exposes an OpenAI-compatible API at localhost:11434

Windows — Enable WSL2 First

Open PowerShell as Administrator:

# Step 1: Install WSL2 + Ubuntu
wsl --install
# Reboot when prompted

# Step 2: Set Ubuntu as default
wsl --set-default Ubuntu

# Step 3: Open Ubuntu terminal
# All remaining steps run INSIDE WSL

Now inside your Ubuntu (WSL) terminal:

# Install Ollama inside WSL
curl -fsSL https://ollama.com/install.sh | sh

# Start the Ollama server
ollama serve &

# Verify it's running
curl http://localhost:11434
# Should return: Ollama is running

Important: Keep all work on the Linux filesystem (~/projects/), not under /mnt/c/. This gives up to 20× faster I/O for Claude Code's file operations.

Linux — Direct Install

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Enable as a system service (auto-start)
sudo systemctl enable ollama
sudo systemctl start ollama

# Check status
systemctl status ollama

# Verify API endpoint
curl http://localhost:11434
# Should return: Ollama is running

Optional: expose to LAN for team access:

# Override service environment
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo tee /etc/systemd/system/ollama.service.d/override.conf <<EOF
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
EOF
sudo systemctl daemon-reload && sudo systemctl restart ollama

03

Pull & Run Google Gemma 4

One command to download, quantize, and start serving — runs identically on both platforms inside WSL/Linux terminal

1

Pull the model (choose your size)

# For 8–16 GB RAM machines (recommended for PMs)
ollama pull gemma4:e4b

# For 18+ GB RAM / dedicated GPU machines
ollama pull gemma4:26b

# Check download & list installed models
ollama list

First pull downloads 5–16 GB. Subsequent starts are instant — Ollama caches model weights locally under ~/.ollama/models/.

2

Test Gemma 4 in the terminal

ollama run gemma4:e4b
# You'll get an interactive chat prompt
# Type: "Write a Python Flask hello-world app"
# Type: /bye to exit

3

Verify the API endpoint works

# REST API test (OpenAI-compatible format)
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma4:e4b",
    "messages": [
      {"role": "user", "content": "Write a 3-step user story for a food delivery app"}
    ]
  }'

4

Keep the model warm (prevent slow cold starts)

# Set keep-alive to prevent model unloading
export OLLAMA_KEEP_ALIVE="-1"  # keeps model loaded forever

# Or add to ~/.bashrc / ~/.zshrc for persistence
echo 'export OLLAMA_KEEP_ALIVE="-1"' >> ~/.bashrc
source ~/.bashrc

Without this, Ollama unloads the model after ~5 minutes of idle. Your first request after that pause takes 10–30 seconds. For a prototyping session, keep it always loaded.

04

Install Claude Code CLI

Anthropic's terminal agent — reads, writes, and executes your entire codebase via natural language

Windows — Run Inside WSL Ubuntu

# Step 1: Install NVM (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.5/install.sh | bash
source ~/.bashrc

# Step 2: Install Node.js 20 LTS
nvm install 20
nvm use 20
node --version   # Must show v20.x.x
which node       # Must show Linux path, NOT /mnt/c/

# Step 3: Configure npm to avoid sudo
npm config set prefix '~/.npm-global'
echo 'export PATH=~/.npm-global/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

# Step 4: Install Claude Code
npm install -g @anthropic-ai/claude-code

# Step 5: Verify install
claude --version

Linux — Native Terminal

# Step 1: Install NVM
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.5/install.sh | bash
source ~/.bashrc

# Step 2: Install Node.js 20 LTS
nvm install 20 && nvm use 20
node --version

# Step 3: Configure npm prefix (no sudo)
npm config set prefix '~/.npm-global'
echo 'export PATH=~/.npm-global/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

# Step 4: Install ripgrep (improves code search)
sudo apt install ripgrep -y

# Step 5: Install Claude Code
npm install -g @anthropic-ai/claude-code

# Step 6: Authenticate
claude
# Browser opens → sign in with Anthropic account

Authentication options: Claude Code supports OAuth (browser sign-in with a Claude Pro/Max subscription) or API key via ANTHROPIC_API_KEY environment variable. For teams behind corporate proxies, API key auth is more reliable.

05

Wire Claude Code → Ollama → Gemma 4

Redirect Claude Code to use your local Gemma 4 model instead of Anthropic's cloud API

How the connection works

Claude Code CLI (your terminal, natural language commands) │ │ ANTHROPIC_BASE_URL override ↓ Ollama HTTP Server (localhost:11434 — Anthropic-compatible API) │ │ routes request to loaded model ↓ Google Gemma 4 (running in VRAM/RAM — your machine, private) │ └─ response streams back → Claude Code executes file edits / runs code

Ollama v0.14+ natively supports the Anthropic Messages API format — no proxy or translation layer needed. You redirect Claude Code with three environment variables.

Windows WSL — Persistent Setup

# Add to ~/.bashrc for permanence
cat >> ~/.bashrc <<'EOF'

# Ollama + Claude Code local backend
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_MODEL=gemma4:e4b

# Alias: switch between local and cloud
alias claude-local='ANTHROPIC_AUTH_TOKEN=ollama ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_MODEL=gemma4:e4b claude'
alias claude-cloud='unset ANTHROPIC_AUTH_TOKEN ANTHROPIC_BASE_URL ANTHROPIC_MODEL && claude'
EOF

source ~/.bashrc

Linux — Persistent Setup

# Same config — add to ~/.bashrc or ~/.zshrc
cat >> ~/.bashrc <<'EOF'

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_MODEL=gemma4:e4b

alias claude-local='ANTHROPIC_AUTH_TOKEN=ollama ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_MODEL=gemma4:e4b claude'
alias claude-cloud='unset ANTHROPIC_AUTH_TOKEN ANTHROPIC_BASE_URL ANTHROPIC_MODEL && claude'
EOF

source ~/.bashrc

Test the full stack

# Navigate to a project folder
mkdir ~/pm-prototypes && cd ~/pm-prototypes

# Launch Claude Code — it will use Gemma 4 via Ollama
claude

# Inside Claude Code, type your first PM prompt:
# "Create a landing page HTML for a SaaS invoicing tool.
#  Include a hero, 3 pricing tiers, and a CTA form."

Context window note: Claude Code works best with at least 32K–64K token context. Gemma 4 E4B supports 128K, so you're covered for most single-feature prototypes. For very large codebases (hundreds of files), be explicit in prompts about which files are relevant.

06

PM Prototyping Workflow

The repeatable loop for validating business ideas from hypothesis to working demo

1

Define the hypothesis (before touching code)

Ask Gemma 4 to help you structure your idea into a testable hypothesis. Open Claude Code and run:

claude
> "I want to validate this idea: [YOUR IDEA].
   Help me write a lean hypothesis:
   Problem, proposed solution, target user, success metric, 
   and the riskiest assumption. Format it as a YAML file."

Claude Code will create a hypothesis.yaml file in your project folder. This becomes your north star for the prototype's scope.

2

Generate the prototype scaffold

Tell Claude Code exactly what to build — be specific about the tech stack that's easiest to demo quickly:

# Example: no-database, HTML/JS prototype
claude
> "Based on hypothesis.yaml, build a single-page HTML prototype.
   No backend needed — use localStorage for data.
   Include: [key flows from hypothesis].
   Make it look polished enough to test with real users."

Claude Code reads your hypothesis file, writes all the HTML/CSS/JS, and confirms each file creation step.

3

Iterate in conversation

Claude Code keeps the full project in context. You can iterate naturally:

# In the same Claude Code session
> "The pricing page feels cluttered. Move the feature list
   to a toggle/accordion. Keep the CTA above the fold."

> "Add a fake 'loading' spinner when the user submits the
   sign-up form, then show a success state after 1.5s."

> "Generate 5 realistic dummy user profiles and
   pre-populate the dashboard with their data."

4

Generate user research artifacts

> "Based on this prototype, write:
   1. A 5-question usability test script
   2. A discussion guide for a 30-min discovery interview
   3. A RICE prioritization table for the next 3 features
   Save each as a separate markdown file."

5

Switch to cloud Claude for investor/stakeholder decks

When you need maximum quality for external deliverables, flip to the Anthropic cloud model:

# Switch to full Claude (cloud)
claude-cloud
> "Based on hypothesis.yaml and this prototype, write a 
   one-pager executive summary and a 5-slide pitch deck outline
   for a Series A investor meeting."

# When done, switch back to local (free, private)
claude-local

07

Prototype Templates for Common PM Use Cases

Copy-paste prompts to get working prototypes in minutes

🛒

SaaS Landing Page + Waitlist

Hero, features, pricing, email capture form. Tests value proposition messaging before building anything.

"Build a SaaS landing page for [product]. Include hero, 3 features, pricing table with Free/Pro/Enterprise, and an email waitlist form that stores signups in localStorage."

📊

Analytics Dashboard Mockup

Fake-data dashboard to validate the most important metrics before building real data pipelines.

"Create an analytics dashboard HTML page with Chart.js. Show KPIs: [your metrics]. Populate with realistic dummy data. Add a date range picker that updates the charts."

🔁

User Onboarding Flow

Multi-step wizard to test onboarding copy, step ordering, and drop-off risk.

"Build a 5-step onboarding wizard for [product]. Each step should validate its fields before proceeding. Add a progress bar. Final step shows a personalized welcome summary."

💬

AI-Powered Feature Mockup

Prototype an AI chat/assistant feature that calls your local Gemma 4 API directly.

"Build a chat UI that sends user messages to http://localhost:11434/v1/chat/completions with model gemma4:e4b. System prompt: [your product's AI persona]. Style it like Intercom."

🗺️

Feature Prioritization Tool

Drag-and-drop RICE/MoSCoW board to facilitate team prioritization workshops.

"Build an interactive RICE scoring tool. Users can add features, score Reach/Impact/Confidence/Effort, and see an auto-sorted priority list. Export to CSV button."

🧾

B2B Quote / Proposal Generator

Validates pricing model and quote structure with sales teams before building CPQ.

"Build a quote generator form. User picks product modules, quantities, and contract length. Auto-calculates tiered pricing and generates a printable PDF-style HTML quote."

Setting up a Python backend prototype (when localStorage isn't enough)

Windows WSL

# Inside WSL terminal
cd ~/pm-prototypes
python3 -m venv venv && source venv/bin/activate
pip install flask requests

# Ask Claude Code to scaffold the Flask app
claude
> "Create a Flask API with SQLite for [feature].
   Include endpoints for [user actions].
   Add a simple HTML frontend that calls these endpoints."

Linux

# Same commands — Linux terminal
cd ~/pm-prototypes
python3 -m venv venv && source venv/bin/activate
pip install flask requests

# Ask Claude Code to scaffold the Flask app
claude
> "Create a minimal Flask REST API with SQLite.
   Implement CRUD for [resource].
   Add CORS headers for the frontend to call it."

08

Validate, Measure & Iterate

Closing the loop from prototype to business decision

Use Gemma 4 to analyze user feedback directly

# Create a feedback analysis script
claude
> "I have a CSV of 47 user interview responses in feedback.csv.
   Write a Python script that:
   1. Reads the CSV
   2. Sends each row to Ollama (gemma4:e4b) for sentiment + theme tagging
   3. Aggregates the top 5 themes and outputs a summary markdown report"

Recommended workflow for each validation cycle

W1

Week 1 — Build the smoke test

Use Claude Code + Gemma 4 to build a static HTML prototype (no backend). Share via GitHub Pages or a local share tool like npx serve .. Get 5 users to click through it. Your goal: does the value proposition land?

W2

Week 2 — Add interactivity based on feedback

In the same Claude Code session, describe the feedback you got and ask it to modify the prototype. Add realistic data, smooth rough edges, test the highest-risk assumption from your hypothesis.

W3

Week 3 — Build the riskiest feature for real

If you've validated enough signal, use Claude Code to build a minimal Python/Node backend for the one feature that makes or breaks the idea. Keep everything else as prototype UI.

W4

Week 4 — Go/no-go decision artifact

claude-cloud  # Switch to full Claude for best quality
> "Given hypothesis.yaml, the prototype code, and these user
   interview notes [paste notes], write a one-page go/no-go
   recommendation document. Include: evidence summary, key risks,
   proposed next steps if GO, pivot options if NO-GO."

When to switch from local Gemma 4 to cloud Claude

Task	Use local Gemma 4	Use cloud Claude
Generating prototype code	Local ✓
Iterating on UI/UX	Local ✓
Analyzing user feedback CSVs	Local ✓
Writing user stories & PRDs	Local ✓
Investor / exec pitch decks		Cloud ✓
Complex architectural decisions		Cloud ✓
Processing sensitive user PII	Local only
Multi-file codebase refactors		Cloud preferred

The core PM advantage: When AI assistance is free and private, you stop rationing it. You ask for throwaway prototypes you'd never pay API costs for. You run 10 variations instead of 2. You use it for internal artifacts — research summaries, meeting notes, draft stakeholder emails — without worrying about cost or data leaving your machine. That behavioral shift is where the real productivity gain lives.

Build & Validate Ideas withLocal AI — Zero Cloud Bills

Gemma 4 model variants & RAM requirements

How the connection works

Test the full stack

Setting up a Python backend prototype (when localStorage isn't enough)

Use Gemma 4 to analyze user feedback directly

Recommended workflow for each validation cycle

When to switch from local Gemma 4 to cloud Claude

Build & Validate Ideas with
Local AI — Zero Cloud Bills