
Google just launched Gemini 3 Pro on November 18, 2025, and it's not just another model update.
This is Google's smartest AI yet, built on state-of-the-art reasoning that handles agentic workflows, autonomous coding, and complex multimodal tasks better than anything they've released before.
If you're a developer building with AI, Gemini 3 changes what's possible.
The catch: it works differently than previous models, and using it the same way you used Gemini 2.5 means missing most of its capabilities.
This guide shows you exactly how to integrate Gemini 3 into your applications, optimize for cost and performance, and avoid the common mistakes that break reasoning quality.
Let's start with what makes Gemini 3 actually different.
ALSO READ: Introducing ChatGPT 5.1: Here's Everything You Need to Know

Gemini 3 Pro is the first model in Google's new Gemini 3 family, and it represents a genuine leap forward in AI capability.
Better reasoning depth
Gemini 3 scored 91.9% on GPQA Diamond (PhD-level science questions) and 95% on AIME 2025 (high school math competition). It handles complex, multi-step reasoning significantly better than Gemini 2.5.
Stronger multimodal understanding
The model achieved 81% on MMMU-Pro (visual understanding) and 87.6% on Video-MMMU. It processes text, images, and video with much higher accuracy.
Autonomous coding capability
On Terminal-Bench 2.0 (which tests a model's ability to use tools and operate a computer via terminal), Gemini 3 Pro scored 54.2%. That's production-ready agentic coding performance.

Context limits:
Pricing (per 1 million tokens):
The pricing jumps when you cross 200k input tokens, so staying under that threshold saves money.
Google AI Studio: Free tier with rate limits, perfect for prototyping
Vertex AI: Enterprise deployment with full controls and monitoring
Gemini CLI: Terminal-based development with Gemini 3 integration
Google Antigravity: New agentic development platform for autonomous coding
Third-party platforms: Cursor, GitHub, JetBrains, Replit, and more
The model is available now as gemini-3-pro-preview, with more Gemini 3 variants coming soon.
Setting up Gemini 3 takes less than 15 minutes. Here's exactly what to do.
Step 1: Create your Google AI Studio account
Visit aistudio.google.com and sign in with your Google account. You'll land in the AI Studio interface where you can test models and generate API keys.
Step 2: Generate your API key

Click the "Get API Key" button in the top right. Google generates a key immediately. Copy it and store it somewhere safe.
Never paste your API key directly in code or commit it to GitHub. Use environment variables instead.
Step 3: Install the SDK
Choose your language:
Python (requires Python 3.9+):
pip install google-generativeai
JavaScript/Node:
npm install @google/generative-ai
REST API: No installation needed, just use curl or your HTTP client of choice.
Step 4: Make your first API call
Python:
from google import genai
client = genai.Client(api_key="YOUR_API_KEY")
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="Explain quantum entanglement in simple terms"
)
print(response.text)
JavaScript:
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: "YOUR_API_KEY" });
async function run() {
const response = await ai.models.generateContent({
model: "gemini-3-pro-preview",
contents: "Explain quantum entanglement in simple terms"
});
console.log(response.text);
}
run();
REST:
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent" \
-H "x-goog-api-key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{
"parts": [{"text": "Explain quantum entanglement in simple terms"}]
}]
}'
If you get a response, you're set up correctly.

What's included in the free tier:
Google AI Studio gives you free access to Gemini 3 Pro with rate limits. You can prototype, test, and build small applications without paying anything.
Rate limits (approximate):
These limits are enough for development and testing, but not for production traffic.
When to upgrade to paid access:
You need paid access when:
Cost optimization tip:
If you're staying on the free tier, create multiple Google accounts (each gets its own key) and rotate between them. Common practice among developers staying within free limits.
Gemini 3 introduces a new way to control how much reasoning the model uses.
The thinking_level parameter controls the maximum depth of the model's internal reasoning process before responding.
Think of it like this: Gemini 3 can think longer about hard problems and respond quickly to easy ones. The thinking level sets the upper limit on how much time it takes to reason.
Important: Gemini 3 uses dynamic thinking by default. Even on high thinking level, it won't waste time on simple questions. It only uses deep reasoning when the prompt needs it.
low - Minimize latency and cost
Use when:
Example: Customer support chatbot answering FAQs
high - Maximize reasoning depth (default)
Use when:
Example: Debugging race conditions in multi-threaded code
medium - Coming soon
Will offer a balance between the two. Not available at launch.
Default behavior:
If you don't specify thinking_level, Gemini 3 Pro defaults to high. For most applications, this is what you want.
Python (setting to low):
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="What's the capital of France?",
config={"thinking_level": "low"}
)
print(response.text)
Python (default high):
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="Find the race condition in this multi-threaded C++ snippet: [code here]"
)
# Uses high thinking level by default
JavaScript:
const response = await ai.models.generateContent({
model: "gemini-3-pro-preview",
contents: "What's the capital of France?",
config: { thinkingLevel: "low" }
});
REST:
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{"parts": [{"text": "What is the capital of France?"}]}],
"generationConfig": {"thinkingLevel": "low"}
}'
Don't use both thinking_level and thinking_budget
The old thinking_budget parameter is still supported for backward compatibility, but using both in the same request returns a 400 error.
If you have old code using thinking_budget, migrate to thinking_level for better performance.
Don't set low for everything
It's tempting to use low thinking for speed, but complex tasks need deep reasoning. If your results are worse after switching to low, try high instead.
Gemini 3 gives you granular control over how it processes visual content through the media_resolution parameter.

Higher resolution means:
Lower resolution means:
The key is matching resolution to your use case.
Images: Use media_resolution_high (1120 tokens)
Recommended for most image analysis tasks. This ensures maximum quality for reading text, identifying small objects, and understanding complex visuals.
PDFs: Use media_resolution_medium (560 tokens)
Optimal for document understanding. Quality saturates at medium for standard documents. Going to high rarely improves OCR results and just costs more.
Video (general): Use media_resolution_low (70 tokens per frame)
Sufficient for action recognition, scene description, and general video understanding.
Note: For video, both low and medium are treated identically at 70 tokens per frame to optimize context usage.
Video (text-heavy): Use media_resolution_high (280 tokens per frame)
Required only when reading dense text within video frames or identifying very small details.
Setting resolution per image (Python):
from google import genai
from google.genai import types
import base64
# Media resolution is in v1alpha API
client = genai.Client(http_options={'api_version': 'v1alpha'})
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents=[
types.Content(
parts=[
types.Part(text="What is in this image?"),
types.Part(
inline_data=types.Blob(
mime_type="image/jpeg",
data=base64.b64decode("..."),
),
media_resolution={"level": "media_resolution_high"}
)
]
)
]
)
print(response.text)
JavaScript:
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiVersion: "v1alpha" });
async function run() {
const response = await ai.models.generateContent({
model: "gemini-3-pro-preview",
contents: [
{
parts: [
{ text: "What is in this image?" },
{
inlineData: {
mimeType: "image/jpeg",
data: "...",
},
mediaResolution: {
level: "media_resolution_high"
}
}
]
}
]
});
console.log(response.text);
}
run();
REST:
curl "https://generativelanguage.googleapis.com/v1alpha/models/gemini-3-pro-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{
"parts": [
{ "text": "What is in this image?" },
{
"inlineData": {
"mimeType": "image/jpeg",
"data": "..."
},
"mediaResolution": {
"level": "media_resolution_high"
}
}
]
}]
}'
Global resolution configuration:
You can also set media resolution globally in generation_config instead of per-image. This applies the same setting to all media in the request.
Cost optimization:
If token usage is exceeding your context window after migrating from Gemini 2.5, explicitly reduce media resolution:
medium instead of highlow unless OCR is criticalhigh when detail truly mattersPerformance optimization:
Lower resolution processes faster. For real-time applications where speed matters, start with low and only increase if results aren't good enough.
For Gemini 3, Google strongly recommends keeping temperature at the default value of 1.0.
Previous models often benefited from tuning temperature to control creativity versus determinism. Gemini 3's reasoning capabilities are optimized specifically for temperature 1.0.
Setting temperature below 1.0 can cause:
This is especially true for reasoning models. The internal reasoning process works best at the default setting.
If you're doing something very specific that absolutely requires deterministic outputs and you've tested thoroughly, you might adjust temperature. But start at 1.0 and only change it if you have a compelling reason.
Thought signatures are one of the most important new concepts in Gemini 3.
Thought signatures are encrypted representations of the model's internal reasoning process. They let Gemini 3 maintain its reasoning chain across multiple API calls.
Think of them as breadcrumbs the model leaves to remember how it arrived at conclusions. When you send a thought signature back to the model, it can continue from where it left off instead of starting fresh.
Function calling: strict validation enforced
When using function calling, thought signatures are required. Missing them returns a 400 error.
The model needs the signature to process tool outputs correctly. Without it, the reasoning chain breaks.
Text/chat: recommended but not required
For standard text generation, signatures aren't strictly validated. But including them significantly improves reasoning quality across multi-turn conversations.
What happens if you omit them:
For function calling: Your request fails with a 400 error.
For text/chat: The model loses context and reasoning quality degrades. Follow-up questions won't have access to the full reasoning chain.
Good news: If you use official SDKs (Python, Node, Java) with standard chat history, thought signatures are handled automatically. You don't need to manually manage them.
When you need to handle them manually:
Rules for handling signatures:
Single function call: The functionCall part contains one signature. Return it in your next request.
Parallel function calls: Only the first functionCall in the list has a signature. Return parts in exact order received.
Multi-step sequential calls: Each function call has its own signature. Return all accumulated signatures in the conversation history.
Multi-step function calling (sequential):
The user asks something requiring two separate steps (like checking a flight, then booking a taxi based on the result).
// Step 1: Model calls flight tool and returns Signature A
{
"role": "model",
"parts": [
{
"functionCall": { "name": "check_flight", "args": {...} },
"thoughtSignature": "<Sig_A>" // Save this
}
]
}
// Step 2: You send flight result, must include Sig_A
[
{ "role": "user", "parts": [{ "text": "Check flight AA100..." }] },
{
"role": "model",
"parts": [
{
"functionCall": { "name": "check_flight", "args": {...} },
"thoughtSignature": "<Sig_A>" // Required
}
]
},
{ "role": "user", "parts": [{ "functionResponse": { "name": "check_flight", "response": {...} } }] }
]
// Step 3: Model calls taxi tool, returns Signature B
{
"role": "model",
"parts": [
{
"functionCall": { "name": "book_taxi", "args": {...} },
"thoughtSignature": "<Sig_B>" // Save this too
}
]
}
// Step 4: You send taxi result, must include both Sig_A and Sig_B
[
// ... previous history including Sig_A ...
{
"role": "model",
"parts": [
{ "functionCall": { "name": "book_taxi", ... }, "thoughtSignature": "<Sig_B>" }
]
},
{ "role": "user", "parts": [{ "functionResponse": {...} }] }
]
Parallel function calls:
User asks: "Check the weather in Paris and London."
// Model response with parallel calls
{
"role": "model",
"parts": [
// First call has the signature
{
"functionCall": { "name": "check_weather", "args": { "city": "Paris" } },
"thoughtSignature": "<Signature_A>"
},
// Subsequent parallel calls don't have signatures
{
"functionCall": { "name": "check_weather", "args": { "city": "London" } }
}
]
}
// Your response with results
{
"role": "user",
"parts": [
{
"functionResponse": { "name": "check_weather", "response": { "temp": "15C" } }
},
{
"functionResponse": { "name": "check_weather", "response": { "temp": "12C" } }
}
]
}
Text/chat with signatures:
// User asks a reasoning question
[
{
"role": "user",
"parts": [{ "text": "What are the risks of this investment?" }]
},
{
"role": "model",
"parts": [
{
"text": "Let me analyze the risks step by step. First, volatility...",
"thoughtSignature": "<Signature_C>" // Recommended to include in follow-ups
}
]
},
{
"role": "user",
"parts": [{ "text": "Summarize that in one sentence." }]
}
]
If you're transferring a conversation from Gemini 2.5 or injecting a custom function call not generated by Gemini 3, you won't have a valid signature.
To bypass strict validation in these cases, use this dummy string:
"thoughtSignature": "context_engineering_is_the_way_to_go"
This lets you migrate existing conversations without breaking function calling validation.
Gemini 3 supports powerful tool integration for building agentic workflows.
Google Search grounding:
Lets the model search the web for current information.
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="What are the latest developments in quantum computing?",
config={"tools": [{"google_search": {}}]}
)
File Search:
Search through uploaded files for relevant information.
Code Execution:
The model can write and run Python code to solve problems.
URL Context:
Fetch and process content from specific URLs.
What's NOT supported:
Google Maps and Computer Use are currently not supported in Gemini 3.
You can define your own tools for Gemini 3 to use.
Example: Weather tool
weather_tool = {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. 'London' or 'New York'"
}
},
"required": ["location"]
}
}
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="What's the weather like in Tokyo?",
config={"tools": [weather_tool]}
)
# Model returns a function call
# You execute the function and return results
# Model continues with the result
Handling function responses:
When the model calls a function, you:
functionResponseGemini 3 lets you combine built-in tools with structured JSON output.
This is powerful for agentic workflows: fetch data with search or URL context, then extract it in a specific format for downstream tasks.
Example: Extract match results from search
Python:
from google import genai
from google.genai import types
from pydantic import BaseModel, Field
from typing import List
class MatchResult(BaseModel):
winner: str = Field(description="The name of the winner.")
final_match_score: str = Field(description="The final match score.")
scorers: List[str] = Field(description="The name of the scorer.")
client = genai.Client()
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="Search for all details for the latest Euro.",
config={
"tools": [
{"google_search": {}},
{"url_context": {}}
],
"response_mime_type": "application/json",
"response_json_schema": MatchResult.model_json_schema(),
},
)
result = MatchResult.model_validate_json(response.text)
print(result)
JavaScript:
import { GoogleGenAI } from "@google/genai";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const ai = new GoogleGenAI({});
const matchSchema = z.object({
winner: z.string().describe("The name of the winner."),
final_match_score: z.string().describe("The final score."),
scorers: z.array(z.string()).describe("The name of the scorer.")
});
async function run() {
const response = await ai.models.generateContent({
model: "gemini-3-pro-preview",
contents: "Search for all details for the latest Euro.",
config: {
tools: [
{ googleSearch: {} },
{ urlContext: {} }
],
responseMimeType: "application/json",
responseJsonSchema: zodToJsonSchema(matchSchema),
},
});
const match = matchSchema.parse(JSON.parse(response.text));
console.log(match);
}
run();
This combines:
All in one API call.
Gemini 3 introduces new tools specifically for agentic coding workflows.
The bash tool lets Gemini 3 propose shell commands as part of agentic workflows.
What it enables:
Security consideration:
Only use the bash tool in controlled environments. Never give it unrestricted access to production systems. Implement command allowlists if exposing to untrusted inputs.
Use cases:
The model proposes commands, you execute them (with proper sandboxing), and return the results.
Google also offers a hosted server-side bash tool for secure prototyping.
What it's for:
Availability:
Currently in early access with partners. General availability coming soon.
The real power comes from combining multiple tools in one workflow.
Example: Research and extract
# Combine search + URL context + structured output
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="Find the top 3 AI models released this month and extract their key specs",
config={
"tools": [
{"google_search": {}},
{"url_context": {}}
],
"response_mime_type": "application/json",
"response_json_schema": {
"type": "object",
"properties": {
"models": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"release_date": {"type": "string"},
"context_window": {"type": "string"},
"key_features": {"type": "array", "items": {"type": "string"}}
}
}
}
}
}
}
)
This workflow:
All automated in a single API call.
Reasoning models require different prompting strategies than instruction-following models.
Be precise, not verbose
Gemini 3 responds best to direct, clear instructions. It doesn't need elaborate prompting techniques designed for older models.
Bad prompt:"I would like you to carefully consider the following code snippet. Please take your time to thoroughly analyze each line, paying special attention to potential issues. Think step by step about what might go wrong..."
Good prompt:"Find the bug in this code: [code]"
Gemini 3's reasoning happens internally. You don't need to prompt it to "think step by step" — it does that automatically when needed.
Default behavior:
Gemini 3 is less verbose than Gemini 2.5. It prefers direct, efficient answers.
If your use case requires a conversational or detailed persona, you must explicitly steer the model.
Example:
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="Explain machine learning to a beginner. Be friendly, conversational, and include examples."
)
Without the explicit instruction to be conversational, Gemini 3 gives a concise, technical answer. With it, you get the tone you want.
When working with large datasets (entire books, codebases, long videos), place your instructions at the end of the prompt, after the data.
Structure:
[Large context data here - could be a full codebase, document, or video]
Based on the information above, [your specific question or instruction]
This anchors the model's reasoning to the provided data instead of relying on its training knowledge.
Example:
prompt = f"""
{entire_codebase}
Based on the code above, identify all instances where database queries don't use parameterized statements and could be vulnerable to SQL injection.
"""
The phrase "Based on the information/code/data above" signals to the model that it should reason exclusively from the provided context.
If you're using Gemini 2.5, here's what you need to update.
1. Thinking levels replace thinking budget
Old way (Gemini 2.5):
config={"thinking_budget": 10000}
New way (Gemini 3):
config={"thinking_level": "high"}
The old thinking_budget still works for backward compatibility, but thinking_level gives more predictable performance.
2. Temperature should stay at 1.0
If your existing code explicitly sets temperature (especially to low values for deterministic outputs), remove this parameter.
Gemini 3's reasoning is optimized for temperature 1.0. Lower values can cause looping or performance degradation.
3. Media resolution defaults changed
Default OCR resolution for PDFs is now higher. This improves accuracy but increases token usage.
If your requests now exceed the context window, explicitly reduce media resolution to medium for PDFs.
4. Token consumption differences
Migrating to Gemini 3 Pro defaults may:
Monitor your token usage after migration and adjust media resolution if needed.
Before migrating:
thinking_budget → Update to thinking_levelDuring migration:
After migration:
Issue: Exceeding context window with new defaults
Solution: Explicitly set media_resolution: "medium" for PDFs or media_resolution: "low" for video.
Issue: Image segmentation doesn't work
Gemini 3 Pro doesn't support pixel-level mask segmentation. For workloads requiring this, continue using Gemini 2.5 Flash with thinking turned off.
Issue: Results feel less verbose than Gemini 2.5
Solution: Add explicit instructions for desired verbosity. Gemini 3 defaults to concise outputs.
Use Gemini 2.5 if you need:
For most use cases, Gemini 3's reasoning improvements outweigh these limitations.
If you're using OpenAI-compatible endpoints, parameter mapping happens automatically.
OpenAI → Gemini:
reasoning_effort: "low" → thinking_level: "low"reasoning_effort: "medium" → thinking_level: "high"reasoning_effort: "high" → thinking_level: "high"
Note that OpenAI's medium maps to Gemini's high because Gemini only has two levels at launch.
Temperature handling:
OpenAI defaults vary by endpoint. Gemini defaults to 1.0. If your OpenAI code relies on specific temperature behavior, test thoroughly.
Token counting:
Gemini's tokenization differs slightly from OpenAI's. Budget for ~10-15% variance in token counts.
Gemini 3 can be expensive if you're not careful. Here's how to control costs.
Tier 1 (under 200k input tokens):
Tier 2 (over 200k input tokens):
Staying under 200k input tokens saves 50% on input costs and 33% on output costs.
1. Optimize media resolution
Default settings may use more tokens than you need:
For PDFs:
# Default might use high, but medium is often enough
config={"media_resolution": "medium"}
For video:
# Use low unless reading text in frames
config={"media_resolution": "low"}
For images:
Only use high when detail truly matters. Try medium first.
2. Use context caching
Context caching costs 90% less than regular tokens. For repeated prompts with the same context, caching saves money.
Minimum tokens for caching: 2,048
Example:
# First request: pays full price for context
# Subsequent requests within cache window: 90% discount on cached portion
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents=prompt,
config={"cached_content": "your-cache-id"}
)
3. Batch API for high-volume requests
The Batch API processes requests asynchronously at lower cost. Good for:
Trade latency for significant cost savings.
4. Smart chunking strategies
Instead of sending entire documents:
Example:
Instead of sending a 500-page document, search for relevant sections and only send those 20 pages.
Set up usage alerts in Google Cloud:
Configure billing alerts to notify you when spending crosses thresholds.
Track per-request costs:
Log token usage for each request:
print(f"Input tokens: {response.usage_metadata.prompt_token_count}")
print(f"Output tokens: {response.usage_metadata.candidates_token_count}")
Calculate cost:
input_cost = (prompt_tokens / 1_000_000) * 2 # or 4 if >200k
output_cost = (output_tokens / 1_000_000) * 12 # or 18 if >200k
total_cost = input_cost + output_cost
Identify expensive patterns:
Review logs weekly to find:
Here's how developers are using Gemini 3 in production.
Use case: AI pair programmer that handles full features end-to-end
Why Gemini 3:
Implementation:
# Agent gets a feature request
prompt = """
Add user authentication to this Express.js API:
[codebase context]
Requirements:
- JWT-based auth
- Login and signup endpoints
- Middleware for protected routes
- Password hashing with bcrypt
"""
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents=prompt,
config={
"thinking_level": "high",
"tools": [{"bash": {}}] # Lets it run commands
}
)
# Agent proposes code changes, runs tests, fixes issues
# All autonomously through multiple tool calls
Result: Complete feature implementation with tests, done in minutes instead of hours.
Use case: Analyze product images and generate descriptions for e-commerce
Why Gemini 3:
Implementation:
from pydantic import BaseModel
from typing import List
class ProductInfo(BaseModel):
name: str
category: str
colors: List[str]
key_features: List[str]
suggested_description: str
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents=[
"Analyze this product image and generate details",
image_data
],
config={
"media_resolution": "high",
"response_mime_type": "application/json",
"response_json_schema": ProductInfo.model_json_schema()
}
)
product = ProductInfo.model_validate_json(response.text)
Result: Accurate product details extracted from images, ready for database insertion.
Use case: Research assistant that finds, reads, and synthesizes information
Why Gemini 3:
Implementation:
class ResearchReport(BaseModel):
summary: str
key_findings: List[str]
sources: List[str]
confidence_level: str
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="Research the latest developments in solid-state batteries and summarize key breakthroughs",
config={
"tools": [
{"google_search": {}},
{"url_context": {}}
],
"thinking_level": "high",
"response_mime_type": "application/json",
"response_json_schema": ResearchReport.model_json_schema()
}
)
report = ResearchReport.model_validate_json(response.text)
Result: Comprehensive research report with sources, generated in one API call.
Here are the issues developers hit most often with Gemini 3.
Possible causes:
Diagnosis:
Check token count:
print(f"Tokens used: {response.usage_metadata.prompt_token_count}")
Check if it's approaching 1M.
Quick fixes:
For large documents:
# Reduce PDF resolution
config={"media_resolution": "medium"}
For video:
# Use low resolution unless OCR needed
config={"media_resolution": "low"}
For simple tasks:
# Don't use high thinking for easy questions
config={"thinking_level": "low"}
Error: 400 bad request with message about missing thought signatures
Cause: You're using function calling but not returning thought signatures.
Fix:
If using official SDKs, make sure you're using standard conversation history format. Signatures are handled automatically.
If using REST directly:
// When model returns function call with signature
{
"role": "model",
"parts": [{
"functionCall": {...},
"thoughtSignature": "xyz123" // Save this
}]
}
// You must return it:
{
"role": "model",
"parts": [{
"functionCall": {...},
"thoughtSignature": "xyz123" // Include it
}]
}
Diagnosis steps:
Solutions:
Reduce media resolution:
# Instead of high (1120 tokens per image)
config={"media_resolution": "medium"} # 560 tokens
Sample video frames:
# Instead of every frame, sample every 5th frame
# Most video understanding doesn't need every frame
Trim prompts:
# Send only relevant code sections, not entire repo
# Extract key document pages, not full PDF
Diagnosis:
Solutions:
Increase thinking level:
# If using low, try high for complex tasks
config={"thinking_level": "high"}
Verify temperature:
# Remove any custom temperature settings
# Let it default to 1.0
Improve prompt:
# Be more specific about what you want
# Add examples if helpful
# State constraints clearly
Consider model choice:
For some tasks, Gemini 2.5 Flash might be better (faster, cheaper, good enough). Gemini 3 Pro is overkill for simple tasks.
For production deployments, these features matter.
Context caching lets you reuse expensive context across multiple requests at 90% lower cost.
When caching helps:
Minimum token requirement: 2,048 tokens
Implementation:
# First request: establish cache
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents=prompt_with_large_context,
config={"cache_content": True}
)
cache_id = response.cache_id
# Subsequent requests: use cache
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents=new_question,
config={"cached_content": cache_id}
)
Cost savings calculation:
Without caching:
With caching:
The Batch API processes requests asynchronously for lower cost and higher throughput.
When to use batch:
Trade-offs:
Implementation:
Check the official Batch API documentation for current setup. The API is supported but pricing and specific implementation details are still being finalized.
For building chatbots or assistants, proper conversation management is critical.
Best practices:
1. Maintain full conversation history
conversation = []
# Add user message
conversation.append({
"role": "user",
"parts": [{"text": user_input}]
})
# Get model response
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents=conversation
)
# Add model response to history
conversation.append({
"role": "model",
"parts": response.candidates[0].content.parts
})
2. Handle thought signatures automatically
If using official SDKs with this pattern, signatures are included automatically in response.candidates[0].content.parts.
3. Manage context window
Conversations can grow beyond 1M tokens. When approaching the limit:
def trim_conversation(conversation, max_tokens=900000):
# Keep system prompt (first message) and recent messages
# Summarize or drop middle messages if needed
pass
How does Gemini 3 stack up against other frontier models?
Gemini 3 wins on:
GPT-5.1 wins on:
Pricing:
Roughly comparable for most use cases. Gemini 3 is slightly cheaper under 200k tokens.
When to choose Gemini 3:
When to choose GPT-5.1:
Gemini 3 wins on:
Claude Sonnet 4.5 wins on:
When to choose Gemini 3:
When to choose Claude Sonnet 4.5:
Gemini 3 improvements:
Gemini 2.5 advantages:
Migration recommendation:
Migrate to Gemini 3 for new projects. For existing projects, migrate gradually and test thoroughly.
Here's what you need to build with Gemini 3 effectively.
Python SDK:
pip install google-generativeai
Documentation: ai.google.dev/gemini-api/docs
JavaScript/Node SDK:
npm install @google/generative-ai
REST API:
Direct HTTP access for any language. Full reference at Google AI docs.
Community libraries:
Google AI Studio:
Free prototyping environment. Test prompts, compare models, generate API keys. Essential for development.
Apidog:
API testing tool with Gemini support. Handles multimodal inputs, inspects responses, debugs endpoints efficiently.
Debugging techniques:
Log token usage:
print(f"Input: {response.usage_metadata.prompt_token_count}")
print(f"Output: {response.usage_metadata.candidates_token_count}")
Inspect thought process:
In AI Studio, enable "Show reasoning" to see how Gemini 3 thinks through problems.
Test with different thinking levels:
Try the same prompt with low and high to understand the trade-offs.
Official documentation:
Gemini 3 Cookbook:
Interactive Colab notebooks with working examples:
Community forums:
Gemini 3 is just the beginning of this model family.
What it is:
Enhanced reasoning mode with even deeper thought chains. Achieves 93.8% on GPQA Diamond and 45.1% on ARC-AGI.
Status:
Currently undergoing additional safety evaluations. Expected release: within weeks of launch.
Who gets access first:
Google AI Ultra subscribers.
Use cases:
Expected releases:
Gemini 3 Flash:
Optimized for speed and cost. Lower reasoning depth but much faster responses. Good for:
Gemini 3 Ultra:
Potentially even more capable than Pro, though not yet announced.
Official channels:
Developer updates:
Subscribe to the Gemini API mailing list in Google AI Studio for release announcements, new features, and breaking changes.
Community:
Join the Google AI Developer Forum for discussions, tips, and early feature testing opportunities.
Gemini 3 represents a significant leap in what's possible with AI.
The combination of state-of-the-art reasoning, multimodal understanding, and built-in agentic capabilities makes it Google's most capable model yet. But capability means nothing if you don't use it correctly.
The key takeaways:
low for simple tasks, high for complex reasoningPick one current project using an older model. Set up a Gemini 3 API key, make one API call with your actual use case, and compare the results.
You'll immediately see where Gemini 3 excels and where you might need to adjust your approach. That hands-on experience is worth more than reading any guide.
Start building with Gemini 3 today and see what you can create with Google's most intelligent model.

