Google shipped Gemini 3.1 Pro on February 19th, and the benchmark numbers grabbed me before the marketing copy did. ARC-AGI-2 score: 77.1%, more than doubling Gemini 3 Pro’s 31.1%. So that is not an incremental improvement. That’s a fundamentally different model.

Dassi supports it today. If you have a Google AI Studio API key, you can plug it in and start running browser tasks with it right now.

What actually changed under the hood

The standout feature is a three-tier thinking system. Previous Gemini models gave you a binary toggle between low and high compute modes, which always felt like choosing between a quick guess and an expensive meditation with nothing practical in between and no middle ground for the kind of moderately complex tasks that fill most of an actual workday. Gemini 3.1 Pro adds a medium tier that lands right where most real work happens, complex enough to need reasoning but not so demanding that you want the model burning 30,000 thinking tokens on a single request.

Google’s thinking_level parameter controls this. Low for autocomplete-style speed. Medium for code review and moderately complex analysis. High for deep debugging where you want the model to slow down. And if you don’t specify, it defaults to high, which is generous if you’re working on something hard and wasteful if you’re not.

Context window stays at 1M tokens. But the output ceiling jumped to 65,536 tokens, which matters more than it sounds because Gemini 3 Pro had an infuriating habit of truncating code generation around 21,000 tokens. You’d get three-quarters of a response and then silence. So that ceiling is gone.

The benchmarks

Some numbers, because they’re genuinely worth looking at. GPQA Diamond for scientific knowledge: 94.3%. SWE-Bench Verified for agentic coding: 80.6%, putting it a hair below Claude Opus 4.6’s 80.8%. MMMLU for multimodal understanding: 92.6%.

But speed tells a stranger story. Artificial Analysis clocked output at 104.7 tokens per second, solidly above the 71.6 median for comparable models. But time to first token sits at nearly 34 seconds — the median for similar models is about 1.2. So you stare at a blank screen, then text floods in all at once. For browser automation tasks that require quick back-and-forth interactions like form filling, clicking through navigation sequences, or extracting specific data points from a page, that 34-second wait before the model even starts responding is a real obstacle. For research-heavy analysis where you’re feeding in large pages and expecting synthesis, probably fine.

Connecting it to dassi

If you’ve used dassi with Claude or GPT before, setup is the same. Grab a key from Google AI Studio, paste it into dassi’s settings, select Gemini 3.1 Pro. And your key stays in your browser. And nothing routes through our servers.

This is why BYOK matters. When a new model drops, you don’t wait for anyone to “add support” or negotiate a partnership deal. You switch. Or switch back if it doesn’t fit your tasks. We covered why model flexibility matters for browser agents a few weeks back, and every major release keeps reinforcing it.

Cost math

Pricing matches Gemini 3 Pro exactly. $2 per million input tokens under 200K context, $12 per million output tokens. So it is effectively a free performance upgrade.

But the thinking tiers create a trap for the inattentive. A complex request on high mode can burn through 30,000+ thinking tokens at the output rate, costing around $0.36. That same request on low uses about 1,000 thinking tokens for $0.012. Thirty times cheaper. For browser sessions where you’re sending dozens of requests, that gap compounds damn fast.

Still in preview

Google hasn’t moved this to general availability. So keep that in mind.

Worth trying anyway

Best way to find out if Gemini 3.1 Pro fits your workflow is to just try it. Swap the model, run your usual tasks, compare. Because nobody can tell you which LLM works best for the specific stuff you do in your browser better than a few minutes of actually testing it yourself.