Google has officially introduced Gemini 3.1 Pro, the inaugural update in its Gemini 3 series, signaling a strategic focus on the burgeoning 'agentic' artificial intelligence sector. This release targets critical areas such as reasoning stability, software development efficiency, and the dependable application of tools within AI systems.
This iteration signifies a notable shift in AI model design, moving beyond conversational interfaces to solutions capable of performing tangible work. Gemini 3.1 Pro is positioned as a foundational engine for autonomous agents designed to navigate file systems, execute code, and tackle intricate scientific challenges. Its reported success rates now rival, and in some instances surpass, those of other leading frontier models in the industry.
Vast Context, Precise Output
A key technical enhancement is its ability to manage large-scale data. Gemini 3.1 Pro Preview retains an impressive 1 million token input context window. For software engineers, this capacity means the model can process an entire medium-sized code repository, retaining sufficient 'memory' to comprehend cross-file dependencies without losing coherence.
Equally significant is the expanded 65,000 token output limit. This substantial increase empowers developers creating extensive generated content, such as comprehensive technical manuals or multi-module Python applications. The model can now complete such large-scale tasks in a single interaction, avoiding previous limitations encountered with smaller output caps.
Doubled Down on Advanced Reasoning
Building on the "Deep Thinking" capabilities introduced with Gemini 3.0, version 3.1 focuses on optimizing this cognitive efficiency. Performance gains across stringent benchmarks are clearly evident:
- ARC-AGI-2: Achieved 77.1%, demonstrating a superior ability to solve novel logic patterns.
- GPQA Diamond: Scored 94.1% for graduate-level scientific reasoning.
- SciCode: Registered 58.9% in Python programming for scientific computing.
- Terminal-Bench Hard: Reached 53.8% for agentic coding and terminal operations.
- Humanity’s Last Exam (HLE): Recorded 44.7% for reasoning near human thresholds.
The 77.1% score on ARC-AGI-2 is particularly noteworthy, reportedly representing more than double the reasoning performance of the original Gemini 3 Pro. This advancement suggests the model exhibits a reduced reliance on mere pattern matching from its training data, instead demonstrating a greater capacity to solve unfamiliar problems and novel edge cases.
Specialized Tools for Agentic Development
Google has also introduced gemini-3.1-pro-preview-customtools, a specialized endpoint tailored for developers integrating bash commands with bespoke functions. Prior versions sometimes struggled to prioritize tool usage, occasionally defaulting to web searches when a local file examination would have been more appropriate. This custom tools variant is specifically optimized to favor internal resources like view_file or search_code, enhancing its reliability for autonomous coding agents.
Further integration with Google Antigravity, the company's new agentic development platform, is also a key feature. Developers can now leverage a 'medium' thinking level, allowing for flexible management of the ‘reasoning budget.’ This enables the application of high-depth thinking for complex debugging scenarios, while opting for medium or low levels for standard API calls, thereby optimizing latency and operational costs.
API Updates and Enhanced Data Handling
Developers working with the Gemini API should note a minor yet critical alteration. Within the Interactions API v1beta, the field total_reasoning_tokens has been relabeled as total_thought_tokens. This modification aligns with the 'thought signatures' feature introduced in the Gemini 3 family, which represents the model’s internal reasoning processes necessary for maintaining context across multi-turn agentic workflows.
The model’s capacity for data ingestion has also expanded significantly:
- File Size Limit: The maximum API upload size has increased fivefold, from 20MB to 100MB.
- Direct YouTube Integration: Users can now provide a YouTube URL directly as a media source, allowing the model to process video content without requiring manual uploads.
- Cloud Data Access: Support has been added for Cloud Storage buckets and private database pre-signed URLs as direct data sources.
The Economics of Advanced Intelligence
Pricing for Gemini 3.1 Pro Preview remains highly competitive. For prompts under 200,000 tokens, input processing costs are $2 per million tokens, with output priced at $12 per million. For contexts exceeding 200,000 tokens, these rates adjust to $4 for input and $18 for output per million tokens.
Compared to rival models, Google is positioning Gemini 3.1 Pro as a leader in efficiency. Data from Artificial Analysis indicates that Gemini 3.1 Pro currently ranks highest on their Intelligence Index, reportedly achieving this performance at approximately half the operational cost of its closest frontier model competitors.
This article is a rewritten summary based on publicly available reporting. For the original story, visit the source.
Source: MarkTechPost