AI2 Unveils SERA: Open-Source AI Coding Agents Match Top Proprietary Systems Through Novel Supervised Learning

Revolutionizing AI-Powered Software Development

The Allen Institute for AI (AI2) recently introduced SERA, or Soft Verified Efficient Repository Agents, as a new paradigm for artificial intelligence in software development. This collection of coding agents demonstrates the potential to achieve performance levels comparable to significantly larger, closed-source systems, all while exclusively leveraging supervised training methods and meticulously crafted synthetic trajectories. SERA represents the inaugural release in AI2’s 'Open Coding Agents' initiative.

At the core of the SERA family is the SERA-32B model, built upon the Qwen 3 32B architecture. This flagship model is specifically designed for comprehensive repository-level coding tasks. On the rigorous SWE-bench Verified benchmark, SERA-32B achieved a 49.5 percent resolve rate with a 32K context window, improving to 54.2 percent when utilizing a 64K context. These figures place its capabilities in direct competition with prominent open-weight systems, including the 24-billion-parameter Devstral-Small-2 and the much larger 110-billion-parameter GLM-4.5 Air. Crucially, SERA maintains complete transparency, with its code, training data, and model weights fully accessible to the public. The series currently comprises four models—SERA-8B, SERA-8B GA, SERA-32B, and SERA-32B GA—all available on Hugging Face under an Apache 2.0 license.

The Innovative Soft Verified Generation (SVG) Method

SERA's training methodology hinges on a novel approach termed Soft Verified Generation (SVG). This pipeline generates agent trajectories that mimic authentic developer workflows. Instead of relying on strict test suite verification, SVG employs a patch agreement system between two independent rollouts as a 'soft' indicator of correctness. The process unfolds in several stages:

Initial Rollout: A function from a real repository is presented. A teacher model, such as GLM-4.6 in the SERA-32B configuration, is given a bug description or change request. It then interacts with developer tools to inspect files, modify code, and execute commands, resulting in a first trajectory (T1) and a corresponding code patch (P1).
Synthetic Pull Request: The system transforms this trajectory into a pull request-like summary, detailing the intended changes and key edits in a format resembling actual pull requests.
Second Rollout: The teacher model restarts from the original repository state, now only provided with the synthetic pull request description and tools. It then generates a new trajectory (T2) and patch (P2) to implement the described modification.
Soft Verification: Patches P1 and P2 are meticulously compared line by line. A recall score (r) quantifies the proportion of modified lines in P1 that also appear in P2. A score of 1 signifies 'hard verification,' while intermediate values indicate 'soft verification.'

A significant finding from the research suggests that absolute verification is not a prerequisite. Training models on T2 trajectories, even with low 'r' thresholds (including r=0), yielded comparable performance on SWE-bench Verified for a fixed sample count. This implies that realistic, multi-step traces, even if imperfect, provide valuable supervision for training effective coding agents.

Efficiency, Scale, and Specialized Adaptation

The SVG technique was applied to 121 Python repositories sourced from the SWE-smith corpus, resulting in a dataset of over 200,000 trajectories across various teacher model runs. This extensive collection represents one of the largest open datasets specifically designed for coding agents. SERA-32B was trained on a carefully selected subset of 25,000 T2 trajectories from the Sera-4.6-Lite T2 dataset. The training utilized standard supervised fine-tuning protocols with Axolotl on Qwen-3-32B over three epochs, employing a learning rate of 1e-5, a weight decay of 0.01, and a maximum sequence length of 32,768 tokens. For trajectories exceeding the context limit, an ordered truncation strategy, which preferentially selects fitting slices, proved superior to random truncation in improving SWE-bench Verified scores.

The reported computational resources for SERA-32B, encompassing both data generation and training, totaled approximately 40 GPU days. A scaling law analysis across dataset size and performance indicated that the SVG methodology is estimated to be roughly 26 times more cost-effective than reinforcement learning-based systems, such as SkyRL-Agent, and approximately 57 times cheaper than earlier synthetic data pipelines like SWE-smith for achieving equivalent SWE-bench performance levels.

Furthermore, the research explored the critical application of adapting agents to specific repositories. Experiments conducted on major SWE-bench Verified projects—Django, SymPy, and Sphinx—demonstrated promising results. For each repository, SVG generated between 46,000 and 54,000 trajectories. Training with a subset of 8,000 trajectories per repository (a mix of soft verified T2 and filtered T1 trajectories) enabled specialized agents to match or even slightly surpass the GLM-4.5-Air teacher model, while also performing favorably against Devstral-Small-2 on these specific codebases. For instance, a specialized agent for Django achieved a 52.23 percent resolve rate, outperforming GLM-4.5-Air's 51.20 percent. Similarly, the specialized SymPy model reached 51.11 percent compared to GLM-4.5-Air’s 48.89 percent.

SERA's introduction marks a pivotal moment in the development of AI coding agents, underscoring the power and efficiency of supervised learning combined with intelligent data generation, making high-performance AI automation more accessible and cost-effective than ever before.

Revolutionizing AI-Powered Software Development

The Innovative Soft Verified Generation (SVG) Method

Initial Rollout: A function from a real repository is presented. A teacher model, such as GLM-4.6 in the SERA-32B configuration, is given a bug description or change request. It then interacts with developer tools to inspect files, modify code, and execute commands, resulting in a first trajectory (T1) and a corresponding code patch (P1).

Synthetic Pull Request: The system transforms this trajectory into a pull request-like summary, detailing the intended changes and key edits in a format resembling actual pull requests.

Second Rollout: The teacher model restarts from the original repository state, now only provided with the synthetic pull request description and tools. It then generates a new trajectory (T2) and patch (P2) to implement the described modification.

Soft Verification: Patches P1 and P2 are meticulously compared line by line. A recall score (r) quantifies the proportion of modified lines in P1 that also appear in P2. A score of 1 signifies 'hard verification,' while intermediate values indicate 'soft verification.'

Efficiency, Scale, and Specialized Adaptation

AI2 Unveils SERA: Open-Source AI Coding Agents Match Top Proprietary Systems Through Novel Supervised Learning

Revolutionizing AI-Powered Software Development

The Innovative Soft Verified Generation (SVG) Method

Efficiency, Scale, and Specialized Adaptation

Latest News

From Political Chaos to Policy Crossroads: Albanese Navigates Shifting Sands

Historic Reimagining: Barnsley Crowned UK's First 'Tech Town' with Major Global Partnerships

OpenClaw: Viral AI Assistant's Autonomy Ignites Debate Amidst Expert Warnings

Adobe Sunsets Animate: A Generative AI Strategy Claims a Legacy Tool

Palantir CEO Alex Karp: ICE Protesters Should Demand More AI Surveillance

More News

Crafting Enterprise AI: Five Pillars for Scalability and Resilience

Exposed: The 'AI-Washing' Phenomenon Masking Traditional Layoffs

India's Zero-Tax Gambit: A 23-Year Incentive to Lure Global AI Infrastructure

AI2 Unveils SERA: Open-Source AI Coding Agents Match Top Proprietary Systems Through Novel Supervised Learning

Revolutionizing AI-Powered Software Development

The Innovative Soft Verified Generation (SVG) Method

Efficiency, Scale, and Specialized Adaptation

Latest News

From Political Chaos to Policy Crossroads: Albanese Navigates Shifting Sands

Historic Reimagining: Barnsley Crowned UK's First 'Tech Town' with Major Global Partnerships

OpenClaw: Viral AI Assistant's Autonomy Ignites Debate Amidst Expert Warnings

Adobe Sunsets Animate: A Generative AI Strategy Claims a Legacy Tool

Palantir CEO Alex Karp: ICE Protesters Should Demand More AI Surveillance

More News

Crafting Enterprise AI: Five Pillars for Scalability and Resilience

Exposed: The 'AI-Washing' Phenomenon Masking Traditional Layoffs

India's Zero-Tax Gambit: A 23-Year Incentive to Lure Global AI Infrastructure

AI2 Unveils SERA: Open-Source AI Coding Agents Match Top Proprietary Systems Through Novel Supervised Learning

Revolutionizing AI-Powered Software Development

The Innovative Soft Verified Generation (SVG) Method

Efficiency, Scale, and Specialized Adaptation

Latest News

From Political Chaos to Policy Crossroads: Albanese Navigates Shifting Sands

Historic Reimagining: Barnsley Crowned UK's First 'Tech Town' with Major Global Partnerships

OpenClaw: Viral AI Assistant's Autonomy Ignites Debate Amidst Expert Warnings

Adobe Sunsets Animate: A Generative AI Strategy Claims a Legacy Tool

Palantir CEO Alex Karp: ICE Protesters Should Demand *More* AI Surveillance

More News

Crafting Enterprise AI: Five Pillars for Scalability and Resilience

Exposed: The 'AI-Washing' Phenomenon Masking Traditional Layoffs

India's Zero-Tax Gambit: A 23-Year Incentive to Lure Global AI Infrastructure

AI2 Unveils SERA: Open-Source AI Coding Agents Match Top Proprietary Systems Through Novel Supervised Learning

Revolutionizing AI-Powered Software Development

The Innovative Soft Verified Generation (SVG) Method

Efficiency, Scale, and Specialized Adaptation

Latest News

From Political Chaos to Policy Crossroads: Albanese Navigates Shifting Sands

Historic Reimagining: Barnsley Crowned UK's First 'Tech Town' with Major Global Partnerships

OpenClaw: Viral AI Assistant's Autonomy Ignites Debate Amidst Expert Warnings

Adobe Sunsets Animate: A Generative AI Strategy Claims a Legacy Tool

Palantir CEO Alex Karp: ICE Protesters Should Demand *More* AI Surveillance

More News

Crafting Enterprise AI: Five Pillars for Scalability and Resilience

Exposed: The 'AI-Washing' Phenomenon Masking Traditional Layoffs

India's Zero-Tax Gambit: A 23-Year Incentive to Lure Global AI Infrastructure

Palantir CEO Alex Karp: ICE Protesters Should Demand More AI Surveillance

Palantir CEO Alex Karp: ICE Protesters Should Demand More AI Surveillance