Agents-A1 claims trillion-parameter-level performance with a 35B model by extending agent horizon

A new arXiv preprint says Agents-A1, a 35B mixture-of-experts agentic model, can reach trillion-parameter-level performance by scaling long agent trajectories and heterogeneous abilities rather than parameter count.

A new arXiv preprint says a 35B Mixture-of-Experts agentic model called Agents-A1 can reach trillion-parameter-level performance by changing how it is trained, not by simply adding parameters.

The paper, posted on June 29, 2026, describes a model that scales along two axes: longer trajectories and heterogeneous agent abilities. The authors say the system builds on a long-horizon knowledge-action infrastructure that connects external knowledge, actions, observations, and verifier outcomes, producing agentic trajectories that average 45,000 tokens.

The work is framed as a research claim rather than a confirmed benchmark consensus. The authors say the model is highly competitive across a set of long-horizon and reasoning-focused evaluations, including SEAL-0, IFBench, HiPhO, FrontierScience-Olympiad, MolBench-Bind, SciCode, HLE, and BrowseComp.

What the paper claims

According to the preprint, Agents-A1 is designed to show that scaling the structure and length of agent trajectories can narrow the gap between much smaller systems and much larger frontier models on selected tasks.

The paper says the model reports leading results on SEAL-0 at 56.4, IFBench at 80.6, HiPhO at 46.4, FrontierScience-Olympiad at 79.0, and MolBench-Bind at 56.8. It also says the model is competitive on SciCode at 44.3, HLE at 47.6, and BrowseComp at 75.5.

The authors describe a three-stage training recipe: full-domain supervised fine-tuning, domain-level teacher models, and multi-teacher domain-routed on-policy distillation with salient vocabulary alignment. They also say six heterogeneous domains were unified into one deployable student model.

Why it matters

If the results hold up, the paper would add evidence that long-horizon agent training can be a practical scaling path even when parameter counts stay relatively modest. That matters for teams exploring ways to improve agent performance without relying only on ever-larger base models.

The claim should still be treated as a research result pending broader validation. The current evidence comes from the authors' own paper and release package, so the key open question is whether the reported gains are reproducible and whether they transfer beyond the benchmarks named in the preprint.

What comes next

The research team has also tied the release to an official project page, code repository, and Hugging Face model card. The next questions are whether the authors publish fuller details on training data and evaluation setup, and whether independent testing confirms the benchmark results.

For now, the story is a clear one: Agents-A1 is presented as a 35B agentic model that tries to win by extending the horizon of its reasoning and action loops, not by scaling parameter count alone.

Revision note

Initial automated publication.