Claude Opus 4.5 beats Gemini 3 Pro, GPT 5.1 in key coding test, uses up to 65% fewer tokens

Anthropic has released Claude Opus 4.5, its most powerful AI model so far, designed to excel at coding, autonomous agents, and very long, complex tasks while using dramatically fewer tokens than earlier versions. The model is already live on the Claude app and API, and on major cloud platforms like Google Vertex AI and Amazon Bedrock, targeting enterprise scale workflows at roughly one third the cost of the previous Opus 4.1 tier.

26/11/2025/3:34:pm

Palpal News Network

Key points

Claude Opus 4.5 is Anthropic’s new flagship model, positioned as its best system for coding, agents, computer use and office workflows, and it fully replaces Opus 4.1 in the product line.
Anthropic says Opus 4.5 can complete long horizon coding tasks with up to 65% fewer tokens than previous Claude 4.x models, thanks to improved internal planning and a new “effort” control in the API.
The model achieved 80.9% on the SWE Bench Verified benchmark for real world software engineering, higher than Gemini 3 Pro at 76.2% and GPT 5.1 Codex Max at 77.9%.
Opus 4.5 is optimised for long context use, with a 200k token context window and 64k token outputs, and can generate 10–15 page narratives in a consistent tone, which is a major upgrade for long form content.
New tool and agent features include more efficient tool loading, tool search and multi agent coordination, which can cut tool related context usage by around 80–85% in Anthropic’s internal tests.
Anthropic and its cloud partners advertise Opus 4.5 as delivering Opus level intelligence at about one third of the previous Opus 4.1 price for many enterprise workloads.
The model is available via the Claude website and API, and on Amazon Bedrock, Google Cloud Vertex AI and other partner platforms, with integrations rolling out into products like Notion Agent and enterprise coding tools.

Anthropic describes Claude Opus 4.5 as a clear step up from Sonnet 4.5 and Opus 4.1, tuned specifically for difficult, multi-step reasoning and software engineering work across very large codebases. The system maintains a 200k token context and a 64k token output limit, allowing it to ingest entire repositories, long specification documents, or multi-chapter drafts in a single session.

Internally, Anthropic reports that Opus 4.5 handles “long horizon” coding problems more efficiently than any previous Claude model, combining higher pass rates on held-out tests with up to 65% fewer tokens used on the same task, which directly lowers API bills. A new “effort” parameter on the Claude API lets developers choose between faster, cheaper responses or deeper step-by-step reasoning, with medium effort matching earlier best scores while dramatically cutting output tokens.

For long form writing and content workflows, Anthropic says Opus 4.5 can reliably produce 10–15 page chapters or reports with stable structure and tone, something earlier models struggled to maintain over very long outputs. This makes it attractive for drafting technical documentation, policy reports, financial analysis and other narrative documents where consistency matters as much as creativity.

On the “agentic” side, Opus 4.5 has been trained to manage multi-step workflows, tool orchestration, and multi-agent coordination, for example, refactoring separate codebases while directing several specialised agents in parallel. The new tool calling stack introduces tool search and tool use examples, loading only the tools that are actually needed, which Amazon and Anthropic say can shrink tool-related context by around 80–85% in realistic enterprise setups.

Benchmarks, pricing, and where to use it

On benchmarks, Claude Opus 4.5 sets a new high watermark on SWE Bench Verified, scoring 80.9% and becoming the first model to cross the 80% line on this real world coding challenge. Public comparisons show Gemini 3 Pro at 76.2% and GPT 5.1 Codex Max at 77.9% on the same test, with analysis sites also noting strong results for Opus 4.5 on terminal style coding and agent based evaluations.

Cloud partners state that Opus 4.5 offers similar or better performance than Opus 4.1 at roughly one-third of the earlier Opus price tier for many enterprise use cases, largely because fewer tokens are consumed per successful solution. That combination of higher accuracy, long context and lower per task cost is aimed squarely at large customers who want to automate software maintenance, document generation, back office processes and data analysis at scale.

Claude Opus 4.5 is already available in the Claude app and developer API, and is generally available on Amazon Bedrock and Google Vertex AI, with multi-cloud availability, including Microsoft-aligned offerings, highlighted in partner announcements. Anthropic also notes that this 4.5 generation is intended to span whole product stacks, with Opus 4.5 as the top-tier production agent, Sonnet 4.5 for fast iteration, and Haiku 4.5 for lightweight or free-tier usage, giving companies a full ladder of models for different workloads.

Palpal News Network Editor

Palpal News Network Palpal News Network

See Full Bio

Key points

Benchmarks, pricing, and where to use it

Recent Posts

LATEST POSTS