Anthropic unveils Claude Opus 4.6 for deeper coding work

Thu, 12th Feb 2026

Anthropic has introduced Claude Opus 4.6, the latest version of its flagship language model, with upgrades focused on coding, long-context reasoning and enterprise productivity tools.

The model adds a 1M token context window in beta, marking the first time an Opus-class model has supported that scale of input.

Coding focus

Anthropic said Claude Opus 4.6 improves on its predecessor's coding performance by planning more carefully, sustaining agentic tasks for longer periods and operating more reliably in large codebases.

The model is designed to handle code review and debugging with greater accuracy, including identifying its own mistakes.

On Terminal-Bench 2.0, an evaluation of agentic coding, the model achieved the highest score among tested systems. It also led on Humanity's Last Exam, a multidisciplinary reasoning test, and outperformed OpenAI's GPT-5.2 by around 144 Elo points on GDPval-AA, which measures economically valuable knowledge work tasks.

Long context

Anthropic positioned the expanded context window as a response to "context rot", where model performance degrades as conversations grow longer.

On the 8-needle 1M variant of MRCR v2, a long-context retrieval benchmark, Claude Opus 4.6 scored 76%, compared with 18.5% for Sonnet 4.5. The company described this as a qualitative shift in how much context a model can use while maintaining performance.

The model also showed improved retrieval across large document sets and better tracking of information over hundreds of thousands of tokens, according to Anthropic's published evaluation results.

Agent controls

Several new controls have been added to the Claude Developer Platform.

Adaptive thinking allows the model to determine when deeper reasoning is required, rather than relying on a binary setting for extended thinking.

Developers can now choose between four effort levels: low, medium, high and max. Context compaction, available in beta, summarises older conversation history when a session approaches its token limit.

The model supports outputs of up to 128k tokens and offers US-only inference at 1.1 times standard token pricing.

Standard pricing remains at USD $5 per million input tokens and USD $25 per million output tokens, with premium pricing applied for prompts exceeding 200k tokens.

Office tools

Anthropic has also extended Claude's integration with office software.

Claude in Excel has been upgraded to handle longer-running and multi-step tasks, including ingesting unstructured data and inferring structure without explicit guidance.

Claude in PowerPoint is available in research preview for Max, Team and Enterprise plans. The model can read slide layouts, fonts and masters to maintain formatting consistency when generating presentations.

Within Claude Code, users can assemble agent teams that work in parallel on independent tasks, such as codebase reviews.

Safety testing

Anthropic said the performance gains do not reduce safety.

On its automated behavioural audit, Claude Opus 4.6 showed a low rate of misaligned behaviours such as deception, sycophancy, encouragement of user delusions and cooperation with misuse.

The company said the model is as well-aligned as Claude Opus 4.5 and records the lowest rate of over-refusals among recent Claude models.

Anthropic also developed six new cybersecurity probes to detect harmful responses, citing the model's enhanced cybersecurity capabilities. It is using the system to identify and patch vulnerabilities in open-source software as part of its cyberdefensive work.

Claude Opus 4.6 is available through Anthropic's web interface, API and major cloud platforms.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google