Anthropic’s new Claude Opus 4.5 is the #2 most intelligent model in the Artificial Analysis Intelligence Index, narrowly behind Google’s Gemini 3 Pro and tying OpenAI’s GPT-5.1 (high) Claude Opus 4.5 delivers a substantial intelligence uplift over Claude Sonnet 4.5 (+7 points on the Artificial Analysis Intelligence Index) and Claude Opus 4.1 (+11 points), establishing it as @AnthropicAI's new leading model. Anthropic has dramatically cut per-token pricing for Claude Opus 4.5 to $5/$25 per million input/output tokens. However, compared to the prior Claude Opus 4.1 model it used 60% more tokens to complete our Intelligence Index evaluations (48M vs. 30M). This translates to a substantial reduction in the cost to run our Intelligence Index evaluations from $3.1k to $1.5k, but not as significant as the headline price cut implies. Despite Claude Opus 4.5 using substantially more tokens to complete our Intelligence Index, the model still cost significantly more than other models including Gemini 3 Pro (high), GPT-5.1 (high), and Claude Sonnet 4.5 (Thinking), and among all models only cost less than Grok 4 (Reasoning). Key benchmarking takeaways: ➤ 🧠 Anthropic’s most intelligent model: In reasoning mode, Claude Opus 4.5 scores 70 on the Artificial Analysis Intelligence Index. This is a jump of +7 points from Claude Sonnet 4.5 (Thinking), which was released in September 2025, and +11 points from Claude Opus 4.1 (Thinking). Claude Opus 4.5 is now the second most intelligent model. It places ahead of Grok 4 (65) and Kimi K2 Thinking (67), ties GPT-5.1 (high, 70), and trails only Gemini 3 Pro (73). Claude Opus 4.5 (Thinking) scores 5% on CritPt, a frontier physics eval reflective of research assistant capabilities. It sits only behind Gemini 3 Pro (9%) and ties GPT-5.1 (high, 5%) ➤ 📈 Largest increases in coding and agentic tasks: Compared to Claude Sonnet 4.5 (Thinking), the biggest uplifts appear across coding, agentic tasks, and long-context reasoning, including LiveCodeBench (+16 p.p.), Terminal-Bench Hard (+11 p.p.), 𝜏²-Bench Telecom (+12 p.p.), AA-LCR (+8 p.p.), and Humanity's Last Exam (+11 p.p.). Claude Opus achieves Anthropic’s best scores yet across all 10 benchmarks in the Artificial Analysis Intelligence Index. It also earns the highest score on Terminal-Bench Hard (44%) of any model and ties Gemini 3 Pro on MMLU-Pro (90%) ➤ 📚 Knowledge and Hallucination: In our recently launched AA-Omniscience Index, which measures embedded knowledge and hallucination of language models, Claude Opus 4.5 places 2nd with a score of 10. It sits only behind Gemini 3 Pro Preview (13) and ahead of Claude Opus 4.1 (Thinking, 5) and GPT-5.1 (high, 2). Claude Opus 4.5 (Thinking) scores the second-highest accuracy (43%) and has the 4th-lowest hallucination rate (58%), trailing only Claude Haiku (Thinking, 26%), Claude Sonnet 4.5 (Thinking, 48%), and GPT-5.1 (high). Claude Opus 4.5 continues to demonstrate Anthropic’s leadership in AI safety with a lower hallucination rate than select other frontier models such as Grok 4 and Gemini 3 Pro ➤ ⚡ Non-reasoning performance: In non-reasoning mode, Claude Opus 4.5 scores 60 on the Artificial Analysis Intelligence Index and is the most intelligent non-reasoning model. It places ahead of Qwen3 Max (55), Kimi K2 0905 (50), and Claude Sonnet 4.5 (50) ➤ ⚙️ Token efficiency: Anthropic continues to demonstrate impressive token efficiency. It has improved intelligence without a significant increase in token usage (compared to Claude Sonnet 4.5, evaluated with a maximum reasoning budget of 64k tokens). Claude Opus 4.5 uses 48M output tokens to run the Artificial Analysis Intelligence Index. This is lower than other frontier models, such as Gemini 3 Pro (high, 92M), GPT-5.1 (high, 81M), and Grok 4 (Reasoning, 120M) ➤ 💲 Pricing: Anthropic has reduced the per-token pricing of Claude Opus 4.5 compared to Claude Opus 4.1. Claude Opus 4.5 is priced at $5/$25 per 1M input/output tokens (vs. $15/$75 for Claude Opus 4.1). This positions it much closer to Claude Sonnet 4.5 ($3/$15 per 1M tokens) while offering higher intelligence in thinking mode Key model details: ➤ 📏 Context window: 200K tokens ➤ 🪙 Max output tokens: 64K tokens ➤ 🌐 Availability: Claude Opus 4.5 is available via Anthropic‘s API, Google Vertex, Amazon Bedrock and Microsoft Azure. Claude Opus 4.5 is also available via Claude app and Claude Code
Claude Opus 4.5 (Thinking) takes the #2 spot on the Artificial Analysis Omniscience Index, our new benchmark for measuring knowledge and hallucination across domains. Claude Opus 4.5 (Thinking) comes in second for both Omniscience Index (our lead metric that takes off points for incorrect answers) and Omniscience Accuracy (percentage correct), offering a balance of high accuracy and low hallucination rate compared to peer models.
Individual results across all benchmarks in our Artificial Analysis Intelligence Index. We have run all these benchmarks independently and like-for-like across all models.
While Claude 4.5 Opus is significantly more token efficient than nearly all other reasoning models, it did use more ~50% more token than Claude 4.1 Opus. Further, given its relatively high pricing, Claude 4.5 Opus is amongst the most expensive to run the Artificial Analysis Intelligence Index at ~$1.5k.
Compare Claude Opus 4.5 to other models on Artificial Analysis:
29,26 tys.
294
Treści na tej stronie są dostarczane przez strony trzecie. O ile nie zaznaczono inaczej, OKX nie jest autorem cytowanych artykułów i nie rości sobie żadnych praw autorskich do tych materiałów. Treść jest dostarczana wyłącznie w celach informacyjnych i nie reprezentuje poglądów OKX. Nie mają one na celu jakiejkolwiek rekomendacji i nie powinny być traktowane jako porada inwestycyjna lub zachęta do zakupu lub sprzedaży aktywów cyfrowych. Treści, w zakresie w jakim jest wykorzystywana generatywna sztuczna inteligencja do dostarczania podsumowań lub innych informacji, mogą być niedokładne lub niespójne. Przeczytaj podlinkowany artykuł, aby uzyskać więcej szczegółów i informacji. OKX nie ponosi odpowiedzialności za treści hostowane na stronach osób trzecich. Posiadanie aktywów cyfrowych, w tym stablecoinów i NFT, wiąże się z wysokim stopniem ryzyka i może podlegać znacznym wahaniom. Musisz dokładnie rozważyć, czy handel lub posiadanie aktywów cyfrowych jest dla Ciebie odpowiednie w świetle Twojej sytuacji finansowej.