OpenAI Releases GPT-5.2 Model Surpassing Human Experts In Knowledge Work

OpenAI’s latest artificial intelligence model challenges the boundaries of professional productivity by outperforming human specialists across diverse tasks. GPT-5.2 achieves state-of-the-art results in benchmarks measuring spreadsheet creation, code generation, and long-context analysis. The release intensifies competition with Google’s Gemini 3, narrowing the performance gap in foundational AI capabilities.

Engineers at OpenAI accelerated development following an internal directive to prioritize core advancements over peripheral projects. The model deploys in three variants: Instant for routine queries, Thinking for structured reasoning, and Pro for high-complexity domains. Rollout begins immediately for paid subscribers on Plus, Pro, Business, and Enterprise plans, with API access available under identifiers gpt-5.2 and gpt-5.2-pro.

GPT-5.2 Thinking scores 70.9 percent on GDPval, a benchmark evaluating 44 occupations in knowledge work such as presentations and data artifacts. This marks the first instance of an OpenAI model matching or exceeding expert human performance on these evaluations. The system handles up to 256,000 tokens in context, enabling near-perfect accuracy on multi-needle retrieval tasks.

Software engineering benchmarks reveal further gains. On SWE-Bench Verified, GPT-5.2 attains 80.0 percent resolution for coding issues, while SWE-Bench Pro reaches 55.6 percent. Scientific reasoning improves to 92.4 percent on GPQA Diamond without tools and 88.7 percent on CharXiv with Python execution. In mathematics, it solves 100 percent of AIME 2025 problems and 40.3 percent of FrontierMath Tier 1-3 challenges.

Vision capabilities halve error rates in chart interpretation and user interface navigation compared to predecessors. Tool-calling accuracy hits 98.7 percent on Tau2-bench Telecom, supporting agentic workflows that integrate spreadsheets, slideshows, and external APIs. Factuality rises with 30 percent fewer hallucinations in generated outputs.

Pricing reflects the model’s frontier positioning. GPT-5.2 costs $1.75 per million input tokens and $14 per million output tokens via API, with 90 percent discounts on cached prompts. The Pro variant commands $21 per million inputs and $168 per million outputs. These rates position it below competitors like Anthropic’s Claude 3.5 while exceeding GPT-5.1.

Availability extends to ChatGPT interfaces on web, iOS, and Android, with free-tier access deferred to the following day. Developers integrate it through Codex for optimized code tasks, without plans to deprecate prior models like GPT-5.1 for three months. OpenAI commits to addressing latency and over-refusal issues in upcoming iterations.

Partner feedback underscores practical impacts. Jeff Wang, CEO of Windsurf, states, “GPT-5.2 represents the biggest leap for GPT models in agentic coding since GPT-5 and is a SOTA coding model in its price range.” AJ Orbach, CEO of Triple Whale, adds, “We collapsed a fragile, multi-agent system into a single mega-agent with 20+ tools… It feels like pure magic.”

The model’s abstract reasoning prowess shines on ARC-AGI-1 at 86.2 percent and ARC-AGI-2 at 52.9 percent, verified editions. These scores signal progress toward general intelligence, though OpenAI emphasizes ongoing refinements for reliability. Integration with enterprise tools promises streamlined operations in sectors from finance to research.

As adoption scales, GPT-5.2 equips users with autonomous agents for end-to-end project execution. Early tests demonstrate reduced dependency on multi-step prompts, collapsing complex systems into single interactions. This shift redefines AI’s role from assistive to operational, potentially reshaping workflows in knowledge-intensive industries.

OpenAI Releases GPT-5.2 Model Surpassing Human Experts in Knowledge Work

Google Disco: A Radical AI Experiment Transforming the Web into Instant Apps

Alibaba Reports 5% Revenue Growth Driven by AI and Cloud Surge

Google’s Antigravity AI Wipes User Drive During Routine Task

Take Two Boss Strauss Zelnick Isn’t Afraid of AI Games: “It’s backward-looking.”

Nvidia Invests $2 Billion in Synopsys to Accelerate AI Chip Design Tools

Google Launches Gemini 3 Flash as Default Model

Similar Posts