Jane Arandelovic

GPT-5.2: Why the boring work just got 11× faster

Published on 2025-12-14 · by Jane Arandelovic
Featured

While most people were already winding down for Christmas parties and summer holidays, OpenAI quietly released GPT-5.2, its latest model for professional knowledge work and AI agents. It is not a fun little toy update. It is clearly aimed at the spreadsheets, reports and project plans that keep businesses running.

Short version: your boring work just got a lot faster and cheaper to process.

In this post I am going to stay away from jargon and focus on what actually matters if you are a business owner, manager or AI curious builder.


TL;DR for non technical humans

Here is the quick sumnary:

  • 11× faster than experts on a big benchmark of real office work
  • Roughly 390× cheaper for some very hard reasoning tasks compared with last year’s frontier model
  • Can work across hundreds of pages at once without losing the thread
  • Produces around 30% fewer wrong answers on real user questions
  • Much better at understanding images, charts and software screens

Most of the upside is not in “write me a blog post”. It is in the boring but important work, like workforce planning, financial analysis and operations.


  1. Built for the boring but valuable work

OpenAI designed GPT-5.2 Thinking for economically valuable tasks, which is their polite way of saying real work, not just cute demos. On an evaluation called GDPval, which measures well specified knowledge work across 44 occupations, GPT-5.2 Thinking beats or ties top industry professionals on 70.9% of comparisons.

These tasks include things like:

  • Workforce planning models
  • Accounting and finance spreadsheets
  • Schedules for hospitals and urgent care
  • Sales presentations and manufacturing diagrams

In other words, the kind of work that actually moves budgets and headcount around.

Even more interesting, on GDPval the model produces those outputs more than 11× faster and at under 1% of the cost of human experts.

That is the shift, from “help me write an email” to “help me build and maintain the spreadsheet that runs this part of the business”.


  1. The cost of deep reasoning has fallen off a cliff

The second big story is price.

The ARC Prize team, an independent group that tests how well models handle hard abstract reasoning, compared last year’s OpenAI o3 (High) with GPT-5.2 Pro (X High) on the ARC-AGI-1 benchmark.

  • About a year ago, ARC verified an unreleased o3 run at 88% on ARC-AGI-1 at an estimated US$4,500 per task
  • This year, they verified GPT-5.2 Pro (X High) at 90.5% accuracy for about US$11.64 per task

That is roughly a 390× efficiency improvement in a year.

If you like analogies, we have basically gone from small second hand car per task money to nice coffee per task money for this kind of high end reasoning.

Why this matters:

  • It becomes far more realistic to run long running agents that think through complex problems step by step
  • You can afford to have an AI sit on a problem for longer, try multiple approaches, and still stay within a sensible budget
  • Workflows that looked too expensive to automate in 2024 start to look viable in 2026

  1. It can actually chew through huge documents

GPT-5.2 Thinking also improves at long context reasoning, which is the ability to read and remember large amounts of text in one go.

On OpenAI’s internal MRCRv2 benchmark, it is the first model they have seen that reaches near 100% accuracy on a “four needle” test at 256k tokens, which roughly corresponds to hundreds of pages of material.

In practical terms, that means you can now:

  • Feed in a full board pack, not just one section
  • Analyse a long contract plus its annexes
  • Review several research papers together
  • Ask questions about multi file projects or long transcripts

You can still expect the model to track details across the whole pile, not just the last few pages it saw.

For people designing AI agents, this opens up workflows like:

  • “Read all of last quarter’s performance reports and summarise what actually changed.”
  • “Scan these policy documents, find conflicts and recommend a clean, combined version.”
  • “Map every dependency in this multi file project and suggest where automation would have the biggest impact.”

  1. Slightly less confident nonsense

Let us talk hallucinations.

OpenAI reports that GPT-5.2 Thinking hallucinates less often than GPT-5.1 Thinking. On a set of de identified ChatGPT queries, responses with errors were about 30% less common, when the model was allowed to think hard and use search tools.

Important nuance:

  • It still makes mistakes
  • It still sounds confident when it is wrong
  • You still need checks, especially around anything legal, medical or financial

For everyday knowledge work, the trend line is moving in the right direction. Fewer confidently wrong answers means less time spent double checking and more trust that the first draft is at least in the right ballpark.

If you are building internal tools or agents, you still need guardrails, validation steps and human review. GPT-5.2 just makes it less painful to get a good first answer most of the time.


  1. Better at looking at screens, not just reading text

GPT-5.2 is also OpenAI’s strongest vision model so far. It roughly halves error rates on tests that involve reading charts from scientific papers and understanding software interfaces.

That shows up in very practical ways:

  • Interpreting dashboards and KPI charts
  • Reading product screenshots and giving feedback
  • Understanding complex UIs, step by step
  • Working with technical diagrams and schematics

For teams, that might look like:

  • “Here is a screenshot of our analytics dashboard, what changed this month?”
  • “Here is a spreadsheet and a chart, explain why our margins dipped.”
  • “Here is a screenshot of an error page, walk me through what to check.”

It is still not magic, but it is much closer to having a colleague who can glance at your screen and make sense of what is going on.


So what does this actually change for businesses?

Here is how I see it at a high level.

  1. Most value will come from “boring” workflows

The real upside is not flashy content generation. For most organisations, the value will come from:

  • Planning and scheduling
  • Financial analysis and modelling
  • Reporting and board papers
  • Project tracking and risk analysis
  • Policy reviews and compliance checks

Content still matters, especially if you are in marketing or media, but the quiet revolution is all the admin and operations work that can be offloaded to an AI that is:

  • Fast enough to use in the flow of work
  • Cheap enough to run often
  • Accurate enough that you are not constantly babysitting it
  1. Agent ideas that sounded like science fiction a year ago now look practical

Because the cost of deep reasoning has dropped so sharply, more ambitious agent workflows start to make sense:

  • An agent that keeps a living workforce plan up to date
  • An agent that watches your cashflow model and flags issues early
  • An agent that reads every weekly report, rolls up the insights, and drafts your monthly summary

You still need to design them carefully, but the economics are shifting in your favour.

  1. This is a planning moment, not just a hype moment

If you are a leader, now is a good time to ask:

  • Which repetitive, thinking heavy tasks slow my team down?
  • Where do we rely on big spreadsheets, long reports or complex dashboards?
  • How could we safely give an AI first pass responsibility, with humans checking and adjusting?

If you are a builder or AI curious operator, the question becomes:

  • What is the smallest, most useful boring workflow I could automate with GPT-5.2 and a bit of glue code?

That might be workforce planning, a recurring financial analysis, or a recurring operational report. Start there, not with a chatbot that tries to answer everything.


Final thoughts

GPT-5.2 is being presented as OpenAI’s most capable model so far for professional knowledge work, and from the early numbers it earns that label.

For me, the most important shift is this:

AI is moving from nice to have assistant to serious operator on the boring stuff, and it is finally getting cheap enough to use that way.

If you are looking at this release and thinking, “Alright, but what would I actually use this for in my team or business?”, that is the right question. Over the next year, I will be focusing more on exactly that, and sharing practical examples as I go.


References

  • OpenAI, “Introducing GPT-5.2” (official release, December 2025).
  • ARC Prize (@arcprize), ARC-AGI-1 benchmark results for GPT-5.2 Pro (X High) and o3 (High), shared on X.

Edited with ChatGPT (GPT-5.2). All thoughts my own :)