IT Brief Asia - Technology news for CIOs & IT decision-makers
5.4 thinking hero %2b seo

OpenAI unveils GPT-5.4 with advanced computer use tools

Sat, 7th Mar 2026

OpenAI has released GPT-5.4 across ChatGPT, its developer API, and Codex, positioning it as a step forward in reasoning, coding, and software-based workflows for professional users.

It has also introduced GPT-5.4 Pro in ChatGPT and the API, aimed at users who want higher performance on complex tasks.

In ChatGPT, GPT-5.4 appears as GPT-5.4 Thinking. For longer, more complex requests, it can provide an upfront plan so users can adjust instructions while a response is in progress, reducing the need for follow-up prompts.

GPT-5.4 Thinking also updates web research, performing better on highly specific queries and maintaining context across longer prompts that require more reasoning.

Computer use

In the API and Codex, GPT-5.4 adds native computer-use functions within a general-purpose model. This allows agents to operate computers and run workflows across applications by reacting to screenshots and issuing mouse and keyboard commands.

Developers can steer behaviour using developer messages and set confirmation policies to adjust safety behaviour based on risk tolerance.

OpenAI reported benchmark results for computer use. On OSWorld-Verified, it said GPT-5.4 achieved a 75.0% success rate, up from 47.3% for GPT-5.2 and above a reported human performance level of 72.4%.

For browser tasks, it said GPT-5.4 scored 67.3% on WebArena-Verified when using both DOM- and screenshot-driven interaction, compared with 65.4% for GPT-5.2. On Online-Mind2Web, it reported a 92.8% success rate using screenshot-based observations alone, compared with 70.9% for ChatGPT Atlas's Agent Mode.

Vision updates

GPT-5.4 includes updates to visual understanding that OpenAI linked to computer-use performance and document parsing. On MMMU-Pro, it reported an 81.2% success rate without tool use, compared with 79.5% for GPT-5.2.

On OmniDocBench, it reported an average error of 0.109 without "reasoning effort", compared with 0.140 for GPT-5.2. Improved visual perception, it said, can translate into better document parsing.

OpenAI also introduced a new "original" image input detail level for full-fidelity perception up to 10.24 million total pixels or a 6000-pixel maximum dimension. It also raised the "high" image input detail level to 2.56 million total pixels or a 2048-pixel maximum dimension.

Knowledge work

The release emphasises tasks common in office settings, including spreadsheets, presentations, and documents. OpenAI said GPT-5.4 achieved a new state of the art on GDPval, a benchmark that tests well-specified knowledge work across 44 occupations. It said the model matched or exceeded industry professionals in 83.0% of comparisons, compared with 70.9% for GPT-5.2.

On an internal benchmark of spreadsheet modelling tasks that a junior investment banking analyst might do, it reported a mean score of 87.3% for GPT-5.4, compared with 68.4% for GPT-5.2.

For presentation evaluation prompts, OpenAI said human raters preferred GPT-5.4 outputs 68.0% of the time over GPT-5.2. It attributed the difference to aesthetics, visual variety, and use of image generation.

Enterprise users also gained access to a ChatGPT for Excel add-in launched alongside the model. OpenAI also updated spreadsheet and presentation skills in Codex and the API.

OpenAI also highlighted factuality metrics. On a set of de-identified prompts where users flagged factual errors, it said GPT-5.4's individual claims were 33% less likely to be false than GPT-5.2's, and full responses were 18% less likely to contain any errors.

Coding focus

OpenAI said GPT-5.4 combines the coding strengths of GPT-5.3-Codex with broader knowledge-work and computer-use functions. It said GPT-5.4 matched or outperformed GPT-5.3-Codex on SWE-Bench Pro while running at lower latency across reasoning efforts.

Codex also gained a "/fast mode" option, which OpenAI said delivers up to 1.5 times faster token velocity with GPT-5.4. In the API, developers can get similar speed via Priority processing.

OpenAI also released an experimental Codex skill called "Playwright (Interactive)", which it said allows Codex to visually debug web and Electron apps and test an app while it is being built.

Tool search

GPT-5.4 in the API introduces tool search for systems that connect models to many external tools. It provides a lightweight list of available tools and retrieves tool definitions only when needed.

OpenAI said older approaches inserted all tool definitions into the prompt upfront, increasing token usage and response time for tool-heavy systems. Tool search, it said, reduces token usage and preserves cache behaviour.

In a test using 250 tasks from Scale's MCP Atlas benchmark with 36 MCP servers enabled, OpenAI reported that tool search reduced total token usage by 47% while achieving the same accuracy.

Safety measures

OpenAI said it is treating GPT-5.4 as "High cyber capability" under its Preparedness Framework and is deploying it with an expanded cyber safety stack. Measures include monitoring systems, trusted access controls, and asynchronous blocking for higher-risk requests for customers on Zero Data Retention surfaces.

Request-level blocking also remains in place for certain customers on Zero Data Retention surfaces, and OpenAI said false positives may occur as classifiers improve.

The company also introduced a new open-source evaluation called CoT controllability, which measures whether models can obfuscate reasoning to evade monitoring. "We find that GPT‐5.4 Thinking's ability to control its CoT is low, which is a positive property for safety, suggesting that the model lacks the ability to hide its reasoning and that CoT monitoring remains an effective safety tool," OpenAI said.

Access and pricing

GPT-5.4 is rolling out across ChatGPT and Codex, with API access under the model name gpt-5.4. GPT-5.4 Pro is available as gpt-5.4-pro.

In ChatGPT, GPT-5.4 Thinking is available to Plus, Team, and Pro users and replaces GPT-5.2 Thinking. GPT-5.2 Thinking will remain available for three months in a Legacy Models section before retirement. Enterprise and Edu plans can enable early access via admin settings, and GPT-5.4 Pro is available to Pro and Enterprise plans.

In Codex, GPT-5.4 includes experimental support for a 1 million-token context window. OpenAI said requests that exceed the standard 272,000-token context window count against usage limits at twice the normal rate.

OpenAI said GPT-5.4 is priced higher per token than GPT-5.2 in the API. Batch and flex pricing are available at half the standard rate, and Priority processing is available at twice the standard rate.