OpenAI Debuts GPT-5.3-Codex-Spark, a Near-Instant AI for Real-Time Coding
OpenAI has released a research preview of GPT-5.3-Codex-Spark, a lightweight version of its latest Codex system built specifically for real-time coding.
The model is designed to deliver rapid responses for tasks such as editing functions, refining logic, or adjusting interfaces inside development tools. According to OpenAI, the model is optimized to feel “near-instant” and can produce more than 1,000 tokens per second when running on ultra-low-latency hardware.
At launch, Codex-Spark is text-only with a 128K context window and is available to ChatGPT Pro users through the Codex app, command-line interface, and VS Code extension. Usage during the preview has separate rate limits and may be queued during periods of high demand.
Powered by Cerebras hardware
The new model marks the first major product from OpenAI’s infrastructure partnership with AI chip company Cerebras. Codex-Spark runs on the Wafer Scale Engine 3 (WSE-3), a specialized processor built for high-speed inference. The goal is to create what the company calls a latency-first serving tier that complements traditional GPU infrastructure.
“What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible—new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning,” said Sean Lie, Cerebras’ CTO and co-founder.
OpenAI stressed that the model itself is only part of the story. The company says it has made broader changes across its inference stack to reduce delays.
According to OpenAI, it introduced a persistent WebSocket connection and optimized its Responses API. The company said these changes reduced per-client/server roundtrip overhead by 80%, per-token overhead by 30%, and time-to-first-token by 50%.
The WebSocket path is enabled by default for Codex-Spark and is expected to become standard across other models.
Performance trade-offs
As with most things in tech, there is a trade-off. Because Spark is a “smaller version” of the flagship model, it isn’t quite as sharp. On benchmarks like SWE-Bench Pro, it performs well but ultimately underperforms compared to the full GPT-5.3-Codex.
More importantly, it lacks the high-level security clearance of its bigger sibling. OpenAI stated that the model “does not have a plausible chance of reaching our Preparedness Framework threshold for high capability in cybersecurity or biology.”
The launch comes amid OpenAI’s broader compute diversification strategy. eWeek previously reported that OpenAI had agreed to purchase compute capacity from Cerebras in a deal valued at more than $10 billion, though OpenAI’s official partnership announcement did not disclose financial details.
Cerebras recently announced it raised $1 billion in fresh funding at a $23 billion valuation, underscoring its growing role in AI infrastructure.
For more on the company’s financial headwinds, read our breakdown of OpenAI’s growing profitability challenge and what it means for the AI race.
The post OpenAI Debuts GPT-5.3-Codex-Spark, a Near-Instant AI for Real-Time Coding appeared first on eWEEK.