Google Cloud Splits AI TPUs to Challenge Nvidia
Google Cloud is competing more directly with Nvidia in AI infrastructure.
At Cloud Next 2026, Google introduced TPU 8t for training and TPU 8i for inference, splitting its latest AI chips between building large models and serving them in production.
Why Google is splitting training and inference
According to Google Cloud’s eighth-generation TPU overview, TPU 8t is built for training, and TPU 8i is built for inference, with both running on Google’s Axion Arm-based processors. Google says the split reflects a growing divide between the hardware needed to train large models and the hardware needed to run AI services efficiently once those models are in use.
Training large models depends on scale, dense compute, and fast communication across large clusters. Inference places greater emphasis on latency, memory use, power efficiency, and the cost of serving requests over time.
In its technical deep dive on TPU 8t and TPU 8i, Google says TPU 8i delivers up to 80% better performance per dollar and twice the performance per watt for inference than the prior generation. Those figures come from Google rather than independent testing, so they are best read as product claims until customers can measure them in live deployments.
Google is also packaging the chips inside a broader system. Its AI Hypercomputer combines compute, networking, storage, and software into one managed stack, giving Google Cloud a stronger pitch to enterprises that are spending more on running AI tools over time rather than only training them.
That shift toward custom hardware is showing up elsewhere across the market, including Anthropic’s latest custom chip expansion with Google and Broadcom and Intel’s recent push into Nvidia’s GPU territory.
What Google’s TPU split means for Nvidia
Google is still working with Nvidia at the same time. In its GTC 2026 infrastructure update, Google said it plans to offer Nvidia Vera Rubin NVL72 systems in the second half of 2026.
Google’s cloud strategy now covers both options: its own TPUs for customers willing to build around Google’s stack, and Nvidia systems for customers that want established GPU tools and CUDA-based workflows.
For customers, the choice is likely to come down to the workload and the cost of running it. Companies training frontier models may care most about cluster scale and utilization. Enterprises serving AI features to large numbers of users may care more about response time, power use, and per-query cost.
Nvidia is making its own case around those same pressures in its latest AI infrastructure rollout, while Google is arguing that separate chips can better match how AI systems are built and deployed.
Whether that argument translates into broader adoption will depend on what customers see after launch. Pricing, performance, and software friction will matter more than architecture slides once these systems are in use.
For now, Google is presenting TPU 8t and TPU 8i as a more specialized alternative to a one-chip approach, and as a more direct challenge to Nvidia in cloud AI infrastructure.
Also read: Nvidia’s CoreWeave investment shows how fast AI infrastructure spending is rising.
The post Google Cloud Splits AI TPUs to Challenge Nvidia appeared first on eWEEK.