• Latest
  • Trending
  • All
  • BUSINESS
  • ENTERTAINMENT
  • POLITICAL
  • TECHNOLOGY
xAI’s Colossus supercomputer cluster uses 100,000 Nvidia Hopper GPUs — and it was all made possible using Nvidia’s Spectrum-X Ethernet networking platform

xAI’s Colossus supercomputer cluster uses 100,000 Nvidia Hopper GPUs — and it was all made possible using Nvidia’s Spectrum-X Ethernet networking platform

November 7, 2024
Today’s NYT Strands Hints, Answers and Help for July 14 #498

Today’s NYT Strands Hints, Answers and Help for July 14 #498

July 14, 2025
Today’s Wordle Hints, Answer and Help for July 14, #1486

Today’s Wordle Hints, Answer and Help for July 14, #1486

July 14, 2025
Best Internet Providers in Alexandria, Virginia

Best Internet Providers in Alexandria, Virginia

July 14, 2025
The end

The end

July 14, 2025
Study shows AI coding assistants actually slow down experienced developers

Study shows AI coding assistants actually slow down experienced developers

July 14, 2025
Self-destructing SSD brings espionage-level security to data protection

Self-destructing SSD brings espionage-level security to data protection

July 14, 2025
Why business landlines are still essential in a wireless world

Why business landlines are still essential in a wireless world

July 14, 2025
Windows 11 preview adds “quick machine recovery” for automatic PC repairs

Windows 11 preview adds “quick machine recovery” for automatic PC repairs

July 14, 2025
Next-gen iPad Pro to debut Apple’s M5 chip ahead of Macs

Next-gen iPad Pro to debut Apple’s M5 chip ahead of Macs

July 14, 2025
Windows 10 KB5062554 update breaks emoji panel search feature

Windows 10 KB5062554 update breaks emoji panel search feature

July 14, 2025
Car overturns in Joo Koon collision; 2 taken to hospital, Singapore News

Car overturns in Joo Koon collision; 2 taken to hospital, Singapore News

July 14, 2025
Life lessons I learnt from being my 92-year-old atuk’s caregiver, Lifestyle News

Life lessons I learnt from being my 92-year-old atuk’s caregiver, Lifestyle News

July 14, 2025
  • About
  • Advertise
  • Privacy & Policy
  • Contact
Tuesday, July 15, 2025
No Result
View All Result
  • HOME
  • BUSINESS
  • ENTERTAINMENT
  • POLITICAL
  • TECHNOLOGY
  • ABOUT US
  • Login
  • Register
  • HOME
  • BUSINESS
  • ENTERTAINMENT
  • POLITICAL
  • TECHNOLOGY
  • ABOUT US
No Result
View All Result
Huewire
No Result
View All Result
Home TECHNOLOGY

xAI’s Colossus supercomputer cluster uses 100,000 Nvidia Hopper GPUs — and it was all made possible using Nvidia’s Spectrum-X Ethernet networking platform

by huewire
November 7, 2024
in TECHNOLOGY
0
xAI’s Colossus supercomputer cluster uses 100,000 Nvidia Hopper GPUs — and it was all made possible using Nvidia’s Spectrum-X Ethernet networking platform
491
SHARES
1.4k
VIEWS
Share on FacebookShare on Twitter

  • Nvidia and xAI collaborate on Colossus development
  • xAI has markedly cut down ‘flow collisions’ during AI model training
  • Spectrum-X has been crucial in training the Grok AI model family

Nvidia has shed light on how xAI’s ‘Colossus’ supercomputer cluster can keep a handle on 100,000 Hopper GPUs – and it’s all down to using the chipmaker’s Spectrum-X Ethernet networking platform.

Spectrum-X, the company revealed, is designed to provide massive performance capabilities to multi-tenant, hyperscale AI factories using its Remote Directory Memory Access (RDMA) network.

The platform has been deployed at Colossus, the world’s largest AI supercomputer, since its inception. The Elon Musk-owned firm has been using the cluster to train its Grok series of large language models (LLMs), which power the chatbots offered to X users.

The facility was built in collaboration with Nvidia in just 122 days, and xAI is currently in the process of expanding it, with plans to deploy a total of 200,000 Nvidia Hopper GPUs.

Training Grok takes serious firepower

The Grok AI models are extremely large, with Grok-1 measuring in as 314 billion parameters and Grok-2 outperforming Claude 3.5 Sonnet and GPT-4 Turbo at the time of launch in August.

Naturally, training these models requires significant network performance. Using Nvidia’s Spectrum-X platform, xAI recorded zero application legacy degradation or packet loss as a result of ‘flow collisions’, or bottlenecks within AI networking paths.

xAI revealed it has been able to maintain 95% data throughput enabled by Spectrum-X’s congestion control capabilities. The company added this level of performance cannot be delivered at this scale via standard Ethernet.

Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!

Using traditional Ethernet, this typically creates thousands of flow collisions while delivering only 60% data throughput, according to Nvidia.

A spokesperson for xAI said the combination of Hopper GPUs and Spectrum-X has allowed the company to “push the boundaries of training AI models” and created a “super-accelerated and optimized AI factory”

“AI is becoming mission-critical and requires increased performance, security, scalability and cost-efficiency,” said Gilad Shainer, senior vice president of networking at Nvidia.

“The NvidiaSpectrum-X Ethernet networking platform is designed to provide innovators such as xAI with faster processing, analysis and execution of AI workloads, and in turn accelerates the development, deployment and time to market of AI solutions.”

Part of the Spectrum-X platform includes the Spectrum SN5600 Ethernet switch – this supports port speeds of up to 800Gb/s and is based on the Spectrum-4 switch ASIC, according to Nvidia.

xAI opted to combine the Spectrum-X SN5600 switch with NVIDIA BlueField-3 SuperNICs for higher performance.

You might also like

  • Google’s super powerful Arm-based CPU is now available
  • Meta is letting the US military use its Llama AI model for ‘national security applications’
  • Take a look at our choices for the best AI tools around today

Read More

Share196Tweet123
huewire

huewire

Recent Comments

No comments to show.

Recent Posts

  • Today’s NYT Strands Hints, Answers and Help for July 14 #498
  • Today’s Wordle Hints, Answer and Help for July 14, #1486
  • Best Internet Providers in Alexandria, Virginia
  • The end
  • Study shows AI coding assistants actually slow down experienced developers
Huewire

Copyrights © 2024 Huewire.com.

Navigate Site

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Follow Us

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • HOME
  • BUSINESS
  • ENTERTAINMENT
  • POLITICAL
  • TECHNOLOGY
  • ABOUT US

Copyrights © 2024 Huewire.com.