New Cerebras Wafer-Scale ‘Andromeda’ supercomputer has 13.5 million cores

Cerebras unveiled its new AI supercomputer Andromeda at SC22. With 13.5 million cores across 16 Cerebras CS-2 systems, Andromeda has an exaflop of AI compute and 120 petaflops of dense compute. Its computing workhorse is Cerebras wafer-scale multi-core processor, WSE-2.

Each WSE-2 board has three physical planes, which handle arithmetic, memory, and communications. The memory plan’s 40GB of onboard SRAM alone can hold an entire memory BERTBIG. But the arithmetic plane also has some 850,000 independent cores and 3.4 million FPUs. These cores have a collective internal bandwidth of approximately 20 PB/s, on the Cartesian mesh of the communication plane.

Each of Andromeda’s wafer-scale processors are the size of a salad plate, 8.5″ square. Image: Brains

Cerebras emphasizes what it calls “near-perfect linear scaling”, which means that for a given job, two CS-2s will do that job twice as fast as one, three will take one third of the time, etc. How? Andromeda’s SC-2 systems rely on parallelization, Cerebras said, of the cores in each slice to the SwarmX fabric that coordinates them all. But the supercomputer’s talents extend beyond its already impressive 16 nodes. Using the same data parallelization, researchers can combine up to 192 CS-2 systems for a single task.

Andromeda evolves with Epyc Wins

Andromeda gets its data from a bank of 64-core AMD EPYC 3 processors. These processors, AMD said via email, work in tandem with the CS-2 wafers, performing “a wide range of data pre- and post-processing.”

AMD-Epyc-Feature-3

“AMD EPYC is the best choice for this type of cluster,” Andrew Feldman, Founder and CEO of Cerebras, told us, “because it offers unmatched core density, memory capacity, and I/O. This has made it the obvious choice for supplying data to the Andromeda supercomputer.

Between its sixteen second-generation wafer-scale engines, Andromeda runs on 18,164 Epyc 3 cores. However, that throughput comes at a price. In total, the system consumes about 500 kilowatts when operating at its maximum.

Go big or go home

Andromeda is not the fastest supercomputer in the world. Border, a supercomputer at Oak Ridge National Laboratory capable of simulating nuclear weapons, passed the exaflops mark earlier this year. Frontier also operates at higher precision, 64-bit to Andromeda’s 16-bit half-precision. But not all operations require nuclear-grade precision. Andromeda is not trying to be Frontier.

“They are a bigger machine. We don’t beat them. Their construction cost 600 million dollars. It’s less than $35 million,” Feldman said.

Nor is Andromeda trying to usurp Polaris, a cluster of over two thousand Nvidia A100 GPUs at Argonne National Lab. Indeed, like Andromeda, Polaris itself uses AMD EPYC cores do pre- and post-processing. Instead, each supercomputer excels at a slightly different type of job.

Broadly speaking, CPUs are generalists while ASICs (including GPUs) and FPGAs are more specialized. This is why crypto miners love GPUs. The blockchain involves a lot of repetitive calculations. But Andromeda is even more specialized. It excels at handling large sparse matrices – multi-dimensional arrays of tensor data that are mostly zeros.

AI is extremely data intensive, both in the pipeline and in the actual AI computation. So, Feldman said, Andromeda uses Epyc processors to streamline the process. “The AMD Epyc-based machines are installed on servers outside of the Cerebras CS-2s,” Feldman said, to coordinate and prepare the data. Then Andromeda’s SwarmX and MemoryX fabrics take over.

Andromeda, at home in its Santa Clara data center. Image: Brains

A GPU cluster must coordinate between every server core, card, and rack. This leads to an unavoidable delay. There is also exponential memory overhead as networks become larger and more complex. In contrast, WSE-2 manages much of its information pipeline within the same hardware. At the same time, Cerebras wafer-scale multi-core processors can do more on a single (gigantic) piece of silicon than a mainstream CPU or GPU. This allows Andromeda to handle deeply parallel tasks.

Great language models

The same way a Formula 1 race car is wasted on surface streets, Andromeda finds its rhythm on a grand scale. Nowhere is this more evident than his runaway success with Large Language Models (LLMs).

Imagine an Excel spreadsheet with one row and one column for each word in the entire English language. Natural language processing models use matrices, special spreadsheet-like grids, to track relationships between words. These models can have billions or even tens of billions of parameters. Their sequences can be 50,000 tokens long. You would think that as the training set grew, this exponential overload would strike again. But LLMs often work using the sparse tensors that Andromeda loves.

Andromeda’s sixteen CS-2 nodes. Image: Brains

Andromeda customers, including AstraZeneca and GlaxoSmithKline, report success using LLMs on Andromeda to research “omics,” including the COVID genome and epigenome. During an experiment at the National Energy Technology Lab, scientists described an “impossible GPU” job with Andromeda that Polaris simply couldn’t complete. And it may not calculate the numbers for nukes, but Andromeda is also working on fusion research.

“Combining the AI ​​power of the CS-2 with the precision simulation of Lassen creates a CogSim computer that opens new doors for inertial confinement fusion (ICF) experiments at the National Ignition Facility,” said Brian Spears of Lawrence Livermore National Lab.

Andromeda meets academia

Andromeda currently lives in Colovore, an HPC data center in Santa Clara. But Cerebras has also allocated time for academics and graduate students to use Andromeda for free.

And there’s one other thing that graduate students, machine learning and elsewhere, might want to note: Andromeda works well with Python. In machine learning, these are table stakes, but we mean really good. You can send AI work to Andromeda, says Cerebras, “quickly and painlessly from a Jupyter laptop, and users can switch between models with just a few keystrokes.”

“It is extraordinary that Cerebras has offered graduate students free access to such a large cluster,” said Mateo Espinosa, a PhD student at the University of Cambridge in the UK. Espinosa, who previously worked at Cerebras, is working with Andromeda for his thesis on explainable artificial intelligence. “Andromeda provides 13.5 million AI cores and near-perfect linear scaling on the largest language models, without the pain of distributed computing and parallel programming. It’s every ML grad student’s dream.

Machine learning must swim upstream in an ever-growing river of data. Up to a point, we can just add more hardware to the job. But within and between networks, latency begins to build up quickly. To get the same amount of work done in a given time, you need to put more energy into the problem. The sheer volume of data makes throughput its own bottleneck. This “triple dot” is where Cerebras seeks to make its mark.

All Andromeda images courtesy of Cerebras.

Now read:

(function(d, s, id) {
var js, fjs = d.getElementsByTagName(s)[0];
if (d.getElementById(id)) return;
js = d.createElement(s); js.id = id;
js.src = “//connect.facebook.net/en_US/all.js#xfbml=1”;
fjs.parentNode.insertBefore(js, fjs);
}(document, ‘script’, ‘facebook-jssdk’));

Leave a Reply

%d bloggers like this: