Google uses SiFive RISC-V cores in AI compute nodes • The Register

SiFive, the RISC-V chip company, says its processors are used to handle AI workloads in Google’s data centers to some extent.

According to SiFive, the processor in question is its Intelligence X280, a multi-core RISC-V design with vector extensions, optimized for AI/ML applications in the data center. When combined with Matrix Multiplication Units (MXUs) from Google’s Tensor Processing Units (TPUs), this is believed to provide greater flexibility for programming machine learning workloads.

Essentially, the X280’s general-purpose RV64 cores in the CPU runtime code that drives the device and powers machine learning calculations in Google’s MXUs as needed to perform tasks. The X280 also includes its own vector calculation unit which can handle operations that accelerator units cannot.

SiFive and Google were a little coy, perhaps for commercial reasons, about exactly how this is packaged and used, although it seems to us that Google has placed its custom acceleration units in a multi-core X280 system-on-chip , connecting the Google-designed MXU directly blocks the RISC-V core complex. These chips are used in Google data centers, in “AI compute hosts” according to SiFive, to accelerate machine learning work.

We imagine that if used in production, these chips manage tasks within services. We note that you cannot rent this hardware directly from Google Cloud, which offers AI-optimized virtual machines powered by traditional x86, Arm, TPU, and GPU technologies.

Details were leaked at the AI ​​Hardware Summit in Silicon Valley earlier this month, during a talk by SiFive co-founder and chief architect Krste Asanović and Google TPU architect Cliff Young, and in a SiFive blog post this week.

The hand of a gloved engineer holding a modern Intel Core processor

Intel’s “substantial contributions” promised to drive RISC-V adoption

READ MORE

According to SiFive, he noticed that following the introduction of the X280, some customers started using it as a companion core alongside an accelerator, to handle all housekeeping and processing tasks at general purpose for which the accelerator was not designed.

Many found that a full software stack was needed to run the accelerator, according to the chip industry, and customers realized they could solve this problem with an X280 core complex alongside their big accelerator, the cores a RISC-V processor handling all the maintenance and operations code, performing mathematical operations that the large accelerator cannot perform, and providing various other functions. Essentially, the X280 can act as a sort of management node for the accelerator.

To take advantage of this, SiFive has worked with customers such as Google to develop what it calls the Vector Coprocessor Interface eXtension (VCIX), which allows customers to tightly tie an accelerator directly to the X280’s vector register file, offering increased performance and more data. bandwidth.

According to Asanović, the benefit is that customers can bring their own coprocessor into the RISC-V ecosystem and run a full software stack and programming environment, with the ability to boot Linux with full virtual memory and support. cache-coherent, on a chip containing a mix of general-purpose processor cores and acceleration units.

From Google’s point of view, they wanted to focus on improving their family of TPU technologies, and not waste time building their own application processor from scratch, and therefore pairing those acceleration features with an off-the-shelf general-purpose processor seemed like the right way to go, according to Young.

VCIX essentially glues MXUs to RISC-V cores with low latency, eliminating the need to spend many cycles waiting to shuttle between the CPU and the acceleration unit via memory, cache, or PCIe. Instead, we’re told, it’s just dozens of vector register access cycles. It also suggests that everything – the RISC-V CPU complex and the custom accelerators – are all on the same die, packaged as a system-on-chip.

Application code runs on general-purpose RISC-V cores, and any work that can be accelerated by the MXU is passed through the VCIX. According to Young, besides efficiency, this approach has other advantages. The programming model is simplified, resulting in a single program with interleaved scalar, vector, and coprocessor instructions, and allowing a single software toolchain where developers can code in C/C++ or assembly to their preference.

“With SiFive VCIX-based general purpose cores ‘hybrid’ with Google’s MXUs, you can build a machine that lets you ‘have your cake and eat it too’, taking full advantage of all the performance of the MXU and programmability of a general CPU as well as the performance vector of the X280 processor,” said Young.

The ability to fabricate a custom chip like this will likely remain the domain of hyperscalers like Google, or those with niche needs and deep pockets, but it demonstrates what can be achieved through the flexibility of the RISC- V of the open ecosystem. .

This flexibility and openness seems to be enough to entice Google – a longtime proponent of RISC-V, with RV cores used in some of its other products – to use the upstart architecture instead of putting its custom coprocessors into chips. x86 or Arm – licensed designs. ®

PS: Remember when Google was playing around with POWER CPU architecture in its data centers?

Leave a Reply

%d bloggers like this: