- OpenAI’s ChatGPT, which creates realistic responses to text prompts, is taking the internet by storm.
- Under the buzz, next-generation development framework Ray played a key role in shaping the viral model.
- Ray hails from the billion dollar startup Anyscale and is also the likely executive behind GPT-4.
Another new artificial intelligence tool has created a storm on the Internet: a chatbot, called ChatGPT, which provides immensely detailed and almost realistic answers to almost any question you can imagine. But while ChatGPT and other viral tools like Prisma Labs’ Lensa are capturing all the buzz, there’s a little-known distributed framework powering this new generative AI revolution that’s flying under the radar.
Ray, the framework from A16z-backed startup Anyscale, has been instrumental in enabling OpenAI to strengthen its ability to train ChatGPT and similar models. Ray operates under the hood for all of OpenAI’s major recent language models – and it’s also the likely framework behind OpenAI’s highly anticipated next act, commonly referred to as GPT-4. Industry insiders believe it could create a new wave of billion-dollar businesses by generating near-human content.
Ray already gets top marks in the field. Before deploying it, OpenAI used a hodgepodge of custom tools to develop early models and products. But as the weaknesses became more apparent, the company went with Ray, OpenAI President Greg Brockman, told the Ray Summit earlier this year.
Lukas Biewald, CEO of Weights & Biases, which helps companies track machine learning experiments and is considered a rising star in the world of AI, said his company’s most forward-thinking customers love it. the product, including OpenAI. It makes him think Ray has promise, he said.
“The idea that you can run the same code on your laptop and on a large set of distributed servers is a huge deal, and its importance grows as the models get bigger,” Biewald told Insider. “I think the devil is in the details, and they seem to have done a good job with them.”
A billion dollar bet on Ray
Anyscale has proven to be such a valuable product that Ben Horowitz, of namesake Andreessen Horowitz and A16z, is a member of the board. Its most recent round, a Series C expansion that valued it at more than $1 billion, closed within days, people familiar with the deal said.
Some investors have described Anyscale as Horowitz’s hopeful “next Databricks” — an apt description given that one of the startup’s founders, Ion Stoica, was a co-founder of the $31 billion data giant.
“AI is evolving incredibly quickly and new approaches and people are trying new approaches all the time,” Anyscale CEO Robert Nishihara told Insider. “ChatGPT has also combined a lot of the previous work on big language models with hardening. Underlying this, you need to have an infrastructure that enables this flexibility and quickly innovates and adapts different algorithms and approaches. the flexibility provided by Ray comes from the ability to use both tasks and actors in Python.”
Because these hot new tools like ChatGPT require increasingly massive models, companies have had to rethink how they develop them from the ground up. Ray fills this gap, making it easier to train these colossal models and to include the hundreds of billions of data points that make every answer feel near-realistic.
How Ray Became a Must-Have Tool for Machine Learning
Ray provides an underlying framework that handles the complex task of distributing the work of training a machine learning model. Machine learning experts can often run small models that use limited sets of data — for example, a model to predict whether a customer will stop buying a product — on their own laptop. For something like ChatGPT, however, a laptop won’t suffice. Instead, these models require an army of servers to train their tools.
But one of the biggest challenges is orchestrating this training across all these different hardware components. Ray provides a mechanism for managing disparate hardware components as a single unit for a programmer – determining what data goes where, how to handle failures, etc. Ray extends a key programming concept, “actors”, in other languages to Python, the language of choice for machine learning.
Sometimes it’s not even the same hardware – and may contain a mix of products like Google Cloud, AWS and others working on the same problem.
Prior to deploying Ray, OpenAI used a hodgepodge of custom tools built on the “neural programmer-interpreter” model. As the company grew, it found itself creating new custom tweaks to its developer tools and infrastructure, said Brockman, president of OpenAI.
“It was the minimum investment we could make and not be unhappy with,” Brockman said at the conference of deploying NPI models. “If something isn’t your core skill, you think, ‘Why am I mixing up the bits and dealing with a TCP stream with pickles in it?’ It’s not our burning passion.”
Tapping Ray removes this huge layer of complexity, freeing up more time and energy for a company like OpenAI to focus on its core competency.
A new generation of AI requires new development tools like Ray and JAX
Ray is just one in a series of fast-emerging next-generation machine learning tools that are rapidly changing the way development happens. Google’s JAX framework, for example, is also garnering huge popularity. Many expect JAX to become the backbone of Google’s core machine learning toolsas it has already been widely adopted in its DeepMind and Google Brain divisions.
It’s also not the only tool focused on a problem like this. Another startup backed by FirstMark Capital and Bessemer Venture Partners, Coiled, is developing a framework called Dask to handle the distribution of this problem.
All of these tools, Ray and JAX included, serve a new generation of Internet-based combustion engines called large language models. These tools, trained on billions of data points, attempt to predict sentence and response structure and spit out realistic text responses to incoming queries. Several companies, both startups and giants, are building their own big language models, including Meta, Hugging Face, OpenAI, and Google.
“It’s extremely important to understand how difficult it is to divide the work (large models) and spread it over many small chips,” Andrew Feldman, CEO of AI chip startup Cerebras Systems, told Insider. “It’s an extremely difficult problem on all levels.”