Proteins are the molecules that do the work in nature, and an entire industry is emerging around their successful modification and manufacture for various uses. But it takes time and is random; Cradle aims to change that with an AI-powered tool that tells scientists what new structures and sequences will allow a protein to do what it wants. The company emerged from stealth today with a substantial funding round.
AI and proteins have been in the news lately, but largely thanks to the efforts of research teams like DeepMind and Baker Lab. Their machine learning models take easily collected RNA sequence data and predict the structure a protein will take – a step that previously took weeks and expensive special equipment.
But as amazing as this ability is in some areas, it’s just a starting point for others. Modifying a protein to make it more stable or to bind to some other molecule involves much more than just understanding its general shape and size.
“If you’re a protein engineer and you want to design a certain property or function in a protein, just knowing what it looks like doesn’t help. It’s like having a picture of a bridge, it doesn’t tell you whether it’s going to collapse or not,” explained Stef van Grieken, CEO and co-founder of Cradle.
“Alphafold takes a sequence and predicts what the protein will look like,” he continued. “We’re the generative sibling of that: you choose the properties you want to engineer, and the model will generate sequences that you can test in your lab.”
Predicting what proteins — especially those new to science — will do on the spot is a difficult task for many reasons, but in the context of machine learning, the biggest problem is that there is not enough data available. So Cradle created much of its own dataset in a wet lab, testing protein after protein and seeing which changes in their sequences seemed to lead to which effects.
Interestingly, the model itself is not exactly specific to biotechnology, but a derivative of the same “great language models” which produced text production engines like GPT-3. Van Grieken noted that these models are not strictly limited to language in how they understand and predict data, an interesting “generalization” feature that researchers are still exploring.
The protein sequences that Cradle ingests and predicts aren’t in any language we know of, of course, but they are relatively simple linear text sequences that have associated meanings. “It’s like an alien programming language,” van Grieken said.
Protein engineers are not helpless, of course, but their work necessarily involves a lot of guesswork. One can be fairly certain that among the 100 sequences they modify is the combination that will produce the desired effect, but beyond that it comes down to exhaustive testing. A little hint here could speed things up considerably and avoid a huge amount of wasted work.
The model works in three basic layers, he explained. First, it evaluates whether a given sequence is “natural”, ie. whether it is a meaningful sequence of amino acids or just random amino acids. It’s akin to a linguistic model that can say with 99% confidence that a sentence is in English (or Swedish, in van Grieken’s example), and that the words are in the correct order. He knows this by “reading” millions of these sequences determined by laboratory analyses.
Next, it examines the protein’s actual or potential meaning in extraterrestrial language. “Imagine we give you a streak, and that’s the temperature at which that streak will crash,” he said. “If you do this for many sequences, you can say not only ‘this looks natural’, but ‘this looks like 26 degrees Celsius.’ This helps the model determine which regions of the protein to focus on.
The model can then suggest sequences to integrate – educated guesses, essentially, but a stronger-than-zero starting point. The engineer or lab can then try it out and bring that data back to the Cradle platform, where it can be re-ingested and used to fine-tune the model based on the situation.
Modifying proteins for a variety of purposes is useful in biotechnology, from drug design to biomanufacturing, and the path from vanilla molecule to custom, effective and efficient molecule can be long and expensive. Any way to shorten it will probably be welcomed, at the very least, by lab technicians who have to perform hundreds of experiments just to get a good result.
Cradle operated on the sly and is now emerging having raised $5.5 million in a funding round co-led by Index Ventures and Kindred Capital, with participation from angels John Zimmer, Feike Sijbesma and Emily Leproust.
Van Grieken said the funding would allow the team to scale up data collection — the more the better when it comes to machine learning — and work on the product to make it “more free- service”.
“Our goal is to reduce the cost and time to market of a biobased product by an order of magnitude,” van Grieken said in the press release, “so that anyone – even ‘two kids in their garage” – can bring a bio-based product to market.