“In a sense, the problem is solved,” computational biologist John Moult said in late 2020. London-based company DeepMind had just won a biennial competition co-founded by Moult that tests teams’ abilities to predict protein structures – the one of the greatest in biology. challenges — with its revolutionary artificial intelligence (AI) tool AlphaFold.
Two years later, Moult’s competition, the Critical Assessment of Structure Prediction (CASP), still walks in AlphaFold’s shadow. Results from this year’s edition (CASP15) – which were unveiled over the weekend at a conference in Antalya, Turkey – show that the most successful approaches to predicting protein structures from their amino acid sequences incorporated AlphaFold, which relies on an AI approach called deep learning. “Everyone uses AlphaFold,” says Yang Zhang, a computational biologist at the University of Michigan in Ann Arbor.
Yet AlphaFold’s progress has opened the floodgates to new challenges in protein structure prediction — some included in this year’s CASP — that may require new approaches and more time to fully tackle. “The fruit at hand has been picked,” says Mohammed AlQuraishi, a computational biologist at Columbia University in New York. “Some of the next problems are going to be more difficult.”
CASP began in 1994, with the goal of bringing rigor to the field of protein structure prediction – advancements that would accelerate efforts to understand the building blocks of cells and advance drug discovery. During the year of a competition, teams are tasked with using computational tools to predict protein structures that have been determined using experimental methods such as X-ray crystallography and cryo-microscopy. electronically, but not yet published.
Entries are evaluated based on how well predictions for whole proteins, or independent folding subunits called domains, match experimental structures. A part of AlphaFold’s predictions at CASP14 were more or less indistinguishable from experimental models – the first time such precision had been achieved.
Since its unveiling at CASP14, AlphaFold has become ubiquitous in life science research. DeepMind released the software’s underlying code in 2021 so anyone can run the program, and an AlphaFold database updated this year contains predicted structures – of varying quality – for nearly every protein of all organisms represented in genomic databases, a total of more than 200 million proteins.
The success and newfound ubiquity of AlphaFold presented a challenge to Moult, who is at the University of Maryland, Rockville, and his colleagues as they planned for this year’s CASP. “People say, ‘Oh, we don’t need CASP anymore, the problem has been solved.’ And I think that’s exactly the wrong way.
At CASP15, the most successful teams were those that had adapted and expanded on AlphaFold in various ways, leading to modest gains in predicting the shape of individual proteins and domains. “The accuracy is already so high that it’s hard to do much better,” says Moult.
To make the competition more relevant in a post-AlphaFold world, Moult and his team added new challenges and tweaked some existing ones. New tests include determining how proteins interact with other molecules such as drugs and predicting the multiple forms certain proteins can take. For the past decade, CASP has included “complexes” of multiple interacting proteins, Moult says, but accurately predicting the structure of these molecules has taken on increased importance this year.
“It’s the right thing to do,” says Zhang, because predicting the structures of unique proteins or domains — the bread and butter of past CASPs — has largely been solved by AlphaFold. Determining the shape of protein complexes, in particular, represents an important new challenge for the field, as there is much room for improvement, says Arne Elofsson, a protein bioinformatician at Stockholm University.
AlphaFold was originally designed to predict the shape of individual proteins. But, a few days after its public release, other scientists showed that the software could be “hacked” to model the interaction of several proteins. In the months that followed, researchers developed a myriad of approaches to improve AlphaFold’s ability to tackle complexes. DeepMind even released an update called AlphaFold-Multimer with this goal in mind.
These efforts appear to have paid off, as CASP15 saw a marked increase in the number of accurate complexes, compared to previous contests, primarily due to methods that adapted AlphaFold. “It’s a new game for us to be close to experimental precision with complexes.” said Moult. “We also have failures.”
For example, the teams made surprisingly accurate predictions of a viral molecule of unknown function made up of two identical intertwined proteins. This type of shape fooled pre-AlphaFold tools, says Ezgi Karaca, a computational structuralist biologist at the Izmir Center for Biomedicine and Genomics in Turkey, who evaluated the complex predictions. The standard version of AlphaFold failed to accurately model the shape of a giant 20-chain bacterial enzyme, but some teams predicted the structure of the protein by applying additional hacks to the network, Karaca adds.
Meanwhile, teams have struggled to predict complexes involving immune molecules called antibodies — several of which are attached to a SARS-CoV-2 protein — and related molecules called nanobodies. But there were glimpses of success in some teams’ predictions, Karaca says, suggesting the AlphaFold hacks will be useful in predicting the shape of these medically important molecules.
This year’s CASP also stood out for the absence of DeepMind. The company did not state the reason for not participating, but released a short statement during CASP15 congratulating the teams that did. (At the same time, it rolled out an update to AlphaFold to help researchers gauge their progress against the network.)
Other researchers claim that the competition represents a considerable commitment of time, which the company might have felt was better spent on other challenges. “It would have been good for us if they had participated,” Moult said. But he adds that “because the methods are so good, they couldn’t take another big leap”.
Big improvements to AlphaFold will take time and will likely require new innovations in machine learning and protein structure prediction, say the researchers. One area under development is the application of “language models”, such as those used in predictive text tools, to the prediction of protein structures. But these methods – including one developed by social media giant Meta – did not perform as well at CASP15 as AlphaFold-based tools.
Such tools could, however, be useful in predicting how mutations alter the structure of a protein – one of the many key challenges in predicting protein structure that has emerged following the success of AlphaFold. Thanks to this, the field is no longer focused on a single goal, says AlQuraishi. “There’s a whole host of these issues.”