4.1: CASP and the Grand Challenge
CASP has been held biennially since 1994, and for structural biologists, it is the definitive benchmark. The scoring metric, GDT (Global Distance Test), ranges from zero to one hundred, where a score above ninety is generally considered on par with experimental accuracy. Even experimental structures of the same protein can differ by this much, due to inherent noise in X-ray crystallography. For over two decades, the best computational methods plateaued with top scores in the forties and fifties. Real progress, agonizing pace.
In 2018, at CASP13, a team from DeepMind, the London-based AI lab owned by Alphabet, entered for the first time with a system called AlphaFold. It won decisively but scored in the sixties and seventies for the most challenging targets. Impressive, not transformative. The community took notice but no one declared the problem solved.
Two years later, at CASP14 in 2020, DeepMind returned with AlphaFold 2. The system achieved a median GDT score of 92.4 across all targets, including the hardest ones. For many proteins, its predictions were indistinguishable from experimental structures. John Moult, CASP's founder, declared the protein structure prediction problem practically solved. A problem that had consumed fifty years and billions in research funding had been cracked by a neural network.
4.2: How AlphaFold Works (for Non-Experts)
AlphaFold's core idea is that evolution has already run billions of experiments on protein folding. We just need to read the results. When predicting a structure, AlphaFold starts by searching databases for related proteins across the tree of life. If a particular amino acid at one position always changes in tandem with one at another position (say, every time position 42 shifts from small to large, position 117 shifts from large to small), those two positions are probably close together in three-dimensional space. They are co-evolving to maintain a physical interaction. AlphaFold assembles these co-evolutionary signals from a multiple sequence alignment (MSA) of thousands of related proteins, building a kind of origami guide hinting at which parts of the chain are close together in space.
This evolutionary information feeds into a neural network architecture called the Evoformer, consisting of forty-eight layers of attention mechanisms, the same mathematics that powers large language models like ChatGPT. The Evoformer processes co-evolutionary signals alongside learned representations of amino acid chemistry and produces spatial constraints. A separate structure module takes those constraints and iteratively refines a three-dimensional model, adjusting atomic positions until the predicted structure satisfies the constraints. The system was trained on approximately one hundred and seventy thousand experimentally determined structures from the Protein Data Bank.
AlphaFold also provides confidence scores, pLDDT (predicted Local Distance Difference Test), for each part of its prediction, on a scale from zero to one hundred. Regions above ninety are predicted with high confidence and typically reliable for detailed structural analysis. Regions below fifty often correspond to genuinely disordered parts of the protein that do not adopt a fixed structure. This transparency tells researchers not just what AlphaFold thinks the structure is, but how much to trust each part of the prediction.
4.3: AlphaFold 3 and Beyond
In May 2024, DeepMind unveiled AlphaFold 3, extending beyond single protein chains to predict how proteins interact with DNA, RNA, small molecules, and other proteins. The architecture shifted from the Evoformer to a diffusion-based model, the same class of generative AI used in image generators like DALL-E, but operating in three-dimensional molecular space. AlphaFold 3 can predict the structure of an entire molecular complex: a protein bound to a drug molecule, a transcription factor wrapped around DNA. For drug design, where understanding how a molecule fits into a protein's binding pocket is the central challenge, this has immediate practical value.
In October 2024, the Nobel Prize in Chemistry was awarded jointly to Demis Hassabis and John Jumper of DeepMind for AlphaFold, shared with David Baker of the University of Washington for computational protein design. The AlphaFold Protein Structure Database now contains predicted structures for more than two hundred million proteins, every protein from every organism whose genome has been sequenced, compared to roughly two hundred thousand structures determined by experiment over fifty-plus years.
4.4: What AlphaFold Meant for Rosie
For Conyngham, AlphaFold was the tool that made the leap from "list of mutations" to "vaccine design" possible. Once he had Rosie's mutation list, he needed to know what the corresponding mutated proteins looked like in three dimensions. Which parts were exposed on the surface, where the immune system could see them? How did the mutated region sit relative to the rest of the protein? Would the neoantigen peptide fold so that MHC molecules could grab and display it? Five years earlier, answering these questions would have required months of experimental lab work or simply gone unanswered. AlphaFold answered them computationally.
The AlphaFold Server is free. Anyone can submit a protein sequence and receive a predicted structure within minutes. ColabFold, a community-built implementation, runs on free Google Colab GPUs, so you do not even need your own hardware. Nobel Prize-winning technology, available to anyone with a web browser. Conyngham used these tools to model the mutated proteins, visualize where neoantigens would sit on the protein surface, and make informed decisions about which targets to include in the vaccine.
Key Takeaways
- AlphaFold solved the 50-year protein folding problem at CASP14 in 2020 with GDT scores above 90 — equivalent to experimental accuracy.
- pLDDT scores tell you how confident AlphaFold is in each part of the predicted structure — critical for vaccine target assessment.
- AlphaFold 3 extends to protein–DNA, protein–small molecule, and molecular complex prediction — opening drug design applications.
- The AlphaFold Protein Structure Database contains predicted structures for ~200 million proteins — freely available to any researcher.