Understanding Genetics in the Age of AI Chapter 3
Chapter 3

Proteins, the Machines That Run Your Body

3.1: What Proteins Actually Do

If DNA is the instruction manual and RNA is the messenger, proteins are the workers that build and run the factory. Almost everything your body does (digesting food, fighting infections, sending nerve signals, carrying oxygen, contracting muscles) is done by proteins. Enzymes catalyze chemical reactions. Hemoglobin carries oxygen. Antibodies neutralize invaders. Receptors on cell surfaces detect signals from other cells. Your body produces more than twenty thousand different types, and their diversity of function is vast.

All proteins are built from the same set of twenty amino acids, strung together in a specific order determined by the encoding gene. Each amino acid has a different chemical personality. Some are hydrophilic: they love water and sit on the protein's exterior. Others are hydrophobic: they avoid water and bury themselves in the interior. Some carry positive charges, others negative. A typical protein is a chain of three hundred to five hundred amino acids, and the specific sequence is determined entirely by the gene. Change the gene, change the sequence. Change the sequence, change the protein.

But a protein's function depends not just on its amino acid sequence but on the three-dimensional shape the chain folds into. A chain of amino acids does not stay a floppy string. It twists and collapses into a precise, compact structure (helices, flat sheets, loops) driven by the physics and chemistry of how amino acids interact with each other and with water. The shape determines what the protein binds to, what reactions it catalyzes, how other molecules interact with it. This took scientists decades to fully appreciate.

Hydrophilic (water-loving) Hydrophobic (water-fearing)

A newly synthesised chain of 12 amino acids — the raw output of the ribosome. Teal = hydrophilic (water-loving); gray = hydrophobic (water-fearing). At this stage the chain is floppy and non-functional.

↕ hydrophobic residues avoiding water, beginning to cluster

Hydrophobic residues avoid the surrounding water and begin migrating toward each other. The chain starts to curve and collapse. This is driven purely by thermodynamics — no energy input required.

Core Surface Hydrophilic (surface-facing) Hydrophobic (buried in core)

The folded protein: hydrophobic core buried away from water, hydrophilic surface facing the cell's watery interior. Shape = function. A single amino acid change from a cancer mutation can alter this shape — creating a neoantigen the immune system can recognise.

1 / 3
Protein folding in three stages. Use Next/Back to step through. The final compact shape is determined entirely by the amino acid sequence — which is determined by the gene.

3.2: The Protein Folding Problem

In 1969, a molecular biologist named Cyrus Levinthal posed a paradox that would haunt the field for fifty years. Consider a modest protein of just one hundred amino acids. Each amino acid can adopt multiple orientations relative to its neighbors. If you enumerated every possible three-dimensional configuration, the number would be on the order of ten to the power of three hundred. If the protein tried each one at a rate of trillions per second, it would take longer than the age of the universe to find the right shape. Yet real proteins fold correctly in milliseconds to seconds. They are not searching randomly.

The resolution: protein folding is guided by an energy landscape. The protein does not try every shape. Local structures form first (a helix here, a sheet there), and the chain progressively collapses toward its lowest-energy, most stable configuration. Specialized helper proteins called chaperones prevent misfolding. The principle that sequence determines structure was demonstrated experimentally by Christian Anfinsen in 1961 when he showed a denatured protein could spontaneously refold into its correct shape. He won the Nobel Prize in 1972.

For Conyngham's project, this matters directly. A cancer mutation that changes a single amino acid can alter a protein's three-dimensional shape. The altered shape means the cell's surface displays a different molecular fragment, a neoantigen, and the immune system may recognize it as foreign. To design a vaccine targeting that neoantigen, you need to know what the mutated protein looks like in three dimensions: which parts are exposed, which buried, how the mutated region interacts with immune detection machinery. For decades, the only way was laborious experimentation. Then AlphaFold arrived.

3.3: Why We Couldn't Predict Shapes Until Now

Before AlphaFold, determining a protein's three-dimensional structure required experimental techniques that are slow, expensive, and technically demanding. X-ray crystallography, the gold standard, requires growing a crystal of purified protein, bombarding it with X-rays, and interpreting the diffraction pattern. Not all proteins crystallize easily, and the process can take months or years. Cryo-electron microscopy (cryo-EM) requires equipment costing millions. Over more than fifty years, the Protein Data Bank (PDB) has accumulated roughly two hundred thousand experimentally determined structures. An extraordinary achievement, but less than a tenth of one percent of all known protein sequences.

In 1994, computational biologist John Moult launched CASP, the Critical Assessment of protein Structure Prediction, to evaluate how well computers could predict protein shapes from sequences alone. Teams would receive sequences, submit predictions, and be scored against experimental structures that had been solved but not published. For twenty-five years, progress was incremental. The best methods scored in the forties and fifties on a scale called pLDDT, where one hundred is a perfect prediction and anything above ninety is considered equivalent to experimental accuracy. Then, in 2020, the curve broke.

Key Takeaways

  • Proteins are built from 20 amino acids; their 3D shape — determined by sequence — is what gives them their function.
  • Hydrophobic residues bury themselves in the protein core; hydrophilic residues face the watery cell interior.
  • A single amino acid change from a cancer mutation can alter protein shape and create a neoantigen the immune system can target.
  • For 50 years, predicting protein shape from sequence alone was unsolved — until AlphaFold.