Chapter 7: The Full Pipeline — Understanding Genetics in the Age of AI

7.1: Overview

Everything in the previous six chapters (DNA, the genetic code, cancer mutations, genome sequencing, protein folding, AlphaFold, the immune system, MHC binding, mRNA vaccines, lipid nanoparticles) converges into a single pipeline: a process that transforms a tumor biopsy into a personalized cancer vaccine. For Rosie, this pipeline, from sequencing to vaccine design, took less than two months.

Click each step to expand Rosie's actual data

1. Tumor DNA Sequencing — ~2 weeks

For Rosie: Paired-end sequencing at UNSW Ramaciotti Centre for Genomics. Illumina platform, ~50× coverage of tumor and matched normal blood. Output: two FASTQ files totalling ~48 GB. Cost: $3,000 AUD. Both tumor biopsy and blood sample were required to distinguish cancer mutations from inherited variants.

2. Variant Calling — ~3 days

For Rosie: FASTQ files aligned to the canine reference genome (CanFam4) using BWA-MEM2. GATK Mutect2 identified somatic mutations by comparing tumor vs normal. Output: a VCF file containing thousands of candidate mutations, filtered to ~300 high-confidence variants affecting protein-coding regions. Ran on a cloud laptop instance; no specialized hardware needed.

3. Neoantigen Prediction — ~1 day

For Rosie: Each mutation translated to all overlapping 8–11 amino acid peptides spanning the mutation site. NetMHCpan 4.1 predicted binding affinity to Rosie's MHC (HLA) alleles. Candidates with affinity <500 nM ranked by Vaxrank, incorporating gene expression data. Final shortlist: 17 high-confidence neoantigen candidates from ~300 mutations.

4. Protein Structure Prediction — hours

For Rosie: Top candidates submitted to AlphaFold Server. Predicted structures visualized in PyMOL to confirm neoantigen peptides sat on the protein surface (not buried). pLDDT scores above 90 for all selected regions. Structural confirmation reduced the list to 12 final vaccine targets. Entire step took one afternoon on a standard laptop.

5. mRNA Vaccine Design — ~1 day

For Rosie: Poly-epitope mRNA construct: 12 neoantigen peptides connected by flexible glycine–serine linkers. Codon-optimized for canine ribosomes using CAMEOS. N1-methylpseudouridine substituted for uridine to reduce inflammation. 5' cap, poly-A tail, and UTR sequences selected for maximum stability. The final construct: a 2,100-nucleotide text file — emailable, reproducible anywhere.

6. Manufacturing & Administration — ~3 months

For Rosie: mRNA synthesized by Prof. Pall Thordarson at the UNSW RNA Institute. Formulated into lipid nanoparticles. Ethics approval via Prof. Rachel Allavena at University of Queensland — ~100 pages of documentation, initially denied, resubmitted, approved after 3 months. First injection: December 2025. Booster: February 2026. By mid-March 2026: largest tumor shrunk 75%, most others 50–75%.

The complete pipeline from biopsy to injected vaccine. Steps 1–5 are computational and can run anywhere with a laptop and internet access. Step 6 requires a specialist RNA synthesis lab.

7.2: Step 1 — Tumor DNA Sequencing

The pipeline begins with biological material. Two tissue samples: tumor and healthy (usually blood). Paired sequencing is essential. You need to distinguish somatic mutations (cancer-specific changes) from germline variants (inherited differences that make every individual unique). Without the healthy comparison, you cannot identify which mutations belong to the cancer. For Rosie, paired sequencing was performed at the UNSW Ramaciotti Centre for Genomics at a cost of three thousand dollars.

The raw output is FASTQ files, enormous text files containing billions of short DNA reads with quality scores for each base. These reads are aligned to a reference genome using alignment software, producing a sorted, indexed BAM file mapping every sequenced fragment to its chromosomal position. At this stage, you have a comprehensive picture of DNA in both tumor and healthy tissue.

7.3: Step 2 — Variant Calling

Variant calling compares the tumor and normal sequences to identify somatic mutations. The most widely used tool is GATK's Mutect2, from the Broad Institute, which uses statistical models to distinguish real mutations from sequencing errors, alignment artifacts, and germline variants. The output is a VCF (Variant Call Format) file, a spreadsheet listing every detected mutation, its genomic location, the reference base, the alternate base, and quality metrics.

A typical solid tumor harbors one hundred to ten thousand somatic mutations, depending on cancer type and mutational burden. These are filtered for quality, removing low-confidence calls, variants in problematic regions, mutations unlikely to affect protein function. Each remaining mutation is annotated: does it change an amino acid (missense), introduce a premature stop (nonsense), or disrupt the reading frame (frameshift)? Only mutations that change the protein sequence are neoantigen candidates. Among those, only a small fraction will produce neoantigens the immune system can recognize.

7.4: Step 3 — Neoantigen Prediction

The pipeline narrows here, and AI becomes indispensable. Of all somatic mutations in a tumor, only about 0.5 to 2 percent produce peptides that bind strongly enough to MHC molecules to be displayed on the cell surface and potentially recognized by T-cells.

NetMHCpan is the key tool. For each mutated protein, it generates all possible peptide fragments of eight to eleven amino acids spanning the mutation site and predicts binding strength to the patient's specific MHC molecules. The threshold: affinity below five hundred nanomolar. Peptides that do not meet it are filtered out.

Surviving candidates are ranked using tools like Vaxrank (from the OpenVax project) or pVACtools, integrating binding affinity, gene expression level, whether the mutation creates a novel sequence absent from the normal proteome, and other metrics. From thousands of mutations, this pipeline typically yields ten to twenty high-confidence neoantigen candidates for vaccine inclusion, the targets most likely to provoke a strong, tumor-specific immune response.

7.5: Step 4 — Protein Structure Prediction

With ranked candidates, the next question: what do these mutated proteins look like in three dimensions? AlphaFold enters the pipeline. By predicting the mutated protein's structure, you can assess whether the neoantigen peptide sits on the surface (accessible to the immune system) or is buried in the interior (where it might not be processed and displayed). You can also model how the peptide interacts with MHC molecules, confirming binding predictions with structural evidence.

Conyngham submitted protein sequences to the AlphaFold Server, which returned predicted structures within minutes. He visualized the mutant proteins, saw where mutations fell in the structure, and assessed prediction confidence using pLDDT scores. This structural information refined neoantigen selection, adding three-dimensional biological insight to the sequence-based predictions. Five years ago, this step would have required months of lab work or been skipped entirely. Now it takes an afternoon.

7.6: Step 5 — mRNA Vaccine Design

With final targets selected, the next step is designing the mRNA construct. The standard approach is a poly-epitope design: a single mRNA molecule encoding multiple neoantigen peptides strung together, separated by linker sequences for efficient processing. One injection trains the immune system against ten or more tumor-specific targets simultaneously, reducing the chance that cancer escapes by losing a single antigen.

The mRNA sequence is optimized using codon optimization algorithms, selecting efficient codons for each amino acid. Modified nucleosides, specifically N1-methylpseudouridine, replace standard uridine to reduce inflammation and increase stability. Grok helped Conyngham with aspects of construct design, including codon optimization and structural element selection. The vaccine design is a digital product, a sequence of letters in a text file, emailable, shareable, reproducible anywhere in the world. The design process, thanks to AI, can be performed by anyone with pipeline knowledge. What cannot be done from a laptop is the next step.

7.7: Step 6 — Manufacturing and Administration

Designing an mRNA vaccine on a computer is one thing. Turning it into an injectable treatment is another. Professor Pall Thordarson at the UNSW RNA Institute synthesized the mRNA and formulated it into lipid nanoparticles, requiring specialized equipment, clean-room conditions, and deep RNA chemistry expertise. This cannot be done in a kitchen or a garage, and it is important to say so. The gap between digital design and physical product remains significant, and crossing it requires trained professionals and institutional infrastructure.

The other major hurdle was ethics approval. Conyngham and his collaborators prepared approximately one hundred pages of documentation to justify using an experimental treatment on an animal. UNSW initially denied the application. Professor Rachel Allavena at the University of Queensland took on the ethics process, which took three months. In December 2025, Rosie received her first injection. A booster followed in February 2026. By mid-March 2026, her largest tumor had shrunk by seventy-five percent, and most other tumors had reduced by fifty to seventy-five percent. One neoantigen was non-responsive, and a second vaccine targeting additional targets was in preparation. Sequencing to vaccine design in under two months. Ethics application to injection in three months. First dose to measurable shrinkage in roughly three months. The entire arc, from a desperate search query to a dog chasing rabbits, spanned less than a year.

Key Takeaways

The pipeline has 6 stages: sequencing → variant calling → neoantigen prediction → structure → mRNA design → manufacturing.
Steps 1–5 are now computational and can be performed by anyone with pipeline knowledge, a laptop, and internet access.
Step 6 (manufacturing and ethics) still requires specialized labs and institutional infrastructure — this cannot be DIY.
For Rosie: sequencing to vaccine design took under 2 months; first measurable tumor response came ~3 months after injection.

The Full Pipeline — From Tumor to Vaccine