BLOG

Published 1 July 2025 by Andrei Mihai

Folding Into Greatness: How a Physicist Helped Solve One of Biology’s Greatest Mysteries

#LINO25 started with several Lindau premieres, among them the Lecture of John M. Jumper

“I guess one of the challenges of receiving a Nobel Prize,” said John Jumper, in the introductory lecture on Monday at #LINO25, “is that you’re destined to give that lecture for the rest of your career.” It’s a small price to pay for the recognition that comes with solving one of biology’s greatest mysteries. Especially if your lecture is “a fun one,” as the Laureate said.

In 2024, John Jumper became the youngest Nobel Laureate in Chemistry in the last 70 years. He shared the prize with Demis Hassabis and David Baker. Jumper and Hassabis, both of whom work at Google DeepMind, were awarded the prize for their contributions on protein folding and AlphaFold.

Yet, Jumper’s path to the Nobel was anything but conventional. He started out in theoretical physics, driven by a fascination with elegant models and the thrill of testing new ideas. “I still love physics. I still consider myself a physicist, despite evidence from my degree and the Nobel Committee to the opposite,” Jumper says.

But it was his leap into computational biology, and his bold decision to focus more on deep learning, that changed everything.

The Big Problem of Protein Folding

In the early days, Jumper’s work focused on simulation – recreating molecular interactions from physical laws. But even the most sophisticated classical models fell short. They layered approximations upon approximations: Schrödinger’s equation simplified, then fitted to force fields, then simulated over time. You can’t just stack three approximations one on top of the other and end up with an exact result, Jumper says.

This approach couldn’t work for studying protein, and the stakes were high.

Proteins play a key role in many different biological processes and what they do depends on their 3D shape. To understand a protein’s shape, you need to understand how it folds. However, the protein folding problem proved notoriously difficult to crack.

The information we had usually came from experimental data and progress was slow.

John M. Jumper
Immediate Q&A with Young Scientists after his Lecture

For decades, experimentalists worked to determine the structures of as many proteins as possible. However, by 2020, they had determined only about 200,000 protein structures. The human body alone contains up to 400,000 proteins, and it’s estimated that there are billions of proteins on Earth. Scaling the process was also extremely challenging.

Determining protein structures experimentally is far from easy. It requires coaxing these complex molecules into highly regular formations. This is often done through labor-intensive and expensive methods like X-ray crystallography or NMR spectroscopy.

“We have to do this incredibly hard step of convincing something weird and large and unruly to form a regular crystal,” Jumper says. He recalled reading one paper’s supplementary information that simply stated: “After a worthy year, crystals began to appear.” It was a moment that stuck with him. A year, just to kickstart the process.

There had to be a better way.

CASP, Meet AlphaFold

AlphaFold is not the first algorithmic attempt at figuring out protein folding. In fact, for over 20 years, some scientists believed (somewhat prematurely) that they had cracked the problem. Their confidence often stemmed from testing models on known structures or refining methods until they worked on specific proteins. But as John Jumper noted in his lecture, this led to a form of scientific self-deception. It worked for some proteins, but not for others.

Without a standardized, realistic test, it was impossible to tell whether these methods were truly predictive or just good at recognizing what they had already seen. That’s why CASP was set up.

Every two years, the global scientific community gathers to predict the 3D shape of a protein from its amino acid sequence. This event is known as CASP, or the Critical Assessment of protein Structure Prediction. It’s a blind challenge. Organizers collect protein structures that have been solved but not yet published, then distribute the raw sequences to research teams.

John M. Jumper in the Inselhalle
John M. Jumper delivered his first Lecture in Lindau in occasion of #LINO25

CASP demonstrated that even the best attempts fell very short of truly predicting protein folding.

Then, in 2018, the first version of AlphaFold steppd in. AlphaFold was a novel artificial intelligence system that aimed to predict the 3D structure of proteins from their amino acid sequences. In its first iteration, it relied heavily on evolutionary data (comparing a protein sequence to similar ones from other organisms). This allowed it to infer which amino acids likely sit close together in 3D space.

“We and others had been finding that if you take this kind of correlation version, the convolution version of this evolutionary information, and you run some very standard 2D machine learning on it [..] that’s kind of like image processing and computer vision. So I’ll just stick it in the standard computer vision network and we’ll train it really hard to go from evolutionary information to structural information.”

The predictions were decent. AlphaFold fared well. But they were not trustworthy enough for most science. “We were far below what experimentalists needed to trust it.” Jumper and colleagues realized that they needed a different approach.

AlphaFold 2: Return of the AI

AlphaFold 2 represented a complete architectural overhaul. Instead of relying primarily on evolutionary correlation maps, the model introduced new ideas inspired by language models and geometric reasoning. Central to this redesign was a component called the Evoformer, a transformer-based architecture that allowed the system to jointly reason over the protein sequence and its pairwise relationships, learning both the pattern and the shape.

“We had kind of moved entirely to the machine learning side of the fence,” Jumper says.

AlphaFold 2 abandoned the idea that the network should try to simulate how proteins fold in nature. Instead, it focused purely on the end result: the final structure. Essentially, instead of trying to understand the physics behind it, AlphaFold 2 learned patterns from the data.

This strategy proved extremely effective. AlphaFold 2 stunned the scientific community at the CASP in 2020. It predicted protein structures with a level of accuracy that was close to experimental methods, far exceeding other computational systems.

“We did really, really well,” says Jumper. “This took it to the realm of biological significance.” After decades of slow, incremental progress, protein structure prediction had finally become practical and reliable.

Beyond Just Proteins

After its landmark performance at CASP14, AlphaFold 2 didn’t remain a closed research project. DeepMind released the AlphaFold 2 model and its source code into the open domain. Along with it, they also released a database of over 200 million predicted protein structures.

These predictions covered nearly every known protein across hundreds of organisms. The database instantly became one of the most widely used resources in biology. Researchers began using AlphaFold for everything from drug discovery to agriculture. As Jumper put it, the goal was “amplifying the work of experimentalists.” The science has come full circle: the experimental data helped build AlphaFold, and now AlphaFold is giving back to the experimental community.

“That created a really new type of resource. One of my favorite things was seeing experimentalists using this resource,” Jumper mentions.

But even this is not the end. Time was running short and Jumper only briefly mentioned what is happening now, after AlphaFold 2.

In 2024, DeepMind announced AlphaFold 3. This is a model that is not just for proteins anymore. It predicts how proteins interact with DNA, RNA, ions, and small molecules. It uses a diffusion model, similar to the tech behind AI image generation. This story is still very much new and unfolding, and the effects of AlphaFold are just starting to ripple.

Reflecting on the journey, Jumper noted that “there had yet to be a really beautiful science of protein machine learning.”

Now there is. And it’s only just beginning.

Andrei Mihai

Andrei is a science communicator and a PhD candidate in geophysics. He co-founded ZME Science, where he tries to make science accessible and interesting to everyone and has written over 2,000 pieces on various topics – though he generally prefers writing about physics and the environment. Andrei tries to blend two of the things he loves (science and good stories) to make the world a better place – one article at a time.