Published 5 December 2024 by Andrei Mihai
Nobel Prize in Chemistry 2024: Protein Folding and AI
Proteins are the molecular workhorses of life. They build and repair cells, transmit signals, and can even cause or cure disease. The human body harbours over 100,000 different proteins, each performing unique roles, making them challenging to study.
This year’s Nobel Prize in Chemistry recognized three scientists – Demis Hassabis, John Jumper, and David Baker – for leveraging artificial intelligence (AI) to decipher the secrets of proteins. Their groundbreaking contributions include solving the protein folding problem and pioneering computational protein design.
The Shape of Proteins
Proteins are chains of amino acids that fold into intricate three-dimensional shapes. These shapes are essential: because proteins interact with many other molecules and structures inside organisms, their shapes essentially dictate their biological roles. If you know a protein’s structure, you can often predict its function because its shape dictates the function. Conversely, knowing a protein’s function can sometimes help infer its shape, as certain functional roles are associated with specific structural motifs.
Decoding protein structure is critical to understanding diseases, designing drugs, and even developing new materials. Yet this is far from an easy task, and determining protein structures has historically been slow and painstaking.
Experimental methods such as X-ray crystallography and cryo-electron microscopy require extensive time and resources. For decades, scientists dreamed of predicting protein structures computationally from their amino acid sequences – a challenge known as the “protein folding problem.”
In the 1990s, this led to a competition called the “Critical Assessment of Protein Structure Prediction” (CASP). The goal of CASP was to stimulate computational approaches to predicting protein folding, essentially getting from sequence to structure. Every year, researchers from around the world were given a sequence of amino acids in proteins. The competition organizers had already established the structure of the protein, but this was kept secret; participants had to derive this structure computationally.
Most early efforts scored 40 points or less, on a 100-point scale. CASP attracted plenty of researchers, but the prediction problem was enormously difficult. This is when Baker entered the scene.
From Philosophy to Proteins
Baker initially studied philosophy and social science, but a textbook (Molecular Biology of the Cell) led him to change course. He became fascinated by protein structures and ultimately started devising experiments and algorithms to explore how proteins fold. He created an algorithm called Rosetta.
Compared to most participants, Baker and Rosetta did pretty well. But his work led to a different realization: that the software can also be used in reverse. That is, instead of entering amino acids to get protein structures out, you can insert a protein structure and obtain possible amino acid sequences. Essentially, this enables the creation of new proteins that serve desired functions.
In 2003, Baker’s team created Top7, one of the first synthetic proteins with a novel structure. Since then, his lab has engineered proteins with a range of applications. Sometimes, it was a tweak on an existing protein, so it could break down hazardous substances. Other times, it was a completely new protein that could be used in the chemical manufacturing industry.
The world of proteins was slowly opening up. Yet progress was slow. Creating new proteins was very challenging, and the range of natural proteins to work with was also a limitation. Baker famously said, “If you want to build an airplane, you don’t start by modifying a bird; instead, you understand the first principles of aerodynamics and build flying machines from those principles.”
In this case, the basics meant going back to protein folding, and protein folding was still a difficult challenge. Then, in 2018, a groundbreaking moment arrived.
From Chess to Proteins
AlphaFold is an artificial intelligence (AI) programme developed by DeepMind, an effort spearheaded by Demis Hassabis and John Jumper, acquired by Google in 2014. It’s not the first “Alpha”. Before tackling protein folding, DeepMind created AlphaZero and AlphaGo, AI programs that mastered chess and Go (a strategy board game) through self-learning and pattern recognition.
Chess and protein folding may not seem that similar at first glance, but they share key similarities. They can be both addressed by a problem-solving algorithm and in both instances, you benefit from having vast datasets to derive patterns. Much like you can decode the strategies of chess and Go, you can predict and identify underlying patterns in protein folding even if you don’t fully understand the rules.
By incorporating physical, geometric, and evolutionary principles, AlphaFold achieved accuracy levels comparable to laboratory methods for most proteins
Most predicted protein structures at CASP had an accuracy of around 40%. With AlphaFold, the precision jumped to almost 60%. This was unexpected progress and suggested a promising new avenue. But this was only the beginning.
In 2020, AlphaFold 2 entered CASP again. For moderately difficult protein target, it scored around 90%. This was a major breakthrough. In some instances, it performed essentially as well as X-ray crystallography. The protein folding problem finally had a workable solution. Just one year later, the source code of AlphaFold 2 was released to the public, along with a massive searchable database of proteins. The paper currently has over 29,000 citations. DeepMind used AlphaFold 2 to predict the structures of over 200 million proteins listed in databases, effectively covering virtually all known proteins. This monumental effort was then made freely accessible to the global scientific community, accelerating research in biology and medicine.
A New Age for Biology
The achievements of Demis Hassabis, John Jumper, and David Baker are remarkable because they solve two fundamental challenges in biology: accurately predicting protein structures and designing new proteins with specific functions. The marriage of biology and algorithms was not an obvious one, but the three pioneers exemplified the transformative potential of interdisciplinary research. By solving the protein folding problem and pioneering protein design, these Laureates have laid the foundation for a new era in biology – one where the mysteries of life’s molecular machinery are no longer out of reach.
The three were awarded the 2024 Nobel Prize in Chemistry: one half to Baker, “for computational protein design”, and the other half to Hassabis and Jumper, “for protein structure prediction”. Although the research is relatively new, particularly compared to other Nobel Prizes, it’s already making a big impact in the world.
In medicine, AlphaFold accelerates drug discovery by revealing protein structures critical for targeting diseases like cancer and Alzheimer’s. It aids in vaccine development by identifying stable antigen structures. Protein design enables tailored enzymes for industrial processes, creating greener alternatives to traditional catalysts. Environmental applications include engineered proteins that break down plastics or detoxify pollutants. Additionally, synthetic proteins are advancing nanotechnology, enabling precise drug delivery and bioengineering innovations. These tools are reshaping biology, offering solutions to global health, environmental, and industrial challenges.
Yet work is still ongoing. While AlphaFold and Rosetta are revolutionary, they are not without limitations. Experimental validation remains essential for verifying computational predictions. Additionally, some proteins, particularly those that form complexes or interact with other molecules, pose challenges for current AI systems.
Despite these ongoing challenges, their achievements lay the foundation for a future where we can engineer life’s molecular machinery with precision, transforming how we understand and interact with the biological world.