Published 20 June 2011 by Ashutosh Jogalekar

The challenges and allure of protein design: A memo for this year’s young researchers

An inspiration from the birth of aviation

A few weeks ago I visited the small coastal town of Kitty Hawk in North Carolina. Kitty Hawk is where the Wright brothers made their epoch-making first powered flight. Big stones mark the start and end points of the flight. There is a huge monument on top of a hill where they took off and then there are three stones at varying distances at ground level. The three stones indicate the distances covered on every flight; the brothers clearly got better at flying on every attempt.
The Wright brothers’ story is inspiring not only because of the watershed in human history which they orchestrated but also because it shows the evolution of a technology at its best. The projects which the brothers undertook cost a few hundred dollars and should serve as a beacon of inspiration in this era of "big science" involving hundreds of millions of dollars. The brothers had a bicycle workshop in which they fashioned many of the components of their infant gliders. They drew inspiration from Otto Lillienthal who had been the first aviation pioneer to make successful glided flights; tragically, Lillienthal was killed on one of his flights, but not before saying "Kleine Opfer müssen gebracht werden!" ("Small sacrifices must be made!"). One of the most important lessons that the Wrights learn from Lillienthal’s adventures was the great value of building ‘toy’ models. Toy models start from the simplest possible systems which retain the essential features of a phenomenon and then work their way towards greater complexity. This philosophy has been used by many other pioneers of technology, including the scientists and engineers who made the moon landings possible.

Why am I relating the story of the Wright brothers? I relate it not only because it is extremely interesting in its own right but because it holds lessons and insights for a field that is likely to have a great impact on medicine. Perhaps some of the students attending this year’s Lindau meeting will make significant contributions to this field. This field is the science and art of protein design and its roots go back to the classical discipline of organic synthesis. In organic synthesis humans found an opportunity to extend their reach into their environment and thus control matter at whim. By synthesizing molecules of exceedingly greater complexity, chemists first emulated and then tried to supercede nature. There have been many notable successes in this endeavor, most significantly in the construction of synthetic drugs, polymers and petrochemicals which have structures and properties better than those of any molecules found in nature. Over the last one hundred years, molecular design through organic synthesis has become a robust, creative and economically all-encompassing activity that has kept scores of academic and industrial chemists engaged. Organic synthesis has not only been an art of the highest degree in itself, but it has been a transforming and enabling science for medicine and biotechnology; indeed, it has been the bedrock of much of the modern way of life. Protein design holds as many possibilities for the future of medicine in the twenty-first century as organic chemistry did in the twentieth.

The protein design problem

So what is protein design? As the title indicates, it is the ability to change the sequence of known proteins to improve their function and generate novel structures. Ultimately protein design would involve the on-demand enumeration of the amino acid sequences corresponding to any arbitrary protein structure.

The basic question that the protein design problem entails can be thought of as the opposite question of the protein folding problem. The protein folding problem- one of the most challenging problems in all of science- asks us to determine the unique three dimensional structure corresponding to a given sequence of amino acids. The essential hurdle in this problem is captured by the so-called ‘Levinthal paradox’. A typical amino acid chain can potentially fold into literally an astronomical number of structures; just imagine a very long thread and the number of ways it can fold up. Levinthal calculated from a simple back of the envelope calculation that if a typical protein were to actually try out all these folds before it found the right one, it would take many orders of magnitude longer than the age of the universe. Clearly this does not happen, or else life would not exist. In reality, as proteins are synthesized in our bodies at every moment of our existence, it takes milliseconds or seconds at most for them to find their right 3D structure. Thus Levinthal saw a paradox in what was expected and what was observed. But four decades of intensive research from physics, chemistry, biology and computer science has revealed that the ‘paradox’ only presents itself if we don’t know the precise nature of the folding process. In reality, protein folding is like evolution by natural selection. As a protein starts to fold, local structures form quickly, leading to a dramatic reduction of the available ‘space’ to be searched. These local structures then try out a more limited number of structures, with successive steps nailing down more and more of the correct structure. What seemed like a miracle is now quite well-understood, although it’s still a challenge to actually predict the specific steps for a given protein.

The protein design problem asks the opposite question- given a 3D protein structure, what are the possible sequences that can fold up into this structure. Just like the folding problem, the essential difficulty in protein design is to trawl through the astronomical number of sequences that may be compatible with a given structure. Not surprisingly, computer algorithms have been very useful in doing this. In one way the design problem is simpler than the folding problem because in the folding problem there’s only one right answer (one 3D structure) while in the design problem there are many (several sequences). Yet the task is undoubtedly daunting. In fact computer scientists have classified the design problem as ‘NP-complete‘, which plainly speaking means that there is no simple, fast solution to any such problem (although a solution when found can be relatively easily verified).

To cut down on the complexity of the design problem, several solutions have been proposed. One common solution is to do what’s called ‘fixed-backbone design’. Recall that an amino acid chain consists of the backbone, which is the peptide bond, and the side-chains; it’s the side-chains that define the identity of a particular amino acid. In fixed-backbone design, it is assumed that the orientation of the backbone of the protein is fixed, and only the side-chains are varied to find a optimum fit to the structure. By optimum fit we mean a new sequence that does not change the structure and does not lead to high energies. Varying only the side-chains significantly cuts down on the number of solutions to be searched. However, fixed-backbone design does not always work since if the given backbone orientation does not make the right contacts with the rest of the protein structure to begin with, no amount of side-chain tinkering can help (this is the case especially when you are desigining a new protein from scratch). Another solution is to look in the existing database of thousands of known protein structures- called the protein data bank (PDB)- and find sequences that are known to fold into parts of the structure which you are interested in. You can then possibly mix and match these structure-specific sequences to build your known superstructure. Both these solutions are of great utility in a number of applications. And that brings us to the potential applications of protein design in medicine.

How could protein design contribute to medicine?

The possibilities are endless, and we can mention only a few here. From a basic scientific standpoint, the most important application of protein design is in understanding the intricate signaling pathways that underlie all of life’s important functions. Whether you are sensing a photon of light, using your immune system to fight off an infection or using a drug to ward off a disease, the workings of our marvelous bodies and of all living organisms depend on a precise and astonishingly complex communication network of small and large molecules. The communication network usually functions as a cascade; a small molecule can trigger a protein, which activates two proteins which dissociate from each other and activate three other proteins…and so on. The end product of such a process is often the activation or suppression of specific genes that can in turn bring about a myriad number of physiological responses. During these processes, it is very important that every protein binds to its specific partners and none else. Considering the highly crowded environment of the cell which consists of thousands of diverse chemical entities, that such a process of partner-finding happens at all is a marvel; think of trying to find your friend who is lost in the crowd during Oktoberfest.

Yet evolution has optimized every protein for this kind of specific interaction. In fact such precise interactions are very important in diseases like cancer. If they are disrupted, predicted chaos will occur. This is where protein design holds promise. Initially, it can be used simply to understand the intricacies of these signaling pathways. Specific proteins can be modified and introduced into the cell to perturb their interactions. The effects of these perturbations can shed light on the signaling networks of the cell. Once these networks are understood, one can modify a specific protein- say, a protein overproduced in cancer cells- and then introduce that protein into the cancer cell to essentially trick it. The modified protein can interfere with the function of the normal protein, thus hindering the cancer cell from dividing and possibly causing its death.

Perhaps the most fascinating conceptual use of protein design is in what’s called ‘metabolic engineering’. Metabolic engineering is a branch of the new science of synthetic biology and entails mixing and matching genes from various organisms to produce certain important biomolecules on demand. For instance metabolites from one organism can be shuttled into enzyme systems produced by another by connecting the relevant genes from the two organisms the way engineers connect different kinds of pipes. The enzymes will then act on the initial molecules and synthesize new products. Individually organism A might produce molecule A and organism B might produce molecule B. But metabolic engineering can enable us to create a novel genetic system producing both A and B which can then react to make C. The possibilities are endless and fascinating. Recently, scientists from the University of California, Berkeley have used such tinkering to produce the very important antimalarial drug artemisinin, a compound which is sorely needed around the world and whose extraction from natural sources is tedious, expensive and resource-depleting. Protein design can manifest itself in such metabolic engineering in the form of enzyme design. Enzyme design is one of the most challenging and cutting edge aspects of protein design since it involves understanding and modifying the functions of enzymes at atomic detail. Recently in a groundbreaking piece of work, enzyme design was used to produce an enzyme that carries out a reaction that is not catalyzed by any known natural protein. When perfected, enzyme design will make it possible to introduce enzymes catalyzing novel reactions in model organisms by way of genetic engineering. A pipeline of such designed enzymes will allow us to make bacteria do the hard work of making virtually any molecule that we want, including even jet fuel. The baton would pass from traditional synthetic organic chemists to protein designers.

Lastly, protein design is already used for producing one of the newest class of drugs against life-threatening diseases- antibodies. Experimental scientists have already been ‘designing’ antibodies by using the process of directed evolution, in which random mutations are introduced in proteins and those leading to desired functions are retained. Millions of mutated antibodies can be screened against a specific antigen to identify those that bind to it most tightly. Directed evolution can likewise find variants of other proteins and even RNA that perform a particular function efficiently. But protein design can make this process much more rational. Using computational algorithms that ‘dock’ antibody and antigen against each other and calculate their binding energy, protein design can suggest mutations or amino acid changes that will improve the energy of this binding. While the large size of antibodies makes this process challenging, in principle protein design will be able to design not just antibodies but any protein that can bind to a chosen molecule. Again, the possibilities of such a process are limitless, from designing antibodies against the latest strain of flu viruses to designing enzymes that bind to and destroy chemical warfare agents.

From building simple and complex organic molecules to designing proteins, we have come a long way. But the field of protein design is only a decade old and is roughly at the stage that organic synthesis was in 1950 and aviation was back in 1903. Just like the Wright brothers, we have had success with simple systems and have occasionally achieved remarkable feats. Just like organic synthesists in 1950, we are starting to understand the general principles of protein origami but have yet to get to a stage where we can design arbitrary proteins of varying diversity and complexity. Yet this future is full of possibilities and is exactly what makes the field so exciting.

When Robert Burns Woodward- arguably the preeminent organic chemist of the twentieth century and a molecular manipulator without peer- received the Nobel Prize for Chemistry in 1965, the Nobel committee chairman had the following to say in tribute to Woodward’s extraordinary abilities: ‘When it comes to organic synthesis, Nature is the uncontested master, but I dare say that the prize-winner of this year, Professor Woodward, is a good second".

I think we can dare say and hope that among this year’s group of young talents at Lindau, there will be at least one or two who will prove themselves to be good seconds to Nature’s protein design abilities. The future beckons.

BLOG