LOCAL SCIENTISTS HELPED IN AI BREAKTHROUGH
Every other year, scientists hold a contest to see whose computer program can best predict a protein’s three-dimensional structure just by looking at the sequence of its amino acid building blocks. This year’s contest was won by AlphaFold, an artificial intelligence network developed by the Google offshoot DeepMind. Its predictions came so close to the actual protein structures that many scientists have declared AlphaFold a game-changer, going so far as to say “the protein folding problem is now solved.”
So why is knowing the
3D shape of a protein so important? Every function in our bodies — from digestion to growth — is carried out by proteins. We need to know how those proteins are shaped so we can better understand how they work normally, as well as what goes wrong with them in various diseases. If we can see a protein’s structure, that can also help us know how best to manipulate it with therapeutic drugs.
That brings us to the “protein folding problem”: It’s very difficult to predict a protein’s shape just by looking at its sequence of amino acids.
In 2001, the human genome — the entire sequence of our DNA — was published. This was a great advance because it gave us the “blueprint” of what we are made of. However, interpreting that blueprint into all the functions that go on in our bodies is not a solved problem at all. Our cells first translate that DNA blueprint into 20 different amino acids, then link them together into proteins.
Finally, proteins usually need to fold up into unique structures before they can carry out their specific functions.
Predicting protein structure is a challenge in part because any random string of amino acids likely will not fold into a unique shape. As an amino acid string collapses on itself, the side chains of those 20 different amino acids interact in ways that may be favorable or unfavorable. There may be many different favorable ways to fold and the protein becomes frustrated — it can’t figure out which way is best. Natural proteins have evolved their unique shapes because as they collapse, there’s really only one favorable way to go. Even knowing this, it’s still difficult to predict, just from the sequence, which amino acids will make the most favorable interactions.
Lots of work by hundreds of scientists contributed to AlphaFold’s success. A key insight was made in the 1980s by researchers who realized that favorable interactions help proteins avoid frustration as they fold. Then, to train artificial intelligence systems like AlphaFold’s, we had to know all the possible amino acid interactions that can be made, and to do that it was necessary to solve a lot of protein structures experimentally. Scientists — including many at UC San Diego — do that in a laborious process that involves crystallizing the proteins and capturing their native structures by X-ray. Working backward, they can then see what interactions the amino acids are making.
After the human genome was solved, the National Institutes of Health, knowing that the DNA sequence didn’t really give us answers about all the functions in our bodies, funded the Structural Genomics Initiative. The goal was to increase the numbers of known protein structures in a central reference database, the Protein Data Bank, which is headquartered at UC San Diego and Rutgers University. Since 2000, many more protein structures have been solved experimentally, giving scientists the data they so badly needed for their predictions — and giving AlphaFold the missing pieces of data it needed to optimize its algorithm and improve upon past attempts to win the contest.
As is often the case in science, AlphaFold stands on the shoulders of hundreds of scientists who have experimentally determined protein structures and who have worked out the theoretical understanding of how to extract and balance the physical, geometrical and fragment information from known folded structures.
Since 2000, many more protein structures have been solved experimentally, giving scientists the data they so badly needed for their predictions — and giving AlphaFold the missing pieces of data it needed to optimize its algorithm and improve upon past attempts to win the biennial contest.
Komives,