San Diego Union-Tribune

LOCAL SCIENTISTS HELPED IN AI BREAKTHROU­GH

- BY ELIZABETH KOMIVES Ph.D., is a distinguis­hed professor of chemistry and biochemist­ry at UC San Diego. She lives in Clairemont.

Every other year, scientists hold a contest to see whose computer program can best predict a protein’s three-dimensiona­l structure just by looking at the sequence of its amino acid building blocks. This year’s contest was won by AlphaFold, an artificial intelligen­ce network developed by the Google offshoot DeepMind. Its prediction­s came so close to the actual protein structures that many scientists have declared AlphaFold a game-changer, going so far as to say “the protein folding problem is now solved.”

So why is knowing the

3D shape of a protein so important? Every function in our bodies — from digestion to growth — is carried out by proteins. We need to know how those proteins are shaped so we can better understand how they work normally, as well as what goes wrong with them in various diseases. If we can see a protein’s structure, that can also help us know how best to manipulate it with therapeuti­c drugs.

That brings us to the “protein folding problem”: It’s very difficult to predict a protein’s shape just by looking at its sequence of amino acids.

In 2001, the human genome — the entire sequence of our DNA — was published. This was a great advance because it gave us the “blueprint” of what we are made of. However, interpreti­ng that blueprint into all the functions that go on in our bodies is not a solved problem at all. Our cells first translate that DNA blueprint into 20 different amino acids, then link them together into proteins.

Finally, proteins usually need to fold up into unique structures before they can carry out their specific functions.

Predicting protein structure is a challenge in part because any random string of amino acids likely will not fold into a unique shape. As an amino acid string collapses on itself, the side chains of those 20 different amino acids interact in ways that may be favorable or unfavorabl­e. There may be many different favorable ways to fold and the protein becomes frustrated — it can’t figure out which way is best. Natural proteins have evolved their unique shapes because as they collapse, there’s really only one favorable way to go. Even knowing this, it’s still difficult to predict, just from the sequence, which amino acids will make the most favorable interactio­ns.

Lots of work by hundreds of scientists contribute­d to AlphaFold’s success. A key insight was made in the 1980s by researcher­s who realized that favorable interactio­ns help proteins avoid frustratio­n as they fold. Then, to train artificial intelligen­ce systems like AlphaFold’s, we had to know all the possible amino acid interactio­ns that can be made, and to do that it was necessary to solve a lot of protein structures experiment­ally. Scientists — including many at UC San Diego — do that in a laborious process that involves crystalliz­ing the proteins and capturing their native structures by X-ray. Working backward, they can then see what interactio­ns the amino acids are making.

After the human genome was solved, the National Institutes of Health, knowing that the DNA sequence didn’t really give us answers about all the functions in our bodies, funded the Structural Genomics Initiative. The goal was to increase the numbers of known protein structures in a central reference database, the Protein Data Bank, which is headquarte­red at UC San Diego and Rutgers University. Since 2000, many more protein structures have been solved experiment­ally, giving scientists the data they so badly needed for their prediction­s — and giving AlphaFold the missing pieces of data it needed to optimize its algorithm and improve upon past attempts to win the contest.

As is often the case in science, AlphaFold stands on the shoulders of hundreds of scientists who have experiment­ally determined protein structures and who have worked out the theoretica­l understand­ing of how to extract and balance the physical, geometrica­l and fragment informatio­n from known folded structures.

Since 2000, many more protein structures have been solved experiment­ally, giving scientists the data they so badly needed for their prediction­s — and giving AlphaFold the missing pieces of data it needed to optimize its algorithm and improve upon past attempts to win the biennial contest.

Komives,

Newspapers in English

Newspapers from United States