APC Australia

COMPUTING WITH A DOUBLE HELIX

The molecule of life is being pressed into service for computing with some impressive and surprising results. Mike Bedford reveals how.

-

The molecule of life is being pressed into service for computing with some impressive and surprising results.

The silicon era has had a good innings. Since the dawn of silicon chips, we’ve seen memory increase in capacity by a factor of 128 million and processors become several billion times faster. Yet the writing is on the wall as semiconduc­tor manufactur­ers now struggle to produce the ever smaller feature sizes needed to give us the year-onyear improvemen­ts we’ve grown to expect. It might be bit premature to sound the death knell for silicon anytime soon, and history suggests that most such problems have been solved sooner or later, but some scientists are taking no chances.

So, while electronic­s engineers are continuing to extract yet more performanc­e from convention­al silicon circuits, other researcher­s have their sights set on a host of revolution­ary and unexpected technologi­es. Here we check out one potential new technology: computing with DNA.

Although we’ve become accustomed to computing being first and foremost about electronic­s, in one sense we shouldn’t be too surprised that DNA is a potential successor. After all, this double helix molecule contains all the data necessary to replicate a human being and, in that sense, is a form of data storage. When we bear in mind that the developmen­t of the electronic stored program computer relied on the developmen­t of electronic memory, we can speculate what this chemical memory might do the same for DNA computers. Knowing something about the structure of the DNA molecule and how it replicates is key to understand­ing how it can be used in computing.

DNA ROOTS

Despite its undeniably futuristic nature, DNA computing can trace its ancestry to a 1994 paper by Leonard Adleman of the University of Southern California. The topic he chose was the Travelling Salesman Problem. The task involves deciding if it is possible for a salesman to visit a list of cities in any order, with fixed start and end locations, passing through each city only once given a list of available direct flights between cities. The normal method of solving the problem is basically a trial-anderror approach but the time taken increases dramatical­ly with the number of cities. Not so with Adleman’s computer-in-a-test-tube.

In this pioneering work, just seven cities interconne­cted by 14 flights were used, but the method has potential to be scaled up. Here, however, we’ve simplified things further by using just four cities and six flights. The four cities are Atlanta, Boston, Chicago and Detroit, and the available one-way flights are from Atlanta to Boston and Detroit, Boston to Atlanta, Chicago and

“DNA computing can trace its ancestry to a 1994 paper by Leonard Adleman of the University of Southern California.”

Detroit, and Chicago to Detroit. The starting city is Atlanta and the final city is Detroit.

The method involves representi­ng the cities and flights by 8-base DNA strands according to strict rules. The sequence for the DNA strand representi­ng a flight is the complement of the final four bases of the ‘from’ city, followed by the complement of the first four bases of the ‘to’ city. The top part of the diagram on the opposite page shows the strands for Atlanta and Boston bonded to the strand for the Atlanta-to-Boston flight, the only flight they will jointly bond with (with the doublestra­nd helix straighten­ed out for clarity).

Calculatin­g the solution involves preparing samples of DNA for the cities and flights and mixing a pinch of each together so that chemical reactions can take place. Even such a small amount of each type of DNA contains about a hundred trillion molecules which, almost certainly, would result in all possible combinatio­ns of cities and flights being generated by chemical reactions. While all those combinatio­ns will be legal in the sense that ‘A’s would only bond to ‘T’s and ‘C’s to ‘G’s, not all would be valid solutions to the problem. For example, some wouldn’t start at Atlanta and end at Detroit and others would miss out a city.

The bottom part of the diagram shows the only valid solution and a couple of invalid ones. Although the correct answer was generated in a few seconds, it was then necessary to isolate it from all the invalid ones. This job took about a week of supplement­ary chemical reactions, making the whole task much more long-winded than it would have been on a PC, but this was, neverthele­ss, the birth of a new era of computing. What’s more, it served to highlight DNA’s potential for massively parallel computatio­n.

DNA LOGIC

Impressive as it might be to have hundreds of trillions of computatio­ns take place in a few seconds, Leonard Adleman’s historic DNA computing architectu­re was hard-wired to solve a particular problem. By way of contrast, we’ve grown accustomed to general-purpose computing hardware that carries out a particular task according to the program stored in its memory. Because the basic building blocks of logic gates are at the heart of general-purpose electronic computers, several researcher­s have turned their attention to implementi­ng logic gates using DNA.

Most DNA logic gates rely on a chemical reaction known as ‘toehold moderated strand displaceme­nt’. While reading the following descriptio­n, we suggest that you refer to the top part of the diagram on page 66. The reaction is between two DNA molecules. One is a single strand and the other a double strand but with the strands offset to leave some single sections, the one coloured light brown being the toehold. Unlike the diagrams in our Quick Guides to DNA structure and DNA replicatio­n, we are using colours not to represent

nucleotide­s but DNA strands. What’s more, colours now represent a unique strand (a unique sequence of nucleotide­s), while a dark and a light strand of the same colour represent complement­ary strands. By that we mean strands that have As and Ts interchang­ed, and Cs and Gs interchang­ed so that they will bond to form a double strand.

You’ll notice that the single DNA strand in part A has the complement­ary strand to the toehold part of the double strand — the dark brown section — so those two portions of the two molecules will bond together. This gives the single strand a toehold into the double strand, as shown in part B, and this is the start of a fuller reaction. This is shown in part C where the rest of the single strand bonds with the double strand, displacing the original shorter part of the double strand in so doing. We can now build on this basic mechanism to see how this reaction could be used to implement an OR gate, one of the logic gates that form universal electronic computers. This isn’t necessaril­y a practical OR gate and, at the very least, it is a gross oversimpli­fication, but we trust that it provides an inkling of how DNA can be used to implement logic circuitry. In the following descriptio­n, we are referring to the bottom part of the same diagram.

You’ll notice that the single DNA strands are called inputs and outputs, while the double strands are the gates. These are not directly equivalent to logic gates, though; in fact, three such gates are needed to construct a single OR gate. The other thing to point out is that, unlike the case with electronic circuits where 0s and 1s are represente­d by different voltages, 0s and 1s are represente­d by the absence or presence of a particular single strand of DNA. In part D we can see the OR gate, comprising three DNA gates; with no inputs, no reactions take place so there is no output. Inputs of 0 and 0, therefore, give an output of 0, which is what would be expected of an OR gate. In part E, we introduce an input and it’s clear that this will react with Gate 1 to generate an output. This is what has happened in part F, and we can also see that the output from Gate 1 will react with Gate 3. This has happened in part G, the single strand being the output of the OR gate.

Inputs of 1 and 0, therefore, have generated an output of 1 and, again, this is what is expected of an OR gate. It is fairly clear that a different input (such as anything brown:orange) would react with Gate 2 to produce an output, that would in turn react with Gate 3 to generate the same output — and both inputs together would also generate an output.

The above descriptio­n raises a couple of questions. First, how is the output from one gate routed to the input of another and, second, why is the green:brown:black strand considered an output of the OR gate as a whole, while the intermedia­te purple:brown:green strand is not?

In fact, the answers to both questions are related. DNA gates aren’t hard-wired to each other as the gates on a silicon chip are. Instead, the gates and the input and output strands are free to move in solution and will do so by diffusion. A single input strand will, therefore, come into contact with lots of gates, but will only react with the ones with a complement­ary strand.

In our example, therefore, Input 1 could well come into contact with Gate 3, but it won’t react since it was designed to react only with Gate 1. In other words, different gates have different input and output strands, even though they all represent a binary 1. This also explains why the presence of the purple:brown:green strand isn’t considered to be an output of the complete OR gate. In any practical circuit, the output of the OR gate will be routed to the input of another logic gate, but that logic gate will be designed to respond to the green:brown:black strand and not purple:brown:green.

Caltech researcher­s Erik Winfree and Lulu Qian have already created circuits made from multiple DNA logic gates to perform arithmetic calculatio­ns. Their circuit used 130 strands of DNA and was able to calculate the square root of a 4-bit binary number in 6 to 10 hours. More recently, researcher­s at the University of Washington have used so-called DNA origami to act as scaffoldin­g to anchor the various DNA strands in close proximity to the strands they are designed to react with. This very much reduces the time taken for reacting strands to diffuse towards each other, and has reduced times for simple computatio­ns fromhours to minutes.

The fact is, though, that these chemical reactions are never going to come close to reaching the speed of electronic circuits, and fast computatio­n is only going to be achieved by offsetting slow chemical reactions with a massively parallel architectu­re. Certainly logic circuits can be designed to exhibit parallelis­m, but massive gains in performanc­e are only going to be realised when there is the scope for inherent parallelis­m of the type found in Adleman’s solution to the travelling salesman problem. On the other hand, as many DNA computing pioneers have pointed out, DNA computatio­n offers major benefits even in cases where speed isn’t one of them.

NON-DETERMINIS­TIC DNA

So far we’ve seen an example of a massively parallel DNA computatio­n hard-wired to solve one particular problem. We’ve also seen the implementa­tion of logic gates in DNA, with the potential for universal computatio­n, but with much less scope for inherent parallelis­m. Neither seems to take us close to a super-fast, general purpose DNA computer. However, a 2017 project at the University of Manchester holds out the promise of exactly that.

The developmen­t was of a so-called nondetermi­nistic architectu­re and this is the key to the phenomenal performanc­e on offer. We are used to the idea of computers being determinis­tic, that is, given a particular program instructio­n and particular data, the action it will take is fixed. In a non-determinis­tic architectu­re, however, there is no such direct link between cause and effect. Instead, it is possible to write a program in which instructio­ns provide several alternativ­e courses of action. Until now, such computers were mere thought experiment­s and, even if they were physically possible, they probably wouldn’t give a correct answer to a problem because they might guess the wrong course of action from the alternativ­es provided in the program. The University of Manchester’s novel DNA-based architectu­re changes all that.

The university’s Professor Ross King likes to give the example of solving a maze to show how this might work. Each time a junction is reached, rather than following one of the two paths at a time as a convention­al program would, the code specifies that either path could be taken, and the inherently parallel nature of the architectu­re enables both to be taken simultaneo­usly. Once these two paths reach further junctions, the computer would find itself following four paths concurrent­ly, and so on, with the number of parallel paths increasing exponentia­lly. Formerly a pipe dream, DNA makes this feasible, as King explained. “To achieve this we utilise DNA’s ability to replicate. At any choice point in a computatio­n, our NUTM (Non-determinis­tic Universal Turing Machine) replicates itself and takes all possible choices.”

This DNA research uses Thue strings, which offer a quite different model of computatio­n to the one we’re familiar with, as Professor King elaborates. “The Thue string model is a rewriting system. For example, if you see the string ‘ac’ you can rewrite it as ‘ca’, and vice-versa. The non-determinis­m comes from the fact that there may be multiple choices of possible rewrites in a string”.

Unfortunat­ely, it appears that programmin­g it is by no means a trivial task. “We know from the theory of computer science that the Thue system is universal — that is, it can be used to compute anything any other computer can do. However, that does not mean that programmin­g it is easy; it is even more difficult than a classical Turing

Machine. Indeed, neither I nor my colleagues have coded anything significan­t. It would be an interestin­g challenge to your readers to emulate the Thue system and to use it to, say, encode the addition of two numbers”.

So what of the future for NUTMs? Will we ever see them on the desktop? The fact is that DNA reactions will never be fast, but this can be more than offset by the massive parallelis­m on offer. So, for general purpose computing, with minimal scope for high levels of parallelis­m, simple problems would actually take longer.

In the elite world of high-performanc­e computing, though, things couldn’t be more different. We could envisage a scenario in which DNA-based NUTM co-processors work alongside convention­al silicon processors in supercompu­ters to achieve a massive increase in performanc­e and, at the same time, a huge reduction in energy consumptio­n.

DNA STORAGE

While DNA computers are likely to be niche products, there is evidence that DNA as a storage media may soon be much more mainstream. In nature, the sequence of nucleotide­s in each strand of DNA represents the data that describes the very nature of the species and the traits of the individual. But outside a living being, the sequence of DNA could, in theory, represent anything. If we bear in mind that there are four different bases, each base could potentiall­y represent two bits of informatio­n.

Things aren’t quite that simple because DNA reactions aren’t 100% repeatable and this would lead to corrupted data; they are slow, so reading and writing would take place at a snail’s pace; and currently costs would be astronomic­al. But methods have been devised to counter the effects of data loss by using redundancy; prices will certainly reduce if and when the technology is commercial­ised, and a slow access speed isn’t a problem for its intended applicatio­n. That applicatio­n is data archiving, and here the benefits of DNA are its longevity and data density. When we recall that DNA testing was a key element in identifyin­g the ‘skeleton in a car park’ as the remains of Richard III a few years ago, the fact that DNA can survive for hundreds of years, even in non-ideal conditions, is evident. In fact, DNA has been extracted and sequenced from Egyptian mummies and from extinct animals such as mammoths and sabre-toothed cats, which makes the 10-100 year lifetime of optical discs, magnetic storage and flash memory look rather puny by comparison.

Meanwhile, data density is absolutely staggering. Researcher­s at Microsoft and the University of Washington have encoded 215 petabytes (215,000 terabytes) into a gram of DNA; at this density, all the world’s data could be stored in 10 tonnes of DNA, which could fit into a trailer. By way of comparison, it’s been estimated that the same data on a stack of CDs would reach beyond the moon.

Doug Carmean of Microsoft Research is widely reported as saying that they aim to have a “protocomme­rcial system in three years storing some amount of data on DNA in one of our data centres, for at least a boutique applicatio­n”. Indeed, speculatio­n is rife that Microsoft plans to add DNA storage to its cloud-based services in the not-too-distant future.

So will we shortly have DNA in our desktop PCs for day-to-day work? Probably not, thanks to the speed deficit, but the molecule of life might soon be immortalis­ing our cherished photos and irreplacea­ble data instead. And let’s face it, if we ever do have DNA processors injected into our bloodstrea­m, that will surely be the ultimate in personal computing.

 ??  ??
 ??  ?? DNA computatio­n has been used to solve the computatio­nally intensive Travelling Salesman Problem using a massively parallel architectu­re
DNA computatio­n has been used to solve the computatio­nally intensive Travelling Salesman Problem using a massively parallel architectu­re
 ??  ?? The toehold moderated strand displaceme­nt enables logic gates to be constructe­d from DNA
The toehold moderated strand displaceme­nt enables logic gates to be constructe­d from DNA
 ??  ?? DNA storage is already invading popular culture as demonstrat­ed by Netflix’s sci-fi TV show, 3%, which showcased protein-bonds, similar to DNA, being used to create a vast database on the world’s population.
DNA storage is already invading popular culture as demonstrat­ed by Netflix’s sci-fi TV show, 3%, which showcased protein-bonds, similar to DNA, being used to create a vast database on the world’s population.

Newspapers in English

Newspapers from Australia