COMPUTING WITH A DOUBLE HELIX
The molecule of life is being pressed into service for computing with some impressive and surprising results. Mike Bedford reveals how.
The molecule of life is being pressed into service for computing with some impressive and surprising results.
The silicon era has had a good innings. Since the dawn of silicon chips, we’ve seen memory increase in capacity by a factor of 128 million and processors become several billion times faster. Yet the writing is on the wall as semiconductor manufacturers now struggle to produce the ever smaller feature sizes needed to give us the year-onyear improvements we’ve grown to expect. It might be bit premature to sound the death knell for silicon anytime soon, and history suggests that most such problems have been solved sooner or later, but some scientists are taking no chances.
So, while electronics engineers are continuing to extract yet more performance from conventional silicon circuits, other researchers have their sights set on a host of revolutionary and unexpected technologies. Here we check out one potential new technology: computing with DNA.
Although we’ve become accustomed to computing being first and foremost about electronics, in one sense we shouldn’t be too surprised that DNA is a potential successor. After all, this double helix molecule contains all the data necessary to replicate a human being and, in that sense, is a form of data storage. When we bear in mind that the development of the electronic stored program computer relied on the development of electronic memory, we can speculate what this chemical memory might do the same for DNA computers. Knowing something about the structure of the DNA molecule and how it replicates is key to understanding how it can be used in computing.
DNA ROOTS
Despite its undeniably futuristic nature, DNA computing can trace its ancestry to a 1994 paper by Leonard Adleman of the University of Southern California. The topic he chose was the Travelling Salesman Problem. The task involves deciding if it is possible for a salesman to visit a list of cities in any order, with fixed start and end locations, passing through each city only once given a list of available direct flights between cities. The normal method of solving the problem is basically a trial-anderror approach but the time taken increases dramatically with the number of cities. Not so with Adleman’s computer-in-a-test-tube.
In this pioneering work, just seven cities interconnected by 14 flights were used, but the method has potential to be scaled up. Here, however, we’ve simplified things further by using just four cities and six flights. The four cities are Atlanta, Boston, Chicago and Detroit, and the available one-way flights are from Atlanta to Boston and Detroit, Boston to Atlanta, Chicago and
“DNA computing can trace its ancestry to a 1994 paper by Leonard Adleman of the University of Southern California.”
Detroit, and Chicago to Detroit. The starting city is Atlanta and the final city is Detroit.
The method involves representing the cities and flights by 8-base DNA strands according to strict rules. The sequence for the DNA strand representing a flight is the complement of the final four bases of the ‘from’ city, followed by the complement of the first four bases of the ‘to’ city. The top part of the diagram on the opposite page shows the strands for Atlanta and Boston bonded to the strand for the Atlanta-to-Boston flight, the only flight they will jointly bond with (with the doublestrand helix straightened out for clarity).
Calculating the solution involves preparing samples of DNA for the cities and flights and mixing a pinch of each together so that chemical reactions can take place. Even such a small amount of each type of DNA contains about a hundred trillion molecules which, almost certainly, would result in all possible combinations of cities and flights being generated by chemical reactions. While all those combinations will be legal in the sense that ‘A’s would only bond to ‘T’s and ‘C’s to ‘G’s, not all would be valid solutions to the problem. For example, some wouldn’t start at Atlanta and end at Detroit and others would miss out a city.
The bottom part of the diagram shows the only valid solution and a couple of invalid ones. Although the correct answer was generated in a few seconds, it was then necessary to isolate it from all the invalid ones. This job took about a week of supplementary chemical reactions, making the whole task much more long-winded than it would have been on a PC, but this was, nevertheless, the birth of a new era of computing. What’s more, it served to highlight DNA’s potential for massively parallel computation.
DNA LOGIC
Impressive as it might be to have hundreds of trillions of computations take place in a few seconds, Leonard Adleman’s historic DNA computing architecture was hard-wired to solve a particular problem. By way of contrast, we’ve grown accustomed to general-purpose computing hardware that carries out a particular task according to the program stored in its memory. Because the basic building blocks of logic gates are at the heart of general-purpose electronic computers, several researchers have turned their attention to implementing logic gates using DNA.
Most DNA logic gates rely on a chemical reaction known as ‘toehold moderated strand displacement’. While reading the following description, we suggest that you refer to the top part of the diagram on page 66. The reaction is between two DNA molecules. One is a single strand and the other a double strand but with the strands offset to leave some single sections, the one coloured light brown being the toehold. Unlike the diagrams in our Quick Guides to DNA structure and DNA replication, we are using colours not to represent
nucleotides but DNA strands. What’s more, colours now represent a unique strand (a unique sequence of nucleotides), while a dark and a light strand of the same colour represent complementary strands. By that we mean strands that have As and Ts interchanged, and Cs and Gs interchanged so that they will bond to form a double strand.
You’ll notice that the single DNA strand in part A has the complementary strand to the toehold part of the double strand — the dark brown section — so those two portions of the two molecules will bond together. This gives the single strand a toehold into the double strand, as shown in part B, and this is the start of a fuller reaction. This is shown in part C where the rest of the single strand bonds with the double strand, displacing the original shorter part of the double strand in so doing. We can now build on this basic mechanism to see how this reaction could be used to implement an OR gate, one of the logic gates that form universal electronic computers. This isn’t necessarily a practical OR gate and, at the very least, it is a gross oversimplification, but we trust that it provides an inkling of how DNA can be used to implement logic circuitry. In the following description, we are referring to the bottom part of the same diagram.
You’ll notice that the single DNA strands are called inputs and outputs, while the double strands are the gates. These are not directly equivalent to logic gates, though; in fact, three such gates are needed to construct a single OR gate. The other thing to point out is that, unlike the case with electronic circuits where 0s and 1s are represented by different voltages, 0s and 1s are represented by the absence or presence of a particular single strand of DNA. In part D we can see the OR gate, comprising three DNA gates; with no inputs, no reactions take place so there is no output. Inputs of 0 and 0, therefore, give an output of 0, which is what would be expected of an OR gate. In part E, we introduce an input and it’s clear that this will react with Gate 1 to generate an output. This is what has happened in part F, and we can also see that the output from Gate 1 will react with Gate 3. This has happened in part G, the single strand being the output of the OR gate.
Inputs of 1 and 0, therefore, have generated an output of 1 and, again, this is what is expected of an OR gate. It is fairly clear that a different input (such as anything brown:orange) would react with Gate 2 to produce an output, that would in turn react with Gate 3 to generate the same output — and both inputs together would also generate an output.
The above description raises a couple of questions. First, how is the output from one gate routed to the input of another and, second, why is the green:brown:black strand considered an output of the OR gate as a whole, while the intermediate purple:brown:green strand is not?
In fact, the answers to both questions are related. DNA gates aren’t hard-wired to each other as the gates on a silicon chip are. Instead, the gates and the input and output strands are free to move in solution and will do so by diffusion. A single input strand will, therefore, come into contact with lots of gates, but will only react with the ones with a complementary strand.
In our example, therefore, Input 1 could well come into contact with Gate 3, but it won’t react since it was designed to react only with Gate 1. In other words, different gates have different input and output strands, even though they all represent a binary 1. This also explains why the presence of the purple:brown:green strand isn’t considered to be an output of the complete OR gate. In any practical circuit, the output of the OR gate will be routed to the input of another logic gate, but that logic gate will be designed to respond to the green:brown:black strand and not purple:brown:green.
Caltech researchers Erik Winfree and Lulu Qian have already created circuits made from multiple DNA logic gates to perform arithmetic calculations. Their circuit used 130 strands of DNA and was able to calculate the square root of a 4-bit binary number in 6 to 10 hours. More recently, researchers at the University of Washington have used so-called DNA origami to act as scaffolding to anchor the various DNA strands in close proximity to the strands they are designed to react with. This very much reduces the time taken for reacting strands to diffuse towards each other, and has reduced times for simple computations fromhours to minutes.
The fact is, though, that these chemical reactions are never going to come close to reaching the speed of electronic circuits, and fast computation is only going to be achieved by offsetting slow chemical reactions with a massively parallel architecture. Certainly logic circuits can be designed to exhibit parallelism, but massive gains in performance are only going to be realised when there is the scope for inherent parallelism of the type found in Adleman’s solution to the travelling salesman problem. On the other hand, as many DNA computing pioneers have pointed out, DNA computation offers major benefits even in cases where speed isn’t one of them.
NON-DETERMINISTIC DNA
So far we’ve seen an example of a massively parallel DNA computation hard-wired to solve one particular problem. We’ve also seen the implementation of logic gates in DNA, with the potential for universal computation, but with much less scope for inherent parallelism. Neither seems to take us close to a super-fast, general purpose DNA computer. However, a 2017 project at the University of Manchester holds out the promise of exactly that.
The development was of a so-called nondeterministic architecture and this is the key to the phenomenal performance on offer. We are used to the idea of computers being deterministic, that is, given a particular program instruction and particular data, the action it will take is fixed. In a non-deterministic architecture, however, there is no such direct link between cause and effect. Instead, it is possible to write a program in which instructions provide several alternative courses of action. Until now, such computers were mere thought experiments and, even if they were physically possible, they probably wouldn’t give a correct answer to a problem because they might guess the wrong course of action from the alternatives provided in the program. The University of Manchester’s novel DNA-based architecture changes all that.
The university’s Professor Ross King likes to give the example of solving a maze to show how this might work. Each time a junction is reached, rather than following one of the two paths at a time as a conventional program would, the code specifies that either path could be taken, and the inherently parallel nature of the architecture enables both to be taken simultaneously. Once these two paths reach further junctions, the computer would find itself following four paths concurrently, and so on, with the number of parallel paths increasing exponentially. Formerly a pipe dream, DNA makes this feasible, as King explained. “To achieve this we utilise DNA’s ability to replicate. At any choice point in a computation, our NUTM (Non-deterministic Universal Turing Machine) replicates itself and takes all possible choices.”
This DNA research uses Thue strings, which offer a quite different model of computation to the one we’re familiar with, as Professor King elaborates. “The Thue string model is a rewriting system. For example, if you see the string ‘ac’ you can rewrite it as ‘ca’, and vice-versa. The non-determinism comes from the fact that there may be multiple choices of possible rewrites in a string”.
Unfortunately, it appears that programming it is by no means a trivial task. “We know from the theory of computer science that the Thue system is universal — that is, it can be used to compute anything any other computer can do. However, that does not mean that programming it is easy; it is even more difficult than a classical Turing
Machine. Indeed, neither I nor my colleagues have coded anything significant. It would be an interesting challenge to your readers to emulate the Thue system and to use it to, say, encode the addition of two numbers”.
So what of the future for NUTMs? Will we ever see them on the desktop? The fact is that DNA reactions will never be fast, but this can be more than offset by the massive parallelism on offer. So, for general purpose computing, with minimal scope for high levels of parallelism, simple problems would actually take longer.
In the elite world of high-performance computing, though, things couldn’t be more different. We could envisage a scenario in which DNA-based NUTM co-processors work alongside conventional silicon processors in supercomputers to achieve a massive increase in performance and, at the same time, a huge reduction in energy consumption.
DNA STORAGE
While DNA computers are likely to be niche products, there is evidence that DNA as a storage media may soon be much more mainstream. In nature, the sequence of nucleotides in each strand of DNA represents the data that describes the very nature of the species and the traits of the individual. But outside a living being, the sequence of DNA could, in theory, represent anything. If we bear in mind that there are four different bases, each base could potentially represent two bits of information.
Things aren’t quite that simple because DNA reactions aren’t 100% repeatable and this would lead to corrupted data; they are slow, so reading and writing would take place at a snail’s pace; and currently costs would be astronomical. But methods have been devised to counter the effects of data loss by using redundancy; prices will certainly reduce if and when the technology is commercialised, and a slow access speed isn’t a problem for its intended application. That application is data archiving, and here the benefits of DNA are its longevity and data density. When we recall that DNA testing was a key element in identifying the ‘skeleton in a car park’ as the remains of Richard III a few years ago, the fact that DNA can survive for hundreds of years, even in non-ideal conditions, is evident. In fact, DNA has been extracted and sequenced from Egyptian mummies and from extinct animals such as mammoths and sabre-toothed cats, which makes the 10-100 year lifetime of optical discs, magnetic storage and flash memory look rather puny by comparison.
Meanwhile, data density is absolutely staggering. Researchers at Microsoft and the University of Washington have encoded 215 petabytes (215,000 terabytes) into a gram of DNA; at this density, all the world’s data could be stored in 10 tonnes of DNA, which could fit into a trailer. By way of comparison, it’s been estimated that the same data on a stack of CDs would reach beyond the moon.
Doug Carmean of Microsoft Research is widely reported as saying that they aim to have a “protocommercial system in three years storing some amount of data on DNA in one of our data centres, for at least a boutique application”. Indeed, speculation is rife that Microsoft plans to add DNA storage to its cloud-based services in the not-too-distant future.
So will we shortly have DNA in our desktop PCs for day-to-day work? Probably not, thanks to the speed deficit, but the molecule of life might soon be immortalising our cherished photos and irreplaceable data instead. And let’s face it, if we ever do have DNA processors injected into our bloodstream, that will surely be the ultimate in personal computing.