Toronto Star

THE BRAIN MAKER

How a Toronto professor’s research took Silicon Valley by storm and shaped the future of artificial intelligen­ce

- KATE ALLEN

MOUNTAIN VIEW, CALIF.— Three summers ago, at the age of 64, Geoffrey Hinton left his home in Toronto’s Annex neighbourh­ood to become an intern at Google. He received a propeller beanie stitched with the word “Noogler” and attended orientatio­n sessions populated mostly by millennial­s, who seemed to regard him, in his words, as a “geriatric imbecile.”

In fact, Google wanted him because he is the godfather of a type of artificial intelligen­ce currently shattering every ceiling in machine learning.

Machine learning is the field of computer science concerned with algorithms that learn as humans do. Algorithms that learn as humans do underpin the technology that would furnish a Jetsons- like future, from self-driving cars to virtual assistants. And if that future suddenly seems plausible — even imminent — Hinton, a University of Toronto computer science professor, has a lot to do with it. The younger Nooglers could be forgiven for not recognizin­g him, however. The systems Hinton works on, known as neural networks, are modelled on the human brain. After successes in the 1980s, neural nets stalled, and most of academia turned its back.

Hinton was one of the few who soldiered on, their research aided by a modest Canadian grant.

In 2006, Hinton made a breakthrou­gh. In quick succession, neural networks, rebranded as “deep learning,” began beating traditiona­l AI in every critical task: recognizin­g speech, characteri­zing images, generating natural, readable sentences. Google, Facebook, Microsoft and nearly every other technology giant have embarked on a deep learning gold rush, competing for the world’s tiny clutch of experts. Deep learning startups, seeded by hundreds of millions in venture capital, are mushroomin­g.

Hinton now spends three-quarters of his time at Google and the rest at U of T. Machine learning theories he always knew would work are not only being validated but are finding their way into applicatio­ns used by millions. At 67, when he might be winding down a long and distinguis­hed career, he is just now entering its most exciting phase.

“The stuff that people are doing here is the future,” he says. “I don’t think I’ll ever retire.”

Hinton was only technicall­y an intern at Google. He arrived that summer for what he describes as a trial run — he was hesitant to leave Toronto, where he has lived with his family for most of the past quartercen­tury — and the short-term stint didn’t have any other obvious job title.

But in the manner of an intern, Hinton still seems chuffed to find himself quite where he is. On a recent morning at Google’s headquarte­rs in Mountain View, the first thing he did was enumerate the riches of one of many well-stocked “micro-kitchens.” His job is to rove among one of the company’s most highly valued teams, seeding the work underway with ideas.

That team is nicknamed Google Brain, and is best known for inventing a computer program that learned to recognize cats.

Four years ago, in one of those snack-filled micro-kitchens, Jeff Dean, a longtime Google engineer, bumped into Andrew Ng, a Stanford University computer science professor and visiting researcher. They chatted about their work, and Dean was surprised to learn Ng was using neural networks — news of their rehabilita­tion had just begun to leak out of academia. They formed a team to see what these models could do with a huge thrust of computing force behind them. The project was initially based out of Google X, a semi-secret laboratory for ambitious, long-range projects.

The first thing the team did was design a massive neural net — some one billion connection­s distribute­d across 16,000 computer processors — and feed it a database of 10 million random frame grabs from YouTube. The images were unlabelled: in other words, the computer model was given an ocean of pixels and no other informatio­n. Could it learn to detect objects without human help?

After three days, the researcher­s came back and ran a series of visualizat­ions on the neural net to see what its strongest impression­s were. Three fuzzy images emerged: a human face, a human body and a cat.

“The model was actually picking up on patterns that you as a human would also say are important, without ever being told,” says Dean. (If you don’t think cat videos are important, go ahead and stop watching them.)

The team moved out of the Google X labs and quietly began absorbing the world’s tiny cadre of neural network specialist­s.

Navdeep Jaitly, a PhD student under Hinton, became a Google intern in the summer of 2011. He was asked to tinker with Google’s speech recognitio­n algorithms, and he responded by suggesting they gut half their system and replace it with a neural net. Jaitly’s program outperform­ed systems that had been finetuned for years, and the results of his work found their way into Android, Google’s mobile operating system — a rare coup for an intern. Jaitly now works at Google Brain as a full-time research scientist.

In 2012, Hinton and two of his other U of T students, Alex Krizhevsky and Ilya Sutskever, entered an image recognitio­n contest. Competing teams build computer vision algorithms that learn to identify objects in millions of pictures; the most accurate wins. The U of T model took the error rate of the best-performing algorithms to date, and the error rate of the average human, and snapped the difference in half like a dry twig. The trio created a company, and Google acquired it.

In December 2013, researcher­s from a small British startup called DeepMind Technologi­es posted a pre- print of a research paper that showed how it had taught a neural net to play, and beat, Atari games. By January, Google had paid a reported $400 million for DeepMind, a company with an impressive roster of deep learning talent and no products to speak of.

Facebook hired neural net pioneer Yann LeCun to head an AI research lab earlier that winter. Baidu, the Chinese search giant, picked up Andrew Ng after he left Google. IBM and Microsoft have teams, too.

But Google’s $400-million acquisitio­n of DeepMind echoed across Silicon Valley like the firing of a starter pistol, and the race it began has yet to slow down. Undergrads are flocking to machine learning, venture capitalist­s call it the Next Big Thing and startups are multiplyin­g. Deep learning became one of the hottest trends in tech practicall­y overnight, and industry insiders estimate Google employs half of the world’s experts, if not more.

The only thing fiercer than the sudden fever for deep learning is the chill that lay over the entire field just a few years ago.

In July 1958, the New York Times published a wire story that described a new, experiment­al “thinking machine” called a Perceptron. The man who designed it claimed it would eventually be able to read and write, and the story said it would be the “first device to think as the human brain . . . Perceptron will make mistakes at first, but will grow wiser as it gains experience.”

The Perceptron was an early neural network. Its designer was Frank Rosenblatt, a young research scientist at Cornell. Rosenblatt was convinced that algorithms could learn as human brains do, and his machine made use of this architectu­re: like the brain’s web of neurons, informatio­n travels through interconne­cted layers of nodes.

Rival computer scientists didn’t agree. In 1969, Marvin Minsky and Seymour Papert published a book called Perceptron­s that summarily dismissed the neural networks’ potential to succeed at complex tasks. The book is still controvers­ial: most credit it with quashing interest in the nascent field. For the next 15 years, interest and funding were shunted elsewhere.

In the dead of this first “neural net winter,” Geoffrey Hinton began a PhD in neural networks at Edinburgh University.

“People said, ‘This is crazy. Why are you wasting your time on this stuff? It’s already shown to be nonsense,’ ” he remembers.

Born in the U.K. to a family with a deep scientific pedigree, Hinton never considered being anything other than a scientist. His paternal great-great-grandfathe­r was the mathematic­ian George Boole. His dad was an entomologi­st. He remembers his mother, a math teacher, telling him it was OK if he didn’t get a PhD in a tone that clearly suggested it was not.

His father, a Marxist, sent Hinton to a Christian school, a clash of ideologies fine- tuned to produce someone who would question received knowledge. The young Hinton knew “that some of the prevailing wisdom was hopelessly wrong — but not necessaril­y which bit” (he was pretty confident it was the religion bit).

Hinton was obsessed with understand­ing the mind, and felt to do that, one must understand the brain. But as an undergradu­ate at Cambridge University, Hinton bopped from discipline to discipline, finding no satisfacti­on: not in physiology, not in philosophy and certainly not in psychology, though his degree was finally from that department.

Once he settled on a PhD program in artificial intelligen­ce, an adviser tolerated his fascinatio­n with unpopular neural networks. But for Hinton, the brain-like systems were “obviously the only way it could possibly be.” There could be no intelligen­ce that mimics our own without a neural substructu­re that does too.

After Hinton completed his doctorate in 1978, he moved around a lot, searching for a research haven. He spent some vital time at the University of California, San Diego, where the academic atmosphere, he says, was much more receptive, and where he collaborat­ed with the cognitive neuroscien­ce pioneer David Rumelhart.

In 1986, Hinton, Rumelhart and computer scientist Ronald J. Williams co-authored a paper that showed how a method called “backpropag­ation” could vastly improve the efficiency of neural networks. Backpropag­ation made neural nets substantia­lly better at tasks such as recognizin­g simple shapes and predicting a third word after seeing two. A system based on the work of Yann LeCun was eventually used to read bank cheques. By the late ’80s, neural nets seemed poised to transform AI.

But the systems faltered when researcher­s attempted more ambitious tasks. Frustrated, the machine learning community turned to rule-based systems that provided better results.

By the early 2000s, the number of researcher­s who specialize­d in neural networks dwindled to fewer than half a dozen.

Ask anyone in machine learning what kept neural network research alive and they will probably mention one or all of these three names: Geoffrey Hinton, fellow Canadian Yoshua Bengio and Yann LeCun, of Facebook and New York University.

But if you ask these three people what kept neural network research alive, they are likely to cite CIFAR, the Canadian Institute for Advanced Research. The organizati­on creates research programs shaped around ambitious topics. Its funding, drawn from both public and private sources, frees scientists to spend more time tackling those questions, and draws experts from different discipline­s together to collaborat­e.

CIFAR’s very first program, establishe­d in 1982, was in artificial intelligen­ce and ro- botics. It lured Hinton from the U.S. to Canada in 1987: he was attracted by a position that offered the maximum amount of time to pursue basic research. (He was also turned off by Reagan-era America, when much of the funding for artificial intelligen­ce research began to come from DARPA, the Department of Defense research arm.)

That program ended in the mid-1990s. But in 2004, Hinton asked to lead a new program on neural computatio­n. The mainstream machine learning community could not have been less interested in neural nets.

“It was the worst possible time,” says Bengio, a professor at the Université de Montréal and co-director of the CIFAR program since it was renewed last year. “Everyone else was doing something different. Somehow, Geoff convinced them.

“We should give (CIFAR) a lot of credit for making that gamble.”

CIFAR “had a huge impact in forming a community around deep learning,” adds LeCun, the CIFAR program’s other co-director. “We were outcast a little bit in the broader machine learning community: we couldn’t get our papers published. This gave us a place where we could exchange ideas.”

In 2006, Hinton and a PhD student, Ruslan Salakhutdi­nov, published two papers that demonstrat­ed how very large neural networks, once too slow to be effective, could work much more quickly than before. The new nets had more layers of computatio­n: they were “deep,” hence the method’s rebranding as deep learning. And when researcher­s began throwing huge data sets at them, and combining them with new and powerful graphics processing units originally built for video games, the systems began beating traditiona­l machine learning systems that had been tweaked for decades. Neural nets were back.

Today, Hinton finds himself in a strange position: as he watches billions of dollars of industry investment pour into what was recently considered a hopelessly musty academic problem, government funding for science is moving in the opposite direction. Increasing­ly, academics must justify the payoff for industry before the basic research has begun.

“There is a lot of pressure to make things more applied; I think it’s a big mistake,” Hinton says. “In the long run, curiosityd­riven research just works better . . . Real breakthrou­ghs come from people focusing on what they’re excited about.”

He likes this quote, attributed to Einstein: “If we knew what it was we were doing, it would not be called research, would it?”

At tech giants and smaller startups, most deep learning applicatio­ns revolve around three tasks: speech recognitio­n, image recognitio­n, and reading or generating natural written language.

Often they involve more than one. In De-

cember, Microsoft-owned Skype unveiled a demo version of a real-time translatio­n service. As one caller speaks English or Spanish, the program renders it in the other language, in both spoken and written form.

The U of T computer science department website hosts a version of a tool that many industry players are racing to perfect: upload a picture, and it generates a written caption. At a CIFAR talk in March, Ruslan Salakhutdi­nov, now a U of T professor, showed that the model is eerily accurate — but not always. Its best guess for a badly lit rock concert was “Giant spider found in the Netherland­s.”

The holy grail is a system that incorporat­es all these actions equally well: a generally intelligen­t algorithm. Such a system could understand what we are saying, what we mean by what we say, and then get what we want. That could take the form of a search engine that, rather than simply computing our words, understand­s the meaning of a request, or an autonomous vehicle that recognizes hazards as it delivers us to our destinatio­n, or a robotic assistant fetching beer.

Yet the irony of artificial intelligen­ce, many point out, is that once it arrives, it no longer seems particular­ly intelligen­t. AI is a fuzzy category. If artificial intelligen­ce is what humans can do but computers can’t, once computers can do it, AI disappears.

The machine learning label is more helpful, because it describes the developmen­t of computer programs that can behave in ways beyond their explicit programmin­g.

Yet as neural nets improve, these successes may not be so obvious either. Siri and her cousins may just improve incrementa­lly, until we forgot we ever lived without her. (Apple, incidental­ly, is the one Silicon Valley giant that hasn’t trumpeted its deep learning hires: nobody knows whether the company — a notorious black box — uses neural networks, though it’s safe to assume it does.)

Hype is vicious: it has smothered and revived AI enough that machine learning scientists are alive to the dangers of oversellin­g their research. Yes, there is always the potential that the billions of dollars being funnelled into deep learning will look foolish in 2050. Some of the problems researcher­s are grappling with are surprising­ly basic: by adding a few random pixels to an image, neural networks can be tricked; even without that, they mistake concerts for spiders and ducks for Barack Obama.

But the algorithms thrive on powerful computing and big data. Both are only growing.

“In the past, neural networks were academical­ly interestin­g,” says Hinton. Today, they are embedded in applicatio­ns most of us use every day. True to form, Hinton doesn’t forecast another neural net winter.

“They’re not just going to sort of fade away,” he says. “They really work.”

 ?? NOAH BERGER FOR THE TORONTO STAR ?? University of Toronto professor Geoffrey Hinton spent years studying neural networks, a model of artificial intelligen­ce that has enjoyed a comeback recently after decades of neglect. He now spends much of his time at Google’s campus in Mountain View,...
NOAH BERGER FOR THE TORONTO STAR University of Toronto professor Geoffrey Hinton spent years studying neural networks, a model of artificial intelligen­ce that has enjoyed a comeback recently after decades of neglect. He now spends much of his time at Google’s campus in Mountain View,...
 ??  ??
 ?? ILLUSTRATI­ON BY NURI DUCASSI/TORONTO STAR ??
ILLUSTRATI­ON BY NURI DUCASSI/TORONTO STAR
 ?? TONY AVELAR/BLOOMBERG VIA GETTY IMAGES ??
TONY AVELAR/BLOOMBERG VIA GETTY IMAGES
 ?? ERIC RISBERG/THE ASSOCIATED PRESS ?? Google is developing a self-driving car — one of many possible applicatio­ns for a “deep learning” algorithm that can respond to human needs and changing situations.
ERIC RISBERG/THE ASSOCIATED PRESS Google is developing a self-driving car — one of many possible applicatio­ns for a “deep learning” algorithm that can respond to human needs and changing situations.

Newspapers in English

Newspapers from Canada