Training computers to recognise actions
CAMBRIDGE, Massachusetts: A person watching videos that show things opening — a door, a book, curtains, a blooming flower, a yawning dog — easily understands the same type of action is depicted in each clip.
“Computer models fail miserably to identify these things. How do humans do it so effortlessly?” asks Dan Gutfreund, a principal investigator at the MITIBM Watson AI (Artificial Intelligence) Laboratory and a staff member at IBM Research. “We process information as it happens in space and time. How can we teach computer models to do that?” Such are the big questions behind one of the new projects underway at the MIT-IBM Watson AI Laboratory, a collaboration for research on the frontiers of artificial intelligence. Launched last year, the lab connects MIT and IBM researchers together to work on AI algorithms, the application of AI to industries, the physics of AI, and ways to use AI to advance shared prosperity.
The Moments in Time dataset is one of the projects related to AI algorithms that is funded
Computer models fail miserably to identify these things. How do humans do it so effortlessly? We process information as it happens in space and time. How can we teach computer models to do that? – Dan Gutfreund, a principal investigator at the MIT-IBM Watson AI Laboratory
by the lab. It pairs Gutfreund with Aude Oliva, a principal research scientist at the MIT Computer Science and Artificial Intelligence Laboratory, as the project’s principal investigators.
Moments in Time is built on a collection of one million annotated videos of dynamic events unfolding within three seconds. Gutfreund and Oliva, who is also the MIT executive director at the MIT-IBM Watson AI Lab, are using these clips to address one of the next big steps for AI: teaching machines to recognise actions.
The goal is to provide deeplearning algorithms with large coverage of an ecosystem of visual and auditory moments that may enable models to learn information that isn’t necessarily taught in a supervised manner and to generalise to novel situations and tasks, say the researchers.
“As we grow up, we look around, we see people and objects moving, we hear sounds that people and object make. We have a lot of visual and auditory experiences. An AI system needs to learn the same way and be fed with videos and dynamic information,” Oliva said.
One key goal at the lab is the development of AI systems that move beyond specialised tasks to tackle more complex problems and benefit from robust and continuous learning. “We are seeking new algorithms that not only leverage big data when available, but also learn from limited data to augment human intelligence,” said Sophie V. Vandebroek, chief operating officer of IBM Research, about the collaboration. — MIT News