Fixing AI one database entry at a time
The AI Incident Database aims to track and record all incidents of artificial intelligence missteps, mistakes and more. Nicole Kobie reveals how it works
Nicole Kobie explores the database that aims to track and record all incidents of artificial intelligence missteps and mistakes.
“It’s likely to grow quickly as the technology becomes more commonplace and there’s more opportunity for things to go wrong”
Workers killed by robots. A pedestrian run down by a driverless car. Algorithmically generated animated videos that terrify children. Machinelearning image analysis that means police arrest the wrong person, invariably a Black man. Horror stories abound of artificial intelligence (AI) gone wrong – and Sean McGregor wants to hear them all.
McGregor is the project lead at the AI Incident Database (AIID), a repository of missteps and mistakes made by supposedly smart systems that’s part of the wider efforts of industry group, the Partnership on
AI. “One of the project ideas was to build a taxonomy of AI failures to really understand how things can go wrong,” McGregor said, referencing a classification system that can help machines better understand language.
The problem was that there wasn’t any data – and, as a trained machinelearning researcher, McGregor is accustomed to first gathering data, learning from it and then making decisions. “We didn’t have a dataset,” he said. “We had a few ad hoc lists of failures, but they weren’t really systematised, or collected or brought together in one place.” In a moment of what McGregor describes as “bravado”, he offered to build it.
Collective memory
So far, the database has 1,200 submissions for 100 incidents, although there’s something of a backlog at the moment with more reports requiring approval. “I think, unfortunately, it’s likely to grow rather quickly as the technology becomes more commonplace and there’s more opportunity for things to go wrong,” McGregor said.
The database lets you sift through those cases by key terms in the report text, as well as incident number, media sources and more. The 100 incidents include everything from deaths by robot to facial recognition missteps, algorithmic hiring bias and discrimination in systems.
McGregor’s favourite is the security robot that fell into a fountain in a shopping centre (incident #68) and eventually got lost in a car park. “It was wandering around and its call button wasn’t connected and someone was just hitting that button to get the police,” he said. “It just told her to get out the way, played a little jingle, and then kept going – it’s a perfect symbol for the state of AI. It can do a lot, it’s quite inventive in a lot of ways, but it’s also bumbling its way through the world.”
While some of the incidents are less amusing than a bumbling bot, McGregor says the good news is that the various mishaps tend to be genuine mistakes rather than malicious. “I was happy to see how many of these instances resulted from a failure of imagination, rather than any kind of carelessness or recklessness,” he said. “I expected there to be more of the ‘just go to market mentality, who cares if it’ll go wrong’.
“With each of these cases, we learn something new. And the next time it happens, we know it’s a rush [to market]. But I think it’s largely a wellmeaning collection of people still working on intelligent systems.”
Tracking not shaming
Indeed, the aim isn’t to shame AI systems out of existence, but to track their faults in order to avoid making the same mistake again by letting researchers study problems and developers see what missteps others have made. “I go back to the George Santayana quote: ‘those who cannot remember the past are condemned to repeat it’,” he said, referencing the writer and philosopher. “We need that collective memory as a field.”
McGregor points to the US Federal Aviation Authority, which tracks incidents with planes, and the Fatality Analysis Reporting System, which collects vehicle crash data, saying the aim is to do the same with AIID. Indeed, that’s why the database tracks “incidents” rather than “mistakes” or worse – as with those systems, the AIID wants to collect mistakes that could have happened, but were caught before any harm was done.
“The term ‘incident’ as opposed to fa ilure or accident is actually derived from the aviation industry, which has a distinction between incident and accident,” he explained. “An accident is where a plane crashed or something of that nature, while an incident is something nearly crashed but didn’t.” The aim is to cover both in the database, as missteps that are caught are also worth learning about.
Imagination gap
There are a few different ways for the database to be used, such as developing that taxonomy McGregor began the project to do or academic research tracking the challenges of AI in the real world. It can also be used by activists looking for examples about a specific technology; “facial recognition” is a popular search term on the database. Of course, it’s also helpful for journalists who are
looking for examples to start a story about AI dangers.
However, the aim is that the AIID is also used by practitioners – those developing AI systems and those who are implementing them, as well as lawyers looking for liability and public relations teams worried about negative headlines. “HR” is one of the top search terms, for example, with incidents such as Amazon’s hiring tool discriminating against women (incident #37). “One of the use cases we’re explicitly looking to support is product managers, someone who’s in a company and building a system to go to market,” said McGregor. “They have a lot of incentive to not have their product written about for the wrong reason.”
To help avoid those negative headlines, product managers and developers can search for application areas, the type of technology they’re using or challenges they’re trying to solve, bringing up examples from the database of what happens when it all goes wrong. That allows those people to address similar issues in their own work, hopefully creating better systems in the process. “It solves a problem of imagination of what can go wrong, and allows that person to layer on additional requirements, engineering effort and so on,” he said. “It allows them to say: we can’t go to market until we solve this. Because look how embarrassing it will be.”
And that should help inspire better AI systems. “It’s one thing to say, ‘it’s the right thing to do’,” said McGregor. “But what really cuts straight to action and making things better is anything related to the bottom line.”
Building better AI
The project still needs work. There’s that backlog of submissions to read through for quality control and correct categorisation, and there are efforts to translate the database into different languages in the hope of extending its reach globally – a task that will be done using AI translators, naturally. Plus, McGregor is working on a deduplication tool to ensure incidents aren’t accidentally listed twice.
To make it easier for people to use without digging into the database, the team is also planning to create a newsfeed of articles that have characteristics of AI incidents to o help p catch more cases. “That grows ow th he dataset and makes it possible sible for r natural language proce essing g t to automatically monitor for various incidents,” McGregor or said. said “Being B ng a machine learning researche er, I’m really eager to play p y with ith that element.”
Then there’s the taxon onomy y work that sparked this entire project. McG Gregor g is working with an aca ademic organisation, which he’s not abl le to o name, to develop a qu ualitative e codi ding g scheme to apply to the database. d B But t his system won’t be the on only y one allowed on the database – indeed, nddeedd, it’s been designed to be open for fo anyone to use how they see fit. “Th The incident database is providing an infrastructure that other organisations and individuals can e to define and apply mies,” he said. “We h ve y build many ta on ny different applications on top of the data to have different viewpoints into the data.” n lly what needs to h pp ID: more data and more people using that data. Because that’s what will help AI become more efu ngerous. “The more w get in there and the more we make it d towards people and d the industry, the better e will be to make the safer and really figure out to put the mitigations and safety and fairness protocols around the system,” said McGregor.