AI ‘already capable of deceiving humans’ and could go rogue
Artificial Intelligence (AI) is already deceiving us to get its own way, a study has found.
AI holds promise in revolutionising modern technologies but scientists are concerned the powerful tools could have unforeseen consequences for human society.
MIT scientists reviewed data and studies on a range of AI models and found computers are adept at bluffing in poker, deceiving people and using underhanded methods to get the upper hand in financial negotiations.
The authors warn that regulation is needed to stop the burgeoning technologies from developing these skills, which are unintended consequences of many programmes.
If the deception is not addressed, it is possible AI could be used to commit fraud, alter elections, interfere with politics and aid terrorist recruitment.
“AI systems are already capable of deceiving humans,” the authors wrote in their study, published in the journal Patterns.
“Large language models and other AI systems have already learned, from their training, the ability to deceive via techniques such as manipulation, sycophancy and cheating.”
The study found that Meta’s AI system Cicero, which ranks among the top 10pc of human players of the strategy game Diplomacy, was adept at deploying furtive tactics.
Study author Dr Peter S Park called the technology, built by Facebook’s parent company, “a master of deception”.
“While Meta succeeded in training its AI to win in the game of Diplomacy ... [it] failed to train it... to win honestly,” he added.
“Generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI’s training task. Deception helps them achieve their goals,” said Dr Park, an AI existential safety postdoctoral fellow at MIT.
The scientists caution that humans do not yet have ample protections in place to prevent against AI going rogue, and that the threat of deception from AI will only increase as the technology matures.
The study says there are four types of societal risk from AI: persistent false beliefs, political polarisation, enfeeblement that causes humans to give AI more power and authority, and nefarious management decisions if AI is empowered with managerial abilities within companies.
Meta was contacted for comment on these findings. (© Telegraph Media Group Ltd 2024)