AI could make the internet more accessible
Most of us take internet use for granted, once we have access. The world wide web is so intuitive, it appears, that anyone can simply call it up and begin clicking. In truth, that applies only to the privileged who are experienced users and with full use of their physical faculties.
For people with disabilities, a simple website can be an obstacle course. The early standards included the “alt” tag in the HTML code that produces a web page, allowing text-to-speech systems to read the description of an image to the visually challenged.
But beyond that, little thought has gone into greater accessibility, despite the fact that one of the aims of the World Wide Web Consortium (W3C), which develops standards and guidelines, is an internet based on the principles of accessibility.
The good news is that dramatic advances in natural language processing (NLP) and artificial intelligence (AI) have the potential to transform accessibility to the internet in all its forms, from apps to the web.
Last month, at the 37th Conference on Neural Information Processing Systems (NeurIPS), for AI and machine learning, researchers from Ohio State University presented a study on how an AI agent could complete complex tasks on any website, using simple language commands.
According to Yu Su, co-author of the study and an assistant professor of computer science and engineering, in the three decades since the web was first released into the public domain it has become an incredibly intricate, dynamic system.
While there are billions of websites available to help access information or communicate with others, many tasks on the internet can take more than a dozen steps to complete. Su said the study, which uses information from live sites to create web agents — or online AI helpers — was a step towards making the digital world a less confusing place.
“For some people, especially those with disabilities, it’s not easy for them to browse the internet,” he said. “We rely more and more on the computing world in our daily life and work, but there are increasingly a lot of barriers to that access, which, to some degree, widens the disparity.”
Generative AI — such as ChatGPT,
Google Bard, Anthropic Claude and Microsoft Bing AI, all of which use large language models (LLMs) — has the potential to close the gap. By taking advantage of the power of LLMs, said Su, the agent works similarly to how humans behave when browsing the web.
The team showed that their model was able to understand the layout and functionality of different websites using only its ability to process and predict language. According to Su, much of their success was due to their agent’s ability to handle the internet’s ever-evolving learning curve.
The team lifted more than 2,000 openended tasks from 137 different real-world websites, which they then used to train the agent. The exercises were fascinating tests of AI agents’ skills: “Some of the tasks included booking one-way and round-trip international flights, following celebrity accounts on Twitter, browsing comedy films from 1992 to 2017 streaming on Netflix, and even scheduling car knowledge tests at the DMV [department of motor vehicles].
“Many of the tasks were very complex — for example, booking one of the international flights used in the model would take 14 actions.”
Su said such effortless versatility allowed for diverse coverage on a number of websites, and opened up a new landscape for future models to explore and learn autonomously.
“Throughout my career, my goal has always been trying to bridge the gap between human users and the computing world. The real value of this tool is that it will really save people time and make the impossible possible.”