How an old tool will tell us if AI is ready to take your job

2023-03-05 - By Vishal Gupta

“AI passes U.S. medical licensing exam.” “ChatGPT passes law school exams despite ‘mediocre’ performance.” “Would ChatGPT get a Wharton MBA?”

Headlines such as these have recently touted (and often exaggerated) the successes of ChatGPT, an artificial intelligence tool capable of writing sophisticated text responses to human prompts. These successes follow a long tradition of comparing an AI’s ability to that of human experts, such as Deep Blue’s chess victory over Gary Kasparov in 1997, IBM Watson’s “Jeopardy!” victory over Ken Jennings and Brad Rutter in 2011, and AlphaGo’s victory in the game Go over Lee Sedol in 2016.

The implied subtext of these recent headlines is more alarmist: AI is coming for your job. It’s as smart as your doctor, your lawyer and that consultant you hired. It heralds an imminent, pervasive disruption to our lives.

But sensationalism aside, does comparison of AI with human performance tell us anything practically useful? How should we effectively utilize an AI that passes the U.S. medical licensing exam? Could it reliably and safely collect medical histories during patient intake? What about offering a second opinion on a diagnosis? These kinds of questions can’t be answered by performing comparably to a human on the medical licensing exam.

The problem is most people have little AI literacy — an understanding of when and how to use AI tools effectively. What we need is a straightforward, general-purpose framework for assessing the strengths and weaknesses of AI tools that everyone can use. Only then can the public make informed decisions about incorporating those tools into our daily lives.

To meet this need, my research group turned to an old idea from education: Bloom’s Taxonomy. First published in 1956 and later revised in 2001, Bloom’s Taxonomy is a hierarchy describing levels of thinking in which higher levels represent more complex thought. Its six levels are: 1) Remember — recall basic facts, 2) Understand — explain concepts, 3) Apply — use information in new situations, 4) Analyze — draw connections between ideas, 5) Evaluate — critique or justify a decision or opinion, and 6) Create — produce original work.

These six levels are intuitive, even for non-experts, but specific enough to make meaningful assessments. Moreover, Bloom’s Taxonomy isn’t tied to a particular technology — it applies to cognition broadly. We can use it to assess the strengths and limitations of ChatGPT or other AI tools that manipulate images, create audio, or pilot drones.

My research group has begun assessing ChatGPT through the lens of Bloom’s Taxonomy by asking it to respond to variations on a prompt, each targeting a different level of cognition.

For example, we asked the AI: “Suppose demand for COVID vaccines this winter is forecasted to be 1 million doses plus or minus 300,000 doses. How much should we stock to meet 95% of demand?” — an Apply task. We then modified the question, asking it to “Discuss the pros and cons of ordering 1.8 million vaccines” — an Evaluate level task. Then we compared the quality of the two responses and repeated this exercise for all six levels of the taxonomy.

Preliminary results are instructive. ChatGPT generally does well with Recall, Understand and Apply tasks but struggles with the more complex Analyze and Evaluate tasks. With the first prompt, ChatGPT responded well by applying and explaining a formula to suggest a reasonable vaccine quantity (albeit making a small arithmetic mistake in the process).

With the second, however, ChatGPT waffled unconvincingly about having too much or too little vaccine. It made no quantitative assessment of these risks, did not account for the logistical challenges of cold storage for such an immense quantity and did not warn of the possibility that a vaccineresistant variant might arise.

We are seeing similar behavior for different prompts across these taxonomy levels. Thus, Bloom’s Taxonomy allows us to draw more nuanced assessments of the AI technology than raw human versus AI comparison.

As for our doctor, lawyer, and consultant, Bloom’s Taxonomy also provides a more nuanced view of how AI might someday reshape — not replace — these professions. Although AI may excel at Recall and Understand tasks, few people consult their doctor to inventory all possible symptoms of a disease or ask their lawyer to recite case law verbatim or hire a consultant to explain the theory of Porter’s Five Forces.

But we turn to experts for higherlevel cognitive tasks. We value our doctor’s clinical judgment in weighing the benefits and risks of a treatment plan, our lawyer’s ability to synthesize precedent and advocate on our behalf, and a consultant’s ability to identify an outof-the-box solution no one else thought of. These skills are Analyze, Evaluate and Create tasks, levels of cognition where AI technology currently falls short.

Using Bloom’s Taxonomy we can see that effective human-AI collaboration will largely mean delegating lowerlevel cognitive tasks so that we can focus our energy on more complex, cognitive tasks. Thus, instead of dwelling on whether an AI can compete with a human expert, we should be asking how well an AI’s capabilities can be used to help foster human critical thinking, judgment and creativity.

Of course, Bloom’s Taxonomy has its own limitations. Many complex tasks involve multiple levels of the taxonomy, frustrating attempts at categorization. And Bloom’s Taxonomy does not directly address issues of bias or racism, a major concern in large-scale AI applications. But while imperfect, Bloom’s Taxonomy remains useful. It is simple enough for everyone to grasp, general-purpose enough to apply to a broad range of AI tools, and structured enough to ensure we ask a consistent, thorough set of questions of those tools.

Much like the rise of social media and fake news requires us to develop better media literacy, tools such as ChatGPT demand that we develop our AI literacy. Bloom’s Taxonomy offers a way to think about what AI can do — and what it can’t — as this type of technology becomes embedded in more and more parts of our lives.

Vishal Gupta is an associate professor of data sciences and operations at the USC Marshall School of Business and holds a courtesy appointment in the department of industrial and systems engineering.

How an old tool will tell us if AI is ready to take your job

Newspapers in English

Newspapers from United States