San Antonio Express-News (Sunday)
STAAR takes one automated step backward
It shouldn’t be a shock that most written responses on the State of Texas Assessments of Academic Readiness, commonly known as STAAR, are now graded by some variant of artificial intelligence — not humans.
The STAAR, as well as its past iterations, have always prioritized shortcuts, stifling education in Texas since before I was in elementary school.
Working for eight years as a teacher, I had an up-close look at the fierce culture of STAAR testing. It looms over teachers and students all year until its arrival is marked by some students experiencing anxiety so strong it sends them running to the restroom to vomit.
Teachers are also anxious, fearful of mistakenly doing something wrong during the testing days that would violate protocol. Or perhaps worse, their students won’t perform well on the one-and-done snapshot that drains the joy from teaching and learning.
While assessment is necessary to gauge learning, a single test doesn’t convey the whole story about what and how students are learning. Not only is the STAAR a colossal waste of resources and time, it shapes how teachers teach, students learn, and schools and districts are judged. And now a computer program is grading the results. Why are we doing this?
Gov. Greg Abbott isn’t even all that attached to the STAAR. He proposed eliminating STAAR, although this was just a sweetener in his bitter push for taxpayer-funded vouchers, which would pull dollars out of the state’s underfunded public education system.
AI scoring will be a money saver, even if it comes with certain incalculable costs. After a stealthy December rollout, Texas Education Agency officials explained the change is necessary because the new STAAR, which launched last year, includes essays at every grade level that are “very time consuming and laborious to score,” according to the Dallas Morning News.
The Texas Tribune reported that in 2023, TEA hired about 6,000 temporary scorers, but this year, it will need fewer than
2,000. The state will cut more than $15 million annually off the $90 million-plus exam.
TEA officials defend computer scoring, saying programs emulate how humans would score an essay. But computers can’t replace humans. The STAAR’s move to online testing, saddled with major technology problems, proved that.
AI’s record on grading isn’t clean. In Ohio, some districts reported irregularities after more-than-expected grades of zero resulted from computer scoring.
In Texas, if districts or parents have concerns about student
scores on written responses, they can request they be rescored by humans — for a fee. If the score is changed, the fee is waived, according to a December TEA scoring process guide. I wonder who will play those odds.
Aside from grading failures, I can only imagine how teachers will not only be pressured to teach to a test but teach students how to write for computers.
If AI assessed this column, would I fail or would the machine fail to understand? It doesn’t follow the formulaic five-paragraph rules sprinkled with perfunctory transitions.
I’ve steered clear of playing with AI, but as I closed this column, I asked ChatGPT to write a column in my style about using AI to grade STAAR essays. Maybe one day, our state will just allow students to ask AI to write their essays and take their tests, which can then be assessed by AI.
Within seconds, the software spat out an essay titled “The Future of Grading STAAR Essays: A Chat with GPT.”
It was eight paragraphs of drivel laced with say-nothing quotes from me.
Here’s the final paragraph:
“In conclusion, the use of AI, such as ChatGPT, has the potential to revolutionize the grading of STAAR essays. By leveraging AI technology, educators can streamline the grading process, provide more consistent feedback, and ultimately enhance the learning experience for students. As we navigate the ever-changing landscape of education, embracing innovative solutions like AI grading could pave the way for a more efficient and effective educational system.”
I’m not surprised at AI’s conclusion, but I wonder if it would give this type of writing an A.