Los Angeles Times

The SAT answers we need

- By Jay Rosner Jay Rosner is the executive director of The Princeton Review Foundation. He took the June 6 and Jan. 23 SATs.

Several hundred thousand high school students will take the second administra­tion of the new SAT on May 7. Students who took the first offering on March 5 are still waiting for their scores, which will not be sent to them until later this month.

While the long wait may bother students, there are more significan­t issues regarding the public’s access to critical SAT data. The College Board calls the new SAT “profoundly transparen­t,” but it won’t release so-called item-level data — informatio­n about how students nationwide fared on particular questions — to the public. In fact, it hasn’t released such statistics since 2000. That makes it difficult for the public to scrutinize why certain demographi­c groups perform so much better on the SAT than others.

On average, we know that boys outscore girls by a few points on the verbal section (recently renamed reading and writing) and by more than 30 points on the math section. We also know that whites outscore blacks and Latinos on verbal (by 98 points and 80 points, respective­ly) and on math (by 106 points and 76 points, respective­ly). These gaps have been constant for decades. In recent years, Asian American students have been outscoring white students considerab­ly on math and have reduced their verbal shortfall to a few points — the only gaps to have changed significan­tly in recent memory.

There are well-known external factors that contribute to these imbalances. Our culture discourage­s girls from excelling in math, and black and Latino children often attend weaker schools. But what if test design is also to blame?

Education Testing Services, which writes exams for the College Board, pretests all potential questions before finalizing a given SAT. It assumes that a “good” question is one that students who score well overall tend to answer correctly, and vice versa.

That’s problemati­c because, as mentioned, girls score lower than boys on math, and black students score lower than white students. So if, on a particular math question, girls outscore boys or blacks outscore whites, it has almost no chance of making the final cut. This process therefore perpetuate­s disparitie­s, virtually guaranteei­ng a test that’s ultimately easier for some population­s than others.

ETS does have a system for ensuring that, in its words, “examinees of comparable achievemen­t levels respond similarly” to each question, called Differenti­al Item Functionin­g. Basically, ETS separates students according to performanc­e — those who scored in the 200s in one group, those who scored in the 400s in another. If, within those groups, boys do far better on a given question than girls, or white students surpass black students, then ETS eliminates it.

See the shortcomin­g? “Achievemen­t level” is a euphemism for performanc­e on the SAT; there’s no external metric. In order to believe DIF results in a test that’s fair to everyone, you have to believe the test is fair to everyone.

But evidence from the last time the College Board released itemlevel data gives me reason to doubt that’s the case.

Below are two SAT math questions from the same October 2000 test — the most recent test for which item-level data are publicly available. The questions were equally difficult; each was answered correctly by only 45% of test takers. Only the first, however, produced dramatical­ly inequitabl­e results in terms of race and gender.

1) When a coin is tossed in an experiment, the result is either a head or a tail. A head is given a point value of 1 and a tail is given a point value of -1. If the sum of the point values after 50 tosses is 14, how many of the tosses must have resulted in heads? (A) 14 (B) 18 (C) 32 (D) 36 (E) 39 2) The sum of five consecutiv­e whole numbers is less than 25. One of the numbers is 6. Which of the following is the greatest of the consecutiv­e numbers? (A)6 (B) 7 (C) 8 (D) 9 (E) 10 More than half of boys, 55%, answered the first question correctly (by choosing C), but only 37 % of girls did. Similarly, 47% of whites answered correctly, but only 24% of blacks did. That’s what I call a question with two “large skews.”

On question 2, 49% of boys and 41% of girls answered correctly (by choosing A). That’s what I call a “medium skew.” Meanwhile 45% of whites and 35% of blacks answered it correctly. Roughly 7,000 more girls and 4,000 more black students picked the correct answer compared to the coin-toss question. (Remember, the questions were of equal difficulty overall.)

In all, 13 of the 60 math questions on the October 2000 test had large skews favoring boys over girls, and 22 of 60 had large skews favoring whites over blacks.

I can’t prove definitive­ly that large-skew questions have appeared on the SAT in the years since the October 2000 test — because the data are not public. If ETS had eliminated them, however, we’d probably have seen at least a small change in girls’ and black students’ scores relative to boys’ and white students’ scores. We haven’t.

In my experience, most folks who are not psychometr­icians (those constructi­ng bubble tests) consider large-skew questions unfair; that’s particular­ly true when they are shown other questions of similar overall difficulty that have smaller skews. Reasonable people can disagree on that point, or might object that a few large-skew questions on a long test don’t really matter. If there were no large-skew questions, disparitie­s wouldn’t disappear entirely overnight. But why not release the data so we can have an open conversati­on?

And it’s not just item-level data that the College Board keep from the public; it also rarely releases combined family income and race/ ethnicity data that would allow researcher­s to make comparison­s such as how high-income black students’ average math scores compare to those of low-income white students.

One might assume that aff luent students of all races/ethnicitie­s score higher than all low-income students. But the last time the College Board released such data, in a 2001 report, that wasn’t the case. On the math section, black students in the highest income group scored, on average, lower than white students in the lowest income group. Again, without transparen­cy it’s difficult to identify the cause of this troubling disparity so that we can address the problem.

(The ACT, the other national college admissions test and the SAT’s competitor, has never publicly released either item-level data or combined income and race/ethnicity data; however, its overall group score gaps closely parallel those of the SAT, and it uses the same test constructi­on methods.)

SAT scores matter: College admissions officers still weigh them heavily in making their decisions. Undoubtedl­y, there are many reasons behind persistent SAT achievemen­t gaps, and the College Board insists that test design is not to blame. But what’s the harm in transparen­cy so we don’t have to take the College Board at its word?

The College Board won’t release so-called item-level data to the public.

 ?? Wes Bausmith Los Angeles Times ??
Wes Bausmith Los Angeles Times

Newspapers in English

Newspapers from United States