Pittsburgh Post-Gazette

Why fear this RB? Check data

- Elizabeth Bloom: ebloom@post-gazette.com, 412-263-1750 and Twitter: @BloomPG.

was constantly looking for NFL data to analyze,” Horowitz, who now works for the NBA, wrote in an email. “Unfortunat­ely, at that time almost all the NFL data I found was protected by a paywall or wasnot in a usable format.”

In contrast, nflscrapR (pronounced “NFL scraper”) cleans up messy play-by-play data from the NFL’s applicatio­n programmin­g interface and publishes it as is, along with expected points and win probabilit­y columns. (The data and other informatio­n are available on GitHub websites and the Carnegie Mellon Sports Analytics Club website.)

Nugent, associate department head and director of undergradu­ate studies in CMU’s statistics department, described nflscrapR as a “massive contributi­on” to the field, not only because it provides a huge amount of informatio­n but also because it encourages open and reproducib­le data more broadly.

The name nflscrapR refers to the programmin­g language R and also gives a hat-tip to nhlscrapR, a comprehens­ive hockey-focused package designed by CMU grad Sam Ventura and former CMU faculty member Andrew Thomas. Ventura, director of hockey research for the Penguins, mentors Yurko and Horowitz and is an outside adviser for the Carnegie Mellon Sports Analytics Club.

“The power of nflscrapR is in its flexibilit­y to analyze the game at any level,” Ventura said in an email. “Since the software provides data on every play from every game since 2009, it allows fans to break down individual player and team statistics by season, by game, by quarter, by drive, orby play.”

Some of Yurko’s findings fromnflscr­apR:

Bell provides Dez Bryantlike value as a receiver along with well-above-average rushing;

Rushing, on average, decreases a team’s expected points;

Last season, Colin Kaepernick was the second most efficient rushing quarterbac­k, after Dak Prescott, and was an average passer;

Derek Carr, an MVP candidate in 2016, was in fact an average passer last year, but he was the No. 2 most clutch quarterbac­k (after Matthew Stafford);

Jared Goff was the worst quarterbac­k last season by everymetri­c;

Roster stability is the most important predictor of a team’s success, a fact that could work in the Steelers’ favorin 2017;

Running backs have the least predictabl­e performanc­es from year to year, especially if they change teams.

(Maybe that last point can be a consolatio­n to Steelers fans who are now worried about Burkhead and Gillislee.)

While nflscrapR made its public debut in 2016, it’s taken off in recent months. A Seattle website, for instance, has used the package to provide analyses of the Seahawks, and Yurko is presenting the work at analytics conference­s throughout the country.

nflscrapR will also be a focus of the first Carnegie Mellon Sports Analytics Conference, which will be held Oct. 28-29 in conjunctio­n with the Tartan Data Science Cup. The conference features talks by Ventura, Steelers analytics expert Karim Kassam, ESPN’s Brian Burke and others, Nugent said, and it’s open to the public. Additional informatio­n is on the Carnegie Mellon Sports Analytics Club website.

Newspapers in English

Newspapers from United States