Why fear this RB? Check data
was constantly looking for NFL data to analyze,” Horowitz, who now works for the NBA, wrote in an email. “Unfortunately, at that time almost all the NFL data I found was protected by a paywall or wasnot in a usable format.”
In contrast, nflscrapR (pronounced “NFL scraper”) cleans up messy play-by-play data from the NFL’s application programming interface and publishes it as is, along with expected points and win probability columns. (The data and other information are available on GitHub websites and the Carnegie Mellon Sports Analytics Club website.)
Nugent, associate department head and director of undergraduate studies in CMU’s statistics department, described nflscrapR as a “massive contribution” to the field, not only because it provides a huge amount of information but also because it encourages open and reproducible data more broadly.
The name nflscrapR refers to the programming language R and also gives a hat-tip to nhlscrapR, a comprehensive hockey-focused package designed by CMU grad Sam Ventura and former CMU faculty member Andrew Thomas. Ventura, director of hockey research for the Penguins, mentors Yurko and Horowitz and is an outside adviser for the Carnegie Mellon Sports Analytics Club.
“The power of nflscrapR is in its flexibility to analyze the game at any level,” Ventura said in an email. “Since the software provides data on every play from every game since 2009, it allows fans to break down individual player and team statistics by season, by game, by quarter, by drive, orby play.”
Some of Yurko’s findings fromnflscrapR:
Bell provides Dez Bryantlike value as a receiver along with well-above-average rushing;
Rushing, on average, decreases a team’s expected points;
Last season, Colin Kaepernick was the second most efficient rushing quarterback, after Dak Prescott, and was an average passer;
Derek Carr, an MVP candidate in 2016, was in fact an average passer last year, but he was the No. 2 most clutch quarterback (after Matthew Stafford);
Jared Goff was the worst quarterback last season by everymetric;
Roster stability is the most important predictor of a team’s success, a fact that could work in the Steelers’ favorin 2017;
Running backs have the least predictable performances from year to year, especially if they change teams.
(Maybe that last point can be a consolation to Steelers fans who are now worried about Burkhead and Gillislee.)
While nflscrapR made its public debut in 2016, it’s taken off in recent months. A Seattle website, for instance, has used the package to provide analyses of the Seahawks, and Yurko is presenting the work at analytics conferences throughout the country.
nflscrapR will also be a focus of the first Carnegie Mellon Sports Analytics Conference, which will be held Oct. 28-29 in conjunction with the Tartan Data Science Cup. The conference features talks by Ventura, Steelers analytics expert Karim Kassam, ESPN’s Brian Burke and others, Nugent said, and it’s open to the public. Additional information is on the Carnegie Mellon Sports Analytics Club website.