Sta­tis­ti­cal model hands Brazil the World Cup

Financial Chronicle - - FRONT PAGE -

Spain’s chances are di­min­ished by a tough draw, fac­ing Por­tu­gal in the group, and need­ing to get past France and Brazil in the knock-out stages

When Li­onel Messi ob­served that “in foot­ball... ta­lent and el­e­gance mean noth­ing with­out rigour and pre­ci­sion”, he was clearly think­ing as much about the econo­met­rics be­hind fore­cast­ing the score as the tac­tics be­hind win­ning the match. But while se­lect­ing the best start­ing eleven re­quires hu­man judg­ment and ex­pe­ri­ence, choos­ing the best vari­ables to pre­dict the out­come of a game is bet­ter left to a sta­tis­ti­cal model. Or more pre­cisely, 200,000 mod­els: har­ness­ing re­cent de­vel­op­ments in “ma­chine learn­ing,” data was mined on team char­ac­ter­is­tics and in­di­vid­ual play­ers to work out which fac­tors help to pre­dict match scores. This gave a large num­ber of fore­casts, which were com­bined to pro­duce an over­all pro­jec­tion. This was then sim­u­lated 1,000,000 pos­si­ble evo­lu­tions of the tour­na­ment to gauge the prob­a­bil­ity of each team pro­gress­ing through the rounds.

The key pre­dic­tions are:

Brazil will win its sixth World Cup ti­tle—de­feat­ing Ger­many in the fi­nal on July 15th

France has a higher prob­a­bil­ity than Ger­many of win­ning the World Cup. But its (bad) luck in the draw sees it meet­ing Brazil at the semi-fi­nal stage, and the team may not be strong enough to make it past Seleção.

Those look­ing for a re­pu­di­a­tion of Gary Lineker’s ob­ser­va­tion that “foot­ball is a sim­ple game; 22 men chase a ball for 90 min­utes and, at the end, the Ger­mans win” will be dis­ap­pointed: Ger­many is fore­cast to de­feat Eng­land in the quar­ter fi­nals on July 7.

Spain and Ar­gentina are ex­pected to un­der­per­form, los­ing to France and Por­tu­gal in the quar­ter fi­nals, re­spec­tively.

De­spite the tra­di­tional boost that comes with host­ing the com­pe­ti­tion, Rus­sia just fails to make it through the group stage .

Foot­ball and Ma­chine Learn­ing: From Not­ting­ham For­est to Ran­dom For­est

We are drawn to ma­chine learn­ing mod­els be­cause they can sift through a large num­ber of pos­si­ble ex­plana­tory vari­ables to pro­duce more ac­cu­rate fore­casts than con­ven­tional al­ter­na­tives.

More specif­i­cally, we feed data on team char­ac­ter­is­tics, in­di­vid­ual play­ers and re­cent team per­for­mance into four dif­fer­ent types of ma­chine learn­ing mod­els to an­a­lyse the num­ber of goals scored in each match. The mod­els then learn the re­la­tion­ship be­tween these char­ac­ter­is­tics and goals scored, us­ing the scores of com­pet­i­tive World Cup and Euro­pean Cup matches since 2005. By cy­cling through al­ter­na­tive com­bi­na­tions of vari­ables, we get a sense of which char­ac­ter­is­tics mat­ter for suc­cess and which stay on the bench. We then use the model to pre­dict the num­ber of goals scored in each pos­si­ble en­counter of the tour­na­ment and use the un­rounded score to de­ter­mine the win­ner. For ex­am­ple, Ger­many nar­rowly beats Eng­land in the quar­ter­fi­nals with 1.47 vs 1.28 goals.

We group to­gether sev­eral team­level and player-level vari­ables for ease of ex­po­si­tion. Four char­ac­ter­is­tics stand out. Team-level re­sults are the most im­por­tant driver of suc­cess. Re­cent team per­for­mance — mea­sured with the “Elo” rank­ings — ac­counts for about 40 per cent of over­all ex­plana­tory power.

But even af­ter tak­ing team per­for­mance into ac­count, in­di­vid­ual play­ers make a dif­fer­ence. We find that play­er­level char­ac­ter­is­tics — in­clud­ing the av­er­age player rat­ing on the team, as well as at­tack­ing and de­fend­ing abil­i­ties — add an­other 25 per cent of ex­plana­tory power. Re­cent mo­men­tum — as mea­sured by the ra­tio of wins to losses over the past ten matches — mat­ters.

Sim­i­larly, the num­ber of goals scored in re­cent games and the num­ber of goals con­ceded by the op­po­nent team help gauge suc­cess in the next game.

Why Brazil are favs

Brazil is clearly the strong­est team across these met­rics, with the high­est Elo rat­ing, ta­lented in­di­vid­ual play­ers and a good win/lose ra­tio in re­cent games. We also see why France and Ger­many run neck and neck for sec­ond: Ger­many has a higher Elo rat­ing than France, but France has per­formed bet­ter in re­cent games. And France ap­pears to have a more un­fa­vor­able draw than Ger­many: if France and Ger­many started in each other’s re­spec­tive group, the most likely re­sult would be a Brazil-France fi­nal (although the win­ner would re­main Brazil). Spain’s chances are like­wise di­min­ished by a tough draw, fac­ing Por­tu­gal in the group, and need­ing to get past France and Brazil in the knock-out stages. Fi­nally, Ar­gentina ranks higher on Elo than Por­tu­gal, but loses to Por­tu­gal in the quar­ter fi­nals due to poor per­for­mance in re­cent games.

How con­fi­dent can we be?

It is dif­fi­cult to as­sess how much faith one should have in these pre­dic­tions. We cap­ture the sto­chas­tic nature of the tour­na­ment care­fully us­ing state-ofthe-art sta­tis­ti­cal meth­ods and we con­sider a lot of in­for­ma­tion in do­ing so (in­clud­ing player-level data). But the fore­casts re­main highly un­cer­tain, even with the fan­ci­est sta­tis­ti­cal tech­niques, sim­ply be­cause foot­ball is quite an un­pre­dictable game. This is, of course, pre­cisely why the World Cup will be so ex­cit­ing to watch.

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.