Poll-Vaulting All That’s Real
It’s better to have no data than ‘online poll data’, which is overwhelmingly no reliable data at all
Up until 8:22 pm Eastern Time in the US on November 8, 2016, The New York Times’ online polling had given Hillary Clinton 82% chance of becoming the next US president. The rest, of course, is history.
Online polls failed to predict the outcomes of other recent elections, including the Brexit vote in Britain. Such verifiable evidence provides enough reason to doubt the results of any online poll. For, they grossly violate the basic principles of statistical surveys.
In 19th- and 20th-century US, newspapers and magazines featured clipout coupons for ‘straw polls’ that readers were sent in to cast ballots for their preferred candidate. Today’s online polls originated from that. Random sampling is the core of any statistical survey — which happens to be impossible to ensure on the internet.
Forgetaboutrandomness,thereisno way of even selecting samples on the internet. To compensate, online pollsters sometimes use some extensive statistical modelling. However, statisticians cast doubts about the usefulness of such modelling as a compromise for randomness.
There is well-known ‘non-response bias’ in online surveys. If I’m interested in cricket, the likelihood is high that I would find an online survey on, say, ‘Best T20 batsmen’ or ‘Best ODI bowlers’. Rather, these online polls would find me on the basis of my internet search history. But, for unbiased- ness, every member of the population must have an equal probability of selection. However, the demographic pattern of internet users grossly mismatches with the demographic nature of the population.
For example, in the US, while 97% of people in the age group between18 and 29 use the internet, they made up just 13% of the 2014 electorate, according to the exit poll conducted by Edison Research. Some 40% of those 65 years old and older do not use the internet. But they made up 22% of those who voted.
In addition, internet users tend to be urban dwellers and have above-average incomes. Therefore, they don’t necessarily represent the population as a whole. The demographic background and location of the responders of the internet-based survey remain unverified. Also, online polling can be manipulated by taking the help of friends or employees to satisfy vested interests and ‘desirable results’.
The Web Unravels
Usually, there is no way to prevent people from voting more than once. So, how are the data weighted? What is the sampling error and how is that measured? According to the National Council on Public Polls (NCPP) in the US, “many web-based surveys are completelyunreliable.Indeed,todescribethem as ‘polls’ is to misuse that term.”
In1998, 52% of more than100,000 responders of an AOL online poll opined that US President Bill Clinton should have resigned because of his relationship with White House intern Monica Lewinsky. Telephone polls conducted at the same time with much smaller but representative samples showed far fewer people seeking Clinton’s resignation — 21% in a CBS poll, 23% in a Gallup poll, and 36% in an ABC poll.
Quite often, people having a particular view might respond more in online polling. But, usually, online poll resul- ts do not mention that the individuals had chosen to participate in online polls, and that they are unlikely to be representative of the general population. In the absence of this criterion, the readers are left with an incorrect impression that the results apply to the general population.
In 2009, the Responsive Management and the South Carolina Department of Natural Resources (SCDNR) in the US conducted a survey on saltwater fishing and shell-fishing in the state of South Carolina. A scientific survey conducted by telephone, and a survey conducted via the internet, were both used, to a closed population who obtained a South Carolina Saltwater Recreational Fisheries Licence. Every licence-holder had an equal chance of being contacted by telephone.
However, the online survey used a sample consisting of licensees who provided an email address while purchasing their licences. Thus, the online survey had eliminated approximately 88% of the possible sample in a systematic way, yielding a severe bias. In addition, only 20.5% of the email addressholders responded to the online survey. The online survey respondents were, in general, a more educated and affluent group, and also disproportionately male. 5.7% of the online survey sample was female, while 19.9% of the telephone sample was female.
However, from the licence-holder database, 18.5% were female. The online pollsamplewasadisastrousmismatch of the population demography. So, no data is certainly better than bad data.
Net-Net, A Question Mark
Of late, there is a huge surge in online polling with an increase in internet users. If the scope was kept within mereentertainment,thatwouldhavebeen not much of an issue. However, it may be nearly impossible to ignore the easy, cheap,quickandever-expandingscope of the internet to gauge public opinion on any issue.
Buteventhen,thereshouldbeacombination of online and in-person surveys, where the online results can be validated,oradjusted,basedonthein-person survey results. Of course, a lot of theoretical studies are still needed to be done before putting this in practice.
The writer is professor of statistics, Indian Statistical Institute, Kolkata