Indian data ecosystem needs an overhaul
Either there isn’t enough data available or the one that exists is sometimes unreliable but is used anyway because there is no alternative
NEWDELHI: In July, in front of a roomful of policy wonks, government officials and journalists, Union health secretary CK Mi sh rama dean honest acknowledgemen t—there are serious problems with India’s public health statistics.
For one, he said, data from the latest round of the National Family Health Survey (NFHS-4) — the major source for detailed health statistics in India, conducted under the aegis of the ministry of health and family welfare( Mo HF W) itself — is unreliable for certain states.
On top of that, the Health Management Information System (HMIS), which Mishra called “a data mine”, is not effectively used. “We use very little of it in the planning process” due to lack of expertise to read and understand the data, he said.
The health secretary’s statement raises concerns: how can the country formu late evidence-based policy or plan wisely for the future without credible data?AndMishra,a34-yearveteranofthe Indian Administrative Service who was appointed to head the MoHFW last in year, is not alone. A recent paper by the Health Team of the National Institute of Public Finance and Policy, New Delhi, found that the country’s health data was unreliable, irregularly published, and failed to cover a broadenough population.
PROBLEMSGALORE
And such problems are not restricted to the health sector alone. The entire Indian data ecosystem needs improvement. Former RBI governor Duvv uri Sub bar ao has stated that monetary policy decisions often go as tray because of erroneous data provided by the government. The debate on the reliability of India’s macroeconomic data, GDP and IIP numbers, for instance, remains unsettled. At a time when unemployment—or rather, underemployment—is a key socio-economic concern, economists can not measure the problem’ s magnitude because they do not have credible figures and surveys. India’ s agricultural statistics have also come under the scanner. Talk about crime, and all you have is aggregate d data from FI Rs —no official crime victim is at ion surveys have been instituted yet.
To be sure, every data set comes with caveats that must be considered when making interpretations. But some failings appear to be a standard characteristic of Indian data sets.
To begin with, there isn’tenoughdata. The data that does exists is sometimes unreliable but is used anyway because there is no alternative. Several important data sets are released with a huge time lag. Others aremissing granular district- level estimates. If such estimates are present, theyare not alwaysused for policy making or governance. And even whendatasetsaregoodandpeoplewant to use them, there may be too few who understand how to work with them, as Mishra said about HIMS.
Taken together, these shortcomings amounttoanIndianstatisticalecosystem that falls short of the needs of the world’s largest democracy.
MODESOFDATACOLLECTION
Therearetwomajormodesofdatacollection:administrative, which refers to data collected as a result of an organisation’s daily operations (think of patient registrations at a hospital or new accounts openedatabank);andsurveys,whichare basedonhowapartofapopulation(what statisticians call a ‘sample’) responds to a set of questions.
P.C. Mahalanobis, the statistician credited for laying the foundations of the data systems of independent India, “focused on creating credible data sets from representative sample surveys,” says a Mint essay which traced the history of Indian statistical system.
ButMahalanobis’spreferenceforsurveys came at the expense of data collection at the administrative level, the essay argued, and may have undermined the government’s ability to collect regular, reliable data.
“Instead of being sparingly used for purposeswheretherewasnoalternative to sampling, sampling became the first choice of technique for collecting data.” Sometimes, surveys are the only way to capture data. Economic statistics, for example, cannot be collected at the administrative level because of the huge sizeoftheIndianeconomy’sinformalsector, which employs around 90% of the country’s workforce, says Pronab Sen, former chief statistician of India.
Yet India faces challenges to conducting good surveys a population of more than a billion people, relatively high rates of illiteracy, and dependence on the informal economy that simply do not exist in much of the rest of the world, says Sen.
VACANCYISSUES
The government also employs too few people to carry out regular and robust surveys. The National Sample Survey Office’s (NSSO) field operations division, which is responsible for collecting primary socio-economic data, has around 24% of positions vacant for the posts of junior and senior statistical officers.
The NSSO’s critics do not realise how hard it is to undertake actual data collectionontheground,SonaldeDesai,professorofsociologyattheUniversityofMarylandwhoalsoconductstheIndiaHuman Development Survey (IHDS), said in an email. Without adequate internal staff, the agency must contract with outside agencies.
“ThisiswhatbothIHDSandNFHSdo, and only we know how difficult it is to maintainquality.Someoftheagencieswe work with are fantastic, and some are struggling themselves. This requires enormous supervision, and if one slips there, the data can be highly questionable,” Desai said.
“This hit-and-miss approach is not acceptable for data that form the core of our policy-building process.”
Expertssaythattechnologycanbeleveraged to improve data collection systems.Privatedatacollectionagenciesare already making use of apps and tools to conduct surveys electronically, rather than on paper. But that comes with its ownchallenges.RichaVerma,wholeads the research and analysis team at Social Cops, a data intelligence company, says that better design is key to make it easier for people to adopt technology.
While working with the government and various non-profits, Verma found thatmanyofitstraineeshaveneverused asmartphone.Datacollectiontechnology must be made simple, and appropriate training must be conducted, so that anyone can be trained to use it.
EVERY DATA SET COMES WITH CAVEATS THAT MUST BE CONSIDERED WHEN MAKING INTERPRETATIONS. BUT SOME FAILINGS APPEAR TO BE A STANDARD CHARACTERISTIC OF INDIAN DATA SETS. SEVERAL DATA SETS ARE RELEASED WITH A HUGE TIME LAG