Gambling on nation’s future?
How effective are surveys and the information collected from surveys for decision making?
It’s not uncommon to use analysis of survey data to try to turn the raw data collected into insights and answers, to both, simple and complex operational questions.
To inform executive decisions in order to improve things for business and government and most importantly our people.
Close enough is not good enough
How analyses of these surveys are used is an interesting question. Is it enough to justify the decision you were going to make anyway? Or is the survey analysis close enough, therefore good enough, with which to make decisions.
For context consider Fiji’s Household Income and Expenditure Survey (HIES), a nationally representative survey conducted by the Fiji Bureau of Statistics (FBoS) every five years.
HIES – Household Income Expenditure Survey
The most recent HIES was done “before the onset of the COVID-19 pandemic … on a representative sample of 6000 households. The survey has provided a comprehensive view of the wellbeing of Fijian households between 2019 and 2020 by producing indepth information on a wide range of topics, including access to services, livelihoods, migration, consumption patterns, and exposure to shocks, among others.”
The then CEO of FBoS stated that “one of the primary objectives of the 2019-20 HIES was to collect data on household income and consumption that can be used to estimate poverty and inequality in the country. This survey provided the basis for a new benchmark and methodology for measuring poverty based on international best practices and has thus marked the beginning of a new series of poverty estimates in Fiji. 2021”.
Data collection was over a 12-month period. The World Bank and the University of Bristol in the UK provided technical data processing and analysis support to the FBoS data analysis team.
6000 households represent all of Fiji
Is a survey of 6000 households truly representative of income, expenditure, poverty and other issues and factors in Fiji? If there were any significant decisions made on the basis of the HIES report, with what level of confidence were they made? You couldn’t say 100 per cent confidence, could you? 90 per cent 80 per cent? 70 per cent? Less? And that is the challenge, if not the problem with survey data.
There are a number of survey data analysis methods you could use, from simple crosstabulation, where survey data is arranged into a table of rows and columns that make it easier to understand, to statistical methods for survey data analyses which tell you things that would normally be near impossible to figure out, such as whether the results you’re seeing have statistical significance (are representative of the total population).
HIES data combined with Census
And then there’s the problem of the recency and therefore the relevance of survey data you’re basing your decisions on.
HIES data could be combined with census data but that would present new challenges such as synchronising the date of survey with census data which is older than HIES data. Census data for analysis purposes could be considered a survey as the collection of data is often in predetermined ranges at a certain point in time — such as age range 25-34, wage range 30,000 – 40,000 and so on.
34-year-olds back in 2021 would now, three years later, be in a different bracket, say 38-45 and earnings may have changed significantly, so making decisions on that data could be missing the point and wasteful. And using census data for the same group would reflect 2017 data making them at least 45 years old today.
Given it took 12 months to complete the HIES survey in itself suggests data is skewed.
The point is not that surveys are a waste of time, but that data recency is of high importance. To be accurate in decision making and spending, the data must be as recent, as realtime, as possible.
Otherwise, to use a scenario of childcare and primary school kids assuming seven years old census data.
Are we just guessing?
We’d be guessing at best, estimating the needs, the funding required etc for childcare, kindy, years one and two whether its to do with facilities, teachers, or carers.
Would you want to take into account relocations and migrations and other factors that would impact planning and budgeting? We’d need to add immigration arrivals and departures data to the mix to get across all of that.
And while we’re doing that perhaps tourism, employment, and education could benefit from analysing the same data enriched with immigration data.
Detail data analyses is of superior value
Detail data analyses are of far more value than survey data alone. The data sources are there, they can be made accessible in realtime, but it seems there’s reluctance to contemplate an integrated data repository. Why?
Is it because we exist in silos? Is it because funding for our projects come from a diverse range of donors and our budgeting is not integrated — particularly from a data capability standpoint? So, we build our own data capabilities in our own silos often using the same source data at a far greater cost than if a central capability was developed.
But we have our own copy, in our own little patch, no matter we’re only getting half the benefit. Wastage. Great! I hasten to add though that this is not the fault of any one business division or government ministry. department or agency. Its to do with the lack of an organisation wide data strategy and the silo-based funding and budget allocations.
In the above scenario your sources of data would be at minimum, in the area of income, expenses. poverty — FRCS, FNPF, Social Welfare, VAT Monitoring System (VMS). In the census area birth registrations, deaths, marriages, business registrations, education, employment, and perhaps correctional services.
National data roadmap
A shared data repository that all stakeholders could access with shared costs instead of spending separately at exponentially compounding costs in environments that basically do the same thing. The reality is that when considered separately in their own silos, achieving an acceptable level of sophistication and integrity of information, these separate costs become prohibitive and we do our own thing resulting in less than mediocre capability. The shared data repository would hold detail data to be combined with survey data to provide the nuance to detail data analyses. A prioritised roadmap with the most commonly required data would deliver to several stakeholders while building out to a national capability.
Surveys and detailed data analytics each have their own set of advantages and disadvantages, depending on the specific goals, resources, and context of the analysis — provided surveys are done on a reasonably frequent basis. Here’s a breakdown of the pros and cons of each:
Survey Pros:
Surveys can be designed to gather a wide range of information, from visitor demographics to preferences, behaviours, and satisfaction levels. Surveys allow for questions specific to the research objectives, providing insights into nuanced aspects.
They provide direct feedback from the population themselves, offering firsthand perspectives on their experiences. And open-ended survey questions can return qualitative results and experiences that may not be captured through quantitative data alone.
Survey Cons:
Response bias is a big one, where demographic groups can be over or under-represented thus skewing the results. This is a risk when running AI algorithms as well. Usually, the sample size is limited making it challenging to generalise findings to the entire population accurately.
Responses to survey questions can be subjective and influenced by various factors such as mood, memory, and social desirability bias of individuals. This is time-consuming and expensive, especially when trying to reach a representative sample across different demographics or geographic locations.
Detailed Data Analytics: Pros
In favour of detail data analytics is that these are fact-based analyses, providing objective
Insights based on actual behaviour patterns and transactions, rather than self-reported and assumed-honesty information. Large, comprehensive and exhaustive datasets can be accesses with analytics and insights provided at scale, covering close to the entire population rather than a sampling of the population. With the larger and more detailed data set analytics techniques allow for predictive modelling, and forecasting of trends and behaviours based on current and historical data.
With real-time data streams, analytics can provide up-to-date relevant insights, allowing for agile decision-making.
Detailed Data Analytics: Cons
The quality of data used in analytics depends on various factors such as collection methods, accuracy, and completeness, which can sometimes be challenging to ensure however is manageable through data governance tools and techniques.
Detailed data analyses raise privacy concerns, particularly when dealing with personally identifiable information (PII), necessitating careful handling and compliance with regulations. However, this is not an insurmountable issue with techniques such as de-identification and anonymisation of data.
Data analytics can be complex, requiring specialised skills and data visualisation for effective interpretation and action.
A combination of both, surveys and detailed data analytics can provide a more comprehensive understanding of a person or segments’ situation, status, behaviours, preferences, and trends.
Detail data analytics could provide the same benefit on its own with a high degree of confidence.
Surveys can provide that traditional warm and fuzzy feeling at least until the roadmap is fully rolled out.
■ is a data and digital strategy consultant. A Fijian citizen based in Sydney, he runs his own consulting practice Data4Digital and is managing partner Australia, NZ, and Pacific for AlphaZetta Data Science and Analytics Consulting. For questions and feedback to: naleen@data4digital.com. The views are his and not of this newspaper.