When economists look to the sky
bined to obtain a cloud-free composite image. The machine learning algorithm helps categorize the composite image data—in the form of pixels, each of which is a vector of quantities in different bands—into a discrete set of land cover categories. How far such workarounds capture the real level of economic activity still remains a matter of further research.
Even as satellite-based data on lights are being mined, other sources are also being harnessed to understand the dynamism of economies, especially urban economies. For instance, Google Street View offers a rich source of visual snapshots of cities across the world. Harvard University economist Edward Glaeser and others (hbs.me/2vcrzbl) have used Google Street View images on the quality of roads and type of dwellings to determine income at a much disaggregated level. They find that Google Street View data predict income and housing prices within New York pretty well.
Google Street View images can also help us understand gentrification in cities. In their 2014 American Journal of Sociology research paper, Harvard University sociologists Jackelyn Hwang and Robert Sampson scoured thousands of images for 23 cities in the US to show that gentrification raised inequality in American cities, with the blacks bearing the brunt of it.
Even after having controlled for a number of factors including crime rate, perception, access to amenities, race still explains why certain neighbourhoods tend to be poor and others tend to be rich.
As cell phones become ubiquitous in developing countries, mobile data is also being used to measure wealth and urban commuting. Using an anonymized database containing call records of billions of interactions in Rwanda, Joshua Blumenstock of the University of California, Berkeley, in a 2015 research paper published in Science (bit.ly/2es16mf), created a measure of wealth based on the length and duration of calls, to find that it closely tracked the socio-economic status of individuals, and at an aggregate level, the wealth level of regions.
Although these studies are quite innovative in their application of modern data mining techniques to get around the problem of irregular or patchy economic data, it is worth noting that they are meant to be workarounds for the most part. Like any other modelling exercise, there are implicit assumptions hidden in most economic estimations using satellite imagery.
One typical assumption is that the economic activity or luminosity of each distinct geographical unit is independent of each other (or spatial independence, as economists term it). But this assumption can be violated for satellite images given that the value of a variable for a particular location is affected by the value of neighbouring locations.
Secondly, all satellite-based data are dependent upon the orbits that satellites take around the earth. And, the quality of images captured by a satellite varies over space and time. How this affects analysis is still not entirely clear.
Thirdly, as Dave Donaldson and Adam Storeygard emphasize in a recent review paper (bit.ly/2ezmxjm) on the use of satellite data, the use of machine learning techniques imposes additional costs in terms of resources and analysis that a researcher has to deploy on the ground to arrive at robust conclusions.
“A critical input to these (machine learning techniques) and other methods is the availability of training data on the variable of interest that assigns ground truth values to sample sites,” the duo point out. “For example, delineating imaged urban neighborhoods as residential, or even more specifically as slums, requires first providing a set of areas pre-defined as slums by other means. Doing so well requires a training dataset that reflects the full diversity of distinct neighborhoods within the category of slums. This is especially challenging when the object of interest is heterogeneous or imprecisely defined….one could imagine economists using remotely sensed information on buildings to estimate a region’s capital stock; in such a case, the ideal training data would concern building values instead of building types. Because these training datasets are used to define the classes underlying a classification algorithm, they must be produced outside the algorithm. Thus, they are typically a labor-intensive analog constraint on a technology that otherwise can operate with all the scale benefits of computer processing.”
Finally and perhaps most importantly, most satellite-based data can potentially identify individuals and households. Cell phone data are the most problematic in this regard, and have serious repercussions on privacy.
It is also worth noting that most early proponents of the use of night-lights data advocated the use of such data as a substitute for national accounts and household survey data where such data is either not reliable or is irregular, and as a complement where such data is indeed available. The reason for the continued preference for old-fashioned data collection techniques is that they generate thick layers of information, which collectively can convey a richer sense of an economy than mere satellite images can. Traditional databases thus help us form inferences based on a wider variety of data. The flip side of course is that it may not be possible to disaggregate the traditional measures in the same manner as satellite data, which is available at a granular level.
To sum up, new and exciting data-sets are helping us understand the world better. However, it is erroneous to believe that these new data-sets can substitute existing survey-based or national accounts data.
A satellite can hardly tell us anything about intra-household allocation of resources, for instance, or the level of discrimination in a rural labour market in a country such as India.
The use of satellite data in a heterogeneous country such as ours also requires intensive use of on-ground resources in several cases, as discussed above. Moreover, economists are still grappling with challenges in interpreting the information from these new and big datasets, which means inferences must be drawn with greater caution.
Undoubtedly, the understanding of such data will evolve over time. At this point, it is best to think of these new data-sets as complements to the traditional data sources collected by the regular statistical machinery.
Sumit Mishra teaches economics at the Institute for Financial Management and Research, Sri City.
Economics Express runs every fortnight, and looks at the world through the lens of economics.