Business Standard

A sampling strategy to estimate Covid cases

We need a methodolog­y that’s cost-effective and specifical­ly suited for ‘rare’ events

- ATANU BISWAS The writer is professor of statistics, Indian Statistica­l Institute, Kolkata

It is essential yet never easy to estimate the prevalence rate of Covid-19 cases among the Indian population, mostly due to the country’s large and diverse population, and a high proportion of asymptomat­ic cases. Simple random sampling (SRS) or some of its variants —like stratified sampling — are suggested by many in this context.

A simple random sampling provides each individual comprising the population the same probabilit­y of inclusion in the sample. Such a sampling technique has been adopted in some European countries — Sweden and Austria, for example — to estimate the prevalence rate of Covid-19. According to such a survey conducted by the Public Health Authority of Sweden in Stockholm during March-april ( https://www.folkhalsom­yndi gheten.se/nyheter-och-press/ ny hetsarkiv/2020/april/resultatfr­an-undersokni­ng-av-forekom sten-av-covid-19-i-region-stockholm), about 2.5 per cent of Stockholme­rs had an ongoing Covid-19 infection. Also, in another random sample-based study in Austria (https:// www.sora.at/ uploads/ media/ Austria_covid -19_ Pre vale NCE_BMBWF_SORA_ 202 004 10_En_version), where testing was conducted between April 1-6, the proportion of positively tested in the weighted sample was 0.33 per cent.

However, the population of Stockholm was 974,000 and that of Austria 8.86 million in 2019. Also, the estimated percentage of prevalence of the disease is quite high in these countries. India, in contrast, might need to adopt a sampling scheme that would suit its condition the best. The country has an extremely large population and a high population density of 464 per square kilometre. Still, the incidences of positive Covid-19 cases in India are fortunatel­y less compared to many European countries or the United States. With nearly 60,000 positive cases detected as on May 9, and assuming another 60,000-1,200,000 asymptomat­ic cases (up to 20 times) present in the country, the total cases should be within 120,000 and 1,260,000 at the moment. And this is within 0.0089 per cent to 0.0933 per cent of the total population — a meagre proportion of the 1.35 billion people.

So far, we observe a similar feature in other Southeast Asian countries, such as Pakistan, Afghanista­n, Bangladesh, Nepal, Bhutan and Sri Lanka. On the other hand, European countries and the United States have a high proportion of positive incidences; population size in those countries is relatively small, and the infection rate is high. The simple random sampling or its variants may be useful for such countries. India needs to adopt a completely different sampling scheme that would be specifical­ly designed to estimate “rare” events.

When the incidences are “rare” compared to the population size, SRS needs an extremely large sample size to provide a reasonable estimate. And, in this case, an extremely large sample size involves huge cost in terms of travel and kits. However, we know that availabili­ty of kits is a serious problem. Several variants of SRS, such as stratifica­tion, clustering, systematic sampling and multistage sampling, will have the same problem of precision and cost.

In contrast, the Adaptive Cluster Sampling (ACS) scheme is designed to estimate “rare” events. It’s wellknown that its precision level is much higher than that of SRS or its other variants. It is also cost-effective. The idea of ACS was advocated by S K Thompson in a classic research article in 1990 (Thompson, S K, 1990, “Adaptive cluster sampling”, Journal of the American Statistica­l Associatio­n, volume 85, pp. 1050-1059).

Several variants of ACS were proposed subsequent­ly (see Borkowski and Turk (2014) “Adaptive Cluster Sampling: an Introducti­on”, in Researchga­te). ACS has successful­ly been applied in several problems related to ecology, environmen­t, and epidemiolo­gy. And, it may be noted that ACS may also be successful­ly applied when the incidences are abundant.

In this context, it is worth mentioning that, in order to get an idea of the rate of positive incidences, some are advocating strategies similar to “snowball sampling” (Goodman, L.A., 1961, “Snowball sampling”, Annals of Mathematic­al Statistics, volume 32 (1), pp. 148-170). It seems quite pragmatic. However, it is a nonprobabi­lity sampling. Such a sampling scheme does not provide any unbiased estimate; it also does not provide any estimate of standard error, and hence it does not give any margin of error.

On the other hand, ACS is a probabilit­y sampling related to the snowball sampling that yields an unbiased estimate along with standard error or the margin of error. ACS involves unequal probabilit­y of sampling, and the probabilit­y of inclusion of different individual­s of the population can be defined in a state-ofthe-art manner.

When applying ACS for Covid-19, we may start with a sample of reasonable size to be selected by some predefined mechanism. If an observed sampling unit is tested to be positive, then additional units in a defined neighbourh­ood are to be adaptively added to the sample. Again, if any of these additional units is found to be Covid-19-positive, units in their neighbourh­oods are also to be added to the sample. This adaptive process should continue until no additional Covid-19 units are encountere­d. It’s a typical example of “inverse sampling” where the total sample size is not exactly fixed a priori, rather we may provide an expected sample size. Clearly, a variant of the standard ACS , which would suit the Indian context best, to estimate Covid-19 proportion­s, using the extra informatio­n specific to Covid-19 virus— for example, Covid-19 virus has an incubation period of 2-14 days — might be more useful to design the sampling scheme.

 ??  ??

Newspapers in English

Newspapers from India