By Nandlal Mishra*
In a recent article published in the “Indian Express” on July 7, 2023, Shamika Ravi brings attention to the valid concern of assessing the data quality of large-scale national surveys such as National Sample Survey (NSS), Periodic Labour Force Survey (PLFS) and National Family Health Survey (NFHS). While her argument raises important points, the example she chooses to support her argument is flawed. It is essential to address this misconception and highlight the actual limitations of sample surveys.
Ravi states that major surveys conducted in India post-2011, which utilized the 2011 census as the sampling frame, have significantly overestimated the proportion of the rural population. However, this argument overlooks the fundamental constraint of sample surveys that to select representative rural and urban populations these surveys rely on the most recent census data available to identify areas as either rural or urban. Due to resource and time limitations, it is not feasible for surveys to independently define an area as rural or urban. Hence, they rely on existing census data for this purpose.
The classification of an area as rural or urban is the responsibility of the Office of the Registrar General of India (RGI), not the survey organizers. Therefore, if an area was defined as rural in the 2011 census, it would be considered rural in all subsequent surveys, till next census data is available, even if it meets the criteria to be classified as urban during that time.
Considering that the proportion of rural population in the 2011 census was 69 per cent, the share of rural population in these sample surveys should ideally be around 69 per cent. In reality, the estimates for the rural population in these surveys conducted during last decade are quite accurate, with only a small overestimation of 1 to 2 percent in exceptional cases. This overestimation can be attributed to lower response rates in urban areas across all these surveys. Thus, the slight overestimation is a result of non-response in urban areas, presenting a new challenge for data producers.
In a recent article published in the “Indian Express” on July 7, 2023, Shamika Ravi brings attention to the valid concern of assessing the data quality of large-scale national surveys such as National Sample Survey (NSS), Periodic Labour Force Survey (PLFS) and National Family Health Survey (NFHS). While her argument raises important points, the example she chooses to support her argument is flawed. It is essential to address this misconception and highlight the actual limitations of sample surveys.
Ravi states that major surveys conducted in India post-2011, which utilized the 2011 census as the sampling frame, have significantly overestimated the proportion of the rural population. However, this argument overlooks the fundamental constraint of sample surveys that to select representative rural and urban populations these surveys rely on the most recent census data available to identify areas as either rural or urban. Due to resource and time limitations, it is not feasible for surveys to independently define an area as rural or urban. Hence, they rely on existing census data for this purpose.
The classification of an area as rural or urban is the responsibility of the Office of the Registrar General of India (RGI), not the survey organizers. Therefore, if an area was defined as rural in the 2011 census, it would be considered rural in all subsequent surveys, till next census data is available, even if it meets the criteria to be classified as urban during that time.
Considering that the proportion of rural population in the 2011 census was 69 per cent, the share of rural population in these sample surveys should ideally be around 69 per cent. In reality, the estimates for the rural population in these surveys conducted during last decade are quite accurate, with only a small overestimation of 1 to 2 percent in exceptional cases. This overestimation can be attributed to lower response rates in urban areas across all these surveys. Thus, the slight overestimation is a result of non-response in urban areas, presenting a new challenge for data producers.
Furthermore, Ravi compares these survey estimates with the projected rural population figures of different years provided by the RGI Expert Committee. However, comparing these estimates with survey data based on the sampling frames obtained from the 2011 census, which are supposed to align with the census figures regardless of the year of the survey, is not a fair comparison. Therefore, the example chosen by Ravi is not reflective of data quality issues but rather the inherent limitations of surveys relying on census data for defining rural and urban areas.
Estimates for rural population in these surveys are quite accurate, with only a small overestimation of 1 to 2 percent in exceptional cases
Now, let’s address the chart provided alongside the article. The first two surveys (NSS 68th round) mentioned in the chart were designed and carried out in 2011-12, prior to the release of the census data from 2011. These surveys relied on sampling frames obtained from the census conducted in 2001, resulting in an estimation of the rural population by over 71 per cent. Furthermore, chart itself was distorted, failing to represent equal distances on the x-axis with equal intervals. It is crucial to present data accurately and meaningfully, as incorrect visualizations can lead to misleading interpretations.
Instead of focusing on examples prone to default errors, Ravi could have chosen indicators such as dependency ratio, sex ratio, literacy rate, households with access to electricity, improved water, and improved sanitation facilities. These indicators are less susceptible to inbuilt errors and would have provided a more accurate representation of data quality concerns.
Moving forward, it is imperative for the government to expedite the conduct of the 2021 census. This will ensure the availability of an updated sampling frame for sample surveys, leading to more accurate estimates, including the proportions of urban and rural populations. An updated census will address the limitations associated with using outdated data, enabling survey organizers to accurately define areas as rural or urban based on current circumstances.
Ravi raises a valid concern regarding transparency in data. An example of this is the 78th round of NSS conducted in 2020-21, which collected data on food security and hunger but has not been released to the public. The report states that this data was collected for internal use, leaving readers perplexed as to why it remains undisclosed.
Moving forward, it is imperative for the government to expedite the conduct of the 2021 census. This will ensure the availability of an updated sampling frame for sample surveys, leading to more accurate estimates, including the proportions of urban and rural populations. An updated census will address the limitations associated with using outdated data, enabling survey organizers to accurately define areas as rural or urban based on current circumstances.
Ravi raises a valid concern regarding transparency in data. An example of this is the 78th round of NSS conducted in 2020-21, which collected data on food security and hunger but has not been released to the public. The report states that this data was collected for internal use, leaving readers perplexed as to why it remains undisclosed.
Releasing the hunger data would enable a better assessment of claims made by global hunger indices and other surveys regarding the hunger and malnutrition crisis in the country. Transparency in data release is crucial for fostering trust in the survey process and facilitating informed policy decisions.
---
*Doctoral fellow at International Institute for Population Sciences, Mumbai
---
*Doctoral fellow at International Institute for Population Sciences, Mumbai
Comments