Consumer Pyramids Household Survey: A response and a rejoinder

Issues

September 3, 2021

A Response to Pais and Rawal

Mahesh Vyas writes:

This is a response to CMIE’s Consumer Pyramids Household Survey:An Assessment by Jesim Pais and Vikas Rawal (hereafter Pais-Rawal).

We thank the authors for taking the trouble of articulating their various critical observations on the Consumer Pyramids Household Survey (CPHS). Our response consists of several clarifications and promises of some actions to fill the gaps pointed out.

This response is largely to Section 2 of Pais-Rawal where the authors detail the problems with respect to the survey design and implementation. In Section 3 the authors make several observations based on their estimations using the CPHS database. These observations include comparisons with other survey results. We have refrained from responding to this section much because of two reasons.

CPHS is a large and continuous operation that has been functioning without a break since 2014. Such a survey cannot function without clearly documented processes.

First, we hold that if the processes of a survey are correct then the outcomes hold merit. The differences, if any, with other credible surveys require an investigation into the sources of the differences and, as Jesim-Rawal also mention, the differences are most likely in the definitions used. Second, it is not possible to test the correctness of the estimates made by Pais-Rawal. It is possible that some errors could have crept into their computations. For instance, they have erred in stating that the CPHS comprises 111,000 rural households and 63,400 urban households, when it is actually 110,975 urban and 63,430 rural, which add up to 174,405 households. Most likely this is an oversight but, there could be similar oversights in the computations.

It is more important to dispel doubts about the fundamental problems in the survey that the authors point out.

Documentation

CPHS is a large and continuous operation that has been functioning without a break since 2014. Such a survey cannot function without clearly documented processes. That the data are released within a day or two of the completion of a Wave is proof that the survey is run on strong processes that are well documented and implemented. Explanations of concepts, definitions and detailed instructions to field members are a part of this vast documentation. However, these are imbedded into training modules, SOP documents, software use manuals, on-line resources, etc. which are all tightly integrated into the operational and in some cases budgeting processes.

It has been our endeavour to extract from this large set of internal documents, portions that are relevant to users of the CPHS database. This has led to the creation of the “How We Do It” section on the Consumer Pyramids DX service website. This is work in progress and we plan to complete by March 2022 or even earlier the documentation that explains all the data fields.

We therefore agree with Pais-Rawal that the documentation is incomplete and is ever changing. It is ever changing because it evolves based on the questions users ask us about the documentation that is released. But Pais-Rawal are wrong in stating “It seems likely that in the absence of full documentation, the survey investigators take their own decisions in the field or make do with ad hoc advice from their supervisors.” We have an elaborate and intensive formal system of training and certifying interviewers before the systems permit them to conduct interviews.

The documentation will explain the changes that have taken place over Waves since 2014. But we do not plan to have a set of documentation for each Wave. Such a Wave-wise documentation is an artefact of traditional surveys and not relevant for a continuous panel survey. We have already documented changes in the sample across Waves. We will provide documentation on concepts used, definitions and changes in them over the many Waves of the CPHS.

[CMIE] is committed to being publicly transparent and publicly engaging and also committed to helping users deploy the data and spread their work.

Pais-Rawal have misunderstood the CPHS execution system in stating that the smartphone application is not accompanied by any survey schedules, questionnaires or instruction manuals. Perhaps their misunderstanding arises because they see the CPHS execution from a National Sample Survey Organisation (NSSO) lens. The concepts of questionnaire and schedules apply differently to the CPHS. The questionnaire is an electronic data-entry form on the smartphone app that contains appropriate drop downs, check boxes and radio buttons, besides encoded navigation and online cross-validation. These replace the traditional survey schedules and a large part of the instruction manuals. Besides, all interviewers have access to copious amounts of documents.

The authors also misunderstand the meaning of “Question Construct”. As our document “The Questionnaire” explains, the CPHS interview is administered conversationally. The Question Construct is an English sentence or phrase for a data field on the app. It is backed by concepts, definitions, local language translations, examples, and finally by training that teaches interviewers how to elicit answers to questions through conversations. It is incorrect to draw conclusions from an isolated reading of one question construct on employment as is done by the authors.

The CMIE has produced a steady stream of detailed documentation since the current CPdx was launched in May 2020. All documentation is available to the public. These are followed up by a public presentation, with at least one each month on how the CPHS is conducted. These presentations are advertised extensively to increase participation. We solicit and encourage questions and answer all questions on the webinar, and follow-up on many of them after the webinar. Subscribers to the database also have a platform to engage with us privately. We are committed to being publicly transparent and publicly engaging and also committed to helping users deploy the data and spread their work. Given this outreach, Pais-Rawal are grossly misguided in their claim that the documentation is “poor”.

Households and Houses

The documentation provided by CMIE makes our unit of survey amply clear. We call it a household. Pais-Rawal say it should be called a house and not a household. With due respect, we do not see a compelling reason to change our nomenclature. There is no confusion as claimed by the authors.

Pais-Rawal merely repeat what is there in the documentation on the continuation of a household whose members have changed and on the non-inclusion of nomadic households. We see nothing wrong in the former and we see the inclusion of the latter impossible in a panel survey.

Selection of Households

Pais-Rawal criticise the use of systematic random sampling in the selection of sample households from the randomly selected villages and urban enumeration blocks. The criticisms are that (1) villages do not lend themselves to a linear administration of systematic random sampling, (2) the probability of selection of all households in the village or CEB is not equal and (3) the method may bias the sample in favour of the well-off and against the marginalised.

The use of systematic random sampling is well documented. The first criticism does not hold because we do not do only linear administration of the sampling. It is circular where households are organised in concentric circles around a centre.

It is known that systematic random sampling does not render the same sampling probability for all households. As our documentation explains, the choice of systematic random sampling was under conditions where a listing and simple random selection was not possible. It was the best choice under the circumstances. Whether the method injects an element of bias in favour of the well-off is a conjecture. Such a concern has been raised earlier as well and we have already stated that this issue will be studied by us during the September-December 2021 Wave and corrections in the sampling wherever necessary will be done by an expansion of the sample in the January-April 2022 Wave.

The CPHS definition of employment is more stringent than the PLFS definition. It is therefore not about probing or being overworked. It is a conscious definitional difference.

Pais-Rawal are wrong in stating that a village is surveyed in a single day. Whether it is surveyed in a day or not depends upon the number of interviewers sent into the village. If a village cannot be interviewed in a day then it is re-visited to complete the survey unless the remainder households are just one or two. The authors are also wrong in stating that if a household is not available in three consecutive Waves, it is dropped from the sample. The rule is that if a household was found to not respond for three consecutive Waves, the supervisor tries to contact the household to determine the cause of the non-response. The household is dropped only if the supervisor is satisfied that the household will not respond in the future. This could be because of extreme conditions such as a household being destroyed by nature or for re-development. A household that is found to be not responding repeatedly because it is locked because of the mobility of its occupants is not removed from the sample.

The town of Singrauli was dropped because the sample households consisted of mostly temporary occupants. These households were not found locked, but the occupants changed far too frequently. The behaviour of the sample households was more akin to temporary lodgings rather than households that could report an income, a corresponding expenditure or show other characteristics of a normal household.

Pais-Rawal are not entirely correct in stating that the CPHS documentation is silent about how new households are selected for addition to the sample. There is a separate note “Sample Survival and Response Rate” that deals with the additions and deletions from Waves 1 to 18. It describes the Wave to Wave changes in the sample. The method of selection is the same as it when the sample is first created.

Weights

Pais-Rawal wrongly state that households are excluded at the time of sample selection, missed during survey execution or dropped because they are mobile. The choice of the words “excluded”, “missed” and “dropped” are incorrect. The term “non-response” is more appropriate than saying “missed” which has a different connotation.

Pais-Rawal do not demonstrate how the non-response rates provided by CMIE to adjust weights “distort” them. The use of non-response rate adjustment factors is normal in any survey. Some non-response is inevitable. It is not advisable to not use adjustment rates to compensate for non-response. Pais-Rawal are also wrong in stating “a low response rate in some states was compensated by increasing the weight of the sample in other states.” We don’t do anything of this kind.

The solution of mapping un-surveyed regions like the islands to surveyed regions that are the closest in similarity is not arbitrary. It is a method to derive an all-India estimate when some remote regions are not surveyed. Note that users have the option to use vanilla weights that are provided separately if the extended weights do not make sense.

Survey Execution

Pais-Rawal incorrectly state that the survey is conducted by 200 interviewers. The total stock of interviewers is about 300. On an average there are about 200 interviewers on the field conducting interviews every day, including on all weekends and all holidays. The rest could be resting, travelling, or in training/refresher courses.

It is not correct to state that one interviewer covers 8 households in a day. The number averages closer to 6 households in a day: of the 174,000 households only 85 per cent— about 148,000—are actually interviewed. The rest are non-responses. The 148,000 are interviewed over 120 days. Therefore, about 1,232 households are interviewed every day by 200 interviewers. i.e., 6 households per interviewer per day. Pais-Rawal therefore overestimate the number of households interviewed by about 33 per cent.

Capturing data on a smartphone is far more efficient compared to capturing data on paper. People are far defter in navigating a phone today than in dealing with pads of paper. Easy availability of multiple choices as drop-down options, conditional lists, binary options as radio buttons, automatic skips, etc. usher great productivity gains. The CMIE’s CPHS questionnaire is much simpler than the questionnaires fielded by the NSSO. It covers a number of subjects, but asks fewer questions on each subject and limits itself to the simpler and essential questions.

[CPHS] has pushed the envelope on survey execution, quality control and supervision, frequency, timeliness, comprehensiveness, and support and engagement.

The panel nature of the sample ensures that it is not difficult to locate a sample household. It also ensures easy cooperation of the household to participate in the survey. We do not give any incentive to the households to participate in the survey. We do give a wall calendar every year (except in 2020) that can double as a utility for tracking recurring household expenses and as a source of education based on data collected from previous surveys. Distribution of the calendar is not conditional on a successful interview and is not limited to the responding household but is freely distributed in the neighbourhood. Households do not get exhausted since we are never in a hurry when conducting the survey. The interview is essentially a conversation and households like to engage with us.

Pais-Rawal are wrong in stating “Despite this [Covid conditions], data on all the variables are reported from each household.” In reality, the response rate, which is usually between 80 and 85 per cent dropped to 64 per cent in the January-April 2020 Wave and then to 44 per cent in the May-August 2020 Wave. Strategically, CMIE decided to limit the interviews to supervisors to ensure no deterioration of quality of data collected. We decided to collect all the information but from fewer households with a smaller team of better quality interviewers. A separate document explaining this in detail is available in the “How We Do It” section.

The authors express surprise that a detailed telephonic survey could be conducted. CMIE did conduct the surveys, households did respond and the data did tell us what happened during that difficult time. The authors betray a mistrust of the private sector or expect households to not trust a private company. Households trust us and we share a relationship of mutual respect with them.

Observations on employment and occupation

Pais-Rawal state that CPHS underestimate women’s participation in economic activities. This, they say “… is very likely to be a result of poor probing by investigators who are required to survey about eight household every day…” As we have pointed out earlier their estimate of eight households per day is not correct. Further, the difference in estimates of female labour force participation rate of the Periodic Labour Force Surveys (PLFS) and CPHS arises because of definitional differences. The PLFS considers a person to be employed if the person was employed for even half a day in the seven days preceding the day of the survey. The CPHS require the person to be employed on the day of the survey or the preceding day. PLFS gives preference to the employment status over other statuses. So, if a person was employed for half a day out of seven but was desperately looking for employment in the remaining six days, the person is still classified as employed in the PLFS. The CPHS definition of employment (and similarly of labour force participation) is more stringent than the PLFS definition. It is therefore not about probing or being overworked. It is a conscious definitional difference.

A final word

The CPHS is a different survey compared to the traditional surveys. It has pushed the envelope on survey execution, quality control and supervision, frequency, timeliness, comprehensiveness, and support and engagement. It is important to understand it for what it is rather than see it through the lens of the traditional surveys. The traditional surveys gave us a great head start. We stand on the shoulders of the great work done by them. But, it would be a shame if we merely replicated them. It was more important to make progress on as many fronts as possible. That is what we have attempted and that is how we should be seen. Whether you criticise the work or use the data, please do so for what it is. We are always happy to engage with both, constructively.

Mahesh Vyas is the CEO of the Centre for Monitoring Indian Economy.

A Rejoinder to Mahesh Vyas

Jesim Pais and Vikas Rawal write:

We thank Mahesh Vyas for his response above of 23 August 2021 to our article on the CMIE’s Consumer Pyramids Household Survey (CPHS). It is, however, unfortunate, that while he accepts several of our points, on others he makes misleading claims, provides superficial responses and does not address some of the more serious issues we raised. In this rejoinder we keep the focus on important methodological questions about CPHS that remain unanswered.

We would first like to apologise for the typographical error in the first sentence of Section 1 (“The CPHS”) of our article which should have read as: “The CPHS comprises surveys of households living in about 174,000 sample houses (about 111,000 urban and 63,400 rural) spread across most states in India.”

Houses or households?

Vyas admits that a house is called a household in the CPHS documentation but insists that this is irrelevant. Using this wrong terminology is a source of much confusion in the documentation. For example, when the documentation says that “if a household is found missing from where it was supposed to be then it is dropped from the panel”, it is meant that the house “has undergone a redevelopment”, “was demolished for some reason” or “was destroyed by nature” (Vyas 2020a: P3). While the word household is mostly used to refer to houses, CMIE sometimes uses the term to refer to a group of people as, for example, when they use question constructs like “Does the household intend to buy the asset now?” or “Did the household buy the asset in past 120 days?” (Vyas 2020b: p 15)).

The fact that the unit of sampling is a house and not a specific group of persons—people living in these houses can change—makes the CPHS very different from household surveys. Vyas’ refusal to acknowledge the error leaves the users thinking that the CPHS uses a sample of households, when it does not.

Sample design

Every elementary textbook of statistics starts by defining what is random. Random sampling, by definition, is one in which the sample is selected by ensuring that every unit in the sampling frame has the same probability of getting selected. CMIE seems to have invented a new definition of systematic random sampling.

Vyas admits that the methodology used for sampling by the CMIE does not give equal probability to every house in the primary sampling units (and in the homogeneous regions). Two points follow from this admission.

First, he asserts that this was the best choice under the circumstances. This is not the case. There can be many alternative ways of sampling that would ensure equal probabilities to all houses. If all houses in a homogeneous region have to have the same weight, the sample size would have to be proportional (say, 1 out of n houses) to the size of the primary sampling units. Such a sample can be randomised if, for example, you choose a random number between 1 and n to start with, and then pick every n^th house. This would be a systematic random sample. Since the sampling proportion is fixed, it would also obviate the need for using crude population projections for computing weights for each homogeneous region. What Vyas is calling a systematic random sample is simply not a random sample.

Second, Vyas does not explain how, if the houses are selected using unequal probabilities, the estimators based on equal weights assigned to all houses in a homogenous region can be treated as unbiased

We had given illustrations of how non-coverage in some regions is compensated by increasing weights in other regions. These are all documented by CMIE. Vyas is playing with words when he says that CMIES does not do this in the case of non-responses. Coverage of the survey is lowered both because some households in a region do not respond and because the survey is not conducted in some regions. In cases where no households are surveyed in a region, the weights of households in other regions are increased to compensate.

What Vyas is calling a systematic random sample is simply not a random sample.

Increasing the weights of surveyed houses within the same region might be acceptable if it can be demonstrated that the non-responses are randomly distributed (across categories of respondents and spatially). As we had discussed in our article, there seems to be a greater likelihood of attrition and non-response among certain kinds of respondents. This makes the adjustment of weights problematic.

Survey questions and the documentation

We had argued that the users have to be provided the questions used to obtain the information. Vyas responds to this saying that while interviewers have access to copious amounts of documentation and are made to go through intensive training, there are no specific set of questions and the “interview is administered conversationally”.

All interview-based surveys involve conversations. These conversations are typically structured using a specified set of questions to make sure that information is obtained efficiently and that no item is missed. Whatever the style of conversations used by the CMIE interviewers, it cannot obviate the need for posing questions to obtain information that is then recorded.

Questions have to be fired like bullets from a machine gun if an interviewer has to get information on so many items in a minute.

Vyas confuses a questionnaire (which is nothing but a list of questions to be covered in the interview) with a survey schedule/data entry application (which is used for recording answers). Even when you replace the paper-based schedule with a data entry application, you still need the list of questions to be used for eliciting information on each item. We are simply asking the CMIE to tell us what questions are used to obtain information for each item.

It is obligatory on CMIE that the entire documentation of a survey— including internal training material—is made available to the users. The CPHS have been conducted for over seven years now and it is inexcusable that subscribers have not been provided full documentation on how the surveys have been conducted.

Survey implementation

We had used the total size of the sample in computing the number of houses that an interviewer has to deal with in a day because we assumed that the cases of non-response require multiple visits and would therefore take time. However, even if one does not take that into account and assumes that each investigator conducts 6 interviews in a day to collect information on 300 items, an investigator doing interviews for 8 hours would need to obtain answers to 3.75 questions per minute. Questions have to be fired like bullets from a machine gun if an interviewer has to get information on so many items in a minute. During the lockdown in 2020, while the response rates fell by 40 per cent, the availability of investigators fell by 66 per cent (79 in place of 200) (Vyas 2021a: p5). So, the workload per investigator only increased.

On occupations

We want to clarify that we tried different ways of comparing the occupational structure captured by the Periodic Labour Force Survey (PLFS) and the CPHS. All these results could not have been included in our article. However, whether you take the Usual Status data from the PLFS, or the seven-day current weekly status data, or the data for just the previous day or a combination of the last few days, in all scenarios, and over multiple rounds of the two sets of surveys, the PLFS shows a higher participation of women in economic activities than the CPHS do.

We do not know what makes Vyas say that it is not possible to test the correctness of our estimates. All our computations can be replicated. We would be happy to share all our programs with anyone who wishes to verify our results.

We would like to end by saying that it is for the users to assess the value of the CPHS data. CMIE needs to listen to criticism and work to improve the survey rather than pat its back.

References

Vyas Mahesh (2020a), “Consumer Pyramids Household Survey - Sample Survival & Response Rate”, Centre for Monitoring Indian Economy, March 12, 2020, accessed on July 07, 2021 https://consumerpyramidsdx.cmie.com/kommon/bin/sr.php?kall=wdlkb&img=674355

Vyas Mahesh (2020b), “Consumer Pyramid Household Surveys - The Questionnaire”, March 30, 2020 https://consumerpyramidsdx.cmie.com/kommon/bin/sr.php?kall=wdlkb&img=674358

Vyas Mahesh (2021a) “Consumer Pyramids Household Survey - CPHS execution during the lockdown of 2020”, Centre for Monitoring Indian Economy, August 19, 2021, https://consumerpyramidsdx.cmie.com/kommon/bin/sr.php?kall=wdlkb&img=686689