A guide to quantitative impact evaluations; what, when, how and who?

Our Principal Consultant, Michele Binci, outlines key questions to consider when contemplating this highly robust tool for gathering evidence and informing policy.

Authors

A well-designed evaluation can produce evidence crucial for understanding whether and how policies, programmes, or pilot interventions have worked satisfactorily and achieved their stated objectives. In turn, this evidence can be used to inform policy reforms, programme decisions and pilot scale-ups. Quantitative impact evaluations (QIEs) can play a particularly important role, as they are able to generate a robust measure of the impact of a programme on their target beneficiaries, by comparing a treatment group (those receiving the intervention) to a control group (those not receiving the intervention).

More specifically, through what is known as a counterfactual-based design, QIE can estimate the exact magnitude of impact that is directly attributable to an intervention, thus telling programme implementers, international donors, and policy makers how much of the improvements measured on outcome indicators of interest (e.g. nutritional outcomes, poverty rates, learning outcomes, etc. ) were due to their intervention.

Although QIE is a powerful tool for gathering evidence and informing policy, it is critical to determine, on a case-by-case basis, whether it represents the preferable and most suitable evaluation approach, depending on both theoretical and practical considerations. In this blog, we are sharing the key questions that we believe should be addressed when considering QIE.

What is the question?

In QIE, like in all types of evaluation, the first point to consider is the Theory of Change (ToC) of the programme under evaluation, and the research (or indeed evaluation) questions that derive from the ToC that are supposed to be answered. This entails adopting a theory-based approach that draws on the ToC to identify the key questions the evaluation should address, verify the assumptions underpinning the ToC causal pathways, and draw conclusions on whether, and how the programme contributed to the results.

Theory-based evaluation is not method-specific, and offers a flexible framework to either support an independent QIE or integrate QIE with complementary evaluation methods. Understanding an intervention’s ToC, and developing the resulting questions to be addressed, thus represents the starting point of any evaluation design.

When is QIE the answer?

Once the evaluation question is clear, the next step is to decide whether QIE is the best answer. As mentioned above, QIE is uniquely valuable when the focus is on producing a measure of impact which can be directly attributed to the intervention under evaluation, such as a policy reform, a long established programme, or a new pilot. However, if the research and policy interest lies in investigating the implementation mechanisms, or the factors influencing implementation, then other evaluation approaches, such as process evaluation, are more likely to be appropriate.

Similarly, if what matters is capturing stakeholders’ views and lived experiences of the changes taking place (or failing to materialise), then a more qualitative inquiry would be preferable. A broad and comprehensive evaluation framework could encompass all these different lines of investigation, and should do so when possible and advisable. It is, however, also very important to ensure that the most suitable, relevant and efficient approach is chosen, depending on the focus of the evaluation questions. Given that a robust QIE requires a large budget and has stringent technical requirements, its added value in answering the evaluation questions should be carefully considered.

Which QIE?

If QIE is indeed the best answer for the core evaluation question(s), then it is time to decide what type of QIE is preferable and feasible to apply.

The challenge faced by any rigorous QIE is essentially a problem of missing information. The evaluation can collect data on the beneficiaries of an intervention, but cannot gather information on what would have happened to those beneficiaries had they not been targeted by the intervention. This is known as the problem of the counterfactual, and it can be tackled by constructing a control group that is as similar as possible to the beneficiaries targeted by the intervention (the treatment group), so as to be a valid comparison. A large range of experimental and quasi-experimental QIE designs exist that attempt to tackle the problem of the counterfactual, and we have extensive experience of implementing most of them.

  • Experimental design: Randomised Controlled Trials (RCTs) are typically considered the most robust QIE design as they provide a convincing estimate of the counterfactual through an experiment. By randomising which units (e.g. individuals, households, or schools) are covered by an intervention (e.g. receiving a cash transfer or a training course for teachers) and which are not, a control group is established that, by construction, is identical to the group receiving the intervention. RCTs are very popular in the field of impact evaluations, and were recently further boosted by the awarding of the 2019 Nobel Prize in Economics to Banerjee, Duflo and Kremer, whose research is largely based on RCTs. We have developed several successful RCTs over a number of years, including the evaluation of the first phase of the Kenya Hunger Safety Net Programme (HSNP), in Kenya, or the evaluation of the Child Development Grant Programme (CDGP), in Nigeria. RCTs have great strengths, but they also have some clear limitations, as we discussed in detail in a previous blog. When RCTs are not feasible, the main QIE alternative is represented by quasi-experimental designs.
  • Quasi-experimental design: This type of design is known as quasi-experimental since it attempts to approximate an experimental approach by building a comparison group through econometric techniques. The quasi-experimental designs most commonly used in the QIE literature are Regression Discontinuity (RD), Propensity Score Matching (PSM) or other matching algorithms, and Difference-in-Differences (DID). The RD approach works by comparing treatment and comparison units (e.g. households) that are very close to an intervention’s eligibility cut-off (e.g. a poverty score), as units just either side of this eligibility cut-off are expected to be very similar before the intervention starts. A good example of RD design is the evaluation of the Benazir Income Support Programme (BISP), in Pakistan. PSM constructs a valid counterfactual by matching units in the treatment group with units in a comparison group that are as similar as possible to each other according to some relevant observable characteristics (e.g. age, gender, level of education, or socioeconomic factors). We developed our own PSM approach for the evaluation of the Education Quality Improvement Programme (EQUIP-T), in Tanzania. Finally, DID is used when information from the treatment and comparison groups is collected at two points in time, and works by comparing differences in outcomes over time in the treatment group, with the differences in outcomes over time in the comparison group. This double differencing approach produces a robust measure of impact and is very popular given that it is relatively easy to implement, One example of this being the evaluation of the Bihar Child Support Programme (BCSP), in India.

Regardless of whether the QIE is based on an experimental or quasi-experimental design (or even non-experimental designs, such as panel fixed effects-models or instrument variables techniques, which offer an additional alternative when the construction of a counterfactual is not possible), we always strive to design and implement our QIEs within a mixed-methods framework. This is because quantitative and qualitative methods both have specific strengths and weaknesses. However, jointly deploying these methods, through a mixed-methods framework, ensures that the approaches complement one another and build on each other’s strengths. Integrating quantitative and qualitative methods thus allows us to give a more comprehensive response to the evaluation questions.

How do you go about it? Do you have the right tools?

Now that the most appropriate QIE approach has been selected, it is critical to make sure that everything is in place for a robust and successful QIE implementation, in the field or remotely. First of all, the optimal sample size for the QIE needs to be determined. How many units (e.g. individuals, households, schools, etc.) should the treatment and control groups contain to be able to attribute observed changes to the intervention under evaluation with statistical confidence, without wasting any resources? This optimal sample (i.e. not too small, not too large) is normally determined through a power calculation method. Several design and sampling requirements need to be considered when determining the optimal sample size, but there was no power calculation tool that allowed us to fully achieve this. To bridge this gap, we developed our own power calculation command, which allows us to account for the different scenarios and parameters that characterise and define experimental or a quasi-experimental designs.

Once the sample is identified, data collection starts. In order to produce robust estimates of impact, high data quality is paramount. This is why data collection protocols and data quality assurance procedures are crucial. These include pre-testing and piloting of survey instruments, training of survey supervisors and enumerators, extensive fieldwork supervision activities, and a thorough data monitoring and quality assurance system. The latter should track data collection progress and interviewers’ performance, identify uncommon patterns in the data, and provide timely feedback to implement course-corrections that improve data quality. Key to both quality and efficiency of the data collection is also the use of computer assisted interviewing. In other words, collecting data with tablets instead of paper questionnaires. We routinely use Computer-Assisted Personal Interviewing (CAPI) in all our in-person QIEs, and have been building our expertise in Computer-Assisted Telephone Interviewing (CATI) too, given the need to adapt our QIEs during Covid-19.

Who’s your audience?

When the impact estimation results are out, building an effective narrative on the impact (or lack thereof) observed is then critical to ensuring that all evidence is used and no learning is wasted. Given that QIE findings are often shrouded by technical terminology and a long list of caveats and limitations, a well thought through dissemination strategy for the evaluation results is particularly important. The first element to consider is the number and type of audiences that the QIE results are intended for. These can include technical specialists, who may assess the robustness of the QIE approach adopted as well as its possible contribution to the literature; international organisations, aid agencies and NGOs, which will want to understand whether their intervention achieved its goals; and last but not least, local stakeholders, including national and regional governments that can put the learning into practice by letting the evidence guide their policy and programmatic direction.

This means that an all-encompassing, lengthy evaluation report cannot be the only channel to present QIE results. Multiple dissemination outputs is the more audience-tailored strategy that we favour: a synthesis report, presenting findings in an accessible and reader-friendly manner; a technical report, including all methodological annexes and statistical tables for those who want to dig deep into the technicalities; a pamphlet, including headline findings and recommendations presented with the use of infographics; and a series of dissemination events and workshops, where key stakeholders get a chance to engage with the results, ask questions, and discuss the best ways to put the evidence into action. Because a successful QIE is one that achieves impact for the populations it set out to study, as much as one that measures impact on them robustly.

We place a great deal of emphasis on the importance of evidence and evidence-based decision-making and we have a long and successful experience of designing and implementing QIEs (e.g. Kenya Hunger Safety Net Programme (HSNP), Benazir Income Support Programme (BISP), Education Quality Improvement Programme (EQUIP-T)). Our QIEs are based on a wide spectrum of approaches, and they span across several thematic areas and many different countries and contexts.

About the author:

Dr Michele Binci is the Quantitative Impact Evaluation (QIE) Lead at Oxford Policy Management (OPM). Michele has a PhD in Development Economics and has worked as a consultant for several international organisations, including UN FAO, The World Bank and UNICEF, before joining OPM. At OPM, Michele is Team Leader and Technical Lead in large impact evaluations across different thematic areas and countries, with a focus on experimental and quasi-experimental designs within mixed-methods frameworks. He also chairs OPM’s Technical Community of Practice on QIE, setting the standards for OPM’s QIE work and further building the capacity of OPM staff. To get in touch please email @email

Area of expertise