Seven possible impact/outcome evaluation design types

An Outcomes Theory Knowledge Basic Topic

There are seven possible outcome/impact evaluation design types which can be used to prove that changes in high-level outcomes are attributable (proved to be caused by) a program or intervention. This topic describes each of these evaluation design types and accompanies the descriptions with a diagram visually explaining each design type. There can be a number of sub-designs within these design types. It is claimed that this list of seven design types is an exhaustive list. Therefore the list can be used to assess the appropriateness, feasibility and affordability of undertaking an outcome/impact evaluation of a program or intervention. This is a topic within the Outcomes Theory Knowledge Base which is an interlinked set of topic articles.


Introduction [1]

There are are seven possible design types for doing high-level outcome/impact attribution evaluation [1]. This set of design types is used within the Outcomes Theory framework and the applied Easy Outcomes system to help users to assess whether or not it is appropriate, feasible and/or affordable to undertake an outcome/impact evaluation of a program or intervention. The power of this approach lies in its claim that this is an exhaustive set of design types. If a user has looked at the appropriateness, feasibility and affordability of each of these design types in regard to a particular program, they can then definitely say that it is, or is not, possible to undertake an outcome/impact evaluation of the particular program (subject to appropriateness, feasibility and/or affordability). High-level outcome/impact design types are one of the five building-blocks of all outcomes systems (evaluation systems, performance management systems, results-based systems etc.) which are detailed within Outcomes Theory. (See: Selecting impact/outcome evaluation designs: a decision-making table and checklist approach for an article on how to select amongst these different types of designs when designing an impact evaluation and Impact/outcome evaluation designs and techniques illustrated with a simple example to see how each of the designs could be used in regard to the same simple example.

The seven possible designs are:

1. True experimental deign
2. Regression-discontinuity design
3. Time-series design
4. Constructed matched comparison group design
5. Exaustive alternative causal identification and elimination design
6. Expert opinion summary judgement design [4]
7. Key informants summary judgement design [4].

In the case of each design type below, diagrams are used to illustrate the designs so as to assist cross-discipline clarity regarding what a particular type of design actually entails. It is hoped that over time, this set of designs (amended if necessarily) can provide an exhaustive and mutually exclusive set of designs such than any evaluation can unambiguously be identified as using one or more of these designs [2]. At the moment these designs are specified using language used in evaluation and policy analysis, in future they may be specified in more generic language which includes the way in which such designs are described in econometric, policy analysis or other disciplines. Figure 1 shows the conventions used in the high-level outcome/impact evaluation design diagrams which are used to describe each design type.


Figure 1: Conventions used in high-level outcome/impact evaluation design diagrams


True experimental design

In the typical simplest case of this design, a group of units (people, schools, hospitals) is identified which is the focus of the intervention being studied. A sample is taken from this group (if there are large numbers of the particular unit on which the intervention could be used). The sample is randomly divided. One half of the units have the intervention applied to them (the intervention group) and the other half do not (control group). Changes in measurements of the high-level outcomes are compared before and after the intervention has been run. It is presumed that any significant difference (beyond what is estimated as likely to have occurred by chance), is a result of the intervention. This is because there is no reason to believe that the units in the intervention and the control group differed in any systematic way which could have created the difference, apart from receiving, or not receiving, the intervention.

A variation on the true experimental design is to use what is called a 'waiting list' or 'pipeline' design. This design uses the same approach as the true experimental design however the intervention is only withheld from those in the control group for a limited period of time (the time they spend on the 'waiting list'). This is in contrast to a true experiment where the control group would never get the intervention. This design is often regarded as more appropriate (because it is more ethical in that the control group do not miss out on the intervention) and more feasible (because participants and stakeholders are more likely to accept it) than true experiments. The problem with the design is that the effect of the intervention needs to be able to be measured in the time between the intervention group getting the intervention and the control group getting the intervention. In the case of interventions which take a relatively long time to improve outcomes, the waiting-list/pipeline experimental design is not appropriate.

Figure 2 illustrates the true experimental design:



Figure 2: True experimental design

Regression discontinuity design [2]

A regression discontinuity design can be used in the case where units can be ranked in order based on measurement of a high-level outcome before any intervention takes place. For instance, reading level for students or crime clearance rate for a police district. A sub-set of the units below a point on the outcome measurement are then given the intervention. After the intervention has taken place, if it is successful, there should be a clear improvement in those subject to the intervention but no similar amount of improvement amongst those units above the cut off point (which did not receive the intervention). This design is more ethically acceptable in a case where there are limited resources for piloting an intervention because (in contrast to a true experiment) the intervention resources are being allocated to those units with the greatest need.

Figure 3 illustrates this design:



Figure 3: Regression discontinuity design
Time series design

A time series design uses the fact that a sufficiently long series of measures have been taken on a high-level outcome. An intervention is then introduced (or has been introduced in a retrospective analysis) and if the intervention has had an effect, a clear shift in the level of the high-level outcome measurements should be observable at the point in time when the intervention occurred.

Figure 4 illustrates this design:

Figure 4: Interrupted time-series design

Constructed matched comparison group design

This design is where a naturally occurring group is located which is similar in as many ways as possible to the group which is receiving the intervention apart from the fact that it is not receiving the intervention. For instance this could be different administrative units which do not receive an intervention. Or different towns or different countries. In a somewhat different version of this design (but which employs the same underlying logic) estimates are made of what happens on average to people with a certain set of characteristics (e.g. who have been on an unemployment benefit for four weeks). An intervention is then given to a group and what happens to them (how long they remain on the unemployment benefit) is compared to the predicted amount of time they should have remained on the unemployment benefit if they had not received the intervention. This type of design is called propensity matching. Problems arise for constructed matched comparison group designs because (in contrast to true experiments) the comparison group is more likely to be different from the control group. There is a set of techniques for attempting to deal with this problem. They are set out in Techniques for improving constructed matched comparison group impact/outcome evaluation designs.

Figure 5 illustrates the general case of constructed matched comparison group designs. 



Figure 5: Constructed matched comparison group

Exhaustive alternative causal explanation elimination design

The exhaustive alternative causal explanation design proceeds by examining all of the possible alternative hypothetical outcomes hierarchies that may lie behind the changes observed in high-level outcome measurement. This can use a range of techniques all directed at identifying and excluding alternative explanations to the intervention. Sometimes this is described as more “forensic-type” method rather than the experimental approaches used above. Figure 6 illustrates this design.

 


Figure 6: Exhaustive alternative causal explanation elimination design


Expert opinion summary judgment design [3]

In this design, an expert is asked to give their summary judgment opinion regarding whether high-level outcomes are attributable to an intervention. They are expected to use whatever data gathering and analysis methods they normally use in their work in the area and to draw on their previous knowledge in dealing with similar instances.

Figure 7 illustrates this design.



Figure 7: Expert opinion judgment design

Key informants opinion summary judgment design [3]

In this design, key informants (people who have experience of the program or significant parts of the program) are asked to give them summary judgment opinion as to whether changes in high-level outcomes are attributable to the intervention. They are expected to use whatever data gathering and analysis methods they normally use in their day to day work and to draw on their previous knowledge in dealing with similar instances. These judgments are then summarized and analyzed and brought together as a set of findings about the outcomes of the program. [3][4]

Figure 8 illustrates this design.

Figure 8: Key informants' judgment design


Links


Please comment on this article

This article is based on the developing are of outcomes theory which is still in a relative early stage of development. Please critique any of the arguments laid out in this article so they can be improved through critical examination and reflection.

Citing this article

Duignan, P. (2005-2009). Seven possible impact/outcome evaluation design types. Outcomes Theory Knowledge Base article No. 209. (http://knol.google.com/k/paul-duignan-phd/seven-possible-outcomeimpact-evaluation/2m7zd68aaz774/10).

[If you are reading this is a PDF or printed copy, the web page version may have been updated.]


[1] This set of outcome/impact design types is an interim set. It may be subject to change in the future if it is established that there is a design type which has not been included within the list. Comments on whether this list is exhaustive can be posted below.

[2] Regression discontinuity design could be regarded as being a type of Constructed Matched Comparison Group Design. However it has been separately listed here because there are some stakeholders who separate true experiments and regression discontinuity out from other designs as providing a more robust estimate of effect. This framework does not take a position on this issue, it simply allows those wishing to make that claim to make it.

[3] The first five of these designs are based on the thinking of the international evaluation expert Michael Scriven on possible ways of establishing causality in evaluation.  The author has added the final two as some stakeholders in some situations regard these as providing sufficient evidence of causality for them to act upon.  Whether or not these designs are accepted by a particular community of users of an outcomes system is up to that community of users. In theory it would be possible for a community of users to reject the notion that there is a particular set of whole-intervention outcomes attribution evaluation designs which provide more robust outcome attribution than other types of evaluation (often known as formative or process evaluation).  Some of those who adopt a post-modern, relativist, interpretativist, constructivist or some other theories of science may want to do so.  Outcomes theory only seeks that such communities of users make an explicit decision about their rejection so that they can be clear about what is known, not known and what is feasible and affordable to know about a particular outcomes system. 

[4] A number of stakeholders (called communities of users in outcomes theory) believe that the last two designs would not usually be expected to establish causality as robustly as the other listed designs. However these designs are frequently used by some communities of users and therefore deserve a place in a full typology of whole-intervention outcome attribution evaluation designs; in particular circumstances they are feasible, timely, affordable and accepted by stakeholders as better than having no whole-intervention high-level outcome attribution information.  Even though they are often more feasible, timely and affordable than the other five designs, decision-makers have to consider on a case by case basis whether these designs can actually provide any coherent information about attribution or whether they will just end up being examples of pseudo-outcomes studies. Pseudo-outcomes studies are ones which do not contribute any sound information about attribution to a particular intervention but merely record that outcomes improved over the time period that the intervention was running.

V1-2 2005-2009

[Outcomes Theory Article #209]

References

  1. Some of this work was developed when the author was the 2005 New Zealand Fulbright Senior Scholar working at the Urban Institute in Washington D.C

Comments

evaluation battles for supremacy

I don't think you need to be a relativist, interpretivist, constructivist or post-modern to be skeptical of blanket claims of superiority on outcomes attribution for any particular evaluation design. You would just need to be robustly post-Popperian and humble about what we think we know, let alone what we don't know, in any period of time.

I think it would be a mistake to attempt to rank order these outcome attribution techniques. I have big doubts about the blanket claims of 'robustness' of either quantitative or qualitative methodologies for determining causal attribution - particularly since causality is an unobservable. I think both (or all seven) methodologies are far too dependent on what has to be taken for granted for them to make sense and to seem credible. Unfortunately, in their advocate forms any method can be conducted in ways that are conceited and unquestioning about their taken-for-granteds. And to me, the most important or valuable aspect of 'scientificity' is being unconceited and genuinely seeking the truth* by being open to just this sort of questioning.
*even constructivists make ontological claims about the truth.

Last edited Aug 24, 2009 4:08 PM
Report abusive comment

fair comparisons

In the interests of academic fairness, I think you need to add to the last two designs the techniques that are used in qualitative designs to confirm or validate 'findings'. I don't think it is fair or adequate to say they amount to 'whatever data gathering and analysing methods they normally use in their day to day work...'.
I am no expert but I do know that one of the most commonly known techniques is triangulation, usually understood as assessing validity through convergence (but there is at least one constructivist alternative use of triangulation that rejects the singular truth presumption). This was a concept/practice first used in social and behavioural scientists back in the 1960s in the field of measurement (eg its the principle of validation behind inter-rater reliability and for multiple items being used in one survey to measure a concept/practice). But since the 1970s the concept was adopted in qualitative research and expanded by Denzin to include multiple methods, multiple evaluators, multiple theories, multiple subjects/perspectives.

Last edited Aug 27, 2009 2:34 PM
Report abusive comment
Paul Duignan, PhD
Paul Duignan, PhD
Outcomes and Evaluation Specialist
Article rating:
Your rating:
All Rights Reserved.
Version: 53
Versions
Last edited: Aug 24, 2009 12:42 PM.

Categories

Based on community consensus.

Activity for this knol

This week:

123pageviews

Totals:

2830pageviews
13comments