Practical issues in randomized experiments in education

Education has seen a huge growth in randomized trials over the past ten years, but what have we learned? Robert Slavin gives his personal experience

Within just the past ten years, randomized experiments have blossomed in education in many parts of the world. In the U.S., the Institute of Education Sciences and Investing in Innovation, both in the Department of Education, as well as other funders, have supported more than 300 randomized experiments in preschool to 12 education (age 4–18). In the U.K., the Education Endowment Foundation has led the way in supporting dozens of experiments. The World Bank, Inter-American Development Bank, and other funders have supported many randomized experiments in low- and middle-income countries.

What we know
● Randomized experiments are expensive and difficult, but they can be done.
● Design the experiment to detect outcomes of importance
● Recruit carefully.
● Implement thoroughly to give the intervention a chance to work.

Just a few years ago, skeptics dismissed randomized experiments in education on the basis that they were too expensive and too difficult to do. They are expensive and difficult, but it is now clear that they are possible. Also, much has been learned in the doing of these experiments about how to do them well in real-life schools. The purpose of this article is to share my personal experience from doing, reviewing, and reading many randomized experiments in education.

Get power

In designing educational experiments, Rule #1 is to design the experiment to detect outcomes of importance. For example, if you consider an effect size of +0.20 to be worthwhile and possible in the area you are studying, you need a sample size large enough to detect an effect size of +0.20 with at least 80% certainty.

I’m not going into the technicalities (see Optimal Design, whose URL is in Further Reading), but here are a few general issues:

You have to choose between a cluster randomized design where schools or teachers are assigned, or a student-level design. Student-level designs give you a lot more power per pound or dollar, and they can work when your treatment is addressed to individual pupils, as in one-to-one or small-group tutoring. However, when the unit of change is the school or department or teacher, it is not practical to divide within these units.

If you have strong covariates, such as pretests that correlate well with post-tests, you may get by with 40 schools in a school-level trial, or 40 teachers/classes, to detect an effect size of +0.20. However, if you can afford it, you may wish to recruit 50 schools/ teachers or more. This enables you to detect smaller effect sizes (eg, +0.15), and to otherwise overcome the slings and arrows of outrageous fortune.

Of course, you need not recruit all the students in 50 schools. You may study just Year 4 (3rd Grade) students, or students taking biology, but whatever group you focus on, you will usually need to include the entire group, to avoid adding bias in selecting students within each school.


Recruiting schools and teachers to participate in a randomized study is a delicate art. In general, you are going to schools and giving them a 50/50 chance of receiving an attractive intervention. You need to make a compelling case for why the treatment might work, but you cannot guarantee it will work (if you could, you would not need the experiment). Also, half of the schools that agree to participate will be in the control group, so you can’t oversell the treatment.

People hate to be randomly assigned. Every time you try to recruit a school to a randomized trial, someone will object that there is a 50% chance they will end up in the control group. “That’s unfair,” they’ll say. My response is this. “You’re not using Program X now, so you’re already in the control group. If you like Program X, you can purchase it [assuming this is true]. However, we are offering you a 50% chance to get it for free!”

Incentives for the control group

One important element in recruitment is the possibility of offering incentives to the control group. The best incentive can be the opportunity to receive the program for free at the end of the study. Such a control group is called a “delayed treatment control group.” However, this may be too expensive, and it is not very appealing to the control group if the study is due to last two or three years or more. Sometimes you can deal with a long delay by starting the program a year later in grades no longer in the study. For example, if you were doing a two-year evaluation of a mathematics program following children from grades 4 to 5, you might let the control group start one year later in grade 4.

If delayed treatment is not feasible, you might provide cash payments or in-kind services to provide the control group with an incentive to continue to provide access and data throughout a study. The payment might be given each year as schools turn over their data, to link it to this key activity.

Ensure quality implementation

Here’s bad news: Most randomized experiments find no significant differences on study outcomes. Usually, this is because implementation was not what it should have been.

Do not skimp on implementation support. I’m not suggesting that developers or researchers provide supports they could never provide in dissemination, such as putting a graduate student in every class every day (I’ve seen it happen). But do make sure there is enough training, on-site coaching, and feedback to school staff to give the program a chance to work. If effects are positive the developer will need to insist on providing the same supports provided in the study, and this may make the program expensive. However, skimping on support and thereby failing to show positive impacts can make the program dead.

In addition to increasing the chances of success, providing sufficient on-site coaching also gives the project excellent feedback on what is happening, and excellent insight at the end on what worked and what didn’t work, and why.


Randomized experiments are here to stay in education. The question is how to make sure they deliver on their great promise. With adequate power, skillful recruitment, and high-quality implementation, a promising program is likely to show its full impact, and whether its effects are positive or not, the field will learn from the experience.

About the author

Robert Slavin is Director of the Center for Research and Reform in Education at Johns Hopkins University, and a Professor at the Institute for Effective Education at the University of York. He is Chairman of the Success for All Foundation, which develops, researches, and disseminates educational programs to increase achievement, particularly for disadvantaged students.

 Further reading

Baron J (2003), How to Assess Whether an Educational Intervention has been “Proven Effective” In Rigorous Research. Washington, DC: Coalition for Evidence- Based Policy.

Best Evidence Encyclopedia (2015).

Raudenbush SW et al (2011), Optimal Design Software.

Raudenbush SW and Bryk AS (2002), Hierarchical Linear Models (2nd ed.). Thousand Oaks, CA: Sage.

Slavin RE (2007), Educational Research in the Age of Accountability. Boston: Allyn & Bacon.

Slavin RE (2008), What Works? Issues in Synthesizing Educational Program Evaluations. Educational Researcher, 37 (1), 5–14.

Slavin RE (2008), Evidence-based Reform in Education: What Will it Take? European Educational Research Journal, 7 (1), 124–128.

Slavin RE (2013), Overcoming the Four Barriers to Evidence-based Education. Education Week, 32 (29), 24.

Song M and Herman R (2010), Critical Issues and Common Pitfalls in Designing and Conducting Impact Studies in Education: Lessons Learned from the What Works Clearinghouse. Educational Evaluation and Policy Analysis, 32 (3), 351–371.

What Works Clearinghouse (2013). Procedures and Standards Handbook (Version 3.0). Washington, DC.


November 2015