Trials: Minimising failure, maximising success

Pam Hanley and Sarah Blower discuss the lessons from two evaluations conducted in busy schools

Conducting randomised controlled trials (RCTs) to evaluate interventions in the real-life setting of busy schools represents a difficult challenge for researchers. Here we discuss some of the pitfalls and ideas about how they can be overcome, based on two case studies.

Promoting Alternative Thinking Strategies

Birmingham City Council commissioned the Dartington Social Research Unit to evaluate the effectiveness of a school-based programme called Promoting Alternative Thinking Strategies (PATHS). This is one of a growing number of interventions designed to promote children’s social and emotional learning. It is intended to be taught by teachers two to three times per week, ideally in 20-30 minute lessons. Data was collected on children three times over two years as they progressed through Reception and Year 1 (age 4-6).

What we know
● It is difficult to disseminate null or negative research findings.
● The results of rigorous evaluation mean more if more schools are willing to participate in the research.
● We should still challenge whether the funding decisions of central government are evidence-based.

Recruitment of schools to the trial went really well. Fifty schools were needed to detect the effects of PATHS and all mainstream primary schools in the city were invited to take part. They were given information about the PATHS programme and the intended RCT. There were four “roadshows” for head teachers comprising a free breakfast and presentation about the project. “Champions” – four or five particularly enthusiastic heads – were identified to act as advocates among the school community. These strategies appeared to work well and 64 schools signed up.

As was anticipated, retaining schools in the trial was much more challenging. Several strategies aimed at minimising drop-out were implemented:

  • All 64 schools were invited to a lunch where the randomisation was conducted live. The goal was to bring the schools together so that they felt part of the project and also to be transparent about the randomisation process.
  • Half a day of supply cover was provided for each teacher to complete the research measures at each data collection point.
  • Control group schools received £1,000 each for participating, to balance out the extra resources, materials, and training that intervention schools would get.

However, eight schools withdrew immediately after randomisation. Between the first and last wave of data collection, four more schools dropped out. There were several reasons: one PATHS school withdrew because it was low-performing and had to concentrate on special measures, and another because it was very high-performing and felt it did not need PATHS. One control school got a new head teacher who did not want to continue with the trial, and another maintained that even with supply cover it was taking too much time to complete the measures.

Data collection was very burdensome on teachers. They had to submit two questionnaires on each child in their class, data about the school context, and so on. Anecdotally, some schools found the supply cover insufficient and for the second and third data collection points a teacher-level incentive (a lottery draw for book vouchers) was introduced. The weekly implementation logs were radically shortened and simplified, increasing response rate from 27% to 78%. In hindsight, the £1,000 to control schools would have been more effective if paid at the end of the project to keep them incentivised rather than at the start as “compensation” for not being in the intervention group.

Thinking, Doing, Talking Science

The Education Endowment Foundation (EEF) funded the Institute for Effective Education to evaluate Thinking, Doing, Talking Science (TDTS). This programme, developed by Oxford Brookes University and Science Oxford, is designed to help generalist primary school teachers plan challenging science lessons that will enhance children’s engagement and cognitive skills. The study focused on Year 5 (9-10 years old) and at least two teachers per school received five day-long training sessions over one academic year.

The evaluation approach was a cluster RCT involving 42 English schools, and more than 1,200 pupils. Pupils were pre-tested in Year 4 (8-9 years old) after which schools were randomly assigned to receive the intervention or business-as-usual control. The pupils were re-tested 18 months later. It was a wait-list trial, with all the control schools being offered TDTS in September 2014.

In the absence of a nationally recognised, standardised science test, the pretests and post-tests were specially constructed measures drawn from existing standardised assessments spanning a range of science topics. They were age-appropriate and curriculum-relevant, and covered both conceptual understanding and factual knowledge. The tests were 40 minutes long. After the intervention, pupils also completed a short, tick-box attitude survey.

Recruitment proved much more difficult than the developers had anticipated. The target had been set at 46 schools, to allow for 13% attrition, which was around the rate then experienced across EEF studies. However, only 42 schools signed up to the study. To achieve this, the developers spent “many, many hours” on the telephone to schools. The study was confined to the local administrative region (Oxfordshire), which had both positive points (personal contacts could be approached) and drawbacks (a limited population of schools was available to draw from). The developers had a sound understanding of the complexities of the RCT approach and knew it was essential to fully appraise schools of their responsibilities in signing up to the project, working closely with the evaluation team to achieve this. This may have helped retain schools for the duration of the study, with only one (a control school) dropping out.

Attrition from the programme was also reduced by involving two teachers per school for mutual support and having a launch day in each. Time lost to teaching (including extra planning time) was reimbursed, and schools were given an equipment grant. Strong relationships were constructed between developers and schools.

Reasons for minimal attrition from the evaluation include:

  • A recruitment conference attended by the evaluation team as well as the developers, to clearly explain the RCT design, requirements, and importance of retention. This helped schools realise that the control and intervention arms are equally valuable and both are essential to the trial;
  • Randomisation after baseline test, ensuring schools were already involved in the study before they learned about their allocation, perhaps reducing their likelihood to leave at that stage;
  • A wait-list design so all schools got the programme eventually, and communication with controls about implementation plans across the last few months of the evaluation;
  • Clear and timely communication, with adequate warning of timing and duration of tests; and
  • A low burden on the schools: the tests only took 40 minutes and the attitude questionnaire was very short. A key factor, the importance of which should not be under-estimated in such studies, was the excellent relationship between developers and evaluators. This meant any issues were communicated quickly, allowing prompt action, and that confusion in the schools from being contacted by two agencies (the developer and the evaluator) was virtually eliminated.


These two examples of RCTs involving schools highlight some of the challenges that can threaten the validity of a trial. These include a failure to recruit, poor implementation fidelity, and low retention rate. In the PATHS trial, successful recruitment did not translate into high retention, whereas TDTS struggled to recruit, yet few participants were lost during the trial. What made the difference: was it something inherent to the programme or to the evaluation? Were TDTS schools more likely to stay engaged because teachers enjoyed the training, or because the evaluation was less burdensome? Conducting a trial is a complex business involving many entwined elements. Full disclosure of information and excellent communication are vital, but so is the ability to remain flexible enough to respond to unexpected challenges.

About the authors

Pam Hanley is a Research Fellow in the Institute for Effective Education at the University of York. Her interests include science education, mindfulness-based approaches in schools and the use of mixed methods in research.

Sarah Blower is a Research Associate in the Institute for Effective Education at the University of York. Her research interests are in the design and evaluation of complex interventions aiming to improve children’s social, emotional and behavioural difficulties, particularly in the areas of child stress, parenting and family support. Before joining the IEE, Sarah worked as a researcher at the Dartington Social Research Unit.



November 2015