New Long Covid Exercise-and-Therapy Study Claims Success Despite Clinically Insignificant Findings

By David Tuller, DrPH

A new study of an online group physical and psychological rehabilitation program for Long Covid confirms once again that people given an intervention purporting to help them are more likely to tell investigators that they feel better than those given nothing of the kind. People, this is not a surprising result! It certainly wasn’t necessary for the UK’s National Institute for Health Research to spend £1,200,000 to find out the answer. The real news here is that, even though the study was designed in a way guaranteed to generate an unknown amount of bias, the findings for the primary outcome were so miniscule that they apparently failed to reach the level believed to represent clinical significance.

In other words, don’t believe the abstract when it declares that the intervention was “clinically effective.” The data suggest otherwise–as the investigators themselves acknowledged in the bowels of the discussion section. No peer-reviewer should have allowed a claim of clinical effectiveness to pass unchallenged.

The study–called “Clinical effectiveness of an online supervised group physical and mental health rehabilitation programme for adults with post-covid-19 condition (REGAIN study): multicentre randomised controlled trial”—was published by BMJ in early February. It was led by researchers from the University of Warwick in Coventry, England.

Dozens of media reports highlighted the findings as if they were meaningful. The study was also touted by some of the usual promotors of dodgy psychosocial research in this domain. That included Harvard pulmonologist Adam Gaffney, who has defended the PACE trial, promoted “psychosocial strain” as a leading cause of prolonged disability after COVID-19, and referred to this new study on X (formerly known as Twitter) as a “notable development on the Long Covid front.”

Professor Alan Carson, a leader in the field of functional neurological disorder (FND), also weighed in. He suggested that such findings could eventually force the UK’s National Institute for Health and Care Excellence, to reconsider its 2021 ME/CFS guideline, which recommended against graded exercise therapy. “There comes a point where @NICEComms is going to have to recognise the harm their anti-scientific guidance is doing,” tweeted Professor Carson. (Given that Professor Carson, along with colleagues, has spent years misrepresenting FND prevalence findings from a key study, it is hard to take seriously anything he says about “anti-scientific” behavior.)

The study was unblinded and relied solely on subjective, self-reported outcomes—a design that inevitably generates an unknown amount of bias. It is, of course, hard if not impossible to blind therapeutic interventions, and there is no biomarker for Long Covid. Still, the investigators could have selected any number of objective measures of physical function—actigraphy, the six-minute walking test, a step-stool fitness test, etc. They chose not to.

Beyond that, the study participants had all been hospitalized during acute bouts of COVID-19, unlike most currently suffering from Long Covid. Patients hospitalized for COVID-19 often report a similar set of sequelae but do not necessarily experience post-exertional malaise (PEM)—the defining trait of ME/CFS and a key characteristic in many cases of Long Covid. Extrapolating from this study to non-hospitalized Long Covid patients should only be done with extreme caution, if at all.

Moreover, as I wrote in two posts–here and here–after the study was first announced two years ago, the authors appeared to have minimal understanding or awareness of the significance of PEM; it was not mentioned in the protocol or the patient information sheet. After pushback from the patient community, the investigators professed concern about PEM and promised to monitor participants for such setbacks. It’s anyone’s guess whether they did so effectively or even understood the difference between PEM and garden-variety fatigue.


Primary outcome findings were not clinically significant

The study included 585 participants, with 298 receiving the intervention and 287 assigned to “usual care.” Here’s how the study described the two conditions: “Best practice usual care was a single online session of advice and support with a trained practitioner. The REGAIN intervention was delivered online over eight weeks and consisted of weekly home based, live, supervised, group exercise and psychological support sessions.” The full intervention included more than a dozen sessions in total. It is self-evident, or should be, that this is a wildly unbalanced comparison. Participants who receive an eight-week intervention they are told might help them are more likely to offer positive responses when asked afterwards how they feel, for any number of reasons, than participants who have received virtually nothing. Absent any objective measurements, the reported results are essentially uninterpretable and meaningless.

Furthermore, the investigators made no effort to control for the time and attention and care received by participants in the intervention arm. Even beyond the issue of bias, therefore, it would be impossible to know whether any reported changes could be attributed to any of the active ingredients in the intervention rather than the fact of receiving time and attention and care. Perhaps an eight-week online Harry Potter book club or a French cooking class on making the perfect crepes Suzette would have produced the same results. Who knows?

In any event, adherence to the program was pretty poor. Less than half of participants in the intervention arm–47%–were assessed to have fully complied, even though the metric established by the investigators for full compliance did not actually require attendance at all sessions. And 20% of these participants were lost-to-follow-up; in other words, they did not provide data for the three-month primary outcome endpoint, compared to only 14% in the comparison group. Why the poor adherence and the higher drop-out rate in the intervention group? The study does not even attempt to provide an explanation.

The primary outcome was a health-related quality of life measure called PROPr, at three months. This measure is calculated from several subscales, such as depression, fatigue and cognitive function; the PROPr scale ranges from -0.02 to 1, with 1 being perfect health. Secondary outcomes were the PROPr scores at six and twelve months, plus outcomes for the PROPr subscales and other subjective measures at all time points.

At three months, both groups had shown improvement but their average PROPr scores remained at the far low end of the scale, representing continuing illness. At both three months and twelve months, the difference between the means of the PROPr scores in the two arms was 0.03. At the interim six-month time-point, the difference fell to only 0.02–a drop that suggests these findings are fragile and far from a robust endorsement of the intervention. (Not surprisingly, the secondary measures trended positive, like the PROPr score itself, but modest changes on subjective scales in an unblinded trial cannot be taken at face value.)

According to the paper’s description of the PROPr scale, “a difference of 0.03 to 0.05 is considered to be clinically important. However, as noted by a smart observer on the Science For ME forum noted, the investigators themselves acknowledged the questionable nature of this statement in the following passage from the discussion section:

“Research completed since we started this study suggests a minimally important difference of 0.04 on the PROPr score between groups. Our observed differences of 0.03 (95% confidence interval 0.01 to 0.05) at three months and 0.03 (0.01 to 0.06) at 12 months are smaller than this suggestion. However, the complier average causal effect analysis showed a larger effect of 0.05 (0.01 to 0.09) at three months and 0.06 (0.01 to 0.10) at 12 months, suggesting that the true effect, in those fully complying with the intervention, might exceed this threshold.”

Ok, then. The “complier average causal effects analysis” involves assessing the results for those who were most adherent to the treatment. So what they’re saying here is that it’s just fine to claim their findings demonstrated clinical effectiveness and/or met the threshold for clinical significance because the results for those who were most compliant with the intervention exceeded the threshold of 0.04. That’s a bogus argument. Unfortunately for the investigators, you can’t claim clinical effectiveness or significance as a global finding if that’s only true for a subset of your sample and not borne out by the main analysis.

In this domain, it is important to actually review study details–and not just take the word of pompous, self-important psychosocial and FND influencers like Dr Gaffney and Professor Carson. Better, in these instances, to pay attention to folks like David Putrino, a neuroscientist and physical therapist (trained in Australia but not licensed to practice in the US) who runs a rehabilitation center at Mt Sinai Health System in New York. Here’s his typically impassioned X thread about this latest mess.