By David Tuller, DrPH
A recent study from JAMA Network Open, called “Resistance Exercise Therapy After COVID-19 Infection: A Randomized Clinical Trial,” demonstrates some of the flaws that so often mar papers in this field of research. The trial’s reported results do not warrant the optimistic conclusion that the intervention “may be a generalizable therapy for individuals with persisting physical symptoms after COVID-19 infection.”
Not surprisingly, the study received positive media attention. An article disseminated on Yahoo, headlined “Long-COVID Patients Should Focus on This Training,” included the following quote from the study’s lead investigator and primary author: “Our study shows the benefits of strength training for recovery after COVID-19 and suggests that people suffering from persistent symptoms after a COVID-19 infection could benefit from this type of training.”
Not so fast! A core problem here is one we have seen before: Statistically significant results are being touted as evidence of benefits even when they fall below the threshold deemed to be clinically significant.
The trial was conducted in Scotland by a team of investigators led by Colin Berry, a professor of cardiology at the University of Glasgow. Participants included 233 patients with confirmed cases of COVID-19, falling into three categories: those who had been hospitalized and discharged but had continued to experience symptoms for at least four weeks; those who had not been hospitalized but continued to experience symptoms for at least four weeks; and hospitalized patients convalescing from their acute disease.
This mish-mash of patients seems rather odd. The cases in these three categories would likely differ significantly from each other. Tossing them all together in one bigger study does not seem like the best way to achieve the most interpretable results for each separate group. Whatever.
All participants received care as usual. Half were randomized to also receive a 12-week resistance exercise training program, tailored to their individual levels of disability. In this case, the designated primary outcome was the Incremental Shuttle Walk Test (ISWT), a commonly used measure for assessing cardiovascular fitness and exercise capacity, at three months after randomization—essentially, at the end of the training program. The trial also included a host of secondary indicators–self-reported, others objectively measured.
It is important to note that no one reasonably objects to exercise interventions for Long COVID patients unless they suffer from post-exertional malaise (PEM), in which case these therapeutic approaches are contra-indicated. It is widely understood that exercise is generally good for people! Those recovering from acute illness are likely to gain muscle strength from resistance exercise—unless this approach is contra-indicated because of PEM.
In any event, the participants in this trial do not seem to have been assessed at baseline for PEM, although data are provided for its presence at at the end point. It would have been interesting to see if there were any changes from start to finish.
**********
Reported benefits lower than the minimal clinically important difference
But here’s the main problem for the investigators and their primary outcome. Per the statistical analysis plan outlined in their protocol, the ISWT’s threshold for clinical significance—known as the minimal clinically important difference (MCID)—is 46 meters. The investigators cited an authoritative 2022 paper from the European Respiratory Society (ERS)–“Use of exercise testing in the evaluation of interventional efficacy: an official ERS statement”–for this MCID. They used it to informed their power calculations for determining the needed sample size. (MCID values for a given measure can vary per the methodology used, population studied and other factors.)
In the trial, participants in the intervention arm performed better than the control group by an average of 36.5 meters. Let’s acknowledge the obvious: 36.5 meters is quite a bit lower than the MCID of 46 meters that the investigators themselves referenced in their statistical analysis plan. Oops!
Researchers with a robust sense of integrity would have reported such a salient detail, however disappointing or embarrassing or contrary to their expectations. Yet the JAMA Network Open paper made no mention of it, effectively disappearing a very revealing data point. This omission, in and of itself, serves to misrepresent the findings and is arguably a form of research misconduct. Readers deserve to know that the trial’s primary outcome did not reach its own threshold for clinical significance.
(A published comment to the article highlighted the omission and the fact that the results fell below the established MCID; in their rebuttal, the investigators simply ignored the point, in effect affirming the validity of the criticism. These issues have also been discussed on the Science for ME forum. Another published comment on the journal’s site highlighted some serious ethical concerns with the trial; the comment author posted an X thread about it.)
Interestingly, a pre-print of the article posted earlier this year did seek to navigate the MCID issue without acknowledging a problem. Rather than citing the MCID of 46 meters referenced in the protocol, the pre-print flatly declared that “the effect size exceeded the minimum clinically important difference of 35.0.” The reference for this lower MCID was a different study–an earlier one, from 2019—that calculated an estimate of 35 to 36.1 meters in patients with chronic obstructive pulmonary disease.
That section of the pre-print is not part of the published text. Perhaps peer reviewers raised questions about it. Perhaps the investigators themselves thought better of including it, for whatever reason, given that the 2022 paper was a much more comprehensive overview of walking tests; also, one of the two co-authors of the 2019 article was a co-author of the 2022 paper as well. In any event, as long as the investigators fail to confront the MCID issue or offer a reasonable account of their shifting positions, it is hard to take what they have written at face value.
And how about those many secondary outcomes? A few yielded positive results, but many did not. In particular, as the investigators acknowledged, “measures of physical activity, including accelerometry and patient-reported fatigue or perception of frailty, did not improve.”
Given that the trial was unblinded, responses to any self-reported or subjective measures would be infused with an unknown amount of bias. This tendency would likely be reflected in modestly positive responses from those who received the intervention–even in the absence of actual benefits.
Beyond this key factor, the investigators failed to include any attempt to adjust for multiple outcomes. This is a standard strategy to compensate for the fact that, given a great many different statistical tests, some are likely to come up as positive by chance alone. Considering the amount of testing conducted in this study, adjusting the results for multiple outcomes might have meant that some positive findings were no longer statistically significant. (Whether positive findings for these secondary outcomes were clinically significant is another question.)
In other words, don’t pay attention to any of the rosy assertions arising from this trial–and the slanted way in which the results have been presented.