UPDATE, MAY 16: As I mentioned, the trial registration did not cite “recovery” as an outcome. However, the various study documents include a number of different statements about the status of physical activity, fatigue, and recovery as endpoints. Of four relevant documents besides the trial registration, one included the definition of recovery used in reporting the study results–three, like the trial registration, did not.
I will outline in more detail when time permits. Those too impatient to wait can access the documents here.
**********
By David Tuller, DrPH
After the debacle with the Lightning Process study, you would think that BMJ would have learned an important lesson—editors and peer-reviewers should scrutinize the background materials for the trials they publish. That’s the best way to prevent selective outcome reporting and ensure that findings are reported as described in the trial registration and/or protocol.
To recap: In 2017, Archives of Disease in Childhood, a BMJ journal, published a study of the Lightning Process as a pediatric treatment for what the investigators called chronic fatigue syndrome. The Bristol University investigators recruited more than half the participants before trial registration and swapped outcome measures midway through. Then they failed to disclose these salient details in the published paper. Although these actions appeared to meet standard definitions of research misconduct and violated BMJ’s own strict guidelines, the journal decided last year to let the reported findings stand–albeit with a 3,000-word correction notice.
Now BMJ has published another paper that significantly diverges from the metrics outlined in the study documentation—in this case, the trial registration. The study, posted last month by BMJ Paediatrics Open, is called “Cognitive-behavioural therapy combined with music therapy for chronic fatigue following Epstein-Barr virus infection in adolescents: a feasibility study.”
From March 2015, until November 2016, the investigators recruited participants in the three counties of Oslo, Akershus and Buskerud. The senior author, Vegard Bruun Wyller, is a professor at the University of Oslo’s Institute of Clinical Medicine and an adherent of the cognitive behavior therapy/graded exercise therapy (CBT/GET) treatment paradigm.
Here’s the short take. Scores for the predesignated primary outcome—mean steps per day at three months—dropped in both the intervention and comparison groups. Not only that, participants in the intervention group performed more poorly, taking on average more than 1000 fewer steps per day. Moreover, two outcomes reported as demonstrating “tendencies” toward the positive—post-exertional malaise and recovery—were not even mentioned in the trial registration. How they came to be added as outcomes in the first place is not explained—an omission that undermines the credibility and integrity of the research.
(Nothing here should be construed as criticizing music therapy as a treatment modality. Our bodies evolved to respond to sound patterns in all sorts of ways that we don’t understand. The issue here is whether the research and the reported results are solid.)
**********
Is CBT plus music better than CBT alone?
The main goal of a feasibility study is to gather some preliminary data and decide whether it makes sense to conduct a full trial of whatever it is. (Of course, it’s important to remember that the Lightning Process study began as a feasibility trial.) Feasibility trials are by definition small. They aren’t designed to deliver authoritative and actionable findings about interventions. Oddly, the registration for the Norwegian trial does not appear to identify it as a feasibility study. Perhaps at some point the investigators decided to change gears; if so, it would be interesting to learn what prompted that shift.
The study’s premise, as I understand it, is something like this: CBT has been shown to work as a treatment for CF and CFS, but only modestly, and multidisciplinary approaches are a good thing, so why not add something extra like music and see if that amplifies the effects? This premise is obviously flawed. The investigators cite both the PACE trial and the Dutch FITNET study for the claim that CBT is effective. But they do not mention that these studies have been shown to feature serious flaws that undermine their claims. (I analyzed PACE here and FITNET here.) Nor do they mention that the CBT/GET approach has lost its status as the undisputed international standard-of-care—as manifested in the 2017 decision by the US Centers for Disease Control and Prevention to stop recommending it.
Beyond that issue, the investigators appear confused about whether they were studying adolescents with chronic fatigue or adolescents with chronic fatigue syndrome. The trial registration described the intervention as “mental training for CFS following EBV infection in adolescents.” The title of the published paper refers to chronic fatigue. In the paper, the investigators note that people with symptoms in addition to fatigue might meet case definitions of CFS, but they lump everyone together in the analysis.
They appear to believe that the two conditions are more or less the same, except for those extra symptoms. Or that they exist on a continuum. Or maybe that’s not what they believe. I actually found it hard to tell. Furthering the confusion, at the start of the study the participants averaged around 8,000 steps a day or more, while the paper cites previous research in which adolescents with CFS averaged much less–around 4500 steps a day. In other words, the chronically fatigued adolescents in this sample appear to have been much more physically robust than would often be expected if someone had chronic fatigue syndrome.
The intervention consisted of ten sessions of a “mental training programme merging elements from music therapy with elements from CBT,” with homework assignments. The first session included the adolescent, parents or guardians, both therapists, and a researcher. On top of that, “personal experiences were also shared by a young adult voluntary patient who had himself recovered from CFS.” Interesting. It is not usually considered appropriate in a clinical trial to tell participants that the intervention that are receiving will lead to recovery, since that could bias the results–especially on subjective measures.
Much of the onus for success seems to have been on the patients. “The treatment programme assumes active participation from the patient between the sessions, and the therapists tried to communicate the necessity of individual effort,” the study noted. It makes sense to suggest that patients who want to get better should seek to engage in efforts to get better. However, this framing can also provide investigators with an easy way to blame participants if the intervention fails to deliver the hoped-for benefits.
Out of 91 eligible study candidates who remained fatigued six months after acute Epstein-Barr virus infection, only 43—less than half—agreed to participate. Of those, 21 were randomized to the intervention group and 22 to a group receiving treatment as usual (TAU)—which essentially meant no treatment at all. The intervention group suffered major attrition, with six drop-outs by the three-month assessment period compared to just one for the TAU group. According to the investigators, adolescents who declined to participate in the research or who dropped out reported that they were concerned about missing school.
Ok, but so what if adolescents gave investigators that reason? It would likely be easier for many adolescents to tell investigators they were concerned about missing school than to say they thought the intervention wasn’t worth their time. In any event, more than half declined the offer and almost a third dropped out near the start. Those facts would seem to raise questions about the acceptability of the intervention and the feasibility of the trial. (On the other hand, those who did not drop out had high compliance with their scheduled therapy sessions—a favorable indicator.)
**********
A zealous over-interpretation of the data
Despite their preliminary nature, feasibility trials are still subject to basic scientific standards. And that means presenting the outcomes as promised in trial documents—unless investigators can provide excellent reasons for making changes. Adherence to this principle is critical for preventing selective reporting of results, an unfortunately popular violation of methodological principles.
In the current case, the trial registration listed one primary outcome: the mean number of steps per day at 12 weeks, assessed by an accelerometer worn for a week. After that, the investigators listed 41 secondary measures assessed at 12 and 64 weeks, including “symptoms (fatigue, pain, insomnia), cognitive function (executive functions) and markers of disease mechanisms (autonomic, endocrine, and immune responses).” Mean number of steps per day at 64 weeks was also a secondary measure. (In the paper, these times were rendered in months rather than weeks.)
Given the investigators’ expectation that the intervention would boost activity levels, the results for the primary outcome were disappointing. At three months, both groups showed measurable declines in their activity levels—and the score for participants in the intervention group was even lower than for those who received TAU. In other words, the treatment not only failed to increase activity levels, it actually led to worse outcomes.
Here’s the study’s abstract on the findings: “Endpoints included physical activity (steps/day), symptom scores, recovery rate…In intention-to-treat analyses, number of steps/day tended to decrease (difference=−1158, 95% CI −2642 to 325), whereas post-exertional malaise tended to improve (difference=−0.4, 95% CI −0.9 to 0.1) in the intervention group at 3 months. At 15 months’ follow-up, there was a trend towards higher recovery rate in the intervention group (62% vs 37%)”
And here’s the conclusion: “An intervention study of combined CBT and music therapy in postinfectious CF is feasible, and appears acceptable to the participants. The tendencies towards positive effects on patients’ symptoms and recovery might justify a full-scale clinical trial.”
This is a zealous over-interpretation, even with the hedging language (“tendencies toward,” “might justify”). In any study, the predesignated primary outcome is the most important metric and is highlighted as such by honest and transparent investigators. In this case, a strength of the primary outcome was that it was an objective measure, not a subjective assessment easily influenced by multiple kinds of bias. In comparing the intervention to TAU, the study found no benefits for its primary outcome. Moreover, both groups took fewer steps at three months than at the start, and the intervention group did worse.
These unfortunate findings cannot be airbrushed away. Yet the abstract seems written to create the impression that the physical activity measure is one of multiple endpoints of equal status. The abstract’s conclusion doesn’t even mention the unfortunate results for the primary outcome. This omission is unacceptable. (In the full text, the investigators appropriately mention that the mean number of steps per day was the primary outcome–but they ignore the implications of that inconvenient detail.)
Moreover, the trial registration did not mention PEM and recovery among the 41 secondary outcomes. These two items were apparently post-designated as outcomes–that is, at some point after trial registration but before production of the final draft of the paper. The paper does not explain the reason for introducing these new outcome measures. However, it is worth noting that the abstract largely rests its argument for the “positive” possibilities of the intervention on the PEM and recovery results while ignoring the disappointing results for the predesignated primary outcome.
Hm. This isn’t how scientific research is supposed to be reported. First-year epidemiology students at Berkeley know better than to pull an amateurish stunt like this.
********
A closer look at PEM and recovery
Now let’s look at the outcomes for which the investigators claimed a “tendency” toward positive effects–PEM and recovery.
Here’s what they wrote about how they tracked PEM: “The symptom of postexertional malaise, often considered a hallmark of CFS, was charted with one single item (‘How often do you experience more fatigue the day after an exertion?’).” That single question is a very crude way to measure PEM. Even so, both groups reported a reduction of this symptom, with minimal differences between the two.
This is not surprising. Participants were taking many less steps, so it is understandable they would report less PEM. Since the intervention was expected to increase activity levels, it seems questionable and unjustified in this context to interpret reduced PEM as an indication of potential success rather than as a marker of reduced activity levels.
In the paper, recovery was defined as a score of three or less on the Chalder Fatigue Scale, on which lower numbers represent less fatigue. A score of four or above on the scale was the threshold for trial entry. Since recovery as an outcome was not included in the trial registration, the acceptable way to present the results would have been as a reduction in reported fatigue without reference to recovery at all. Clearly, “recovery” sounds better than “reduced fatigue.”
The investigators could perhaps argue that the intervention was seeking to alleviate “chronic fatigue,” so it would be fair to consider a reduction to three on the fatigue scale as demonstrating recovery. In that case, they needed to make that point when they predesignated their outcomes—not at some unspecified later date when the choice could have been biased by emerging developments during the trial.
Moreover, it is important to note that the statistics provided for recovery–62% in the intervention group vs 37% in the TAU group–are from a per-protocol analysis, not the intention-to-treat analyses provided for the other outcomes. That is, the investigators have simply divided the number of people who met the recovery threshold by the number of participants remaining in that study arm, overlooking those who dropped out. In contrast, an intention-to-treat analysis takes into account the fact that some have dropped out of each arm and that their outcomes are unknown.
Intention-to-treat analyses are generally viewed as more conservative and a better reflection of real-world experience. The intention-to-treat analysis of the scores for the fatigue scale shows very little difference between the intervention and the TAU groups. Transforming the same scores into a recovery outcome and then providing a per-protocol analysis is a clever way of making the same findings look much better. It is fair to assume that the decision to conduct this recovery analysis was made after trial registration. (Also, Table #4, which includes the recovery data, seems to be missing information about one person in each of the study arms; the totals do not add up.)
And another thing…It is hard to understand a definition of recovery that ignores a trial’s primary outcome. This trial documented that people were taking fewer steps per day than before, not more—which undermines the argument for the effectiveness of the intervention. When people take fewer steps, it should not be surprising if they report less fatigue—or less PEM. It takes a certain kind of hubris to argue for a “tendency” toward recovery when patients who received the intervention performed worse on a predesignated–and objectively measured–primary outcome.
In summary, this published paper is deficient in multiple ways. BMJ Paediatrics Open should not have accepted it without insisting that the results be reported according to the predesignated measures in the trial registration. BMJ professes to maintain a rigid stance against selective outcome reporting—but its journals can’t seem to stop publishing papers that indulge in this unfortunate practice.
This post is already very long—and I haven’t even mentioned the study’s peer reviews and treatment manual, which make for interesting reading. Hopefully I’ll get to that soon.
Comments
7 responses to “More on that Norwegian CBT/Music Therapy Study”
I think this is a great example of how difficult it is to interpret the results of poorly designed studies.
A decline in steps per day seems to be a bad thing. However such a decline could plausibly explain the reduced fatigue and PEM in participants. The participants seem to be quite active despite having a fatiguing illness. Maybe they were too active and the therapy helped them pace themselves better?
Or maybe the therapy harmed patients and biased reporting affecting self-reported outcomes masked this, with steps per day being less affected?
Almost invariably, I am not aware of any counter examples, but readers do correct me if you are, research into behavioural and psychological interventions into the symptom of chronic fatigue (found both as a symptom of many distinct and diverse medical conditions and probably much more rarely as an idiopathic feature in total isolation) and into the condition of Chronic Fatigue Syndrome (of which chronic fatigue is only one of many symptoms which by itself is neither necessary nor sufficient in the most widely accepted definitions) displays serious methodological flaws, if not in the worst cases worrying research misconduct.
Is this failure to understand the basic principles of good experimental design, or a bizarre willingness to disregard them, widespread in researchers of behavioural/psychological interventions or just confined to the field of chronic fatigue and/or chronic fatigue syndrome? Are we seeing a general crisis in psychology, to misquote Brian Hughes, or a specific failure in a small group of researchers? Worryingly these are issues in experimental design that were widely discussed when I was an undergraduate over forty years ago, but seem, by this group of researchers at least, less understood now some four decades later.
Interestingly in the patient forums there are some ME/CFS patients who report they very rarely experience fatigue or post exertional malaise in their daily lives. They are not saying they are recovered, indeed they would regard themselves as having a serious ongoing disability, but believe they are successfully managing these symptoms by strict activity management, invariably involving significantly restricting what they do. Ironically such behaviour could be a better explanation of this study’s results than that given by the researchers. Providing the subjects with time to reflect on their symptoms and providing them with a model of using music as time out from physical activity are the researchers inadvertently doing the complete opposite of what they intended, encouraging participants to control their symptoms by reducing and/or pacing their activity.
Peter Trewhitt
I think one angle to consider is that medicine is deeply convinced that unexplained symptoms means the patient should be treated by psychiatrists and psychologists. This is how the healthcare system has been designed to operate.
The psychiatrists and psychologists are expected to fix patients or at least manage them. The pressure to deliver positive results may be too high to admit that what their studies are actually showing is that psychiatrists and pscyhologists have no special insight into medically unexplained symptoms and no effective treatment. The negative results of the studies are then distorted to conform to the way the system is designed to operate.
I think it would be very helpful if medicine admitted it is imperfect and that there are illnesses with unknown cause it doesn’t understand and cannot treat and that a better approach is needed.
It kind of got lost in this absurdity that Wyller admitted to blatant cherry-picking but ethical and methodological incompetence are so baked-in in BPS research that such practices are so expected they go unnoticed. When he mentions the bit about a trend towards “recovery” (whatever that means). This is literally how climate change deniers have worked, cherry-picking short-term trends to distract from much larger sustained patterns.
Not only should the paper have never been published, it is yet another cheap psychological experiment that should never have been approved as it had no chance of providing anything useful. It came weak out of the gate and slithered and blobbed its way to the finish line as a puddle of confusion and rank incompetence.
The whole thing would be laughable, if it wasn’t for the very real consequences of enabling a literal delusional fantasy into medical practice. Laughable in the Pennywise-the-clown-dragging-you-through-the-sewer-grate sense.
And the fact that yet another fatally weak experiment was funded, approved and published shows enormous systemic problems with peer review and publishing, clearly unable to fulfill their duties. It’s clear that peer review in this field is entirely superficial and detached from the actual substance, limited to style and trivial features. It’s a field in which 1 of 2 reviewers literally state not having read beyond the abstract while the other is a pal of the author and it is still considered valid peer review. Ridiculous.
This might explain the crisis of replicability, would justify a larger examination of this pervasive failure that is frankly revealing that the field of clinical psychology should take several steps back and recognize that the reliability of their knowledge base is essentially nil. This has to stop, it will fracture confidence in modern medicine and rightfully so. Especially in a time of freaking pandemic. Medicine has to be based on science and this here is yet another mockery of both the spirit and letter of the scientific method.
There is an easy way to get rid of the confusion caused by the use of the terms ‘chronic fatigue’ and ‘chronic fatigue syndrome’ – just get rid of the latter term altogether and use ‘ME’ instead.
COFFI*, anyone, while the music plays, with cake and cookies too? Now there’s an idea – they could always try cake + CBT next, that might aid recruitment. (*https://ammes.org/2018/02/09/the-international-collaborative-on-fatigue-following-infection-coffi/)
I am very curious how these CBT quackers (CBT psychiatrists) are dealing with post COVD-19 fatique syndrome (the new ME/CVS group) next years.