By David Tuller, DrPH
Last fall, Professor Sir Simon Wessely and Professor Trudie Chalder were among several co-authors of a study published in the Journal of the Royal Society of Medicine. The study purported to prove that years of provision of cognitive behavior therapy (CBT) to patients with “chronic fatigue” and “chronic fatigue syndrome” proved that the intervention was a success. I previously pointed out myriad problem in this post last August.
The paper read as if it were meant to shore up support for CBT during the deliberations of the National Institute for Health and Care Excellence over the new ME/CFS guidelines. The draft guidance, which dropped the recommendation for CBT designed to cure the illness, was published in November, with a final version due in April. Because the paper made unwarranted causal statements and glossed over the huge number of drop-outs and questionnaire non-responders, among other methodological and ethical issues, Professor Brian Hughes and I co-wrote a response and submitted it formally to the journal two weeks ago through the online editorial manager.
As of this morning, the commentary was still awaiting administrative action, so Professor Hughes wrote directly to the journal editor. As he was in the process of posting the commentary to a pre-print site, he received a brief note from the editor stating the following: “Thank you for submitting your article to the JRSM. I read it with interest but I am afraid that I am unable to offer publication on this occasion. I am sorry to disappoint and wish you luck elsewhere.”
So I have posted it below. And here’s the version on the pre-print site. We will also send it to members of the NICE committee in case this flawed study gets raised in discussion by CBT proponents to support their position. It would be nice to find a journal willing to publish such a critique, but at least rejection no longer presents an obstacle to making something public.
**********
RESPONSE TO ADAMSON ET AL. (2020):
“COGNITIVE BEHAVIOURAL THERAPY FOR CHRONIC FATIGUE AND CHRONIC FATIGUE SYNDROME: OUTCOMES FROM A SPECIALIST CLINIC IN THE UK”
Brian M. Hughes
National University of Ireland, Galway
David Tuller
University of California, Berkeley
Abstract
In this review, we consider the paper by Adamson et al., published in the October 2020 issue of the Journal of the Royal Society of Medicine. The authors interpret their data as revealing significant improvements following cognitive behavioural therapy in a large sample of patients with chronic fatigue syndrome and chronic fatigue. Overall, the research is hampered by several fundamental methodological limitations that are not acknowledged sufficiently, or at all, by the authors. These include: (a) sampling ambiguity; (b) weak measurement; (c) survivor bias; (d) missing data; and (e) lack of a control group. In particular, the study is critically hampered by sample attrition, rendering the presentation of statements in the Abstract misleading with regard to points of fact, and, in our view, urgently requiring a formal published correction. In light of the fact that the paper was approved by multiple peer-reviewers and editors, we reflect on what its publication can teach us about the nature of contemporary scientific publication practices.
Response to Adamson et al. (2020): “Cognitive behavioural therapy for chronic fatigue and chronic fatigue syndrome: outcomes from a specialist clinic in the UK”
In the paper by Adamson et al.1, published in the October 2020 issue of the Journal of the Royal Society of Medicine, the authors interpret their data as revealing significant improvements following cognitive behavioural therapy (CBT) in a large sample of patients with chronic fatigue syndrome (CFS) and chronic fatigue (CF). In our view, their conclusions are misplaced and unwarranted. The paper and the research it describes are both problematic in several critical respects. For example, the Abstract – the section of the paper most likely to be read by clinicians – contains a crucial error in the way the data are described, and requires urgent correction.
In this review, we briefly survey the most compelling issues, and reflect on what the publication of such research implies about the nature and effectiveness of peer-review in clinical behavioural science.
Prelude: A conspicuous controversy overlooked
In explaining the rationale for the CFS-specific version of CBT used in their study, Adamson et al. write that the intervention is “based on a model which assumes that certain triggers such as a virus and/or stress trigger symptoms of fatigue. Subsequently symptoms are perpetuated inadvertently by unhelpful cognitive and behavioural responses” (p. 396). Treatment involves, among other elements, “addressing unhelpful beliefs which may be interfering with helpful changes” (p. 396).
This theory is essentially the one laid out more than thirty years ago, in a 1989 paper by a team that also included two of the current paper’s authors (namely, professors Wessely and Chalder).2 The main problem here is that, in light of recent re-evaluations of key research in the field as well as accelerating efforts to unlock underlying pathophysiological processes, this cognitive-behavioural theory of chronic fatigue syndrome is now very widely disputed.
The authors’ failure to acknowledge that their decades-old theory is currently embroiled in a highly contentious academic dispute represents an omission that verges on selective reporting of the research history. Since 1989, a considerable volume of empirical research has documented a wide range of organic dysfunctions in patients. In 2015, a report3 from the US Institute of Medicine (now the National Academy of Medicine) cited this body of research when describing the illness as “a serious, chronic, complex, and multisystem disease” (p. 209) and rebutting claims that it is psychiatric or psychological in nature. By not mentioning research that would appear to counter their narrative, Adamson et al. fail to anticipate – or to attempt to offset – some obvious criticisms that their CBT-focused approach is likely to attract.
Adamson et al. were similarly selective in their brief discussion of the literature on interventions. A thorough review of this research would almost certainly have culminated in, at best, a far more lukewarm conclusion regarding the potential utility of CBT as a treatment for CFS and related conditions. In late 2020 (admittedly, some months after the authors submitted their paper to the Journal of the Royal Society of Medicine), the UK’s National Institute for Health and Care Excellence (NICE) published its own assessment of this literature. Having scrutinised in detail studies covering 172 CBT outcomes – findings previously used to support claims that CBT is an effective treatment for CFS and related conditions – NICE classified all of the research as constituting evidence of either “low” or “very low” quality. Across the entire literature, as judged by NICE, not a single claim for CBT efficacy was supported by any evidence that exceeded the “low quality” threshold.4
However, the shortcomings of the new paper by Adamson et al. extend far beyond a questionable theoretical premise or a selective literature review. Overall, the research is hampered by several fundamental methodological limitations that are not acknowledged sufficiently, or at all, by the authors. These include: (a) sampling ambiguity; (b) weak measurement; (c) survivor bias; (d) missing data; and (e) lack of a control group. Given these issues, in our view, the findings reported by Adamson et al. are unreliable because they are very seriously inflated.
(a) Sampling ambiguity: Who were the participants?
The investigators seem confused about whether they are investigating patients with CF or patients with CFS. The title suggests the answer is both, but the paper itself generally refers to CFS throughout and to the participants as having met CFS criteria.
All 995 participants met the criteria outlined in the 2007 NICE guidance for what it called CFS/ME. These criteria require four months of fatigue. Yet, according to Adamson et al., only 76% met the Oxford case definition, which requires six months of fatigue and no other symptoms, and just 52% met the CDC criteria, which require six months of fatigue plus four of eight other symptoms. This raises a question as to whether 24% of the present sample had fatigue only for between four and six months. That seems hard to understand, given that participants were reported to have been ill for a mean duration of 6.64 years.
Nor is it clear if many or any of the included participants experienced post-exertional malaise, widely acknowledged as being a core symptom of the disease.5 Without more information, it is difficult to determine how many people in this study had CFS per se, as opposed to idiopathic CF or another illness in which fatigue was a symptom.
(b) Weak measurement: How valid were the outcomes?
The course of CBT described by Adamson et al. included up to 20 sessions on a twice-monthly basis. Patients completed several questionnaires at the start of treatment, at the fourth and seventh sessions, at discharge, and at three months after discharge. The measures included the SF-36 and the Chalder Fatigue Questionnaire (CFQ), along with more generic scales, such as those for work and social adjustment, depression and anxiety, and overall health.
It is important to note that all of these measures are subjective. The study included no objective indicators often used in research assessing outcomes in patients being treated for disabling conditions. For example, the authors report no data on improvements in physical endurance (such as a walk test), fitness (such as a step test), or occupational well-being (such as return-to-work rates or changes in disability-related benefit payments). We can note that in past intervention research with CFS, CBT-based therapies that were reported as having led to self-reported “improvements” were found to have had no effect whatsoever on either physical endurance, fitness, or socio-economic outcome.6
Moreover, as this study was not blinded, all participants knew that they were receiving anintervention that was designed to help them. It should therefore not be surprising that some people receiving such an intervention would self-report short-term ephemeral benefits, in line with their expectations. Without any objective outcome measures, the risk of confirmation bias in such a study design is extremely high.
(c) Survivor bias: How meaningful were the results?
For several reasons, the main results do not support the interpretation that treatment was effective. Scores on the SF-36 rose from a mean of 47.6 at baseline to 57.5 at discharge and 58.5 at three-month follow-up. In previous research on CFS, SF-36 scores of 65 or below have been used to identify serious disability and thus have been employed as inclusion criteria to determine whether participants are sick enough to be recruited for a treatment study. Notably, the present paper includes authors who have previously used the SF-36 in precisely this way.7 Therefore, these authors should be well aware that any treatment outcome in which SF-36 scores average 58.5 needs to be considered against the fact that patients overall remain seriously disabled despite undergoing therapy. CFQ scores at discharge and follow-up tell a similar story: while modestly improved from baseline, they nonetheless represent disablingly high levels of fatigue.
But even these results are likely to be misleading given the significant rate at which participants dropped out of treatment. Of the sample of 995 participants initially identified, some 31% were considered “lost-to-follow-up”—as defined by the investigators, that meant they provided no data either at the end of treatment or at the follow-up assessment three months later, despite providing some data at the earlier timepoints. Moreover, their attrition was non-random: those who were lost-to-follow-up had reported, at baseline, greater problems with depression, work and social adjustment, and physical function than those whose data were ultimately analysed.
Simply put, we have no idea what happened to almost a third of the participants, although we know that they were in relatively poor shape to begin with. Perhaps they were lost-to-follow-up because of further deteriorations in health, whether or not these were related to CBT, or perhaps because they just found CBT to be unhelpful.
The substantial attrition rate suggests an obvious problem of survivor bias. Any positive findings accruing from the whittled down dataset may simply be the inflationary result of a statistical artefact. Deep in the body of the text, the authors allude to this problem, stating that “there may have been some bias in the data, in that those who completed treatment may not represent all patients” (p. 401). This modest acknowledgement falls short of the appropriate scientific rigour. It would have been more accurate to have stated that “an unknown amount of bias in these data is inevitable, in that those who completed treatment will not represent all patients.”
Adamson et al. make no mention of participants being lost-to-follow-up when summarising the findings in their Abstract. Instead, they state that data were available “for 995 patients” before then stating that “85% of patients” self-reported improvement after therapy. Their construction is extremely misleading. A crucial caveat – that the “85%” applies only to a non-random subset of participants who were not lost-to-follow-up – is omitted. The resulting abstract presents an opaque sequence of points that serves to greatly inflate the findings, and which constitutes factual error.
Also in the Abstract, the authors highlight that “90%” of patients “were satisfied with their treatment.” Presumably, again, that impressive-looking figure does not include responses from the 31% who were lost-to-follow-up. As the denominator is unknown, this high approval rate is difficult even to understand.
(d) Missing data: What should we make of non-responses?
In addition to those participants who were classified as lost-to-follow-up, a further problem arises from the fact that, for several key variables, large numbers of the participants who remained did not complete the required questionnaires or return relevant data. Only 581 participants (58% of the initial sample of 995) completed the CFQ at the end of treatment and only 503 (51%) did so at follow-up. Only 441 participants (44%) completed the SF-36 at discharge, and just 404 (41%) did so at follow-up. Despite this, both CFQ and SF-36 scores are used as measures of treatment outcomes. Once again, when citing these results in their Abstract, the authors do not mention that data were missing for up to six out of every ten of the 995 patients recruited as study participants.
In short, it is misleading for the authors to have set out positive findings without revealing that, for key outcome variables, conclusions were drawn from a substantially depleted dataset. This is especially true when participant disengagement en masse will almost certainly be suggestive of widespread treatment failure.
(e) Lack of a control group: What was the point?
It is an elementary principle of good study design that causality cannot be established without reference to a control group or control condition. The present study did not include a control group or control condition. Therefore, the study data cannot be used to support inferences about causality.
Nonetheless, in their discussion section, Adamson et al. write the following: “The cognitive behavioural therapy intervention led to significant improvements in patients’ self-reported fatigue, physical functioning and social adjustment” (p. 400; our emphasis). This is a straightforward statement of causality and so is clearly unwarranted; in the absence of a control group, any such inference is unsound.
When further discussing their conclusions, the authors then state: “the lack of a control condition limits us from drawing any causal inferences, as we cannot be certain that the improvements seen are due to cognitive behavioural therapy alone and not any other extraneous variables” (p. 401). Despite its implication of rigour, this statement includes another assertion of causality. Moreover, it is self-contradictory. To state that improvements might not be “due to CBT alone” is to posit, as fact, that they are due to CBT at least in part but that other factors might have contributed. In one sentence, therefore, the authors draw a causal inference while denying the possibility of being able to do just that given their study design.
The paper by Adamson et al. does not present evidence that CBT “led to” anything. The authors have provided a partial dataset suggesting that some of their participants self-reported modest increases in subjective assessments of well-being (while nonetheless remaining within a range of scores that indicate severe debilitation). These changes in scores might well have happened whether or not CBT had been administered.
Conclusion: Therapeutic loyalties and the challenge of scientific reviewing
In our view, the shortcomings in the paper by Adamson et al. are as obvious as they are inherent. In that regard, given that it was approved by multiple peer-reviewers and editors, we feel we must reflect on what its publication can teach us about the nature of contemporary scientific publication practices.
Even the most objective scientists will be strained by the problems of confirmation bias, especially if they have invested their professional reputations in a particular approach to therapy. This problem, sometime referred to as “therapeutic allegiance”, has been shown to present a statistically significant source of investigator bias in psychotherapy trials. In short, researchers are inclined to report larger effect sizes in studies of therapies about which they hold strong professional beliefs.8 It is undoubtedly the case that the authors of the Adamson et al. paper include some who have published widely on the use of CBT for CFS and related conditions, and who have advocated for its use over several decades. Based on the findings of numerous empirical studies of this class of bias, we join with other scholars who have called for therapeutic allegiance to be recognised as an important risk to research integrity, the details of which should always be publicly acknowledged, akin to those of any other conflict of interest.
We understand that editors and peer-reviewers vary in their theoretical backgrounds, areas of technical expertise, and editorial philosophies. We also understand the increasing difficulties faced by editors in sourcing peer-reviewers who are willing and able to fulfil this important role. The peer-review system relies significantly on volunteer effort, and this feature alone is one that affords it considerable esteem. The system, and those who participate in it, are deserving of our respect.
The Journal of the Royal Society of Medicine operates a policy of publishing the names of peer-reviewers for every article. Presumably, this is intended to support transparency, by allowing readers to evaluate for themselves the nature and scope of peer review in each given case. In that regard, we encourage readers to consider the named peer-reviewers of the paper by Adamson et al., and to reflect on the reviewers’ disciplinary and research backgrounds. We urge readers to make up their own minds whether the peer reviewers in this case had detailed experience with quantitative research or specialist knowledge in associated methodological issues (such as familiarity with the particular pitfalls arising from subtle demand characteristics that so often undermine research validity in behavioural medicine contexts).
Likewise, in intertwined professions such as academia and medicine, it can be extremely difficult to ensure or maintain author anonymity during the review process, or to completely obliterate the risk that conflicts of interest might skew the evaluations of reviewers.9 Every effort must be made, at all times, to avoid even the appearance of such a conflict. The Journal of the Royal Society of Medicine maintains editorial separation from the Royal Society of Medicine, a laudable principle of which it can be proud. That said, the fact that the Adamson et al. paper lists among its authors the outgoing president of the Royal Society of Medicine will likely confuse those readers who are unfamiliar with, or distrusting of, such safeguards. It is always preferable that, where possible, research authors submit their manuscripts to journals with which they have no association, even if tenuous. In our view, given widespread interest in the subject matter of the paper by Adamson et al., many such alternative options were available to these authors.
The Abstract of the Adamson et al. paper contains erroneous statements about the nature of its dataset, which inflate its findings in a highly misleading fashion. As published, the Abstract implies that at least three separate statistical findings were based on a sample of 995 cases. In reality, because of both significant participant drop-out and widespread missing data, the study sample was actually hundreds of cases smaller in each instance. For example, stating that “Data were available for 995 patients” and then that “85% of patients self-reported that they felt an improvement” is factually incorrect, because around a third of the 995 patients had dropped out of the study and up to half of the self-report data was missing. Sample attrition is a fatal shortcoming in any treatment study; the withdrawal of large numbers of cases distorts findings because statistical analyses end up disproportionately focussing on those patients for whom therapy was most beneficial. For it to have occurred on such a scale without being mentioned creates significant problems, and renders the presentation of statements in the Abstract critically misleading with regard to points of fact. In our view, the error is substantive and requires the publication of a formal correction to Adamson et al.’s Abstract.
We agree with Adamson et al. that clinics should routinely assess treatment outcomes and report on change in naturalistic settings. We also very much agree with them that future studies should aim to employ better research methodologies. In our view, all such research should meet robust and rigorous standards of reliability and validity, and should be evaluated against those standards. As such, we feel that the recently published paper by Adamson et al. is especially problematic: its methodology is hampered by a litany of grave shortcomings, and its conclusions presented with a level of confidence far exceeding what is warranted by its theoretical premise, dataset, or study design.
Declaration of Competing Interests: The Authors declare that there is no conflict of interest.
Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. DT is a senior fellow in public health and journalism at the Center for Global Public Health at the University of California, Berkeley; members of the ME/CFS patient and advocacy community have donated to crowdfunding campaigns in support of DT’s position at Berkeley.
Contributorship: Both authors contributed to the writing and editing of this paper, and agreed on the final version.
References
- Adamson J, Ali S, Santhouse A, Wessely S, Chalder T. Cognitive behavioural therapy for chronic fatigue and chronic fatigue syndrome: Outcomes from a specialist clinic in the UK. J R Soc Med 2020; 113; 394–402. DOI: 10.1177/0141076820951545.
- Wessely S, David A, Butler S, Chalder T. Management of chronic (post-viral) fatigue syndrome. J R Coll Gen Pract 1989; 39; 26–29.
- Institute of Medicine. Beyond Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: Redefining and illness. National Academies Press. 2015.
- NICE. Myalgic encephalomyelitis (or encephalopathy)/chronic fatigue syndrome: Diagnosis and management – [G] Evidence reviews for the non-pharmacological management of ME/CFS. See https://www.nice.org.uk/guidance/gid-ng10091/documents/evidence-review-7 (last checked 26 January 2021).
- CDC. Myalgic encephalomyelitis/chronic fatigue syndrome: Symptoms—Primary symptoms. See https://www.cdc.gov/me-cfs/symptoms-diagnosis/symptoms.html (last checked 26 January 2021).
- Stouten B. PACE-GATE: An alternative view on a study with a poor trial protocol. J Health Psychol 2017; 22, 1192–1197. DOI: 10.1177/1359105317707531.
- White PD, Sharpe MC, Chalder T, et al. Protocol for the PACE trial: A randomised controlled trial of adaptive pacing, cognitive behaviour therapy, and graded exercise as supplements to standardised specialist medical care versus standardised specialist medical care alone for patients with the chronic fatigue syndrome/myalgic encephalomyelitis or encephalopathy. BMC Neurol 2007; 7; 6. DOI: 10.1186/1471-2377-7-6.
- Dragioti E, Dimoliatis I, Fountoulakis K N, Evangelou E. A systematic appraisal of allegiance effect in randomized controlled trials of psychotherapy. Arch Gen Psychiatry 2015; 14; 25. DOI: 10.1186/s12991-015-0063-1.
- Hughes BM. Psychology in Crisis. Palgrave. 2016.
Comments
8 responses to “Hughes-Tuller Comment on Wessely-Chalder CBT Study Rejected by Journal, Posted Here”
And yet the medical establishment wonders why large chunks of the UK public mistrust their so-called ‘medical experts’? Perhaps they’ve just got wise to their conjuring tricks?
It looks as though the editor couldn’t even scrape together an excuse for why the journal wouldn’t publish this excellent article.
What’s happened to UK medical science? Is this what we can look forward to in the Post-Brexit era – ‘Look here, don’t look over there’ misdirection the order of the day, with the secrets closely guarded by their own magic circle and scientific debate shut down?
Many thanks to Hughes and Tuller for enlightening us as to what is really going on with the Adamson et al study. Now who do they remind me of?
I mostly agree with what Dr. Tuller is saying here.
But – in the conflicts of interest section, I would have expected to see a declaration that Dr Tuller is in part at least being funded by donations from CFS patients.
Oh sorry – you did declare the funding source in the next section. Still, I think there’s a case for being up front that the funding source is technically a conflict of interest.
========
A slightly tangential thought.
Theo’s a conference where I am on the program committee.As is usual for these types of event, the committee is mostly made up of well-known professors in the field. These people mostly have lots of grad students, who – of course – submit their research to the conference.
Now when submitting a paper to such a conference, it is clear that on the submission form (but not necessarily in the paper itself), you put “Professor X is my adviser” and Professor X doesn’t get to peer review your paper, or and in fact Ins’t allowed to hear the internal discussion about whether to accept or reject it.
Still, a slight worry does get articulated: will it look bad that the authors of accepted papers are mostly our own grad students. Some sort of stricter conflict of interest rule has been mooted. On the other hand, the very nature of peer review means that it is likely that your PhD advisor will be on the editorial board of the venue to which you are submitting your paper,,,
How many times has Sir Simon “retired” from “CFS” research? One wonders, do they throw a retirement party each time he “retires”?
Sometimes I think that paying the psychobabblers to go away and shut up might well be worth the money. On the other hand, their continuing promotion of harmful treatments won’t help their case when they are finally held to account in a court of law. Their own words are the best evidence against them at the moment.
Are there still efforts underway to obtain the rest of the PACE trial data? I remain convinced that a close examination of the raw data will reveal massive harms to study participants.
If the harms data sees the light of day, there will be more than enough evidence to drive a stake through the heart of Sir Simon’s little club. The Regius Professor might even qualify for free retirement at the Greybar Hotel.
In the meantime more patients are being harmed every day that Sir Simon and gang remain free to promote blatant discrimination and abuse against some of the sickest patients on the planet.
Yes, I wasn’t sure if the funding should be listed in the CoI section. But since they’re right next to each other, it seemed appropriate to put it in the funding section. But I can see where someone could point and say, hey, he didn’t declare his potential conflicts.
The purpose of the funding section is to identify conflicts of interest. I would regard the other section as a catchall, covering conflicts — such as board memberships — that are not otherwise disclosed. On the other hand, we should strive to err on the side of disclosure.
Thank you, thank you, thank you.
Your point (a) about sampling ambiguity is, I think, the most subtle and controversial. (The other points are things that are Obviously Wrong with the Adamson et al study).
As I understand, there’s two parts to point (a):
1) The data don’t seem ton”add up”, raising doubts about whether the population that was sampled was really as described in the paper. For reasons of scientific reproducibility, it’s clearly important to be clear about how the sample was chosen. Notoriously difficult in psychology, of course: human beings might vary in experimentally uncontrolled ways that affect the outcome.
2) There’s what seems to me to be a disagreement about what hypothesis/population *should* be tested. Adamson et al appear to be taking a wide definition of fatigue, and that’s the population/hypothesis they’re testing.
On the other hand, it’s a plausible alternative hypothesis (not tested by Adamson et al) that the subpopulation with more narrowly defined symptoms respond differently to the treatment (e.g. don’t get better). It is statistically possible for a treatment to be beneficial on average but ineffective/harmful for an identifiable subpopulation. It’s less clear to me that it is a strong objection (in terms of academic publishability) to the Adamson paper that they’re testing what is arguably the wrong hypothesis. In practical terms, it is an objection: the hypothesis that CBT is ineffective for the subpopulation with post exertional malaise etc. is just not tested by this experiment, so might be true despite their results. This is the kind of thing where referees typically ask for more discussion of the issues in the conclusions section…