By David Tuller, DrPH
Trudie Chalder, a professor of cognitive behavior therapy (CBT) at King’s College London, has recently published yet another high-profile paper: the main results for “efficacy” from a trial of CBT for patients with so-called “persistent physical symptoms” (PPS) in secondary care. As usual with this group of investigators, things haven’t turned out well. But despite null results for the primary outcome, Professor Chalder and her like-minded colleagues have cast the findings in a positive light in their article, published in Psychological Medicine.
(Psychological Medicine also published the bogus 2013 “recovery” paper from the PACE team; Professor Chalder was one of the three lead investigators for this classic of likely research misconduct, in which participants could get worse on the primary outcomes and still be deemed. to be “recovered.” When I complained to the editors about it a few years ago, I was advised to replicate the PACE trial; instead, I wrote a letter to the journal that demanded an immediate retraction and garnered more than 100 signatories.)
The new study, called the PRINCE Secondary Trial, is separate from another trial of people with persistent physical symptoms in primary care—the PRINCE Primary Trial. Both are part of the ongoing campaign by Professor Chalder and her colleagues to provide an evidentiary base to justify the expansion of psychological services to anyone suffering from PPS, a category also frequently referred to as “medically unexplained symptoms” (MUS). For Professor Chalder and her colleagues, PPS and MUS include chronic fatigue syndrome, irritable bowel syndrome, fibromyalgia, and pretty much anything else that resists easy clinical assessment and diagnosis and could be inferred to be psychologically driven and/or perpetuated by experts predisposed toward such an interpretation.
First, let’s note that PRINCE Secondary is an unblinded trial relying on self-reported outcomes—a study design fraught with potential and actual bias. This is not just the opinion of people who dislike the PACE trial and believe that Psychological Medicine publishes a lot of crap research. As I’ve noted, the current editor-in-chief of the Journal of Psychosomatic Research, along with his two predecessors, published an editorial earlier this year in which they clearly indicated that subjective outcomes were subject to enormous bias in studies that were not rigorously blinded. (That hasn’t stopped the journal from continuing to publish such problematic research, such as recent output from Professor Chalder’s PACE colleague, Professor Peter White.)
On the basis of this laudable stance from the Journal of Psychosomatic Research, any positive findings from PRINCE Secondary would be suspect from the start. But that’s not even at issue here, given the null results for the primary outcome—the Work and Social Adjustment Scale (WSAS) at 52-weeks. (The invaluable blog CBT Watch has published a critique of the trial.)
PRINCE Secondary, according to the trial protocol, was “an RCT designed to evaluate the efficacy and cost-effectiveness of a transdiagnostic cognitive behavioural intervention for adults with PPS in secondary care.” The intervention—”therapist-delivered transdiagnostic” CBT, or TDT-CBT—was offered in addition to standard medical care (SMC). It was specifically developed to address the following concerns, as outlined in the published paper: “Patients with PPS can develop unhelpful cognitions and behaviour which can consequently lead to a reduction in daily functioning, reduced quality of life, and an increased susceptibility towards developing depression and anxiety.” The comparison arm received SMC alone.
The protocol touted the trial as a major initiative: “The PRINCE Secondary study will be the first trial worldwide to address the efficacy and cost-effectiveness of a manual-based, transdiagnostic, approach…If it proves to be efficacious, this treatment approach could significantly improve overall functioning in patients with PPS and may lead to substantial long-term economic benefits to the NHS.”
That last point is important. Professor Chalder and many of her colleagues have promoted the expansion of the National Health Service’s Improving Access to Psychological Therapies program to people with MUS. After this publication, it will be hard to cite PRINCE Secondary as proof that CBT for MUS is an “efficacious” treatment–on the contrary, the results undermine any such claim. But that probably won’t make Professor Chalder or any other members of the CBT ideological brigades question their own assertions about their favored interventions. [In research, “efficacy” and “efficacious” refer to how interventions perform in controlled studies like clinical trials; “effectiveness” and “effective” refer to how interventions perform in the real world.]
Let’s forget about “efficacy” and cite “helpfulness” instead
The protocol for the PRINCE Secondary Trial, published in BMC Pyschiatry, clearly stated: “Efficacy will be assessed by examining the difference between arms in the primary outcome Work and Social Adjustment Scale (WSAS) at 52 weeks after randomisation.” The trial itself notes that the WSAS “was chosen as the primary outcome because the focus of therapy was on targeting processes which might result in a reduction of the impact of symptoms.” In other words, the TDT-CBT was aimed specifically at influencing the cognitive and behavioral factors that were presumed to be preventing PPS patients from full engagement in their work and social lives.
Ok, then. After conducting their due diligence and assessing all the earlier studies and possible outcome measures in the process of developing an authoritative protocol, the investigators determined how they wanted everyone to definitively measure the impact of their intervention. The WSAS is a 40-point scale. The investigators calculated that a difference of 3.6 points or more on the scale would be considered clinically significant. That is, any change of less than 3.6 points would be of insignificant clinical benefit to an individual—it would be an essentially meaningless blip that would not translate into a noticeable improvement.
At 52 weeks, the mean WSAS score of those who received the intervention was only 1.48 points lower than those did not. (Lower WSAS scores represent improvement.) The p-value was 0.139—far from the 0.05 threshold needed to be considered statistically significant. So the WSAS 52-week findings were both clinically and statistically insignificant. Moreover, the entire confidence interval range (-3.44 to 0.48) fell below the designated 3.6-point threshold for clinical significance. These are pretty unequivocal results. They are definitely not useful to those seeking to promote the use of CBT as a treatment for PPS and MUS.
That’s why the conclusion of the paper’s abstract is so striking, and so bizarre. An abstract’s conclusion is what many people who scan a paper will likely remember the most. Here is the entire conclusion of this abstract: “We have preliminary evidence that TDT-CBT + SMC may be helpful for people with a range of PPS. However, further study is required to maximise or maintain effects seen at end of treatment.”
Anyone who takes the time to review the paper should be mystified by this conclusion. This full-scale trial was approved because a lot of earlier research, as outlined in the protocol, had produced ample “preliminary evidence” of the kind mentioned in the conclusion. The protocol, unless I misread it, did not propose to produce more “preliminary evidence” that the TDT-CBT intervention “may be helpful.” PRINCE Secondary was presented in the protocol and received funding based on the notion that it would produce hard data about “the efficacy and cost-effectiveness” of the intervention. (The Psychological Medicine paper did not include the “cost-effectiveness” data.)
It should be noted that “helpfulness” is not the same as “efficacy” and is not defined in the protocol or the trial itself. An intervention might be “helpful” in some way as a supportive strategy while having no “efficacy” as an actual treatment. In this trial, the method of assessing the “efficacy” of the treatment was clearly designated; the results did not achieve that metric, so the treatment cannot be described as “efficacious.” As a vague stand-in, “helpfulness” sounds positive but can mean more or less anything—as it seems to here.
In the paper, the investigators designate eight secondary outcomes. They tout marginal improvements in three of them as indicating possible “helpfulness.” But the results suggest at best the following: Giving people eight sessions of encouragement and attention could prompt them to upgrade their answers by one step or two on some—but not most—questionnaires, compared to those who do not receive eight weeks of such encouragement and attention. That’s it. Expansive interpretations of “helpfulness” are not justified.
Let’s examine these secondary results in a bit more detail. The first is the WSAS at 20 weeks, which reported a 2.41-point difference between the groups. This is still below the 3.6-point threshold for being a clinically significant difference. And as expected because of the bias inherent with self-reported outcomes in unblinded studies, even this minimal apparent effect was not maintained by the 52-week point. (The WSAS at 20 weeks was not in fact listed as a secondary outcome in the protocol.)
Other results are also unimpressive. Five of the eight listed secondary outcomes did not produce statistically significant findings. Two others did. The intervention group achieved a 1.51-point difference from the comparison group on the 30-point Patient Health Questionnnaire 15 and a 0.55-point difference on the 9-point Global Clinical Impression scale. These minimal reported improvements do not provide convincing evidence in favor of the intervention, since they are well within the range of responses that one might expect from the kind of bias noted by the editors of the Journal of Psychosomatic Research in an unblinded study with subjective outcomes.
The problem of multiple comparisons
And there’s another issue here. When authors engage in multiple comparisons, they increase the likelihood of obtaining some results that reach statistical significance by chance. To make up for that, it is common to correct or adjust the results with standard statistical steps—and the Bonferroni correction is the most well-known. Yet the PRINCE investigators don’t like that approach. It is too stringent for their needs, so they decided not to do it. Here’s what they say:
“Throughout this paper, we present unadjusted p values. Methods for adjusting the family-wise error by methods such as the Bonferroni correction are known to be conservative. However, if one were to use a method that controlled the false-discovery rate such as the Benjamini–Hochberg procedure then the differences on PHQ-15, WSAS at 20 weeks and CGI remained statistically significant and would therefore be considered as discoveries after correction for all nine outcomes (eight secondary plus primary outcome).“
My creative interpretation of this statement: Our findings are weak but they’re even weaker than they appear to be as reported. That’s why we didn’t bother to calculate and present p-values that took into account and corrected for how many tests we ran to try to find statistically significant results. Also, the standard method to correct for multiple tests in a study like this is really, really tough to get through, so we’re not going to use it. But we can assure you that a correction method we like better still lets us call these results “discoveries!” (We’re not presenting those corrected results, but trust us on this one–the study is a success.)
An abstract’s conclusion should at least make an effort to incorporate the findings for the primary outcome—the main results of interest. The conclusion from Chalder and colleagues should have forthrightly noted that the intervention was not efficacious. In this context, to prioritize exceedingly modest unadjusted results from a minority of secondary outcomes over the null results of the primary outcome is an insult to readers. This tactic for hiding bad news reeks of desperation—like a bald man’s comb-over.
Furthermore, it is disingenuous for investigators to assert that these meager data represent “preliminary evidence” for the intervention when the primary outcome was a bust. The call for further research to study how to maintain or extend these effects is unwarranted. The intervention failed to produce the predicted and desired effects. That’s the only credible interpretation of these disastrous results.
Peer reviewers and journal editors are supposed to act as safeguards against misrepresentations. In this case, the system failed. A conclusion that does not mention the null results for the primary outcome is unacceptable—and it should have been unpublishable.
7 responses to “Null Outcomes Presented as Success in Yet Another CBT Trial from Prof Trudie Chalder”
They are so incompetent that they were not able to torture the data into statistical significance, no matter how hard they pushed on the scale. What a bunch of losers!
This would be a bit entertaining in a gossipy way, except the mountains of psycho-magic “research” is having devastating effects well beyond the Wessely School’s ivory tower, as rational thinking and logic is tossed aside and replaced with magical thinking.
Even now there is a person with a masters degree telling me I need an energy healer. Adverse childhood events, toxins, and negative thoughts are keeping me ill. Or perhaps I am fearful of what life might be like if I were well.
Well I am made from fairly tough stuff, apparently, so I don’t take crazy talk personal, mostly. However I am very much alarmed that professional people have convinced themselves that magic exists, and they are selling it to others. This is a huge step backwards into the new dark ages.
I wonder if Sir Simon is proud of his accomplishments in the regression of Science…
Thanks for an excellent run-down on this paper. I wonder if ‘Cambridge Core’ is aware that ‘Psychological Medicine’ is publishing stuff like this.
Did this trial meet its required sample size? At 52 weeks it looks like there were only results for 115 patients in the CBT treatment arm? And it seems that the inclusion/exclusion criteria was changed to include gastroenterology clinics and IBS patients after the trial had started, so gaining 72 gastroenterology patients (36 per arm). A gastroenterology/IBS patient population wasn’t included in the original HRA approval (Version 1 – IRAS ID – 156145 -https://www.hra.nhs.uk/planning-and-improving-research/application-summaries/research-summaries/prince-secondary-version-1/) but perhaps the approval was updated later or that doesn’t matter? (I haven’t been able to find an update on the HRA site but I may not be looking in the right place.)
Another issue is the use of “minimum clinically important difference” (MCID) as an indicator of effect at group level. Although frequenly done it is not correct. As you stated, MCID it is the minimal change points that would be of significant clinical benefit to an individual. It is not appropriate to apply this individual treshold to determine clinical effecitveness at group level (see Dworkin 2009). It doesn’t make sense if one individual has a huge effect which nearly levels out the patients that had no effect and so conclude that the therapy is clinically effective at group level since average difference exceeds MCID.
I would push back to the beginning and question whether “Persistent physical Symptoms” actually constitutes a valid entity worthy of study. It is probably a wide variety of highly disparate conditions, many of which have been misdiagnosed by their doctors anyway
Thank you again David for your hard work .
I’m not an academic but l understood everything you wrote concerning this paper.
Yes indeed, the paper is an insult to the reader. and yet still it’s allowed.
Am l sad or angry that that they can do this ? It’s both , because my health , as someone with M.E, depends on the reporting of the results of these ‘trials’
Their changing the meaning of the wording ‘efficacious to helpfulness’ is still not, as my mother would have said….’going to wash’ !
and also as my mother would have said .’ they are trying to pull the wool over someone’s eyes ‘
My mother was very wise lol.
Thank you for this.
I find your comments on “medically unexplained symptoms “ (MUS) particularly current and relevant. Is MUS the new CFS/M.E. After becoming severely ill for the second time across a 40 year span and many years of acute suffering, I find nothing has changed bar terminology. MUS seems to be the new route to dumping us at the psych doors. My GP is now using this to pass the buck as science has sadly in my 40 years of illness failed to shed enough light on the actual cause of M.E. Let’s be honest, this is due to biased attitudes and gross underfunding into decent research. How many grants have found their way into Psychology departments?Instead of biomedical research, monies are channelled into what at times can only be described as ridiculous studies e.g. where music has been found to reduce the risk of children developing M.E.??? You can probably show that for a million and one diseases!!! It’s high time this misappropriation of resources was ended. It feels almost criminal at times.
Oh, just for the record, I’m in Psychology, I’ve had to give up my PhD research due to becoming ill again. During my “better-ish years” I went and studied psychology. One reason was to try and understand where these academics were coming from, was it really “all in my head” were my beliefs and conceptions grossly misplaced. I was right. Psychology is damaging a group of people who need to be treated biomedically. Statistics can be skewed to show lots of things, lots of damaging things! I shall now retreat to my bed as the sweat is lashing out of me, my head is about to explode and dizziness will become so severe that I may pass out. Please call me a Psychiatrist -I need their help (NOT!)
PPS: Psychological Persecution of the Sick.