New Paper Seeks to Reframe Poor Findings in CODES Trial of CBT for Non-Epileptic Seizures

By David Tuller, DrPH

The CODES trial investigated cognitive behavior therapy (CBT) as a treatment for dissociative seizures (DS), a sub-category of what is now called functional neurological disorder (FND). The intervention was a course of CBT specifically designed to address the variety of factors presumed to be triggering the seizures. (I have previously critiqued CODES here, here, and here,)

However, the trial was a bust, with null findings for the self-reported primary outcome–seizure reduction 12 months after randomization. In fact, in what must have been a major embarrassment for the investigators, the group that did not receive the intervention reported a greater reduction of seizures than the group that did, although this difference was not statistically significant.

(In the past, seizures not believed to have been caused by abnormal electrical signals have generally been called “psychogenic non-epileptic seizures.” The new term is meant to be less insulting; patients often resent being told their conditions are psychologically driven.)

Since these null CODES findings were published in 2020, FND experts have tried to reframe them by, among other strategies, suggesting that seizure reduction wasn’t the most appropriate or relevant primary outcome after all. The investigators themselves raised this notion in the initial paper reporting the CODES results. In an accompanying commentary, a colleague of the investigators promoted a similar notion, suggesting that quality-of-life measures were perhaps a better primary outcome than seizure reduction.

Now they’re at it again. In a new paper called “Reflections on the CODES trial for adults with dissociative seizures: what we found and considerations for future studies,” published this month by BMJ Neurology Open, the key CODES investigators present additional analyses of the trial data and try to argue that the results weren’t really so bad. The paper includes the following sentence: “Overall, inspection of our data does not support others’ suggestions that our treatment did not sustainably reduce DS frequency.”

This is a bizarre remark. It obviously implies that the data from CODES support the idea that the specialized treatment did, in fact, “sustainably reduce DS frequency.” However, CODES provided no evidence that the treatment did any such thing. The primary goal of the trial was not even to investigate whether participants in the intervention arm had reduced seizure frequency but whether the intervention showed benefit—that is, whether those who received the intervention did better than those who did not. And that didn’t happen.

In CODES, both arms experienced some seizure reduction, but the intervention did not provide any advantages in that regard. The reduction in seizure frequency cannot be attributed to the intervention, even if the investigators now appear to be claiming otherwise.

(I could be misinterpreting the above sentence, but I don’t think so. I think the investigators truly believe, notwithstanding the evidence, that their trial documented some impact from the intervention.)

With 368 participants, CODES was the largest clinical trial to date of a treatment for FND. The senior author was the factually and mathematically challenged Trudie Chalder, a professor of cognitive behavior therapy at King’s College London (KCL). The press release from KCL scammed the public by burying the disastrous findings for the primary outcome and instead touting the trial as a big success—a claim based on some subjective secondary outcomes with modestly positive findings that really mean nothing at all.

The new paper explains that, in the CODES model, “DS are maintained by a vicious circle of behavioural, cognitive, affective, physiological and social factors of which fear and avoidance are particularly salient.” This framework, the paper notes, “lends itself to the application of CBT interventions, particularly graded exposure to feared (avoided) situations and seizure interruption and control techniques.”

In the new paper, the investigators provide, perhaps inadvertently, a clue into why CODES was destined to be a failure. As they explain, seizure reduction six months after the end of treatment was the primary outcome in a pilot study of CBT for DS, published in 2010: “In the pilot RCT [randomized controlled trial], 6 months after treatment there was an observed post-randomisation difference in favour of the DS-CBT group, but it could not be shown to be statistically significant.”

Exactly–the pilot study had null results. And yet the investigators were able to convince funders that the evidence warranted a test of the intervention in a full-scale trial. Is there something wrong with this picture? Why is anyone surprised that the full trial also had null results for seizure reduction at follow-up?

In the new paper, the investigators again present creative reasons to re-interpret the null findings from CODES. They note that the CODES comparison arm provided more than the standard care patients would have received outside the trial context. The participants in the comparison arm received some of the explanatory information and coping guidance that was available to those in the intervention arm, even though they did not receive the intervention’s active CBT component. From the perspective of the investigators, then, the null results for the primary outcome seem to mean that both arms benefited from the approach embodied by the intervention—not that the intervention was ineffective.

(I think I’m understanding their point, although I can’t be sure.)


Primary and secondary outcomes

The new paper includes a lengthy discussion of the choice of primary outcome. The investigators first mention that funders required it—even though they themselves have a long history of defending seizure reduction as the primary outcome. In the pilot study, the investigators explicitly rejected the idea that other metrics might be more suitable. Presumably they took that step after careful consideration of other possibilities.

Here’s what they wrote in the pilot:

“Our CBT approach is predicated on the assumption that PNES represent dissociative responses to arousal, occurring when the person is faced with fearful or intolerable circumstances. Our treatment model emphasizes seizure reduction techniques especially in the early treatment sessions. While the usefulness of seizure remission as an outcome measure has been questioned, seizures are the reason for patients’ referral for treatment.”

That reasoning still makes sense. Since the investigators specifically designed the intervention to achieve seizure reduction based on their hypothetical understanding of the etiology disorder, it is not immediately clear why seizure reduction should not be the primary outcome. If they are now abandoning this metric as not so important after all, are they also questioning the biopsychosocial theories that informed the creation of the intervention? If not, why not?

Failure of an intervention should lead smart investigators to question their assumptions—but that doesn’t seem to have happened with CODES. The investigators still seem to believe the trial should be viewed as a success, making much of the fact that nine of their 16 secondary measures had findings that were statistically significant. But let’s be clear: This was an unblinded study relying on self-reported (or, in one case, physician-reported) outcomes—a trial design subject to an enormous amount of possible bias. It would be unexpected for the intervention group not to report modestly better outcomes from bias alone.

(The primary outcome and three of the secondary outcomes involved patients’ reports of the number of seizures. The self-reporting of seizures has the appearance as well as some aspects of objectivity, but it is still subjective and potentially influenced by bias.)

My colleague Philip Stark, a professor of statistics at the UC Berkeley, made the following assessment of CODES:

“The trial did not support the primary clinical outcome, only secondary outcomes that involve subjective ratings by the subjects and their physicians, who knew their treatment status. This is a situation in which the placebo effect is especially likely to be confounded with treatment efficacy. The design of the trial evidently made no attempt to reduce confounding from the placebo effect. As a result, it is not clear whether CBT per se is responsible for any of the observed improvements in secondary outcomes.”

I highlighted Professor Stark’s assessment in a 2020 post, which also included my own observations about the secondary outcomes. Here’s the relevant passage:

“The investigators included 16 secondary outcomes in the study, measured either through questionnaires or the seizure diaries, and reported statistically significant findings for nine of them: seizure bothersomeness, longest period of seizure-free days in the last six months, health-related quality of life, psychological distress, work and social adjustment, number of somatic symptoms, self-rated overall improvement, clinician-rated overall improvement, and satisfaction with treatment. Although many of these findings were modest, the array appeared impressive.

Yet the seven outcomes that failed to achieve statistically significant effects also constituted an impressive array: seizure severity, freedom from seizures in the last three months, reduction in seizure frequency of more than 50% relative to baseline, anxiety, depression, and both mental and physical scales on a different instrument assessing health-related quality of life than the one that yielded positive results.

So parsing these findings, CBT participants reported that the seizures were less bothersome than in the SMC group, but not less severe. They reported benefits on one health-related quality-of-life instrument, but not on two separate scales on another health-related quality-of-life instrument. They reported less psychological distress, but not less anxiety and depression. When viewed from that perspective, the results seem somewhat arbitrary, with findings perhaps dependent on how a particular instrument framed this or that construct.

“If investigators throw 16 packages of spaghetti at the wall, some of them are likely to stick. The greater the number of secondary outcomes included in a study, the more likely it is that one or more will generate positive results, if only by chance. Given that, it would make sense for investigators to throw as many packages of spaghetti at the wall as feasible, unless they have to pay a statistical penalty for having boosted their odds of apparent success.

The standard statistical penalty involves accounting for the expanded number of outcomes with a procedure called correcting (or adjusting) for multiple comparisons (or analyses). In such circumstances, statistical formulae can be used to tighten the criteria for what should be considered statistically significant results–that is, results that are very unlikely to have occurred by chance.

The CODES protocol made no mention of correcting for this large number of analyses, or comparisons. The CODES statistical analysis plan included the following, under the heading of “method for handling multiple comparisons: ‘There is only a single primary outcome, and no formal adjustment of p values for multiple testing will be applied. However, care should be taken when interpreting the numerous secondary outcomes.

In other words, the investigators decided not to perform a routine statistical test despite their broad range of secondary outcomes. It is fair to call this a questionable choice, or at least one that departs from the approach advocated by many trial design experts and statisticians, such as Professor Stark, my Berkeley colleague. A self-admonition to take care “when interpreting the numerous secondary outcomes” is not an appropriate substitute for an acceptable statistical strategy to address the potpourri of included measures.

Despite this lapse, it appears that someone–perhaps a peer-reviewer?–questioned the decision to completely omit this statistical step. A paragraph buried deep in the paper mentions the results after correcting for multiple comparisons, with no further comment on the implications. Of the nine secondary outcomes initially found to be statistically significant, only five survived this more stringent analysis: longest period of seizure-free days in the last six months, work and social adjustment, self-rated overall improvement, clinician-rated overall improvement, and treatment satisfaction.

Let’s be clear: These are pretty meager findings, especially since they are self-reported measures in an open-label trial. For example, it is understandable and even expected that those who received CBT would report more “treatment satisfaction” than those who did not receive it. It is also understandable that a participant who received a treatment and the clinician who treated that participant would be more likely to rate the participant’s health as improved than when compared to the SMC group. And a course of CBT could well help individuals with medical problems adjust to their troubling condition in work and social situations.

“None of this means that the core condition itself has been treated–especially since those who did not receive CBT had better results for the primary outcome of seizure reduction at 12 months.”


Even as they present all of their additional analyses, the CODES investigators are ignoring the admonition they themselves included in their protocol—that “care should be taken when interpreting the numerous secondary outcomes.” As this latest paper shows, they have taken zero such care. The new paper doesn’t even mention that only five of the secondary outcomes were statistically significant after adjustment for multiple comparisons—a telling omission. The whole thing reads like a desperate attempt to portray their intervention as having had some meaningful effect. The CODES data tell a different story.