By David Tuller, DrPH
Members of the CBT/GET ideological brigades produce a gusher of dreck, and I don’t bother commenting on most of their work. Life’s too short.
So it can be easy to lose sight of how flawed and truly awful each individual paper can be. But even among this flood of scientifically deficient research, a recent paper in the journal Occupational Medicine distinguishes itself. I’ve blogged about it here and here. Dr Mark Vink and Dr Keith Geraghty tweeted about it, respectively, here and here.
The article’s elementary errors about its own core findings reflect a startling degree of incompetence. I had to read the paper a batch of times to convince myself that corresponding author Professor Trudie Chalder and her colleagues had mangled basic statistics so badly. It seems very unlikely that anyone would directly contradict the data in their own tables and make such self-evident misstatements intentionally. It’s not as if these specific errors enhanced the attractiveness of the reported findings.
It was also hard to understand why no one involved in producing and publishing this mess noticed that something was amiss.
Methodological and design lapses in papers, however, can also arise from the desire and intention to obscure problematic or unappealing results. This Occupational Medicine paper from Professor Chalder and her colleagues contained such defects as well. It should never have been submitted in this form, nor should it have been accepted for publication.
Professor Brian Hughes, a psychologist at the National University of Ireland, Galway, and I have written to Occupational Medicine outlining our major concerns and calling for the paper to be retracted. You can read our letter below or on a pre-print server here.
**********
Dear Editor,
Occupational Medicine recently published a paper from Stevelink et al. (2021) called “Chronic fatigue syndrome and occupational status: a retrospective longitudinal study.” Unfortunately, the paper features major technical and methodological errors that warrant urgent editorial attention.
To recap: The study started with 508 participants and had follow-up data for 316 of them. The primary outcome was occupational status. Many participants had dropped out by follow-up— only 316, or 62%, provided follow-up data. Of those 316, 88%who reported no change in employment status. As a group, the participants experienced either no changes or only insignificant ones in a range of secondary outcomes, including fatigue and physical function. The poor follow-up scores on fatigue and physical function alone indicate that the group remained, collectively, severely disabled after treatment
In several sections of the paper, the authors’ description of their own statistical findings is incorrect. They make a recurring elementary error in their presentation of percentages. The authors repeatedly use the construction “X% of patients who did Y at baseline” when they should have used the construction “X% of all 316 patients (i.e., those who provided follow-up data)”. This recurring error involving the core findings undermines the merit and integrity of the entire paper.
For example, in the Abstract, the authors state that “53% of patients who were working [at baseline] remained in employment [at follow-up].” This is not accurate. Their own data (Table 2) show that 185 patients (i.e., 167 + 18) were working at baseline, and that 167 patients were working at both time points. In other words, the proportion working continuously was in fact 90% (i.e., 167 out of 185). The “53%” that the authors refer to is the percentage of the sample who were employed at both time points (i.e., 167 out of 316), which is an entirely different subset. They have either misunderstood the percentage they were writing about, or they have misstated their own finding by linking it to the wrong percentage.
This error is carried over into the section on “Key Learning Lessons”, where the authors state that “Over half of the patients who were working at baseline were able to remain in work over the follow-up period…” While 90% is certainly “over half”, it seems clear that this phrasing is again incorrectly referring to the 53% subset.
The same error is made with the other key findings. For example, the Abstract states that “Of the patients who were not working at baseline, 9% had returned to work at follow-up”. But as above, this is incorrect. A total of 131 patients (i.e., 104 + 27) were recorded as “not employed” at baseline and 27 were recorded as not working at baseline but as working at follow-up. This is 21%, not 9%. Once again, the authors appear to misunderstand their own findings. The “9%” they refer to is a percentage of the sample of 316; it is not, as they have it, a percentage of that subset of the sample who were initially unemployed. This erroneous “9%” conclusion appears as well in the ”Key Learning Lessons” and in the Discussion.
And again, the authors state in the Abstract that “of those working at baseline, 6% were unable to continue to work at follow-up”, a claim they repeat in the section on “Key Learning Lessons” and in the Discussion. This statement too is wrong. Once more, the authors mistakenly interpret a percentage of the sample of 316 as if it were a percentage of a targeted subset. In this case, they think they are referring to a percentage of patients working at baseline, but they are actually referring to a percentage of the full group that provided follow-up data.
The authors present the raw frequency data in Table 2. Readers can see for themselves how their sample of 316 patients is cross-tabulated into four subsets of interest (i.e., “working at baseline and follow-up”; “not working at baseline and follow-up”; “dropped out of work at follow-up”; “returned to work at follow-up”). From Table 2, it is clear that the prose provided in the body of the paper is at odds with the actual data.
It is undeniable that the text of this paper is replete with elementary technical errors, as described. Inevitably, the narrative is distorted by the authors’ failure to understand and correctly explain their own findings. It is unclear to us how these basic and self-evident errors were not picked up during peer-review. Although we don’t know the identities of the peer-reviewers, we speculate that groupthink and confirmation bias will have played their part. After all, it is generally reasonable for peer-reviewers to presume that authors have understood their own computations.
There are several other features of this paper that cause concern. These include the following:
• The authors state that they evaluated participants using guidance from the UK’s National Institute for Health and Care Excellence (NICE). (Presumably they are referring to the 2007 NICE guidance, not the revision published in October 2021.) But the reference for this statement is a 1991 paper that outlines the so-called “Oxford criteria”, a case definition that differs significantly from the 2007 NICE guidance. Moreover, in a paper about the same participant cohort previously published by Occupational Medicine — “Factors associated with work status in chronic fatigue syndrome”– the authors state explicitly that these patients were diagnosed using the Oxford criteria. This inconsistency is non-trivial, because the differences between these two diagnostic approaches have substantive implications for how the findings should be interpreted. The authors’ confusion over the matter is hard to comprehend and raises fundamental questions about the validity of their research.
• According to Table 1, there were either no changes or no meaningful changes in average scores for fatigue, physical function, and multiple other secondary outcomes between the preliminary sample of 508 and the final follow-up sample of 316. The authors themselves acknowledge that the patients who dropped out before follow-up were likely to have had poorer health than those who remained. Therefore, the fact that Table 1 presents combined averages for the entire preliminary sample — i.e., combined averages for patients who dropped out and those who did not — muddies the waters. Presenting combined baseline scores for all patients will mask any declines that occurred for these variables in the subset who were followed up. It would have been far more appropriate to have isolated and presented the baseline data for the 316 followed-up patients alone. Doing so would have reflected the authors’ research question more correctly, as well as enabling readers to make their own like-with-like comparisons.
• Finally, the authors state that “Studies into CFS have placed little emphasis on occupational outcomes, including return to work after illness.” However, they conspicuously fail to mention the PACE trial, a high-profile large-scale British study of interventions for CFS. The PACE trial included employment status as one of four objective outcomes, with the data showing that the interventions used — the same ones as in the Occupational Medicine study — have no effect on occupational outcomes. This previous finding is so salient to the present paper that it is especially curious the authors have chosen to omit it. The omission is all the more disquieting given that the corresponding author of the paper was a lead investigator on the PACE trial itself
Authors of research papers have an obligation to cite seminal findings from prior studies that have direct implications for the target research question. Not doing so — especially where there is overlapping authorship — falls far short of the common standards expected in scientific reporting.
Even putting these additional matters aside, the technical errors that undermine this paper’s reporting of percentages render its key conclusions meaningless. The sentences used to describe the findings are simply incorrect, and the entire thrust of the paper’s narrative is thereby 6 contaminated. We believe that allowing the authors to publish a correction to these sentences would create only further confusion.
We therefore call on the journal to retract the paper.
Yours, etc.
Brian M. Hughes
School of Psychology
National University of Ireland, Galway
David Tuller
University of California, Berkeley
References
Stevelink, S. A. M., Mark, K. M., Fear, N. T., Hotopf, M., & Chalder, T. (2021). Chronic fatigue syndrome and occupational status: a retrospective longitudinal study. Occupational Medicine. Online ahead of print. doi: 10.1093/occmed/kqab170.
Conflicts of Interest
David Tuller is a senior fellow in public health and journalism at the Center for Global Public Health at the University of California, Berkeley; members of the ME/CFS patient and advocacy community have donated to crowdfunding campaigns in support of his position at Berkeley.
Comments
4 responses to “A Letter to Occupational Medicine From Brian Hughes & Me About Prof Chalder’s Latest Disaster”
It should be very hard for the cohort to fight you when you are so specific as to exactly what they do wrong. If OM’s editors don’t have the statistical background (it’s kind of an odd journal for this kind of paper), surely they have access to people who do. You’ve provided the blueprint – let’s see what comes of it.
Glad to see the gloves coming off more. We all wondered whether the NICE Guidelines foofaraw meant we would be subjected to more decades of the same – and they finally decided to, heavyhandedly, come down on the side of their own process. And now the final convulsions of the end are showing their surprising clumsiness in talking back.
I wonder how long they can keep it up, and hope it isn’t for very long. The money and time and effort wasted can go for the purpose of helping us – and the victims of other post-viral syndromes, not excluding the huge number of long-covid sufferers.
Please keep it up.
Bravo David and Brian! Thank you for your brilliant examination of this shambolic, unprofessional paper.
Do my eyes deceive me, or am I right in saying that the authors of this paper include 3 professors and a senior lecturer? If I’m right then what does that say about the standards of the institution/s that they work for?
I wonder – would it be worth informing the relevant institution/s of the problems with this paper? I imagine that their academic reputation/s will be important to them, no?
Even in informal market research to be read by one person you’d be expected to use (N=316), (N=10) etc for each percentage or number you use as a result. It blows my mind that these individuals submit papers where they pick out numbers for different ‘measures’ they cherrypick across papers and then do not provide this basic etiquette next to them. The Sept 2021 paper for example, had this data hidden in ‘figures’ which showed that even the N=71 out of the 1506 registered patients became N=53, N=41 etc(I can’t remember the exact numbers but was astounded at how small they became) for the different questions and graphs (looking at things like mood) as clearly even those who completed questionnaires left those questions blank.
To any layperson who has to look at basic things (which many jobs do) and had this pointed out (their rhetoric is charming until you are jolted to say ‘where’s the N’ if you aren’t from a background where you write these reports or from certain backgrounds) hiding this attrition of sample size for all the claims surely must look like deceit.
You don’t choose to write this out in separate A4 pages that one has to cross-compare back to the relevant bits, that you then append as a figure somewhere in the paper, over adding 4 characters at the end of a one line claim, just because you are slapdash, lazy or forgot. You’d go back during proofing and insert (N=x) after the relevant statement.
How on earth is it that these papers are getting through so many hands (and then readers) and people who are apparently top of their tree and doing research daily (and who if their low-paid admin person passed them an informal survey results summary missed this out would tell them to amend it) without this being enforced? The standards in some of these parts of academia seem to be shockingly low compared to what £12k interns for whatever business might be held to. Is it the Ref vs RAE embedding this ‘quality doesn’t matter, friends (or citations) do’?
I watched the video of Chalder a week or two ago where she was on a video call with docs for long covid going through her presentation and she just comes across as an utter blagger, and one-dimensional one at that (when asked a serious question she just reverted to the cliches of telling them to ask people if they’ve got kids, or other sources of stress). It would be embarrassing in a real workplace and they’d never be invited back (and have shown themselves up if they were an employee) but I couldn’t tell whether some of them were lapping it up simply because it was what they themselves wanted to hear (and noone pushed or asked any hard questions).
Is this the norm of how either noone dares to ask questions (culture issue caused by hierarchy) that would be expected in any other context? Or did none of them care to check – which if so just indicates to a shocking lack of self-awareness of the profession of how they are not led by, or focused on, facts ‘first’ (and term something as ‘blag’ if these don’t add up) and holding people accountable to substantiate their suggestions with facts.
Either explains the ‘parrot-fashion’ phrase of ‘mind influencing body’ that too many medics and allied now seem to be reciting despite not knowing a blind thing about whether it is tosh (just like a chinese whispers phrase they’ve been made to learn by heart) – it’s almost like they are being made to do this – and then embed in their own beliefs simply by said multiple recitations of it? which after all is basically a technique for indoctrination etc.