Dutch Team Offers “Dog-Ate-My-Data” Excuses for Not Reporting Null Objective Findings

By David Tuller, DrPH

Two months ago, Clinical Infectious Diseases (CID), a high-impact journal, published a study called “Efficacy of Cognitive-Behavioral Therapy Targeting Severe Fatigue Following Coronavirus Disease 2019: Results of a Randomized Controlled Trial.” The study, nicknamed ReCOVer amd conducted in the Netherlands, purported to provide the “first evidence for the positive effect of CBT in patients with severe post–COVID-19 fatigue.” The study received widespread attention and has been highlighted credulously in social media and news articles—including a recent article in Slate.

The 114 subjects were randomized to receive either a 17-week course of CBT, called Fit after COVID, or “care as usual” (CAU). When this study was announced a few years ago, I noted that it was an unblinded study relying on subjective outcomes—a recipe for generating a significant amount of bias. In other words, it was essentially designed to produce positive findings. And that is what happened.

The primary outcome was the mean difference in self-reported fatigue at the end of treatment and six months later on an instrument called the Checklist Individual Strength. The fatigue subscale of the CIS includes eight questions, each of which can be rated from 1 to 7; the final scores range from 8 to 52, with higher scores indicating greater fatigue. The difference in the means between the two groups was 9.3 points and 8.4 points, respectively, at the end of therapy and six months later. At that point, the mean score was 31.5 in the CBT group and 39.9 in the CAU group.

As with so much of the research from the CBT ideological brigades, the results were less than meets the eye; it is impossible to take them at face value. It should not be hard to understand that if you offer one group supportive and compassionate attention from a sympathetic therapist over a period of several months, they are more likely to provide positive answers on questionnaires than members of a group not receiving the intervention. Even more so if you prime them by assuring them repeatedly that CBT has been shown to work for such conditions, as is frequently the case in this sort of research.

In other words, the findings are not a surprise. As I wrote in a recent post, the research documented that unblinded studies relying on subjective outcomes yield positive results. These results have been over-hyped from Amsterdam to Harvard Medical School, even though the self-reported benefits were pretty modest—well within the range that might be expected from bias alone.

The authors noted that six points is considered a clinically significant difference on the CIS fatigue scale. So the difference between the two groups was only marginally outside what would be considered a clinically significant difference. Nonetheless, the authors were able to claim that more people from the CBT group no longer had “severe” fatigue. That’s because a score of 35 was designated as the threshold between severe and less severe fatigue. While it is true that more in the CBT group met that threshold, the 3.5-point difference between the mean of 31.5 and the 35-point threshold would not be considered clinically significant. Nor would the difference between that threshold and the mean score of 39.9 for the CAU group

So to repeat the obvious: These apparent benefits from CBT are exceedingly modest no matter how they are parsed.

Two cogent published responses—here and here–are well worth reading.

A curious omission from the published report raised some eyebrows. The trial protocol listed the primary outcome, several secondary outcomes, and a category called “other study outcomes.” Among the latter was the sole objective measure: actigraphy. Actigraphy involves wearing devices that precisely measure physical movement over a period of time. In this case, participants wore these devices for 14 days at baseline and for 14 days at the end of therapy. According to the protocol, “The actigraph has been shown to be a reliable and valid instrument for the assessment of physical activity.”

The lack of any mention of these data in the article suggested that the results for that measure were likely poor. In the authors’ response to the published comments, they acknowledged this to be the case. As they wrote:

“Proposed alternative outcomes, like physical activity assessed with actigraphy or physical fitness are no[t] reliable markers of fatigue, and are also influenced by the perception of patients and subjectively experienced symptoms. Research showed that a substantial number of patients with severe fatigue do not have deviant physical activity levels. This was also found in our sample, i.e. 81% of participants had a fluctuating active activity pattern, and only 19% had a low active activity pattern. A reduction of fatigue will not necessarily lead to increased levels of objective physical activity or vice versa. Also, reduced fatigue levels do not necessarily concur with improved aerobic capacity. In our study there was no significant difference between the conditions in the increase in physical activity assessed with actigraphy.”

For good measure, one of the co-authors, Dr Chantal Rovers, added another excuse in a Twitter thread: “You always have more data than fits within the word limit of medical journals. Primary and secondary outcome measures are included. The plan was to publish the other results in a separate article, as is very common.”

**********

Deconstructing bogus excuses for failing to provide objective data

Do any of these rationalizations hold water? No. Let’s look at each in turn

*Previous studies have found that actigraphy results do not correspond with patients’ self-reports about their fatigue levels. Therefore, they’re irrelevant.

It is true that previous studies from these and other authors have found that positive reports on subjective measures of fatigue are not matched by corresponding increases in physical activity, as measured objectively by actometers worn over a period of time. The apparent conclusion drawn by these researchers—that the actigraphy results can be dismissed as not related to fatigue—is ridiculous and clearly self-serving.

The Dutch investigators have perfected this strategy of including objective measures and then not reporting them until long after they have already received attention for their papers highlighting the positive findings on subjective measures. This happened in three trials in the 2000s of CBT for what they called CFS. Years later, the investigators published the null actigraphy findings from all three papers and concluded—conveniently—that reduction in fatigue was not mediated by increases in physical activity. They pulled the same stunt more recently with a study on Q-fever.

In other words, they enthusiastically accepted the accuracy of subjective reports of improvements and dismissed the significance of the indisputable fact that patients did not engage in more physical activity—even though they stated that its goal is to reduce the disability associated with the condition. This phenomenon has now morphed into the blunt assertion—the dogma, really–that objective measures of physical activity bear no relationship to the construct of fatigue. All that matters is the self-report—whether people actually do more is irrelevant as long as they say that they are less fatigued. It is hard to know how to respond to such an absurd, self-serving argument except to point out that it is absurd and self-serving.

*Most patients had fluctuating activity levels rather than continuously low activity levels, so the actigraphy readings couldn’t really show any improvement in these patients.

First, let’s note that—to justify leaving out key findings–they are offering information on levels of physical activity at baseline that they did not present in the study itself. We have no idea what they mean by “fluctuating activity levels”—presumably these were based on the baseline actigraphy readings—and we are asked to take their word based on data that have not been peer-reviewed and that we haven’t seen.

Beyond that methodological complication, the argument is odd: Unless people are active 24 hours a day, it is nonsensical to argue that they couldn’t do more and that actigraphy would not provide salient information. And this point also overlooks the obvious converse—that patients might do much worse in one or both study arms. If patients are at a high level of activity and suffer relapses either with or without CBT, the actigraphy would likely document that. These authors are so convinced by their own theorizing that they appear not to grasp that measures are designed to capture declines in health status as well as improvements.

*The actigraphy was not a primary or secondary outcome, so we didn’t need to report all the findings in this first paper.

Come on! Really?? This is perhaps my favorite bogus justification. The null results on this objective outcome inevitably raise questions about the validity and reliability of the subjective primary and secondary measures. Investigators have an obligation, enshrined in research ethics codes, to provide all salient data and not to hide information that would raise questions about or alter interpretations of their findings. It is impossible to argue with a straight face—although the Dutch team has tried; perhaps they lack a sense of humor and irony—that an objective measure of function like actigraphy is meaningless even when the results contradict subjective reports.

If the actigraphy results had been terrific—if they’d shown that patients increased their activity levels–does anyone seriously believe that the investigators would have withheld the data from the first study report on the grounds that it was not a primary or secondary outcome?

In any event, it was the investigators themselves who decided not to make their one objective measure a primary or secondary outcome, presumably because past experience demonstrated that the findings would probably contradict the subjective claims. So to cite this questionable decision as the reason to intentionally withhold these findings is disingenuous in the extreme. It takes an enormous amount of chutzpah—as in the classic tale of someone who kills their parents and then pleads for mercy as an orphan.

Moreover, if activity levels are “fluctuating” and therefore the actigraphy data collected for 12 days are therefore questionable, why should we pay attention to subjective reports of fatigue from eight items on a single questionnaire? I don’t get it. Under their argument about fluctuating levels, all the study data should be considered irrelevant and meaningless.

What about the argument that they just had too much data for one paper and so understandably had to set some aside for future publications? This position takes a self-evident fact of scientific research and twists it to justify not reporting important data. Investigators always want to squeeze more publications out of a single study—that’s fine. What’s not fine is choosing to leave out information that raises questions about your conclusion. The Dutch investigators don’t see it that way, of course. Since they believe the actigraphy findings have no value in assessing fatigue in the first place, they see no problems with waiting till some future date to publish them.

In short, these responses are not serious. They are self-serving deflections; whether or not the authors actually believe them, they have exposed themselves as unqualified and too ethically challenged to engage in any research at all. It is hard to grasp how legitimate investigators could engage in such specious reasoning.

Dr Daniel Griffin, an infectious disease specialist in the New York City area and a regular on the popular podcast This Week in Virology (hosted by Vincent Racaniello, a microbiology professor at Columbia who is also the host of Virology Blog) agreed that the decision to not report the objective findings from actigraphy cannot be justified. Here’s what he had to say:

“The criticism is warranted. My biggest thing is being open and honest and not ‘hiding’ or ignoring data that fails to support one’s agenda. As we are seeing, a person may report they feel improved. But when we see no increase in actual activity, that is important information to share.”