replying to D. Pasquier Nature Communications https://doi.org/10.1038/s41467-025-60478-x (2025)

We appreciate the comments by Pasquier on our meta-analysis1. Pasquier states that the “significant increase of mortality associated with HCQ (OR, 1.11, 95%CI, 1.02–1.20, p = 0.02) … is in contrast with other meta-analyses based on similar sets of trials, which reported wider confidence intervals2,3,4,5.” and that “The difference between Axfors et al. and other meta-analyses originates mostly from the meta-analytic model used”, rather than the fact that we included more studies.” However, while we agree that different meta-analytic model choices may give non-identical results (a well-known fact that affects any meta-analysis)6 our conclusions are not materially altered by using the method Pasquier suggests.

As shown in Table 1, the four other meta-analyses cited by Pasquier vary substantially in the number of included trials, total sample size, and total number of deaths. Our meta-analysis includes more data than any of the other four meta-analyses, i.e,. more trials, larger total sample size, and more deaths (where reported). Therefore, it is not surprising that it can have somewhat tighter confidence intervals for the summary effect. The relative weight contribution of the additional trials depend on how weights are calculated. Regardless, the results of the four cited meta-analyses, as well as the main result and the sensitivity analyses results that we reported in our original paper, are remarkably similar. In all of them, the summary estimate shows modest increase in mortality risk with hydroxychloroquine. If we exclude the HKSJ-SJ option, which may be problematic for these data1, the range of the odds ratio point estimates is in the very narrow 1.08–1.11 window. The difference in p-values is also very modest (0.02–0.17) and, in part, it reflects the fact that our meta-analysis includes more evidence from more trials.

Table 1 Comparison of 5 meta-analyses of randomized trials on hydroxychloroquine

In our paper, we provided extensive results with different meta-analytic models in the Supplement, and not all of them passed the traditional threshold of p < 0.05. It is very well documented that different models for random effects and different approaches for estimating the between-study variance can lead to different results6, but in fact the variability here in those results is small.

As discussed in our original paper, we agree with Pasquier that the choice of an HKSJ model with an SJ estimator of the between-study variance may not work with these data. We chose to use the PM estimator of the between-study variance that seems to have the best performance in most circumstances, but when many studies have low event counts or zero events, no method is perfect. As concluded in the most rigorous review of different estimators of between-study variance6, “according to the summarized findings of the simulation studies, the PM method appears to have a more favorable profile among other estimators of the between-study variance in meta-analysis, including DL. It is easy to calculate, does not require distributional assumptions (Bowden et al., 2011; DerSimonian and Kacker, 2007), and has been shown to be less biased and more efficient than many alternatives.” We acknowledge that with the HKSJ approach, the q estimation can have uncertainty and, indeed, with q = 1, the summary result is not nominally statistically significant. The example provided by Pasquier where one small trial is added at a time to a large trial shows the well-known fact that between-study variance estimation is very uncertain in these circumstances. Type I error may be underestimated in such cases7. However, uncertainty is lessened, and type I error becomes more appropriate when many trials are considered—as when all trials are considered in our meta-analysis. Both simulations and empirical data show that when there is large difference in sample sizes with only one very large trial included and with a small number of total studies, HKSJ is in fact far more likely to give non-statistically significant results than DL, i.e., it is more conservative in inferences7. For the reasons outlined above, we believe that our prespecified analytical methods were reasonable. Nonetheless, considering the unexpected under-dispersion observed in the data, we regard the truncated version as providing the more accurate variance estimate, although it doesn’t change our substantive conclusions. However, no method is a gold standard, and results from different meta-analytic methods need to be seen side-by-side. Re-analyzing the data with the ad-hoc truncated approach proposed by Pasquier, incorporating the variance estimate from the classic random-effects model with the Hartung-Knapp method, produces an identical point estimate (as the formula dictates) with slightly less precision i.e. OR = 1.11 (95% CI: 0.97–1.26; p = 0.11) (Table 1).

It is important to emphasize that our methods and analytical choices were pre-specified. In addition to the approaches proposed by Pasquier, there are a number of other post-hoc analytical choices that could be made. Post-hoc choices based on the data, particularly those that alter statistical significance, can introduce bias and reduce the credibility of any such analyses.

The other example examined by Pasquier (hydroxychloroquine as outpatient treatment) would indeed give a statistically significant summary effect if the same method were to be used as we did in our meta-analysis. We should note that a significant reduction in hospitalizations in unblinded trials is plausible and it should not come as a surprise (as large scale analyses on this methodological issue demonstrate, e.g., refs. 8,9). In unblinded trials, a treatment thought to be effective may lead to fewer decisions to hospitalize. So, it is possible that this is a bias rather than a genuine pharmacological benefit8,9. This bias would not affect mortality as an outcome in the same way or direction. Τhe largest outpatient trial10 (the only one to have a substantial number of death events, n = 10) shows a hazard ratio of 1.56 (95% CI, 0.42–6.72) for mortality, in the direction of hydroxychloroquine harm.

Overall, we should not forget the main question that our meta-analysis aimed to answer: Does hydroxychloroquine treatment affect mortality risk in COVID-19 patients? While different meta-analytic models differ in the exact levels of statistical significance, they all provide modest evidence for a small increase in mortality and fairly strong evidence against a material mortality benefit.

Since we published our meta-analysis, additional data have been published by the REMAP-CAP trial11. While the unpublished data from this large trial that we had included in our meta-analyses showed an inconclusive result (relative risk 1.04 with very wide confidence intervals], the updated 180-day follow-up of the same trial (HR, 1.51, 95% CrI 0.98–2.29) showed a 96.9% probability of harm from hydroxychloroquine using a Bayesian model11. Pasquier is cautious about the results of the REMAP-CAP study and fears confounding of the results, as it is a “small Bayesian trial” with “a non-concurrent control group and complex modeling”. The REMAP-CAP is a textbook example for a platform trial12 with an extremely rigorous design. Pasquier criticizes the adaptive design and modeling but does not provide any further details or possible corrective actions. These updated results of REMAP-CAP, even if one may find limitations to them as in any trial, are consistent with our analyses. Pasquier argues that “[…] to estimate deaths caused by HCQ, one should ideally use the most recent and complete meta-analysis available, which however showed a weaker rather than stronger trend when more data was added.” However, it should be noted that the cited meta-analysis only included 7911 patients (1318 deaths), whereas our analysis encompasses more than 2000 additional patients with more than 200 additional deaths (see Table 1).

We agree with Pasquier that one has to be cautious about the uncertainty that surrounds the estimates of increased mortality, and we agree that the estimated uncertainty will vary depending on the meta-analysis method used. But the slight additional imprecision introduced in this case by using a different meta-analytic model does not materially change our conclusions, following the dictate of the American Statistical Association, that “Scientific conclusions …should not be based only on whether a p-value passes a specific threshold.”13. We think that the harm signal for hydroxychloroquine has substantial evidence.

Reporting Summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.