Introduction

Misclassification of causes of death on death certificates is a well-documented issue1,2,3,4,5,6. In general, it is understood that if misclassification is non-differential, it will bias dose-response relationships toward the null7,8,9. This heuristic is commonly used to suggest that if epidemiological studies with significant dose-response associations incorporated misclassification, the associations would have been stronger. However, several studies described exceptions to the conventional assumption that outcome misclassification biases risk estimates toward the null10,11,12. Yland et al.12 provided several exceptions to the conventional assumption. One of these exceptions demonstrated that even when the underlying misclassification rates are non-differential, the observed misclassification in a population may actually be differential due to the random nature of misclassification. Simulated datasets were used to demonstrate that this effect is more significant for smaller population sizes and smaller misclassification rates. Whitcomb & Niami11 made a useful distinction between bias and error. They clarify that bias is “a tendency – the difference between the true value of a parameter and the expected value based on (usually hypothetical) repetitions of a study,” whereas an error is the “difference between a particular study result and the truth.” Both Yland et al.12 and Whitcomb & Niami11 demonstrate that, regardless of the direction of the overall bias, the error observed for an individual study may not move the measure of the dose-response relationship toward the null. This means that there is some probability that the heuristic will not hold true for an individual study. A thought experiment demonstrates that if the true odds ratio in a study population is exactly 1 (the null), and the underlying misclassification is non-differential, the odds ratio can only move away from the null with equal probability of 50% in both directions (larger or less than 1).

While Yland et al.12 and Whitcomb & Niami11 simulated the effects of dose misclassification, a similar effect would be expected as a result of outcome misclassification. The current study focuses on the impact of non-differential outcome misclassification and its effect on dose-response estimates for individual studies. As such, two questions will be addressed: (1) what is the probability that misclassification of disease mortality moves measures of dose-response associations away from the null? and (2) what is the probability that misclassification moves measures of dose-response associations away from the null sufficiently to change the conclusion of a study from non-significant to significant? The first question represents the probability that the heuristic does not hold true for an individual study. The second question is of unique importance, because it explores the probability that misclassification would artificially enhance the measure of risk sufficiently to change the conclusion from non-significant to significant. By addressing these questions, the present analysis provides a quantitative framework for evaluating the extent of concern warranted in studies that rely on causes of death from death certificates. Specifically, it enables researchers to estimate the probability that an observed significant association may arise from outcome misclassification rather than from a true relationship between dose and response. Quantifying these probabilities is essential for critically interpreting findings and for assessing how outcome misclassification could influence the conclusions drawn from individual epidemiological studies.

Methods

A validation dataset

United States Transuranium and Uranium Registries (USTUR) Registrants are former nuclear workers who had documented exposures to external radiation and/or to internally incorporated actinide elements such as plutonium13,14. These individuals agreed to have an autopsy at the time of death and voluntarily donated tissues and organs, or their entire bodies, for posthumous research. The USTUR maintains copies of radiation exposure records from Registrant worksites as well as autopsy reports and death certificates. The combination of cumulative external dose (Sv) data, autopsy report cause of death, and death certificate cause of death needed for misclassification analyses was available for 229 Registrants. A study of misclassification of causes of death on death certificates6 showed an overall misclassification rate of 25.4% among USTUR Registrants. No association between misclassification and radiation dose was found, suggesting that misclassification of underlying causes of death among USTUR Registrants is non-differential with regard to dose.

This study was performed as part of the USTUR research program, which was reviewed and approved by the Central Department of Energy Institutional Review Board (USA) No. WASU-68–50181. Since the initiation of Registrant recruitment in 1968, the USTUR has routinely obtained authority for autopsy and/or informed consent, as well as a release of medical records, from participants next-of-kin, or power of attorney in accordance with the ethical standards in place at the time of data collection. In addition, a formal informed consent process has been in place for decades. For cases collected prior to the establishment of formal IRB requirements, consent procedures adhered to the prevailing ethical standards and institutional policies at the time.

Initial and misclassified datasets

Datasets were formed by pairing dose and outcome data, where outcomes were either cancer or non-cancer mortality. Two types of datasets were used in this study: initial datasets and misclassified datasets. Initial datasets represented the ‘true’ distribution of diseases in a studied population, as might be found on autopsy reports, and were used as the starting point for misclassification simulations. Misclassified datasets were the result of the misclassification simulations, and represented possible misclassified distributions of disease in a population, such as those typically found on death certificates used in epidemiological studies.

Source of dose data

Two sources of dose data were used: actual cumulative external doses and generated cumulative external doses. Actual cumulative external doses were taken directly from copies of worksite occupational exposure records from USTUR Registrants, and ranged from 0.001 to 0.714 Sv (geometric mean: 0.063 Sv, geometric standard deviation: 4.09 Sv, n = 229). Generated cumulative external doses were designed to increase the sample size by generating 5,000 external doses from a truncated lognormal distribution, with the same geometric mean and geometric standard deviation as were observed in actual USTUR external dose data. An upper limit of 1 Sv was set to avoid extreme or outlier doses. After creating these dose values, the geometric mean and geometric standard deviation of the distribution were calculated and compared to actual USTUR external dose data. This process was repeated 100 times using different random seeds. The dose distribution that was the closest to the actual USTUR doses was selected by identifying the iteration that minimized the sum of absolute differences between their respective geometric means and geometric standard deviations. This optimization approach ensured that the final dose distribution was as close as possible to the actual USTUR data. The cumulative external doses in the generated dataset ranged from 0.004 to 0.98 Sv (geometric mean: 0.057 Sv, geometric standard deviation: 3.94). A sample size of 5,000 was selected because that was the point where patterns in the results became stable.

Source of ‘true’ outcome data

The outcome of interest in this study was cancer mortality. Outcomes were binary and they were labeled as 1 if an individual’s underlying cause of death was cancer and 0 if the person died from other causes. Two sources of ‘true’ outcome data were used to form initial datasets: actual cancer deaths from USTUR Registrant autopsy reports and generated cancer deaths. Actual cancer deaths were identified by a medical doctor (MD) who reviewed each autopsy report6.

Generated ‘true’ outcomes were produced using the logistic probability function,

$$\:p\left(x\right)=\:1/\left[1+{e}^{-\left({\beta\:}_{0}+{\beta\:}_{1}\cdot\:x\right)}\right]$$
(1)

Where, \(\:x\) was the radiation dose in Sv.

\(\:p\left(x\right)\) was the probability of a cancer death.

β0 was a constant derived from a baseline cancer mortality rate of 20% using the formula \(\:{\beta\:}_{0}=-\text{l}\text{n}(\frac{1}{0.2}-1)\).

β1 was the log of a preset odds ratio.

The preset odds ratio used to calculate β1 was designed to either force the odds ratio to be close to 1 by using β1 = log(1.001), or to produce a non-significant dose-outcome dataset with a p-value sufficiently close to 0.05 (0.05 < p-value < 0.0501).

Equation 1 provided the probability of a cancer death as a function of radiation dose. It was used to calculate the probability of a cancer death associated with each value in the dose dataset. Those probabilities were then used to randomly generate outcomes corresponding to each dose value. The resulting paired dose-outcome dataset represented just one possible dataset that could have been generated from the p(\(\:x\)) dose probabilities. Therefore, using the same doses and probabilities, a total of 1,000 possible dose-outcome datasets were randomly generated, and odds ratios and p-values were calculated for each.

Afterward, a single initial dataset was selected from among the 1,000 dose-outcome datasets for use as the starting point in the misclassification analysis. This initial dataset was selected to force either the odds ratio to be close to the null value of 1, or the p-value to be slightly larger than 0.05. The scenario where the initial dataset’s odds ratio was close to 1 was designed to mimic a situation where the association between exposure and outcome is minimal or non-existent. The scenario that forced the initial p-value to be slightly larger than 0.05 was designed to create a borderline non-significant initial dataset, representing an extreme situation where the conclusion of a study is most vulnerable to changing from non-significant to significant as a result of death certificate misclassification errors.

Calculation methods

Odds ratios were used as a measure of the association between cancer mortality and dose, and p-values as a measure of the significance of that association. Two methods were used to calculate the odds ratios and p-values: a 2 × 2 contingency table and a logistic regression15. The 2 × 2 table method used a categorical dose variable to calculate the odds ratio and p-value for a dataset. To create the categorical dose variable, cases were divided into low- and high-dose groups using the median dose (0.076 Sv). The low- and high-dose groups were each further subdivided into cancer and non-cancer cases to make a 2 × 2 contingency table. The contingency table was then used to calculate the odds ratio, and the chi-squared test was used to calculate the p-value. The logistic regression method used the continuous dose variable to calculate the odds ratio and p-value for a dataset, such that each dose was treated as a unique value associated with a specific outcome in calculations.

Misclassification: real data from USTUR registrants

As a first step, the impact of death certificate misclassification among USTUR Registrants was calculated using data that was taken directly from Registrant files: actual external doses (Sv), actual underlying causes of death from autopsy reports, and actual underlying causes of death from death certificates. For the purposes of this work, underlying causes of death from autopsy reports represented the ‘true’ distribution of diseases in a population, and underlying causes of death from death certificates represented the misclassified distribution of diseases. The odds ratio for cancer mortality was calculated based on autopsy reports and compared to the odds ratio based on death certificates. Both of these odds ratios were calculated using two methods, a 2 × 2 table and a logistic regression, to explore how calculation methods influence the findings.

Misclassification: simulated outcomes

Figure 1 illustrates the general approach used to establish an initial dataset and simulate misclassified outcomes. Overall, simulation of misclassified outcomes consisted of three steps: (1) selection or generation of doses, (2) selection or generation of ‘true’ cancer outcomes, and (3) the misclassification simulation.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

General approach to selection of the initial dataset and the misclassification simulation.

Six separate scenarios were selected to represent different combinations of actual and generated data as the starting points for six separate misclassification simulations. Each of these scenarios had different initial datasets and/or different calculation methods as summarized in Table 1 and described below.

Scenario 1: The initial dataset consisted of actual doses and actual cancer mortality outcomes from USTUR Registrants (229). The 2 × 2 table method was used for calculations of odds ratios and p-values.

Scenario 2: The initial dataset consisted of actual doses and actual cancer mortality outcomes from USTUR Registrants (229). The logistic regression method was used for calculations of odds ratios and p-values.

Scenario 3: The initial dataset consisted of actual doses from USTUR Registrants (229) and generated mortality outcomes that forced the initial dataset to have an odds ratio close to 1. The logistic regression method was used for calculations of odds ratios and p-values.

Scenario 4: The initial dataset consisted of actual doses from USTUR Registrants (229) and generated mortality outcomes that forced the initial dataset to have a slightly non-significant p-value (p > 0.05). The logistic regression method was used for calculations of odds ratios and p-values.

Scenario 5: The initial dataset (5,000) consisted of generated doses and generated mortality outcomes that forced the initial dataset to have an odds ratio close to 1. The logistic regression method was used for calculations of odds ratios and p-values.

Scenario 6: The initial dataset (5,000) consisted of generated doses and generated mortality outcomes that forced the initial dataset to have a slightly non-significant p-value (p > 0.05). The logistic regression method was used for calculations of odds ratios and p-values.

Table 1 Misclassification simulation scenarios: initial datasets and calculation methods.

Misclassification was simulated for each scenario using over- and under-misclassification rates ranging from 0% to 30%, where over- and under-misclassification rates were defined as follows:

$$\text{Over-misclassification Rate}=\frac{\text{False Positives}}{\text{False Positives}+\text{True Negatives}}$$
(2)
$$\:\text{Under-misclassification Rate}=\frac{\text{False Negatives}}{\text{False Negatives}+\text{True Positives}}$$
(3)

Over-misclassified cancer death outcomes were simulated by randomly selecting non-cancer cases in the initial dataset and changing them to cancer cases in the misclassified dataset (i.e. the outcome was changed from 0 to 1). Similarly, under-misclassified outcomes were simulated by randomly selecting cancer cases in the initial dataset and changing them to non-cancer cases (i.e. the outcome was changed from 1 to 0). For example, to generate a combination of 5% over- and 15% under-misclassification of disease mortality, 5% of non-cancer cases in the initial dataset were selected at random and changed to cancer cases, and 15% of cancer cases from the initial dataset were selected at random and changed to non-cancer cases. This process was repeated 20,000 times for each combination of over- and under-misclassification rates, resulting in 20,000 misclassified datasets. These datasets represented the range of possible outcomes that misclassified death certificates could have if misclassification was random. For each scenario, the misclassification was simulated for all combinations of over- and under-misclassification rates of 0%, 5%, 10%, 15%, 20%, 25% and 30%.

Odds ratios and p-values were calculated for each of the 20,000 simulated datasets and were used to calculate three summary statistics for each combination of misclassification rates:

  • Statistic A: The geometric mean of the odds ratios for all simulated datasets, where the geometric mean is equivalent to the exponentiated mean of the log of the odds ratios from all simulated datasets: \(\:{e}^{mean\left[\text{log}\left(OR\right)\right]}\).

  • Statistic B: The percentage of simulated datasets where the odds ratio moved away from the null value of 1, indicating a strengthened association between dose and disease mortality. In Scenarios 1 and 2, this was calculated as the percentage of odds ratio values that decreased since the initial odds ratios were less than 1. In Scenarios 3–6, it was calculated as the percentage of odds ratio values that increased since the initial odds ratios were greater than 1.

  • Statistic C: The percentage of simulated datasets where the odds ratio moved away from the null value of 1 and the p-value changed from non-significant to significant, indicating that the odds ratio was strengthened sufficiently to change the dose-response relationship to significant. Movement away from the odds ratio was again defined as odds ratio values that decreased for Scenarios 1 and 2, and as odds ratio values that increased for Scenarios 3–6.

Visualization

For each scenario, Statistics A, B, and C were visualized as heatmap figures. In each heatmap, the x-axis represented the over-misclassification rate and the y-axis represented the under-misclassification rate, both ranging from 0% to 30% in increments of 5%. Each cell displayed the value of a given summary statistic for 20,000 simulations of one specific combination of over- and under-misclassification rates, with a color gradient applied across cells to indicate the magnitude of values. The cell corresponding to 0% over- and 0% under-misclassification was left blank for Statistic B and C, as it represented the initial dataset with no misclassification applied. This resulted in three heatmaps per scenario, one for each statistic.

Software and packages

All data processing, simulations, statistical analyses, and visualization were conducted in R version 4.5.216 using RStudio version 2026.01.017. Generated dose distributions were produced using a truncated lognormal distribution via the truncdist package version 1.0.218. Heatmaps were created using the ggplot2 package version 4.0.119. The full R script (Supplementary Material 3) is provided as supplementary materials.

Results

Observed misclassification impacts

The odds ratios and p-values associated with ‘true’ causes of death found in Registrant autopsy reports and misclassified causes of death from Registrant death certificates are presented in Table 2. The 2 × 2 table approach moved the odds ratio slightly away from the null, and the logistic regression moved the odds ratio toward the null. However, the p-values for both the ‘true’ and misclassified odds ratios for both calculation methods were quite non-significant.

Table 2 Misclassification among USTUR Registrants.

Simulated misclassification impacts

The full results for all six scenarios and all three summary statistics are provided as heatmaps in Supplementary Material 1. Figure 2 provides an example of these heatmaps using Scenario 2 results. Figure 2(a), which displays Statistic A, indicates that the initial odds ratio was 0.360 and the geometric mean of misclassified odds ratios ranged from 0.366 to 0.696. The minimum geometric mean was observed when the over-misclassification rate was 0% and the under-misclassification rate was 5%. The maximum geometric mean was observed when the over-misclassification rate was 30% and the under-misclassification rate was 30%. Figure 2(b), which displays Statistic B, indicates that the percentage of datasets where the odds ratio moved away from the null value of 1 ranged from 22.9% to 43.8%. The minimum percentage was observed when the over- and under-misclassification rates were both 30%. The maximum percentage was observed when the over-misclassification rate was 0% and the under-misclassification rate was 5%. Figure 2(c), which displays Statistic C, indicates that the percentage of datasets where the odds ratio moved away from the null value of 1 and the p-value changed to significant ranged from 0% to 4.5%. The minimum percentage was observed with a 5% over-misclassification and no under-misclassification. The maximum percentage was observed when the over-misclassification rate was 30% and the under-misclassification rate was 15%.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Heatmaps for Scenario 2 with an initial odds ratio of 0.360: (a) the geometric mean of the odds ratios for all misclassified datasets; (b) the percentage of misclassified datasets where the odds ratio moved away from the null value (OR = 1); (c) the percentage of misclassified datasets where the odds ratio moved away from the null value of 1 and the p-value changed to significant (p < 0.05). All heatmaps were generated using the ggplot2 package version 4.0.1 (https://ggplot2.tidyverse.org)19 within the R environment version 4.5.2 (https://www.R-project.org)16 and RStudio version 2026.01.0 (http://www.posit.co)17.

The range of values in each scenario’s heatmaps are summarized in Table 3, along with the odds ratio and p-value of the initial dataset used for each scenario.

Table 3 Misclassification simulation summary statistics.

Several trends in the odds ratios can be observed from the full results of all six simulation scenarios (Supplementary Material 1). When the odds ratio for the initial dataset was not close to 1 (Scenarios 1–2, 4, and 6), the impact of misclassification was more sensitive to low misclassification rates. For these scenarios, the percentage of datasets where the odds ratios moved away from the null tended to decrease as misclassification increased. Conversely, when the odds ratio for the initial dataset was approximately 1, the percentages of dataset where the odds ratios moved away from the null was approximately 50% regardless of the misclassification rate. It can also be observed that when the initial dataset had a p-value that was slightly larger than the level of significance (Scenarios 4 and 6), there was a wider range in the percentage of datasets where the misclassified odds ratios moved away from the null. Additionally, there was a higher probability that the odds ratio would both move away from the null and change the conclusion of a study from non-significant to significant, given a significance level of p = 0.05.

Discussion

The conventional heuristic states that if misclassification is non-differential, it will bias dose-response relationships toward the null. Statistic A indicates that on average, the odds ratios did move toward the null for scenarios where the odds ratio of the initial dataset was not 1. Thus, if a study could be repeated many times, on average, the dose response relationship would be expected to move toward the null. However, the heuristic is often used to make inferences about the error in a single study. The error in the odds ratio for a particular study may or may not move the measure of the dose-response relationship toward the null. Statistic B illustrates the probability that the odds ratio for a single study will move away from the null as a consequence of misclassification. It can be seen that for a non-trivial percentage of simulated studies, the odds ratio did not follow the conventional heuristic, but instead moved away from the null. For example, between 4% and 47% of misclassified datasets associated with Scenario 6 contradicted the conventional heuristic. Statistic C further indicated that not only can the odds ratio move away from the null, but it can also do so in such a way that a p-value shifts from non-significant in the absence of outcome misclassification to significant as a result of misclassification. Thus, caution should be used when assuming that slightly significant findings would have been more significant if misclassification could have been accounted for. Likewise, additional caution should be exercised when assuming that slightly non-significant findings would have been significant if misclassification had been accounted for.

As suggested by the thought experiment in the introduction, when the odds ratio of the initial dataset was set to the null value of 1 (Scenarios 3 and 5), the proportion of datasets with odds ratios that moved away from the null was approximately 50%. Consequently, the geometric means of the odds ratios were approximately 1. This trend became clearer when the size of the dataset was increased to 5,000. It is interesting to note that a small percentage (< 3%) of datasets still had odds ratios that moved far enough away from the null to change the p-value from pinitial=0.99 to a value < 0.05, indicating that these misclassified datasets had statistically significant, but erroneous, associations between dose and disease.

The heatmaps in Supplementary Material 1 indicated that the percentage of datasets where the odds ratios that moved away from the null tended to decrease as misclassification increased. Thus, the impact of misclassification was more sensitive to low misclassification rates, which is consistent with the findings from Yland et al12. This is concerning for epidemiological studies that are based on death certificate causes of death, because very small misclassification rates can have a relatively large impact on the validity of the findings of a study even with health outcomes having low misclassification rates such as certain cancers. Additionally, when the initial dataset had a p-value that was slightly larger than the level of significance, there was a higher probability that the odds ratio would both move away from the null and change the conclusion of a study from non-significant to significant. This occurred because the initial p-value was so close to the level of significance that even a small deviation in the odds ratio could tip it over to significance. This is also concerning for epidemiological studies, such as low-dose radiation epidemiological studies, where barely significant associations are often found and published.

Given the various factors that influence the probability that correcting for misclassification would change the conclusion of a study, one might ask when it is reasonable to apply the heuristic to the findings of a particular study. Certainly, if the probability of a conclusion change was 0.01%, the impact of misclassification would be trivial, and the heuristic could be used. However, if the probability of a conclusion change was 40%, there would be a noteworthy chance that the heuristic could be misleading.

Other studies have simulated the effect of outcome misclassification on estimates of risk11,12,20. Yland et al12. investigated the impact of misclassification on risk ratios, and observed a decreasing probability that the risk ratio would move away from the null with increasing misclassification rates in simulated datasets. Yland et al12. simulated non-differential misclassification in hypothetical populations of increasing size, and demonstrated that even when the expected misclassification rate is non-differential, the observed errors may be differential due to random nature of chance. However, for a fixed risk ratio, as the sample size and/or the level of misclassification increased, random chance was less likely to move the results away from the null.

Efforts are underway to better understand additional factors that influence the trends in the heatmaps in Supplementary Material 1. A “cancellation effect” may play a role, especially for categorical exposures, where the effect of over-misclassified cases cancels out the effect of under-misclassified cases, such that the net balance of cancer deaths for high- and low-dose cases remains similar. Other factors that may influence the trends on the heatmaps include: the dose distribution, baseline disease rate, the strength of the dose-response association, confounding factors, sample size, etc. Future work is needed to explore the influence of these factors as well as the influence of interactions among the factors. Previous work21 indicates that when confounding factors are introduced, the effect of misclassification on measures of risk is more complex.

This simulation study was carried out using cancer mortality among USTUR Registrants as an example. However, the methods and equations involved are not specific to any particular disease, and the findings can be generalized to other outcomes. For example, the baseline rate of deaths due to heart disease (21%) among US residents is similar to that for cancer (19%); therefore, the results of this simulation study can be generalized to heart disease. The impact of misclassification of less common diseases could be explored by changing the coefficient associated with baseline disease rate in Equation 1. The methodology could also be extended to endpoints such as morbidity or specific types of cancer.

Recent methodological discussions have highlighted important limitations of odds ratios as effect-size measures, particularly their non-collapsibility and limited interpretability as population-level risks in multivariate logistic regression models22,23. In this study, odds ratios were not used for effect-size interpretation, causal inference, or comparison across models. Rather, they served as the statistical quantities underlying hypothesis testing in logistic regression. Because the model specification was held constant across simulations and included radiation dose as the only explanatory variable, variation in estimated odds ratios across simulations reflected only outcome misclassification and finite-sample variability, not non-collapsibility induced by conditioning on additional covariates. The purpose of the analysis was therefore inferential—to examine how outcome misclassification can influence apparent statistical significance in a single realized study—rather than interpretative. This distinction aligns with recent guidance emphasizing the importance of separating inferential and effect-size roles of odds ratios in applied research22.

A substantial methodological literature has developed approaches to mitigate the impact of outcome misclassification under various assumptions. Comprehensive overviews of these methods are provided by Fox, MacLehose, and Lash24. One class of methods relies on likelihood-based models that explicitly parameterize misclassification probabilities, allowing simultaneous estimation of disease risk and classification error when sensitivity and specificity are known or estimable. Related approaches incorporate external validation data, in which misclassification parameters are estimated from an independent or nested validation sample and subsequently integrated into the primary likelihood. Alternative strategies include validation-sample–based models that treat true outcome status as partially observed, as well as regression calibration techniques that use estimated misclassification probabilities to adjust regression coefficients. When sensitivity and specificity of outcome classification are available or can be estimated from validation data, simpler correction approaches may also be applied. One such approach is the Rogan–Gladen estimator25, which adjusts observed outcome prevalence to obtain an unbiased estimate of the true prevalence and has been widely used in epidemiologic studies involving screening and surveillance. These methods can reduce bias in effect estimation when model assumptions are satisfied, and adequate validation data are available.

The objective of the present study, however, was not to estimate corrected effect measures but rather to examine how outcome misclassification can affect statistical inference in individual realized studies prior to, or in the absence of, formal correction. Understanding the inferential behavior of uncorrected analyses remains important in practice, as correction methods are not always implemented, validated, or reported, and applied interpretations frequently rely on nominal statistical significance from standard regression models. Ongoing work by the authors applies Rogan–Gladen–type correction methods using validation data from the USTUR Registrants to directly examine how misclassification correction alters estimated associations and statistical inference.

Conclusions

It is generally understood that non-differential misclassification biases dose-response relationships toward the null, and this heuristic is often used to suggest that correcting for misclassification would only strengthen these relationships. While this is often the case, the findings of this study indicate that general belief is not always correct for individual studies. Nominally non-differential misclassification can move the odds ratio away from the null. There is a non-trivial probability that correcting for misclassification would change a dose-response relationship from significant in a misclassified dataset to non-significant in the correctly classified dataset. Consequently, researchers should use caution when assuming that accounting for outcome misclassification in a particular study would have strengthened the dose-response association. Future work will focus on the application of appropriate methods for correcting this type of misclassification errors.