Introduction

Across a range of conditions − e.g., osteoporosis, left-sided colon cancer, and systemic lupus erythematosus (SLE) − patients who represent the less commonly impacted demographic have poorer outcomes and more severe manifestations1,2,3,4. For example multiple sclerosis (MS) is generally considered a female-predominant disease but male patients tend to have worse prognoses. Some have proposed that these types of disparities may be due, in part, to diagnostic delays or misdiagnosis. Interrogating variation in clinical decision-making can identify actionable points of intervention that can improve disease management and outcomes and inform the evidence base for the practice of evidence-based medicine.

Experimental vignette studies offer a methodological approach to understanding variation in clinical decision-making5,6. These factorial studies use identical clinical case vignettes that vary only by the experimental factors under study. This approach has been used to study variation in the delivery of care for common yet complex diseases and conditions in areas such as cancer, cardiovascular disease, pain management, and mental health5. Using this approach, we previously showed that case vignettes presented as male were less likely to be correctly diagnosed with SLE by physicians, particularly White males compared with Black females7.

To date, there have been little to no data about variation in clinical decision-making among neurologists about common yet complex neurologic diseases and conditions. This study aimed to use an experimental vignette design to measure the effects of race (Black, White) and gender (female, male) on variation in clinical decision-making for remitting-relapsing MS (RRMS), primary progressive MS (PPMS), stroke, Parkinson disease (PD), and epilepsy.

Methods

Standard protocol approvals, registrations, and patient consents

This study was approved by the Institutional Review Board of Stanford University, Stanford, California, and followed AAPOR guidelines8. Participants provided informed consented to participate in the study survey before proceeding to provide any data or responses.

Survey design

Five vignettes were drafted by a neurologist (NB), then reviewed and pilot-tested by 30 Stanford neurologists who received a $20 gift card through Tango Rewards. The pilot and final survey were distributed via Qualtrics (Qualtrics, Provo, UT) and used block randomization to randomize which race-gender version was shown. Following consent, background and practice-related questions were asked before participants were presented the case vignettes. The case vignettes were given in the following order: stroke, MS, PD, epilepsy, MS, where the sequence of the two MS subtypes (RRMS and PPMS) was randomized to account for potential ordering effects (Fig. 1). Each vignette and its associated questions were included on the same survey page. Participants were asked, “What is the most likely diagnosis?” (free text response), “Is your next step diagnostic or therapeutic?” (dichotomous), and “Please briefly tell us what this next step would be:” (free text response; see eMethods 3 in Supplement 1).

Fig. 1
figure 1

Schematic for pool of case vignettes and randomization mechanism used in a survey-based study of how patient race and gender influence clinical decision-making among neurologists, United States, August—November 2022. Abbreviations: RRMS, relapsing-remitting multiple sclerosis; PD, Parkinson disease; PPMS, primary progressive multiple sclerosis. (A) Pool of case vignettes by race and gender; (B) an example of 6 possible sets of randomized surveys. Each participant received the same 5 clinical cases, randomized by race and gender. The scenarios were presented in the same order, with the exception of RRMS and PPMS, as shown in S1—S3 compared with S4—S6.

Survey distribution

A mailing list of neurologists was provided by SPAN Global Services/LakeMedia Group, a medical marketing company. 22,085 email addresses for neurologists were received on 6/1/2022. 332 duplicate addresses were removed. 21,753 survey invitations to neurologists were sent on 8/16/2022, with randomized incentives of $0, $10, $20, $50, or $75 gift cards (J. Simard, ScD, unpublished data, February 2024). Four follow-up emails were sent, the final on 9/9/2022. On 10/12/2022, those who had neither opted out nor completed the survey were offered $75 gift cards to participate. Two follow-up emails were sent, and the survey was closed on 11/15/2022.

Study population

We received 949 survey responses but initially excluded 257 who responded “No” to the first question (“Are you a Neurologist?”). An additional nine were excluded because the question was not answered. Of the 949 survey responses, Qualtrics flagged 8 as possible spam; 7 of which were excluded based on responses to the first question. We further excluded 1 survey because the participant reported living outside the US, and an additional 13 surveys missing location. One of the surveys excluded for no location was flagged as spam. Of the remaining, we excluded surveys with no (n = 27) or only some (n = 21) vignettes completed, yielding 621 US neurologists participating in the present study (Fig. 2).

Fig. 2
figure 2

Study Population Flowchart.

Outcomes

First, we identified correct diagnoses for each vignette blinded to the participant’s data and which race-gender version they saw (details provided in eMethods 1 in Supplement 1). As outlined in the supplementary material, we used string searches to parse write-in text responses. These were categorized and then compared to the correct diagnosis. Planned next steps (“therapeutic” or “diagnostic”) among the correct diagnoses was a proxy of providers’ certainty in their diagnoses, assuming that a choice of “therapeutic” indicates more certainty than selecting “diagnostic”. We assumed that individuals who opted to treat a condition (therapeutic next step) would be more certain in their diagnosis than those who opted to seek more information through laboratory and other instrumental findings (diagnostic next step). For each vignette a secondary outcome of response time (i.e., estimated time spent on the vignette) was calculated as the difference between the date/time stamps for submitting the final answer (finishing the vignette) and first opening the page.

Survey demographics

Demographic data included self-reported gender (Female or Male or Other) and year of birth. Age was calculated by subtracting year of birth from 2022. Participants could specify “Hispanic” or “Not Hispanic” ethnicity and separately asked about race: “What is your race? Please select all that apply.” Participants were provided with multiple choice check boxes, along with an “Other” option with an accompanying text box to fill in. Classification of “Other” responses, along with US geographic information, are provided (eMethods 2 in Supplement 1). The complete survey is included in eMethods 3 in Supplement 1.

Analysis

Data were exported from Qualtrics and cleaned and analyzed in SAS version 9.4 (SAS Institute Inc., Cary, NC). After all vignette responses for diagnoses were classified, we merged the classified correct/incorrect codes with the participants’ demographic data, timestamps, and indicator for which race-gender vignette versions were shown for each participant. Descriptive statistics were calculated to characterize the study population and presented as frequencies and proportions. Our primary comparisons were between two subtypes of MS and comparing both MS subtypes with the common neurologic conditions of stroke, PD, and epilepsy. The proportion of correct responses by race-gender version for each vignette was calculated and heterogeneity by race-gender version was assessed using a Chi-square test on an R×C table to generate P values. eFigures 1–3 in Supplement 1were produced using R version 4.3.19. The distribution of participants’ next steps (diagnostic vs. therapeutic) was then estimated by vignette race-gender version, separately for correct and incorrect responses. To calculate the theoretical proportion of patients successfully diagnosed and treated without delay, the fraction of correct diagnoses that were also recommended “therapeutic” as next step was calculated from all vignettes randomized to each race-gender version (eTable 3 in Supplement 1). As a secondary analysis, we estimated median time taken to complete vignettes by “next step”, by race-gender version and vignette, among correct responses only.

Results

The demographics of the 621 neurologist participants are described in eTable 2 in Supplement 1. Our study population is representative of the US neurologist population reported in the American Medical Association (AMA) Physician Masterfile with respect to distribution of gender, ethnicity, and race, and the focus on patient care10. However, our sample is younger (age < 55 years, sample 62% vs. AMA 40.7%), has fewer international medical graduates (sample 25% vs. AMA 31.5%), and reports engaging in more research as a type of work (sample 13% vs. AMA 4.5%). Compared with the US distribution of neurologists in 201211, our sample has a similar distribution by US geographic division.

Comparing the proportion of correct diagnoses across race-gender versions, stroke was correctly diagnosed consistently across the four race-gender versions (minimum 91%, maximum 93%, P = 0.96). (Table 1) The other clinical vignettes were less uniform. For PD, vignettes describing Black persons (Black females, 92%; Black males, 94%) were diagnosed correctly more frequently than White persons (White females, 87%; White males, 91%), and males were diagnosed correctly more frequently than females, but not statistically significant (P = 0.18). Correct diagnoses were similar for epilepsy among vignettes describing Black males (91%), White females (93%), and White males (89%), but somewhat lower among Black females (85%) (P = 0.13).

Table 1 Proportion of correct diagnoses identified among disease case vignettes with race and gender jointly randomized.

In contrast, the two MS case results were distinct and differed from each other. For the RRMS subtype, correct diagnoses were similar among vignettes describing Black females (82%), Black males (81%), and White males (81%), but higher among White females (89%) (P = 0.13). PPMS overall was less frequently correctly diagnosed than was RRMS. For PPMS vignettes, females (Black females, 76%; White females, 81%) were more likely to be correctly diagnosed than males (Black males, 70%, White males, 75%), and White vignettes more likely to be diagnosed than Black vignettes (P = 0.17).

When examining whether race and/or gender influenced next step among the correct responses, we found some variability across vignettes and the race-gender versions (Table 2, and eFigure 1 in Supplement 1). For the PD case vignette, the planned next step was “therapeutic” consistently across all race-gender versions (minimum 80%, maximum 84%). For the stroke vignette, the planned next step was “therapeutic” at comparable levels for Black females (18%), White females (18%), and White males (18%), but higher for Black males (25%). In the epilepsy case vignette, the planned next step as “therapeutic” was lower for Black females (41%) compared with 48% the other three versions.

Table 2 Distribution of choices for “next step” among correctly and incorrectly identified diagnoses for case vignettes with race and gender jointly randomized.

More variation was observed for planned next steps across race-gender versions for the two MS subtypes (Table 2, and eFigure 1 in Supplement 1). Overall, there was a higher recommendation for a “therapeutic” next step for RRMS (range 58–69%) than for PPMS (range 25–36%). For the RRMS correct diagnoses, White female vignettes were the least likely to be recommended a “therapeutic” next step (58%) despite being most likely to be correctly diagnosed. White male vignettes were the most likely to have recommended a “therapeutic” next step (69%), followed by Black females (65%) and Black males (61%). For PPMS, on the other hand, we observed the following proportions of recommendation for “therapeutic” next step (in decreasing order): White female, 36%; Black female 32%; White male, 27%; Black male, 25%.

The proportion of diagnoses that were correct and also recommended “therapeutic” as the next step, with the denominator as the total randomized, represents the theoretical proportion that will be successfully diagnosed and promptly treated within each race-gender version (eTable 3 in Supplement 1). Although the White female case vignettes were the most likely to receive a correct RRMS diagnosis, they were the least likely to be recommended a therapeutic next step compared with male and Black female case vignettes. In absolute terms, the White female RRMS vignettes yielded a similar probability of treatment as the other race-gender versions. When correct diagnoses were evaluated for next steps for the epilepsy case vignette, Black females were recommended for therapy at a lower rate than the other three race-gender versions. The result is that 35% of the Black female vignettes would be recommended for therapy, compared with 43–44% of the other race-gender versions. Among the stroke case vignettes, 23% of the Black male vignettes were diagnosed correctly and recommended therapy compared with 16–17% of each of the other race-gender versions. Among PD case vignettes, 79% of the Black male vignettes were diagnosed correctly and recommended therapy compared with 73–74% of each of the other race-gender versions. Finally, the proportions for race-gender versions among the PPMS vignette followed the same relative pattern as for correct diagnoses; Black female 34%, Black male 18%, White female 29%, and White male 20%.

We observed some variability in the secondary outcome of median response time among correct responses by race-gender version (eTable 4 and eFigure 2 in Supplement 1) as well as when stratified by next step (eTable 5 and eFigure 3 in Supplement 1).

Discussion

As physicians experience uncertainty they may rely on prior probabilities from population-based epidemiologic studies to inform the diagnosis of a particular case12,13,14,15. Five clinical vignettes with race and gender randomly assigned were evaluated by 621 participating neurologists. Because of the randomization and the sizable sample drawn from across the US, we would expect participants to perform equally well across each race-gender version within the same clinical case vignette if there was no variation in clinical decision-making due to race or gender.

We observed that stroke, PD, and epilepsy had the highest overall proportions of correct diagnoses. This is consistent with each of the conditions having classic symptoms associated with affected brain anatomy. The two MS vignettes, RRMS and PPMS, both were correctly “diagnosed” less frequently. This is consistent with the heterogeneity in presentation as well as the multiple criteria that must be met for an MS diagnosis. RRMS was diagnosed correctly more frequently than PPMS, consistent with PPMS being more difficult to diagnose due to its insidious and progressive course.

We observed the most variation in clinical decision-making for the RRMS vignette compared with all other conditions. Specifically, the proportion of RRMS vignettes for White females diagnosed correctly was higher than the three other vignettes, but these differences were not statistically significant. Until recently the prevailing belief about MS has been that it is rare among Black persons and more common among females than males16,17,18,19. Recent US studies also described an overall female predominance, but reported comparable prevalence among White and Black females, especially at the median age at onset20,21. The results from our study suggest reliance on a well-accepted but less well-documented prior, i.e., that MS occurs more frequently among White females16,17,18,19. This is consistent with our prior findings studying rheumatologists and SLE7. On the other hand, PPMS, which reportedly occurs with approximately equal frequency between males and females, did not follow the same pattern as RRMS22,23,24. Instead, PPMS results followed the same pattern of overall US MS incidence recently reported for corresponding race-gender versions20.

As a post hoc comparison we considered results by whether conditions onset acutely or insidiously. Both PD and PPMS are progressive and neurodegenerative with insidious onset. The proportion correct among race-gender versions for these two conditions followed a pattern consistent with the currently accepted relationship between gender and risk of each condition. Specifically, PD is diagnosed more frequently in men than in women25,26,27, and MS generally is diagnosed more frequently in women than in men20,23. It should be noted that PPMS comprises only about 15% of all MS cases, and the ratio of women: men is estimated to be more even22,23,24. In contrast, three clinical case vignettes represented conditions with acute onset. Two of these (RRMS and epilepsy) had one race-gender version with a somewhat different proportion of correct diagnoses from the other three race-gender versions. For RRMS, the White female case vignettes had a higher proportion of correct diagnoses than the other versions; for epilepsy, the Black female case vignettes had a lower proportion of correct diagnoses than the other race-gender versions.

We also sought to understand the potential impact of certainty or uncertainty on the management of the neurologic conditions. Although the White female case vignettes were the most likely to receive a correct RRMS diagnosis, they were the least likely to be recommended a therapeutic next step. All versions of the RRMS vignette yielded a similar absolute probability of treatment ranging from 50 to 56%. It may be that RRMS is easier to recognize in “classical” patient, i.e., one who is White and female, but does not translate to higher likelihood of treatment. In contrast we observed differences in diagnosis and initiation of treatment in the epilepsy case vignettes. The Black female epilepsy vignettes were less likely to be diagnosed correctly and less likely to be recommended for therapy than the other race-gender versions. The result in absolute terms is that 35% of the Black female vignettes would be recommended for therapy, compared with 43% of the Black male, 44% of the White female, and 43% of the White male epilepsy vignettes. These results at both diagnosis and next step suggest persistent uncertainty around Black women and epilepsy that warrants further investigation.

Next steps for the stroke vignettes had a high proportion of diagnostic testing, which may be expected as the vignette stated that MRI results were pending. Black male vignettes were recommended therapy more often than others, perhaps reflecting more certainty in the diagnosis consistent with overlap in the risk profile for cardiovascular disease among the US Black male population28. The next step recommendation of treatment for PD was uniformly high, reflecting high certainty in the diagnosis and standard treatment. Finally, next steps for PPMS were generally low for recommending treatment, reflecting low certainty in the diagnosis. Both female versions of the PPMS vignettes had higher proportions of treatment recommended, reflecting higher certainty.

Despite incentives offered for participation in our study, our response rate was low and therefore our statistical power limited for some analyses, such as exploring gender or race concordance or the influence of international training. These data are consistent with results from our prior rheumatology study. For transparency we include results for time taken to answer the group of three questions for each vignette. However, we refrain from drawing any conclusions because we cannot disentangle hesitation or “thinking time” from the time required to enter a lengthy response. In a prior study of rheumatology clinical case vignettes, we felt more confident using response time because we asked a single question regarding diagnosis. The response time in the present study includes the time required to answer the three questions for each vignette. Any future studies employing this randomized survey strategy may benefit from recording time taken for each question separately. As respondents were simply asked for “the most likely diagnosis” for each vignette, we have no insights into what diagnostic criteria were applied in their thinking. Although response time has been previously used as a proxy for implicit bias, our survey design did not allow for its use in this way. Therefore, we were unable to estimate implicit bias among the neurologist participants.

This novel study evaluated variation in clinical decision-making among neurologists from across the US. We used randomization to assign the race and gender of each case vignette, allowing us to hold everything constant except these factors. We also preemptively randomized order to minimize ordering effects. Our findings are consistent with prior work suggesting that similar biases may influence providers when applying diagnostic criteria to a condition with heterogeneous presentation. This variation is pronounced when there is evidence for a race- or gender-based predominance in prevalence. In summary, using clinical vignettes in a randomized survey experiment, we found that diagnosis and certainty may vary by the gender and race of the hypothetical neurology patient. The extent to which frequency of correct diagnosis varied in the present study suggests uncertainty and reliance upon epidemiologic knowledge base, particularly as the distribution of correct diagnoses corresponds with the evidence base.