Optimizing the interaction between the human and the machine is a major topic when deploying artificial intelligence (AI) at the bedside. The goal of this randomized clinical vignette study is to learn if presenting AI model outputs via continuous Bayesian updates and/or uncertainty quantification can improve diagnostic accuracy and clinician trust in healthcare professionals (physicians, residents, fellows, physician assistants (PAs), and nurse practitioners (NPs)) from US academic institutions evaluating patients with chest pain or dyspnea.
The main questions it aims to answer are:
- Does presenting AI predictions as Bayesian-updated post-test probabilities improve diagnostic accuracy compared to standard predicted probabilities?
- Does the addition of uncertainty quantification (95% confidence intervals) to AI predictions improve diagnostic accuracy?
- Do these interventions (Bayesian updating and/or uncertainty quantification) help clinicians recover from the negative effects of intentionally misleading AI predictions?
Comparison: Researchers will compare standard AI predicted probabilities (presented without uncertainty) to Bayesian-updated post-test probabilities and/or outputs containing 95% confidence intervals to see if the interventions improve diagnostic accuracy, clinician confidence, and resilience against misleading AI.
Participants will:
- Review 8 clinical vignettes (simulated patient cases) focusing on chest pain or dyspnea.
- Provide an initial "pre-test" diagnostic probability for 5 possible diagnoses based on the clinical history alone.
- View AI model outputs that vary by experimental condition (standard probability vs. Bayesian update, with or without uncertainty intervals, and accurate vs. misleading).
- Provide an updated "post-test" diagnostic probability for the diagnoses after viewing the AI output.
- Select and rank diagnostic tests and therapeutic steps for each vignette. Complete a post-survey regarding their trust in the AI, comfort with the data presentation, and demographics.
Transforming Clinical Decision Support Systems: Using Continuous Bayesian Updates to Integrate AI Predictions With Clinician Expertise
Study Design: This is a 2x2 factorial within-subjects design. The two factors are (1) Bayesian updating via continuous likelihood ratios (CLR) vs. standard predicted probability, and (2) uncertainty quantification (95% confidence intervals) vs. point estimate only. AI prediction accuracy (accurate vs. intentionally misleading) is varied as a within-subjects stratification factor balanced across all 4 conditions, with half of each participant's vignettes receiving accurate predictions and half receiving misleading predictions. AI predictions are simulated (pre-programmed) for experimental control. Vignette order and condition assignment are independently randomized per participant.
Primary Analysis: Diagnostic accuracy is analyzed using a generalized linear mixed model (GLMM) with fixed effects for CLR, Uncertainty, Misleading, and vignette, and a participant random intercept. Pre-specified secondary analyses examine interactions of presentation format with misleading AI.
Sample Size: Simulation-based power analysis (1,000 Monte Carlo iterations per scenario) was conducted using the planned GLMM. Assuming 70% baseline diagnostic accuracy and within-participant ICC of 0.25, the study achieves 85.8% power for the CLR main effect and 85.7% for the Uncertainty main effect with N=100 at alpha=0.05 (two-tailed).