Understanding Sources of Uncertainty in Machine Learning
As data scientists deploy machine learning models in increasingly high-stakes applications, it is important to understand how uncertain model forecasts are. Instead of supplying only an exact forecast, we may also desire a prediction interval (e.g., “this child will be between 170cm and 184cm tall when they are an adult, with 90% probability”). Conformal prediction has emerged as a promising paradigm for creating prediction intervals even when using black-box machine learning models on data with an unknown distribution. One challenge that remains is providing prediction intervals that provide coverage for rare inputs, not just typical ones. To this end, researchers have successfully started employing recently quantile regression. Our research builds on the quantile regression approaches by addressing two forms of uncertainty. The first type of uncertainty, aleatoric uncertainty, is due to inherent noise: perhaps unknowable environmental factors will influence a child’s height. Quantile regression is intended to estimate this type of uncertainty. The second type of uncertainty is epistemic uncertainty: signal that exists but that you cannot recover without more data. Perhaps, if the model can train on more patients from rarer demographics, it may better understand the range of outcomes and reduce epistemic uncertainty.
Team Members
Raphael Rossellini
Rina Foygel Barber
Rebecca Willett
Our proposed method, Uncertainty-Aware Conformalized Quantile Regression (UACQR), uses estimates of epistemic uncertainty to provide better prediction intervals. We propose estimating of epistemic uncertainty by leveraging the structure of popular machine learning methods, such as Random Forests and Neural Networks. For example, with Neural Networks, we estimate epistemic uncertainty by tracking how predictions vary for each input during training. Our method outperforms existing approaches on a range of real-world data sets, thus demonstrating the power of incorporating estimates of epistemic uncertainty into conformal prediction.
Left: A toy example showing how aleatoric uncertainty is the width of the oracle prediction interval for each X value and epistemic uncertainty depends on the shape of the oracle prediction interval and the X distribution. Right: Our proposal has the same coverage guarantees as the baseline yet provides smaller intervals, a desirable property.
NSF Award NSF DMS-2235451
Rossellini, Raphael, Rina Foygel Barber, and Rebecca Willett. "Integrating uncertainty awareness into conformalized quantile regression." In International Conference on Artificial Intelligence and Statistics, pp. 1540-1548. PMLR, 2024. https://arxiv.org/abs/2306.08693