Calibrating Retrieval-Based Quantile Predictions with Conformal Prediction
The article discusses how the authors addressed the problem of miscalibrated quantile predictions in their product, which uses nearest-neighbor retrieval to generate return distributions. They explain the root cause and introduce a conformal prediction-based solution to calibrate the output.
Why it matters
Properly calibrated quantile predictions are critical for AI-assisted trading tools to be trusted by users. The authors' work demonstrates a practical solution to this problem.
Key Points
- 1The authors' product was returning quantile predictions that were not well-calibrated, with actual returns falling outside the predicted ranges more often than expected.
- 2The issue was caused by the systematic bias in nearest-neighbor retrieval, where the retrieved samples are more similar to each other than to the anchor, leading to underestimated variance.
- 3The authors implemented a split conformal prediction approach to calibrate the quantile predictions, fitting an additive offset based on a held-out calibration set.
- 4The calibrated predictions showed much better coverage on the validation data, hitting the target 80% and 50% confidence levels.
Details
The article describes a problem the authors encountered in their product, where the quantile predictions (e.g., p10, p25, median, p75, p90) for forward returns on chart patterns were not well-calibrated. They found that the actual returns fell outside the predicted ranges more often than expected, with the 80% confidence interval only covering 68.2% of the outcomes at 5 days and 64.2% at 10 days. The issue was traced back to the nearest-neighbor retrieval approach used to generate the quantile predictions, where the retrieved samples are more similar to each other than to the anchor, leading to an underestimation of the true variance. To address this, the authors implemented a split conformal prediction approach, holding out a calibration set of anchors with known forward returns, computing nonconformity scores, and using the empirical quantile of those scores to derive a calibration offset. This calibrated band correction, rather than a median shift, was shown to significantly improve the coverage on the held-out validation data, hitting the target 80% and 50% confidence levels. The authors note that this is a minimum viable solution, and they plan to explore more advanced calibration models that account for cohort features and regime-specific biases.
No comments yet
Be the first to comment