
Discover more from Mindful Modeler
When in Doubt, Abstain: Why Machine Learning Models Need to Know Their Limits
Machine learning have a drawback that, at first glance, looks like an advantage:
Machine learning models always give an answer.*
*As long as the input data has the right dimensions and no missing values.
But it’s a bad habit because the model doesn’t always produce the right answer or prediction.
Imagine a person who will always answer your question - no matter how sure they are of the answer. Even if the person doesn't know the right answer, they will answer with confidence.
I’d prefer this person to say “I don’t know” if the person doesn’t know. The same goes for computers.
Machine learning models should say No more often
If you have been following the development of large language models (LLMs), then the idea of abstaining might be familiar. LLMs have the tendency to make up references and facts. A consequence of how they were trained — to produce language-like sounding strings. At least that’s how LLMs were trained initially.
LLMs like ChatGPT were further trained with a layer of reinforcement learning to align with it with our intentions and values (or those of OpenAI). And that alignment includes sometimes saying “No” and not giving an answer. To abstain from answering.
But LLMs are not the only machine learning models. There are many other tasks like classification, survival analysis, clustering, … The question of when to abstain is a more general one.
Here are some reasons why we might want the machine learning model to abstain from making predictions.
Uncertain decision
Out-of-distribution data
Adversarial attacks
Conflicting input
Insufficient training data
Answer not aligned with certain values
Biased outputs
The list is probably much longer, maybe you have a few more examples.
Making the model abstain
But how can you make the model abstain? There are multiple options, that can either be baked into the model or are external to the model.
Before the model gives a prediction, the input data is checked with an outlier detector.
Add input checks that are run before the model gives a prediction. Think of range checks, consistency checks, checking for conflicting inputs, and so on.
Include adversarial data in the training data to make the model more robust.
Use reinforcement learning from human feedback (RLHF) for large language models.
Measure the uncertainty of a prediction with methods like conformal prediction. Then either have a threshold for abstaining or at least use the uncertainty information for downstream decision-making.
No single method will be able to safeguard against all the cases where the model should abstain from making a prediction. A patchwork of multiple methods is needed. And even then it’s likely that some cases are not covered. But it is better to take some safety precautions than to allow the model to confidently talk nonsense.