Machine learning never cheats but it may play flawed games

How to become better at designing prediction games

Feb 27, 2024

Streamer Rainbolt has become famous for his exceptional skill in GeoGuessr, a game where players guess locations from Google street-view images. He has even become a meme for his ability to identify "secret" places. When asked how he identifies countries from images, he mentions features like poles, bollards, and license plates. But there are also street-view-specific clues like camera artifacts from the street-view car that help identify Albania. Such game-specific clues obviously wouldn’t apply to the real world. But since it all happens in the context of GeoGuessr, all clues Rainbolt uses are fair game.

Let’s switch the focus and say we train a machine learning model that can geotag photos and is trained with images from Google Street View. In this case, when the model relies on such shortcuts as identifying Albania from camera quirks, it seems like the model is cheating or taking shortcuts.

But here’s the thing: It’s not only Rainbolt playing a game. Also, the ML algorithm is playing a game. While it may seem like a model is cheating by exploiting quirks in the data that are not intended, I find it much more useful to think of ML as always playing by the rules. And if the ML algorithm finds shortcuts, it’s because the game allows it.

Two copies of the same maze. Left the path through the maze is followed and the maze is titled "Intended solution". On the right, a shortcut around the maze is taken and the maze is titled "The machine learning model". — Making machine learning models work is all about designing the prediction task so that the model takes the intended path. Easier said than done of course.

However, machine learning doesn’t feel like designing a game. And that’s because most of the game rules are implicitly encoded by how the data were generated. The over-arching rule for GeoGuessr is that you can use any information in the image to guess the location. Whatever information is in the images dictates how easy or hard the task is. The trouble is, that identifying these “rules” is difficult. It’s a bit of a chicken-and-egg-problem, since if we knew all the rules of the prediction game, we wouldn’t need machine learning in the first place. However, we can still try to detect many flaws in the data.

Detecting flaws in prediction games

Caruana et. al. (2015) developed an interpretable model to predict the probability of death for patients who showed up at the ER with pneumonia. The goal: Identify high-risk patients that need to be admitted to the hospital. One of the rules the model identified was that asthma patients had a reduced risk of death. Odd, since the opposite is true: patients with asthma are at higher risk of pneumonia. There is a simple explanation for why the model learned that asthma means a lower risk of death: Patients with asthma tend to get more aggressive treatments for pneumonia. A treatment that is highly effective against death from infection. Hence asthma is a good predictor of a lower probability of death for this particular prediction tasi.

Here’s how Caruana et. al describes a possible solution to the problem:

We can “repair” the model by eliminating this term (effectively setting the weight on this graph to zero), or by using human expertise to redraw the graph so that the risk score for asthma=1 is positive, not negative. Because asthma is boolean, it is not necessary to use a graph, and we could present a weight and offset (RiskScore = w * hasAsthma + b) instead.

(Note: “graph” here refers to the effect plot that shows the feature on the x-axis and the contribution to the prediction on the y-axis)

The framing is that the model is broken and needs fixing. My point is that the model isn’t broken, but the game has implicit rules that conflict with the intended use case of the model. Their fix carries the risk that features correlated with asthma are still in the model and give a lower predicted probability for asthma. In 2017, Amazon scraped their ML hiring algorithm because it was biased towards men and they couldn’t get it fixed by removing the gender attribute — the model used other means to identify women, such as all-women colleges or mentions of women’s sports clubs.

The pneumonia model problems run deeper than the problem with the asthma feature. The general game design flaw is that variables used for prediction also influence the treatment. The outcome, however, is influenced by the patient’s initial health status and treatment. Any implicit information that is in the patient’s health status about the treatment gets mixed up with these features. But the goal was to use the model to decide on a treatment, which would be an extremely bad idea. It’s fair to assume that there might be a lot more variables that influence the treatment and will make the model a worse model for the intended use of treatment decisions.

Prediction games are a function of the data-generating process

What we typically think of when designing a machine learning task is the choice of task (e.g. survival, regression, classification), choice of loss and evaluation function, how to split data, which features to use, and so on.

But these tools are focused on training which allows only limited control over the prediction game. One profession that is uniquely equipped to design prediction games is statisticians. I may be biased here since my background is in statistics, but the focus on understanding and shaping the data-generating process is one of the foundations of statistics. The data-generating process ultimately shapes what the prediction game is all about.

Here are just a few tools and lenses that can help you understand and shape a prediction game:

Sampling theory
Design of experiments
Missing data analysis
Causal inference
Interpretable models

In my book Mindful Modeler, I advertise drawing from multiple modeling approaches when modeling data. For example, learning to think like a statistician and being able to build causality into your models can greatly enhance prediction models that are built in the spirit of supervised machine learning.

Mindful Modeler

Machine learning never cheats but it may play flawed games

How to become better at designing prediction games

Detecting flaws in prediction games

Prediction games are a function of the data-generating process

Discussion about this post