3 Comments
User's avatar
Junaid Butt's avatar

Do you have good recommendations for where we can find code implementations (Python and/or R) where cost sensitive ML and threshold tuning on validation data are implemented? Especially in the case of the latter I haven't found an end to end walkthrough, although my search may just be ineffectual. For weighting data points, I know that tidymodels in R now support case weights which seem to be a good implementation.

Great post as usual.

Expand full comment
Tom Cal's avatar

Junaid, Re: "code implementations where cost sensitive ML and threshold tuning on validation data are implemented"

These resources at scikit-learn.org mightbe helpful:

"Post-tuning the decision threshold for cost-sensitive learning",

webpage, https://scikit-learn.org/stable/auto_examples/model_selection/plot_cost_sensitive_learning.html

notebook, https://scikit-learn.org/stable/_downloads/133f2198d3ab792c75b39a63b0a99872/plot_cost_sensitive_learning.ipynb

Here are related videos:

"Get the best from your scikit-learn classifier: trusted probabilities and optimal binary decision", https://pretalx.com/euroscipy-2023/talk/GYYTCH/

https://www.youtube.com/watch?v=6YnhoCfArQo

“Imbalanced-learn: regrets and onwards - with Guillaume Lemaitre, core-maintainer [of scikit-learn and imbalanced-learn]”, https://youtu.be/npSkuNcm-Og

Expand full comment
Jesuino Vieira Filho's avatar

Is there any reason not to always utilize sample weights for classification when they're available?

Initially, it seems that either results will improve for unbalanced data, or in the ideal scenario where your data is perfectly balanced, nothing would change. However, I'm intrigued why this isn't the default choice in machine learning libraries.

Expand full comment