Do you have good recommendations for where we can find code implementations (Python and/or R) where cost sensitive ML and threshold tuning on validation data are implemented? Especially in the case of the latter I haven't found an end to end walkthrough, although my search may just be ineffectual. For weighting data points, I know that tidymodels in R now support case weights which seem to be a good implementation.
Is there any reason not to always utilize sample weights for classification when they're available?
Initially, it seems that either results will improve for unbalanced data, or in the ideal scenario where your data is perfectly balanced, nothing would change. However, I'm intrigued why this isn't the default choice in machine learning libraries.
Do you have good recommendations for where we can find code implementations (Python and/or R) where cost sensitive ML and threshold tuning on validation data are implemented? Especially in the case of the latter I haven't found an end to end walkthrough, although my search may just be ineffectual. For weighting data points, I know that tidymodels in R now support case weights which seem to be a good implementation.
Great post as usual.
Is there any reason not to always utilize sample weights for classification when they're available?
Initially, it seems that either results will improve for unbalanced data, or in the ideal scenario where your data is perfectly balanced, nothing would change. However, I'm intrigued why this isn't the default choice in machine learning libraries.