We published our new book “Supervised Machine Learning for Science”. 🎉 🎉 🎉
tl;dr: You can buy the book as a paperback, a hardcover, for Kindle, and as an eBook (with a 30% early bird discount).
The 2024 Nobel Prizes in Chemistry and Physics went to machine learning researchers. Make of that what you will, but it’s a symptom of machine learning overtaking science.
But is machine learning a good fit for science? Is ML just a tool like Excel, or is it a more fundamental change in the way science works?
I started thinking about these questions during my PhD, especially about interpretability (surprise, surprise). Can you train a machine learning model to predict the outcome you are studying, print out the permutation feature importance, and call it science? Justified or not, the fact is that so many scientists are already using machine learning to varying degrees in their research.
Timo, my co-author, and I didn’t find that very satisfying. Machine learning already plays such a big role in science, but we felt the justification for using ML in science was lacking. We both like machine learning and science, and we already had a gut feeling that machine learning could be a tool for science. But clearly, science should be more than a Kaggle competition. Science has goals that are not captured by mean square error or log loss. As a scientist, you want to know how certain the predictions are, you want to understand how actions affect the world, you want to create explanations, and you want to be able to reason about the phenomenon that you are studying. And a “bare bones” ML approach that focuses only on predictive performance won’t cut it.
This motivated us to write Supervised ML for Science. I pitched the book idea to Timo two years ago at my wife’s office opening party — while everyone was partying, we sat in the corner and talked about machine learning and science 😂. He was immediately on board. What a win for the project: Timo brings a rare combination of philosophy and machine learning. A perfect match for this project. While we started the book project 2 years ago, we already met during our PhD studies where we collaborated on papers on interpretability. Interpretability is often the first step for people that takes them beyond predictive performance to questions of causality, generalization, and more. That’s why interpretability is just one of the chapters in Supervised Machine Learning for Science. We also cover generalization, domain knowledge, causality, robustness, uncertainty, reproducibility, and reporting. All of these topics are puzzle pieces needed to make ML work for science. Even non-scientists can benefit immensely from reading the book, for example, to make their models more robust and to think about uncertainty.
Accidental overviews
Almost as a by-product, some chapters are the best overviews I’ve seen. I don’t mean this as self-praise, because I’m referring to chapters written by my co-author Timo, especially the chapter on Causality and Robustness. I find causality in particular to be such a complex topic. You can use machine learning to estimate causal effects, but you can also use ML to identify the causal graph or even to learn high-level causal representations in images. Timo has managed to write such a comprehensive overview that I think both scientists and non-scientists will find useful.
We also brought our unique perspectives into the book: For example, the interpretability chapter includes our research on linking model interpretation to the actual underlying phenomenon. In the domain knowledge chapter, we develop the view that domain knowledge is a two-way street: Not only can you infuse domain knowledge into the model, for example by designing a custom loss function, but you can also evaluate domain knowledge in terms of predictive power. That’s a rather under-discussed topic, but IMHO a great way to test and challenge core beliefs of your domain.
Free to Read, Yours to Own
Science is important to both Timo and me. So for Supervised Machine Learning for Science, we both felt that it’s best to have an open and free version — just like science should be. We want this resource to be available to everyone who might benefit.
However, if you’d like to support our work or, simply prefer to read it on paper or an eReader, you can purchase the book here. The eBook is 30% off until the end of the week:
Read the free version here: https://ml-science-book.com/
Supervised Machine Learning for Science is also my first hardcover release. It’s always a special feeling to hold a physical book I’ve written in my hands — somehow it feels real for the first time. And this time with the hardcover, it feels even more special.
I've had the privilege of publishing over 10 scientific papers on machine learning applications in mining, and I owe much of this success to Christopher Molnar's influential work. As someone who began publishing with just a bachelor's degree, I found his clear, methodical approach to complex topics invaluable. I've cultivated a personal collection of all his publications, and I'm particularly excited about his latest book release. His exceptional ability to make complex concepts accessible continues to set the standard for technical writing in our field.
For anyone looking to deepen their understanding of machine learning, I highly recommend adding his new book to your reading list - his work has consistently proven to be an outstanding resource for both practitioners and researchers.
Thank you Christoph🙏