Being a classically trained statistician, I am always surprised that people claim ML is newish. Most statistical methods for prediction were well known by the time of Leo's text (it was assigned reading when it came out in my first year of PhD Math Stat work in 2001, and I do not remember anyone finding it controversial or even that novel. Although post graduation in 2006 it did seem to have a much bigger impact on the applied stats/biostats communities at my next department causing a rather big debates about who was a 'real ' statisticain vs just a computer scientist. I was rather confused to say the least.)
McLachlan's wonderful book on discriminant analysis and statistical pattern recognition is my personal go to for all the methods pre 1992, with elements of statistical learning filling in the gap up to 2010'ish. Now I tend to use the two new ProbML books by Murphy. Of particular importantance are the classical works for Atchison and Dunsmore on statistical prediction from the 70's and the related work on density estimation by Silverman. (Oh shoot almost forgot Brian Ripley's texts, which I was actually trained in grad school out of in early 2000's!!! We just tended to call these subjects multivariate statistics for some reason...)
I guess in general I do not think that statiaticains ignored the problems in ML, there was just more money in inference for clinical trials, basic science, and economics focused understanding why something happened at the time. (vs being able to predict the future.... As in ML or pattern recognition.) Also bigish data sets removed much of the need theoretical foundations at the time period of mid 2000's which opened the door for ignoring the efficient use of observations, considerably lowering the theoretical threshold for entering into the pattern recognition games.
Part of the problem is that there has been a radical renaming of algorithms in ML... As an example if I am talking to someone in math stats I will typically use the language of empirical processes where for ML I will use language of PAC learning. Or maximum Likelihood estimates of discriminate functions(stats) or Bayesian classification methods(ML).
Larry Wasserman's all of statistics has a good break down of the break down in communication between stats and CS people.
Thanks a lot for sharing these insights and book recommendations.
That's what I like about the Two Cultures paper: It's about culture ultimately, and it can already differ between institutes. In my case, the culture really was hardcore "classic" statistics.
Great piece, matches my exp with statisticians very closely (and by background I am one ;-)
- I totally agree with them on the need for interpretability, but their go-to example is logistic regression: how exactly is fixing all variables except for one realistic?
- my personal rabbit hole: I started checking how many papers in stats / econometrics report performance on training set ONLY. Maybe I have sample selection problem, but I walked away horrified.
Oh yeah, there are still many papers with performance measured in training data. If you only have ~100 data points, splitting the data is challenging, but often it's a "cultural" thing to not evaluate the generalization error using fresh data.
Just a quick note.
Being a classically trained statistician, I am always surprised that people claim ML is newish. Most statistical methods for prediction were well known by the time of Leo's text (it was assigned reading when it came out in my first year of PhD Math Stat work in 2001, and I do not remember anyone finding it controversial or even that novel. Although post graduation in 2006 it did seem to have a much bigger impact on the applied stats/biostats communities at my next department causing a rather big debates about who was a 'real ' statisticain vs just a computer scientist. I was rather confused to say the least.)
McLachlan's wonderful book on discriminant analysis and statistical pattern recognition is my personal go to for all the methods pre 1992, with elements of statistical learning filling in the gap up to 2010'ish. Now I tend to use the two new ProbML books by Murphy. Of particular importantance are the classical works for Atchison and Dunsmore on statistical prediction from the 70's and the related work on density estimation by Silverman. (Oh shoot almost forgot Brian Ripley's texts, which I was actually trained in grad school out of in early 2000's!!! We just tended to call these subjects multivariate statistics for some reason...)
I guess in general I do not think that statiaticains ignored the problems in ML, there was just more money in inference for clinical trials, basic science, and economics focused understanding why something happened at the time. (vs being able to predict the future.... As in ML or pattern recognition.) Also bigish data sets removed much of the need theoretical foundations at the time period of mid 2000's which opened the door for ignoring the efficient use of observations, considerably lowering the theoretical threshold for entering into the pattern recognition games.
Part of the problem is that there has been a radical renaming of algorithms in ML... As an example if I am talking to someone in math stats I will typically use the language of empirical processes where for ML I will use language of PAC learning. Or maximum Likelihood estimates of discriminate functions(stats) or Bayesian classification methods(ML).
Larry Wasserman's all of statistics has a good break down of the break down in communication between stats and CS people.
Anyway enjoyed your article! Thank you.
Thanks a lot for sharing these insights and book recommendations.
That's what I like about the Two Cultures paper: It's about culture ultimately, and it can already differ between institutes. In my case, the culture really was hardcore "classic" statistics.
Great piece, matches my exp with statisticians very closely (and by background I am one ;-)
- I totally agree with them on the need for interpretability, but their go-to example is logistic regression: how exactly is fixing all variables except for one realistic?
- my personal rabbit hole: I started checking how many papers in stats / econometrics report performance on training set ONLY. Maybe I have sample selection problem, but I walked away horrified.
Oh yeah, there are still many papers with performance measured in training data. If you only have ~100 data points, splitting the data is challenging, but often it's a "cultural" thing to not evaluate the generalization error using fresh data.
Yeah, not sure about challenging - with 100 points, you actually *can* do leave-one-out. I think you are right about this being "cultural".