Machine learning performance evaluation: tips and pitfalls — Jose Hernandez Orallo

Beginners in machine learning usually presume that a proper assessment of a predictive model should simply comply with the golden rule of evaluation (split the data into train and test) in order to choose the most accurate model, which will hopefully behave well when deployed into production. However, things are more elaborate in the real world. The contexts in which a predictive model is evaluated and deployed can differ significantly, not coping well with the change, especially if the model has been evaluated with a performance metric that is insensitive to these changing contexts. A more comprehensive and reliable view of machine learning evaluation is illustrated with several common pitfalls and the tips addressing them, such as the use of probabilistic models, calibration techniques, imbalanced costs and visualisation tools such as ROC analysis.

Jose Hernandez Orallo, Ph.D. is a senior lecturer at Universitat Politecnica de Valencia. His research areas include: Data Mining and Machine Learning, Model re-framing, Inductive Programming and Data-Mining, and Intelligence Measurement and Artificial General Intelligence.