Introduction

There are a lot of easy ways to get a machine learning model up and running. Something that is not quite as easy is evaluating these models. This article aims to answer questions such as “What is evaluation?”, “Why do you need it?”, “What are evaluation metrics?”, “What evaluation metrics should be used for a regression model”, "Which situation should I use this metric in?”

At the end of this article, you should understand the basics and most important parts of evaluating a created regression model.

Quick Recap

Machine Learning is a subset of Artificial Intelligence where the focus is on building systems (models) that learn and improve based on the data that they consume. Machine Learning is a fundamental part of how the world operates with its use cases and applications visible in various sectors of the world.

A machine learning model is a model that takes in independent variables (as input) and aims to predict a dependent variable (output) based on the data it has taken in and the relationships that exist amidst the data. More specifically, a linear regression model is based on the formula (y=mx+c) to model the relationship between the dependent variable and the independent variable(s).

Why Evaluate?

After the creation of a model, it is necessary to gauge the performance of how well the model performs on data. Evaluation is the process of testing the performance of the model. It is one of the most important aspects of Machine Learning as it is a method of testing how accurate a model is at predicting outcomes. It must be done because, without it, the risk of a bad model being used is greatly increased. An evaluation metric is a mathematical quantifier of the quality of the model that has been created. Examples of evaluation metrics are accuracy, precision, Inlier Ratio Metric, Mean Squared Error, Mean Absolute Error etc.

Residual Errors and Evaluation Metrics

There are different evaluation metrics for different types of models such as the Classification model, Regression model, etc. A question frequently asked is “How can I calculate the accuracy of a regression model?” The simple answer is that you cannot because the output of a regression model is a numeric value such as a height or a dollar amount or something in that category. The output is not needed to predict the exact value but to instead measure how close the predicted value is to the actual value. That is where residual errors for regression models come in.

For the evaluation of a Regression model, it is important to understand the concept of Residual Errors. A Residual error is a difference between the actual and predicted values. For each output, there is a residual. Residual errors could be either positive or negative. Technically, it is possible to manually check each residual to know how a model performed but in datasets where there are thousands and millions of points, that is not feasible. Hence, why there are evaluation metrics that are calculated using this residual error to simplify the evaluation process.

There are a lot of metrics but the most common ones used are:

Mean Absolute Error (MAE)

The MAE is simply calculated by taking the absolute of the residual errors and then finding the average value. The figure is the absolute mean of the errors. How high or low it is determines how well the model is performing. The lower the MAE, the better the model fits the dataset.

The MAE's advantage is that all the errors are computed on the same scale since the absolute value is used. Hence, not too much attention is given to the outliers in the data and it provides a fairly even measure of how well the model performs. The disadvantage is that if the outliers are very important in the model, the MAE is not very effective. This means that a model with a good MAE is usually great but they often make a few disappointingly poor decisions.

Mean Squared Error (MSE)

The MSE is calculated by squaring the residual error and then finding the average. The value of the MSE is always vastly greater than that of the MAE due to the square that was taken. For the MSE, the closer the value gets to 0, the better the model is. Comparing multiple models, the one with the lowest MSE is the best.

The advantage that comes with using the MSE is ensuring that the model has no outlier predictions that will produce huge errors as the MSE places greater influence on those errors since it squares them. The downside is that if the model makes one bad prediction, the squared function of the MSE greatly magnifies that error and it can skew the total.

Root Mean Squared Error (RMSE)

The RMSE is simply taking the root of the mean squared error. The value of the RMSE should be as low as possible. The lower the RMSE, the better the predictions. A high RMSE shows a great deviation between predicted and actual values which shows that the model could be bad.

Huber Loss Function

The Huber Loss is something of a mid-point between the Mean Absolute Error and the Mean Squared Error. MSE identifies outliers while MAE ignores the outliers. The calculation of the Huber loss function is somewhat complicated. Simplifying it is saying that for loss values that are less than the delta, the MSE should be used, and for loss values greater than the delta, the MAE should be used. That is a combination of the best from both error terms.

The advantage of using the Huber is that using the MAE for the larger loss values lowers the weight given to outliers while using the MSE for the other loss values adds up to what is a well-rounded model.

Other evaluation metrics that are not explained include the R-squared, Adjusted R-squared, Max Error, etc.

Considerations and Recommendations

For your model, The Huber Loss function should be used when a balance is needed between giving weight to outliers in the model but not so much that the model is entirely dependent on their presence. It is useful in regression model cases such as estimating location. For cases where outliers are very important, the MSE is advisable to be used and in cases, where outliers are not at all cared about, the MAE functions very well. For models where any slight variation in the error means a lot such as clinical trials, the RMSE should be used because it is the most exact of these metrics.

Evaluation metrics as we have seen are quite useful and very helpful in reducing the stress of manual inspection of each point in the data.

It is recommended to use more than one metric for evaluating a model as some models are seen to perform very well on one metric while failing to perform on another metric. That could give a false impression if only the good metric is reported on.

The majority of models created do not care much about outliers and are simply created to provide well-rounded predictions that perform well on the majority of the data. Final Thoughts and Further Reading Resources. It is important to understand how to evaluate a model because it is the bedrock for how good the model is. For further reading on other less used evaluation metrics and also greater in-depth readings on evaluation metrics for regression models, check out:

Evaluation Metrics for an ML Regression Model

A guide to understanding evaluation for Machine Learning Regression Models

Table of contents