Machine learning & Data Science course
Explore machine learning, data science, business analysis & product management mini courses. Click on the link below
If you are working on a regressionbased machine learning model like linear regression, one of the most important tasks is to select an appropriate evaluation metric.
In fact, if you are working on a machine learning projects in general or preparing to become a data scientist, it’s kind of must for you to know the top evaluation metrics.
These are also called loss functions.
There are two kinds of machine learning problems – classification and regression.
And these have different kind of loss functions.
In this post, I am going to talk about regression’s loss functions.
Since every project or data set is different, we must select appropriate evaluation metrics. Usually, more than 1 metrics is required to evaluate a machine learning model.
Instead of including all the loss functions or evaluation metrics for regression machine learning models, I will try to focus on top loss functions.
Download android app for better experience.
Before we start with loss functions, you need to understand what we are trying to do here. In a typical regressionbased machine learning model, our model will produce continuous values (predicted value).
Our primary objective is to keep these predicted values closer to actual values.
Predicted values are denoted by y hat ().
Actual values are denoted by y.
Error = y  y hat
So, whenever we are talking about error in this post, we are talking about this error. And yes, ideal condition (hypothetical one) is that this error (difference) is 0, which means our model can predict all values correctly (which is not going to happen).
Let’s start with mean absolute error.
Mean absolute error (MAE)
In simple terms, mean absolute error is the sum of absolute/positive errors of all values. So, if there are 5 values in our data set, we find out the difference between the actual value and predicted values for all 5 values and take their positive value. So even if the difference between actual and predicted value is negative, we take positive value for calculation.
So we take the positive value of all errors, add them and find out their mean.
Mean absolute error illustration;
Actual Value (y)  Predicted Value (y hat)  Error (difference)  Absolute Error  
100  130  30  30  
150  170  20  20  
200  220  20  20  
250  260  10  10  
300  325  25  25  
21  Mean  
Note You take the absolute value of error which is the positive value, therefore 30 becomes 30  
MAE is the sum of absolute differences between actual and predicted values. It doesn’t consider the direction, that is, positive or negative.
When we consider directions also, that is called Mean Bias Error (MBE), which is a sum of errors(difference).
Formula for mean absolute Error or MAE is represented by;
Mean Square Error (MSE)
Mean square error is always positive and a value closer to 0 or a lower value is better. Let’s see how this this is calculated;
Let’s use the last illustration to understand it better.



 

100  130  30  900  
150  170  20  400  
200  220  20  400  
250  260  10  100  
300  325  25  625  
485  Mean 
So if we were to run a model with different parameters/independent variables, model with lower MSE will be deemed better.
We will look at its comparison with other loss functions in a while in this post. First quickly cover RMSE.
Root mean square error (RMSE)
Square root of MSE yields root mean square error (RMSE). So it’s formula is quite similar to what you have seen with mean square error, it’s just that we need to add a square root sign to it;
It is the standard deviation of error (residual error).
it indicates the spread of the residual errors. It is always positive, and a lower value indicates better performance. Ideal value would be 0 but it is never achieved.
Actual Value (y)  Predicted Value (y hat)  Error (difference)  Squared Error  
100  130  30  900  
150  170  20  400  
200  220  20  400  
250  260  10  100  
300  325  25  625  
485  Mean  
22.02271555  Square root of mean 
Effect of each error on RMSE is directly proportional to the squared error therefore, RMSE is sensitive to outliers and can exaggerate results if there are outliers in the data set.
Before moving to their comparison, I just want to mention one more evaluation metric and that is Root mean squared log error (RMSLE)
Root mean squared log error (RMSLE)
Root mean squared log error is basically RMSE but calculated at logarithmic scale. So, if you understand the above mentioned 3 evaluation metrics, you won’t have any problem understanding RMSLE or most other evaluation metric or loss functions used in regressionbased machine learning model.
While calculating RMSLE, 1 is added as constant to actual and predicted values because they can be 0 and log of 0 is undefined. Overall formula remains same. Standard denotation for RMSLE is;
In this illustration, I have used log for calculation;
Actual Value (y)  Predicted Value (y hat)  Actual + 1  Predicted + 1  log (Actual)  Log (Predicted)  Error (difference)  Squared Error  
100  130  101  131  2.004321374  2.117271296  0.112949922  0.012757685  
150  170  151  171  2.178976947  2.23299611  0.054019163  0.00291807  
200  220  201  221  2.303196057  2.344392274  0.041196216  0.001697128  
250  260  251  261  2.399673721  2.416640507  0.016966786  0.000287872  
300  325  301  326  2.478566496  2.5132176  0.034651104  0.001200699  
0.003772291  Mean  
0.061418977  Squre root of mean 
Let’s look their difference now.
MAE vs MSE vs RMSE Vs RMSLE
In terms of comparison, primary differences are between MAE & MSE because they both are calculated in different ways. RMSE & RMSLE are extension of MSE therefore they share lots of properties with MSE.
Mean absolute Error (MAE)  Mean square Error (MSE)  Root mean square error (RMSE)  Root mean square log Error (RMSLE) 
It doesn’t account for the direction of the value. Even if value is negative, positive value is used for calculation.  It does account for positive or negative value.  It does account for positive or negative value.  It does account for positive or negative value. 
RMSE & MSE share many properties with MSE because RMSE is simply the square root of MSE.  RMSE & MSE share many properties with MSE because it is simply the square root of MSE.  
MAE is less biased for higher values. It may not adequately reflect the performance when dealing with large error values.  MSE is highly biased for higher values.  RMSE is better in terms of reflecting performance when dealing with large error values.  
RMSE is more useful when lower residual values are preferred.  
MAE is less than RMSE as the sample size goes up.  RMSE tends to be higher than MAE as the sample size goes up.  
MAE doesn’t necessarily penalize large errors.  MSE penalize large errors.  RMSE penalize large errors.  RMSLE doesn’t penalize large errors. It is usually used when you don’t want to influence the results if there are large errors. RMSLE penalize lower errors. 
MAE is more useful when the overall impact is proportionate to the actual increase in error. For example if error values go up to 6 from 3, actual impact on the result is twice. It is more common in financial industry where a loss of 6 would be twice of 3.  RMSE is more useful when the overall impact is disproportionate to the actual increase in error. For example if error values go up to 6 from 3, actual impact on the result is more than twice. This could be common in clinical trials, as error goes up, overall impact goes up disproportionately.  
When actual and predicted values are low, RMSE & RMSLE are usually same.  When actual and predicted values are low, RMSE & RMSLE are usually same.  
When either of actual or predicted values are high, RMSE > RMSLE.  When either of actual or predicted values are high, RMSE > RMSLE. 
MAE vs MSE vs RMSE Vs RMSLE Conclusion
I have mentioned only important differences. If there is no valid point for one, I haven’t included in the above table and that’s why we have empty cells in the table.
Few important points to remember when using loss functions for your regression;
 Never compare apple with oranges, that is, never compare different metrics with each other. For example don’t compare values of MSE with MAE or others. They would be different.
 Try to use more than 1 loss function.
 Always calculate evaluation metrics (loss functions) for both testing and training data set.
 Compare evaluation metrics between test and training data set. There shouldn’t be a huge difference between them. If there is, there is a problem with your model. For example if you are using RMSE, calculate RMSE for testing and training data set. There should be huge difference between these values for this data set.
 If you have outlier in the data and you want to ignore them, MAE is a better option but if you want to account for them in your loss function, go for MSE/RMSE.
Questions or feedback? Please leave your comments.
Akhilendra:
Thank you for sharing your article, and I enjoy reading it. I have a question on the statement of “[MSE] does account for a positive or negative value.”
How can it be when the square function makes the residual error to be positive?
Regards,
Duc Haba