Several metrics are used to evaluate the performance of methodologies in regression problems. Some of the common metrics are RMSE, MAE, MSE and R-Squared. The button below links to an introduction of these metrics, their definition and formulas.
This section shows how to code these metrics in Python from scratch while the section below shows how to use sklearn package to obtain the same results. The library that will be used to create them from scratch is numpy
to perform mathematical operations. These metrics perform operations on the residuals which are the different between the true values and the predicted values, so some values need to be defined. For example, the true values are $\boldsymbol{y}=[2,0.8,5.4,3.2,1.6,8.3,7.2,9.5]$ while the predicted values are $\boldsymbol{y_{pred}}=[2.3,0.85,5.2,3.5,1.2,8,7.1,9.8]$. The code below shows how the metrics are coded from scratch.
import numpy as np y=np.array([2,0.8,5.4,3.2,1.6,8.3,7.2,9.5]) y_pred=np.array([2.3,0.85,5.2,3.5,1.2,8,7.1,9.8]) #Calculate Residuals e=y-y_pred #RMSE rmse=np.sqrt(np.mean(np.power(e,2))) print("RMSE: ",rmse) #MSE mse=np.mean(np.power(e,2)) print("MSE: ",mse) #MAE mae=np.mean(np.abs(e)) print("MAE: ",mae) #R2 RSS=np.sum(np.power(e,2)) y_mean=np.mean(y) TSS=np.sum(np.power(y-y_mean,2)) R2=1-RSS/TSS print("R2: ",R2)
The values obtained are:
RMSE: 0.2675116819879089 MSE: 0.07156250000000011 MAE: 0.24375000000000024 R2: 0.9925726517903477
Use the console below to try to create the metrics or change the real and predicted values.
This section shows how to use common regression metrics in Python with sklearn which is a very well-known Python package used in Machine Learning. The library that needs to be imported is sklearn.metrics
which contains a series of metrics functions that can be called. Using the same values mentioned in the previous section, the true values are $\boldsymbol{y}=[2,0.8,5.4,3.2,1.6,8.3,7.2,9.5]$ while the predicted values are $\boldsymbol{y_{pred}}=[2.3,0.85,5.2,3.5,1.2,8,7.1,9.8]$. The code below shows how the metrics can be called with sklearn. It is very easy given that there is a function for all of them: mean_squared_error()
is used for MSE when squared=False
otherwise it becomes RMSE, mean_absolute_error()
is used for MAE and r2_score()
is used for R2 (i.e. R-Squared).
import numpy as np import sklearn.metrics as mts y=np.array([2,0.8,5.4,3.2,1.6,8.3,7.2,9.5]) y_pred=np.array([2.3,0.85,5.2,3.5,1.2,8,7.1,9.8]) print('sklearn') #RMSE sk_rmse=mts.mean_squared_error(y,y_pred,squared=False) print("RMSE: ",sk_rmse) sk_mse=mts.mean_squared_error(y,y_pred,squared=True) print("MSE: ",sk_mse) sk_mae=mts.mean_absolute_error(y,y_pred) print("MAE: ",sk_mae) sk_r2=mts.r2_score(y,y_pred) print("R2: ",sk_r2)
Again, we obtain the same values shown above which are:
sklearn RMSE: 0.2675116819879089 MSE: 0.07156250000000011 MAE: 0.24375000000000024 R2: 0.9925726517903477
The sklearn package conveniently has several functions typically used in Machine Learning and this saves time although it is good to know what lies behond these functions and how they actually work.