Python Code of Regression Metrics from scratch

Several metrics are used to evaluate the performance of methodologies in regression problems. Some of the common metrics are RMSE, MAE, MSE and R-Squared. The button below links to an introduction of these metrics, their definition and formulas.

This section shows how to code these metrics in Python from scratch while the section below shows how to use sklearn package to obtain the same results. The library that will be used to create them from scratch is numpy to perform mathematical operations. These metrics perform operations on the residuals which are the different between the true values and the predicted values, so some values need to be defined. For example, the true values are $\boldsymbol{y}=[2,0.8,5.4,3.2,1.6,8.3,7.2,9.5]$ while the predicted values are $\boldsymbol{y_{pred}}=[2.3,0.85,5.2,3.5,1.2,8,7.1,9.8]$. The code below shows how the metrics are coded from scratch.

import numpy as np

y=np.array([2,0.8,5.4,3.2,1.6,8.3,7.2,9.5])
y_pred=np.array([2.3,0.85,5.2,3.5,1.2,8,7.1,9.8])

#Calculate Residuals
e=y-y_pred

#RMSE
rmse=np.sqrt(np.mean(np.power(e,2)))
print("RMSE: ",rmse)
#MSE
mse=np.mean(np.power(e,2))
print("MSE: ",mse)
#MAE
mae=np.mean(np.abs(e))
print("MAE: ",mae)
#R2
RSS=np.sum(np.power(e,2))
y_mean=np.mean(y)
TSS=np.sum(np.power(y-y_mean,2))
R2=1-RSS/TSS
print("R2: ",R2)

The values obtained are:

RMSE:  0.2675116819879089
MSE:  0.07156250000000011
MAE:  0.24375000000000024
R2:  0.9925726517903477

Use the console below to try to create the metrics or change the real and predicted values.


Python Code of Regression Metrics by using sklearn

This section shows how to use common regression metrics in Python with sklearn which is a very well-known Python package used in Machine Learning. The library that needs to be imported is sklearn.metrics which contains a series of metrics functions that can be called. Using the same values mentioned in the previous section, the true values are $\boldsymbol{y}=[2,0.8,5.4,3.2,1.6,8.3,7.2,9.5]$ while the predicted values are $\boldsymbol{y_{pred}}=[2.3,0.85,5.2,3.5,1.2,8,7.1,9.8]$. The code below shows how the metrics can be called with sklearn. It is very easy given that there is a function for all of them: mean_squared_error() is used for MSE when squared=False otherwise it becomes RMSE, mean_absolute_error() is used for MAE and r2_score() is used for R2 (i.e. R-Squared).

import numpy as np
import sklearn.metrics as mts

y=np.array([2,0.8,5.4,3.2,1.6,8.3,7.2,9.5])
y_pred=np.array([2.3,0.85,5.2,3.5,1.2,8,7.1,9.8])

print('sklearn')
#RMSE
sk_rmse=mts.mean_squared_error(y,y_pred,squared=False)
print("RMSE: ",sk_rmse)
sk_mse=mts.mean_squared_error(y,y_pred,squared=True)
print("MSE: ",sk_mse)
sk_mae=mts.mean_absolute_error(y,y_pred)
print("MAE: ",sk_mae)
sk_r2=mts.r2_score(y,y_pred)
print("R2: ",sk_r2)

Again, we obtain the same values shown above which are:

sklearn
RMSE:  0.2675116819879089
MSE:  0.07156250000000011
MAE:  0.24375000000000024
R2:  0.9925726517903477

The sklearn package conveniently has several functions typically used in Machine Learning and this saves time although it is good to know what lies behond these functions and how they actually work.