We will understand following concept today¶
1.) Why model performance evaluation is required?
2.) Classification model evaluation metrics
3.) Regression model evaluation metrics
Why model performance evaluation is required?¶
Following are the reason to use model evaluation metrics
a) We need a way to choose between different model types, tuning parameters, and features
b) It is used to estimate how well a model will generalize to out-of-sample data
c) It helps to quantify the model performance
Classification model evaluation metrics¶
1. Confusion Matrix¶
A confusion matrix is summary of prediction results on classification problems.This matrix contains the following:
TP(True positive): correct positive prediction
TN(True Negative): correct negative prediction
FP(False Positive): incorrect positive prediction
FN(False Negative): incorrect negative prediction
Predicted | ||||
---|---|---|---|---|
1 | 0 | |||
Actual | 1 | TP | FN | |
0 | FP | TN |
Below is the code for calculating confusion matrix for the given actual and predicted values
from itertools import chain
import numpy as np
from scipy.sparse import coo_matrix
def unique_labels(*ys):
"""get the unique labels in the dependent variable
"""
ys_labels = set(chain.from_iterable(y for y in ys))
return np.array(sorted(ys_labels))
unique_labels([1,1,1,0,0,0,2,2],[0,1,1,1,0,3])
def calconfusion_matrix(y_true,y_pred):
"""
Compute the confusion matrix to evaluate the accuracy of a classification
"""
labels = unique_labels(y_true, y_pred)
sample_weight = np.ones(y_true.shape[0], dtype=np.int64)
n_labels = labels.size
label_to_ind = dict((y, x) for x, y in enumerate(labels))
# convert yt, yp into index
y_pred = np.array([label_to_ind.get(x, n_labels + 1) for x in y_pred])
y_true = np.array([label_to_ind.get(x, n_labels + 1) for x in y_true])
# intersect y_pred, y_true with labels, eliminate items not in labels
ind = np.logical_and(y_pred < n_labels, y_true < n_labels)
y_pred = y_pred[ind]
y_true = y_true[ind]
# also eliminate weights of eliminated items
sample_weight = sample_weight[ind]
# Choose the accumulator dtype to always have high precision
if sample_weight.dtype.kind in {'i', 'u', 'b'}:
dtype = np.int64
else:
dtype = np.float64
CM = coo_matrix((sample_weight, (y_true, y_pred)),
shape=(n_labels, n_labels), dtype=dtype,
).toarray()
return CM
y_true=np.array([1,1,1,0])
y_pred=np.array([1,1,0,0])
tn, fp, fn, tp=calconfusion_matrix(y_true,y_pred).ravel()
p=tp+fn
n=tn+fp
#(tn, fp, fn, tp)
Confusion matrix helps to derive the following metrics
Accuracy: The proportion of the total number that were correct.
$ ACC={\displaystyle\frac{TP+TN}{P+N}} $
P=TP+FN
N=TN+FP
Best is 1.0 and worst is 0.0
Sensitivity / Recall / True Positive Rate: Intuitively it is the ability of the classifier to find all the positive samples
$ Sensitivity={\displaystyle\frac{TP}{TP+FN}=\frac{TP}{P}} $
Best is 1.0 and worst is 0.0
Specificity / True Negative Rate: The proportion of actual negative cases which are correctly identified.
$ Specificity={\displaystyle\frac{TN}{TN+FP}=\frac{TN}{N}} $
Best is 1.0 and worst is 0.0
Precision / Positive Predictive value: Intuitively it is the ability of the classifier not to label as positive a sample that is negative.
$ Precision={\displaystyle\frac{TP}{TP+FP}} $
Best is 1.0 and worst is 0.0
False Positive Rate: It is the Number of incorrect positive prediction divided by total number of negatives
$ FPR={\displaystyle\frac{FP}{TN+FP}=1-Specificity} $
Best is 0.0 and worst is 1.0
F-Score¶
It is a harmonic mean of Precision and Recall.
F1- score¶
$ F_{1}={\displaystyle 2 \frac{(PREC)(RECALL)}{PREC+RECALL}} $
The score lies in the range [0,1] with 1 being ideal and 0 being the worst. Unlike the arithmetic mean, the harmonic mean tends toward the smaller of the two elements. Hence the F1 score will be small if either precision or recall is small.
Fbeta- score¶
$ F_{\beta}={\displaystyle (1+\beta^2)\frac{(PREC)(RECALL)}{\beta^2(PREC+RECALL)}} $
The F-beta score is the weighted harmonic mean of precision and recall, reaching its optimal value at 1 and its worst value at 0. The beta parameter determines the weight of precision in the combined score.
It measures the effectiveness of retrieval with respect to a user who attaches β times as much importance to recall as precision
MCC (Mathew Correlation coefficient)¶
Matthews correlation coefficient is considered to be the most informative single score to establish the quality of a binary classifier prediction in a confusion matrix context.It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.
$ MCC={\displaystyle\frac{TP.TN-FP.FN}{\sqrt{(TP+FP)(TP+FN)(TN+FN)(TN+FP)}}} $
The MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement between prediction and observation.
def accuracy(tp,tn,p,n):
"""
calculate how accuracy of classification model
"""
acc=np.float64(tp+tn)/np.float64(p+n)
return round(acc,3)
accuracy(tp,tn,p,n)
Since 3 out of 4 1's were correctly identified hence 75% accuracy
def sensitivity_recall(tp,fn):
"""
calculate the sensitivity or recall for given classification model
"""
return round(np.float64(tp)/np.float64(tp+fn),3)
sensitivity_recall(tp,fn)
def specificity(tn,n):
"""
calculate the sepcificity of the classification model
"""
return round(np.float64(tn)/np.float64(n),3)
specificity(tn,n)
def precision(tp,fp):
"""
calclulate the precision of model
"""
return np.float64(tp)/np.float64(tp+fp)
precision(tp,fp)
def FalsePositiveRate(fp,tn):
return np.float64(fp)/np.float64(fp+tn)
FalsePositiveRate(fp,tn)
def MCC(tp,tn,fp,fn):
"""
calculate the mathhew correlation coefficient
"""
return np.round(np.float64(tp*tn-fp*fn)/np.sqrt(np.float64(tp+fp)*np.float64(tp+fn)*np.float64(tn+fn)*np.float64(tn+fp)),2)
MCC(tp,tn,fp,fn)
def f1_score(y_true,y_pred):
#calc the confusion matrix to get precision and recall
tn, fp, fn, tp=calconfusion_matrix(y_true,y_pred).ravel()
#calc precision and recall score
prec_score=precision(tp,fp)
recall_score=sensitivity_recall(tp,fn)
f1=2*((prec_score*recall_score)/(prec_score+recall_score))
return f1
def fbeta_score(y_true,y_pred,beta):
"""
y_true: actual y values
y_pred: predicted y values
beta: beta value passed by user to weigh the recall as imp as precision if beta==1 then fbeta_score==f1_score
"""
#calc the confusion matrix to get precision and recall
tn, fp, fn, tp=calconfusion_matrix(y_true,y_pred).ravel()
#calc precision and recall score
prec_score=precision(tp,fp)
recall_score=sensitivity_recall(tp,fn)
beta2=beta**2
fbeta_score=(1+beta2)((prec_score*recall_score)/((beta2)*(prec_score+recall_score)))
return fbeta_score
Regression Model Evaluation functions¶
¶
Root Mean square error: RMSE is the square root of the average of squared errors
$ RMSE={\displaystyle \sqrt{\frac{\sum_{i=1}^{n}{(Actual_{i}-Predicted_{i})^2}}{n}}} $
Mean Absolute Error: MAE is the average absolute difference between $\large y_{i}$ (actual) and $ \large \hat{y_{i}}$ (predicted values)
$ MAE={\displaystyle{\frac{1}{n}}{\sum_{i=1}^{n}{\left|{y}_i-\hat{y_i}\right|}}} $
Mean Absolute percentage error: MAPE (Mean Absolute Percent Error) measures the size of the error in percentage terms. It is calculated as the average of the unsigned percentage error
$ MAPE={\displaystyle \frac{1}{n}{\sum_{i=1}^{n}} \left| \frac{Actual_{i}-Predicted_{i}}{Actual_{i}} \right| *100 } $
def rmse(y,y_pred):
"""
y: vector of actual values
y_pred: vector of predicted values
"""
return np.sqrt(np.mean((y-y_pred)**2))
def mae(y,y_pred):
"""
y: vector of actual values
y_pred: vector of predicted values
"""
return np.mean(np.abs(y-y_pred))
def mape(y,y_pred):
"""
y: vector of actual values
y_pred: vector of predicted values
"""
return np.mean(np.abs((y-y_pred)/y))*100