We will understand following concept today¶

1.) Why model performance evaluation is required?
2.) Classification model evaluation metrics
3.) Regression model evaluation metrics

Why model performance evaluation is required?¶

Following are the reason to use model evaluation metrics

a) We need a way to choose between different model types, tuning parameters, and features

b) It is used to estimate how well a model will generalize to out-of-sample data

c) It helps to quantify the model performance

Classification model evaluation metrics¶

1. Confusion Matrix¶

A confusion matrix is summary of prediction results on classification problems.This matrix contains the following:

TP(True positive): correct positive prediction
TN(True Negative): correct negative prediction
FP(False Positive): incorrect positive prediction
FN(False Negative): incorrect negative prediction

		Predicted
		1	0
Actual	1	TP	FN
	0	FP	TN

Below is the code for calculating confusion matrix for the given actual and predicted values

In [1]:

from itertools import chain
import numpy as np
from scipy.sparse import coo_matrix

def unique_labels(*ys):
    """get the unique labels in the dependent variable
    """
    ys_labels = set(chain.from_iterable(y for y in ys))    

    return np.array(sorted(ys_labels))

In [2]:

unique_labels([1,1,1,0,0,0,2,2],[0,1,1,1,0,3])

Out[2]:

array([0, 1, 2, 3])

In [3]:

def calconfusion_matrix(y_true,y_pred):
    """
    Compute the confusion matrix to evaluate the accuracy of a classification
    """    
    
    labels = unique_labels(y_true, y_pred)
    
    sample_weight = np.ones(y_true.shape[0], dtype=np.int64)
        
    n_labels = labels.size
    label_to_ind = dict((y, x) for x, y in enumerate(labels))
    
    # convert yt, yp into index
    y_pred = np.array([label_to_ind.get(x, n_labels + 1) for x in y_pred])
    y_true = np.array([label_to_ind.get(x, n_labels + 1) for x in y_true])

    # intersect y_pred, y_true with labels, eliminate items not in labels
    ind = np.logical_and(y_pred < n_labels, y_true < n_labels)
    y_pred = y_pred[ind]
    y_true = y_true[ind]
    
    # also eliminate weights of eliminated items
    sample_weight = sample_weight[ind]

    # Choose the accumulator dtype to always have high precision
    if sample_weight.dtype.kind in {'i', 'u', 'b'}:
        dtype = np.int64
    else:
        dtype = np.float64

    CM = coo_matrix((sample_weight, (y_true, y_pred)),
                    shape=(n_labels, n_labels), dtype=dtype,
                    ).toarray()
    return CM

In [4]:

y_true=np.array([1,1,1,0])
y_pred=np.array([1,1,0,0])
tn, fp, fn, tp=calconfusion_matrix(y_true,y_pred).ravel()
p=tp+fn
n=tn+fp
#(tn, fp, fn, tp)

Confusion matrix helps to derive the following metrics

Accuracy: The proportion of the total number that were correct.

$ ACC={\displaystyle\frac{TP+TN}{P+N}} $

P=TP+FN

N=TN+FP

Best is 1.0 and worst is 0.0

Sensitivity / Recall / True Positive Rate: Intuitively it is the ability of the classifier to find all the positive samples

$ Sensitivity={\displaystyle\frac{TP}{TP+FN}=\frac{TP}{P}} $

Best is 1.0 and worst is 0.0

Specificity / True Negative Rate: The proportion of actual negative cases which are correctly identified.

$ Specificity={\displaystyle\frac{TN}{TN+FP}=\frac{TN}{N}} $

Best is 1.0 and worst is 0.0

Precision / Positive Predictive value: Intuitively it is the ability of the classifier not to label as positive a sample that is negative.

$ Precision={\displaystyle\frac{TP}{TP+FP}} $

Best is 1.0 and worst is 0.0

False Positive Rate: It is the Number of incorrect positive prediction divided by total number of negatives

$ FPR={\displaystyle\frac{FP}{TN+FP}=1-Specificity} $

Best is 0.0 and worst is 1.0

F-Score¶

It is a harmonic mean of Precision and Recall.

F1- score¶

$ F_{1}={\displaystyle 2 \frac{(PREC)(RECALL)}{PREC+RECALL}} $

The score lies in the range [0,1] with 1 being ideal and 0 being the worst. Unlike the arithmetic mean, the harmonic mean tends toward the smaller of the two elements. Hence the F1 score will be small if either precision or recall is small.

Fbeta- score¶

$ F_{\beta}={\displaystyle (1+\beta^2)\frac{(PREC)(RECALL)}{\beta^2(PREC+RECALL)}} $

The F-beta score is the weighted harmonic mean of precision and recall, reaching its optimal value at 1 and its worst value at 0. The beta parameter determines the weight of precision in the combined score.

It measures the effectiveness of retrieval with respect to a user who attaches β times as much importance to recall as precision

MCC (Mathew Correlation coefficient)¶

Matthews correlation coefficient is considered to be the most informative single score to establish the quality of a binary classifier prediction in a confusion matrix context.It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.

$ MCC={\displaystyle\frac{TP.TN-FP.FN}{\sqrt{(TP+FP)(TP+FN)(TN+FN)(TN+FP)}}} $

The MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement between prediction and observation.

In [5]:

def accuracy(tp,tn,p,n):
    """
    calculate how accuracy of classification model    
    """
    acc=np.float64(tp+tn)/np.float64(p+n)
    return round(acc,3)

In [6]:

accuracy(tp,tn,p,n)

Out[6]:

0.75

Since 3 out of 4 1's were correctly identified hence 75% accuracy

In [7]:

def sensitivity_recall(tp,fn):
    """
    calculate the sensitivity or recall for given classification model
    """
    return round(np.float64(tp)/np.float64(tp+fn),3)

In [8]:

sensitivity_recall(tp,fn)

Out[8]:

0.66700000000000004

In [9]:

def specificity(tn,n):
    """
    calculate the sepcificity of the classification model
    """
    return round(np.float64(tn)/np.float64(n),3)

In [10]:

specificity(tn,n)

Out[10]:

1.0

In [11]:

def precision(tp,fp):
    """
    calclulate the precision of model
    """
    return np.float64(tp)/np.float64(tp+fp)

In [12]:

precision(tp,fp)

Out[12]:

1.0

In [13]:

def FalsePositiveRate(fp,tn):
    return np.float64(fp)/np.float64(fp+tn)

In [14]:

FalsePositiveRate(fp,tn)

Out[14]:

0.0

In [15]:

def MCC(tp,tn,fp,fn):
    """
    calculate the mathhew correlation coefficient
    """
    return np.round(np.float64(tp*tn-fp*fn)/np.sqrt(np.float64(tp+fp)*np.float64(tp+fn)*np.float64(tn+fn)*np.float64(tn+fp)),2)

In [16]:

MCC(tp,tn,fp,fn)

Out[16]:

0.57999999999999996

In [17]:

def f1_score(y_true,y_pred):
    #calc the confusion matrix to get precision and recall
    tn, fp, fn, tp=calconfusion_matrix(y_true,y_pred).ravel()
    #calc precision and recall score
    prec_score=precision(tp,fp)
    recall_score=sensitivity_recall(tp,fn)
    f1=2*((prec_score*recall_score)/(prec_score+recall_score))
    return f1

In [18]:

def fbeta_score(y_true,y_pred,beta):
    """
    y_true: actual y values
    y_pred: predicted y values 
    beta: beta value passed by user to weigh the recall as imp as precision if beta==1 then fbeta_score==f1_score
    """
    
    #calc the confusion matrix to get precision and recall
    tn, fp, fn, tp=calconfusion_matrix(y_true,y_pred).ravel()
    #calc precision and recall score
    prec_score=precision(tp,fp)
    recall_score=sensitivity_recall(tp,fn)    
    beta2=beta**2
    fbeta_score=(1+beta2)((prec_score*recall_score)/((beta2)*(prec_score+recall_score)))
    
    return fbeta_score

Regression Model Evaluation functions¶

¶

Root Mean square error: RMSE is the square root of the average of squared errors

$ RMSE={\displaystyle \sqrt{\frac{\sum_{i=1}^{n}{(Actual_{i}-Predicted_{i})^2}}{n}}} $

Mean Absolute Error: MAE is the average absolute difference between $\large y_{i}$ (actual) and $ \large \hat{y_{i}}$ (predicted values)

$ MAE={\displaystyle{\frac{1}{n}}{\sum_{i=1}^{n}{\left|{y}_i-\hat{y_i}\right|}}} $

Mean Absolute percentage error: MAPE (Mean Absolute Percent Error) measures the size of the error in percentage terms. It is calculated as the average of the unsigned percentage error

$ MAPE={\displaystyle \frac{1}{n}{\sum_{i=1}^{n}} \left| \frac{Actual_{i}-Predicted_{i}}{Actual_{i}} \right| *100 } $

In [19]:

def rmse(y,y_pred):
    """
    y: vector of actual values
    y_pred: vector of predicted values
    """
    return np.sqrt(np.mean((y-y_pred)**2))

In [20]:

def mae(y,y_pred):
    """
    y: vector of actual values
    y_pred: vector of predicted values
    """
    return np.mean(np.abs(y-y_pred))

In [21]:

def mape(y,y_pred):
    """
    y: vector of actual values
    y_pred: vector of predicted values
    """
    return np.mean(np.abs((y-y_pred)/y))*100

Performance metrics for classification and regression problems

We will understand following concept today¶

Why model performance evaluation is required?¶

Classification model evaluation metrics¶

1. Confusion Matrix¶

F-Score¶

F1- score¶

Fbeta- score¶

MCC (Mathew Correlation coefficient)¶

Regression Model Evaluation functions¶

¶

Published

Category

Tags

Contact