I have been talking about performance metrics recently, and I wanted to post/review on something from from chapter 8 of Data Science For Business.

Profit curves are a way of giving a benefit value to correct predictions and cost value to incorrect predictions.  I really like these because it gives a way to evaluate the business impact of a model.   It is often that we can gain epsilon increases in performance, but it may not make a meaningful difference in the bottom line.   Profit curves allow us to see this.

Profit curves also allow us to estimate the optimal threshold, or what fraction of the strongest predictions, to predict positive to maximize the business impact.

Also, profit curves have constructed in a similar way to ROC curves.  ROC plots show the True Positive Rate and False Positive Rate in order of prediction strength.  Profit curves show expected profit in order of prediction strength.

Confusion Matrix

The confusion matrix show the count of target classifications vs prediction classifications in a matrix.   The True Positive Rates and False Positive Rates are metrics that are target based metrics.  This means that they are normalized by the count of either total target positives or total target negatives.

\Large{\left( \begin{array}{cc} TP & FP \\ FN & TN \end{array} \right) \rightarrow \left( \begin{array}{cc} \frac{TP}{TP+FN} & \frac{FP}{FP+TN} \\ \frac{FN}{TP+FN} & \frac{TN}{FP+TN} \end{array} \right)}

An ROC curve would plot the top row of the right-hand matrix in order of prediction strength.

We have the above confusion matrix and normalize by the target values. It allows us to estimate the rate that we get a given prediction is correct or incorrect.  The next step in constructing a profit curve is to normalize the matrix with respect to the population proportions of positive and negative targets.  This will allow us to get a feel for the actual rate of misclassification and correct classification in the populations we are concerned with if we know the population proportions P_+ and P_-.

\Large{\left( \begin{array}{cc} \frac{TP}{TP+FP} \ P_+ & \frac{FP}{FP+TN} \ P_- \\ \frac{FN}{TP+FN} \ P_+ & \frac{TN}{FP+TN} \ P_- \end{array} \right)}

Profit Matrix

Profit curves are created by giving each term in the above altered confusion matrix with a numerical value. A cost or benefit.

\Large{\mbox{Profit Matrix} = \left( \begin{array}{cc} B_{P_+} & C_{P_+} \\ C_{P_-} & B_{P_-} \end{array} \right)}

For illustrations of the calculations, and to develop some intuition, I will use the following example:

\Large{\mbox{Profit Matrix} = \left( \begin{array}{cc} 10 & -2 \\ -2 & 6 \end{array} \right)}

This matrix says a correct positive prediction is worth $10, a correct negative prediction is worth $6, and a misclassification costs $2.

Profit Calculations

I want to do two toy calculations using the example matrix.  One case is where the model predicts everyone positive.  N_+ and N_- are the count of positive and negative targets in the population.

\Large{ \left( \begin{array}{cc} N_+ & N_- \\ 0 & 0 \end{array} \right) => \left( \begin{array}{cc} 1 & 1 \\ 0 & 0 \end{array} \right) }

The 1,1 illustrate that this is the upper right corner of an ROC curve.

If we have a population that has the proportion of positive and negative examples of P_+ = .33 and P_- = 0.67, we have the error rate that looks like:

\Large{\left( \begin{array}{cc} .33 & .67 \\ 0 & 0 \end{array} \right) }

The element wise multipication with our cost matrix gives:

\Large{ \left( \begin{array}{cc} 3.33 & -1.33 \\ 0 & 0 \end{array} \right) }

We sum all the elements of this matrix together to get the expected profit per prediction:

\Large{ E[\mbox{Profit}] = 3.33 - 1.33 + 0 + 0 = 2 }

This strategy is profitable. If we look at the other extreme and predict everything is negative we get the following results:

\Large{ \left( \begin{array}{cc} 0 & 0 \\ N+ & N- \end{array} \right) \rightarrow \left( \begin{array}{cc} 0 & 0 \\ 1 & 1 \end{array} \right) }

\Large{\left( \begin{array}{cc} 0 & 0 \\ .33 & .67 \end{array} \right) }

The element wise multiplication with our cost matrix produces:

\Large{\left( \begin{array}{cc} 0 & 0 \\ -.67 & 4.00 \end{array} \right) }

\Large{E[\mbox{Profit}] = 0 + 0 + -.67 + 4.00 = 3.33}

I have two extreme predictions.  In one case I predict everything to be positive, and make $2 per prediction.  The other case is I predict everything to be negative, and make $3.33 per prediction.

Models can help improve our predictions, and we would like to do two things.

  1. Forecast increase in profits
  2. Choose the model parameters that optimize profits

Flower Shop

For this post I am going to make a profit curve for a flower shop that sells at two locations, but I am trying to automate the shipping. If we can get our best flowers to the right location, we will make 10 dollars, but if we have to pay 2 dollars in shipping to the correct place if the prediction is incorrect. If we get the other flowers to the other location, we get 6 dollars, but again we have to pay 2 dollars in shipping if this prediction is incorrect. The cost benefit matrix is what I have used above.

\Large{ \mbox{Profit Matrix} = \left( \begin{array}{cc} 10 & -2 \\ -2 & 6 \end{array} \right) }

We already know the two extremes in profit, but lets fit some models on the iris data set and make some profit curves.


I am using the Iris data set fit with two models:  Logistics Regression and Support Vector Machines.   I have hamstrung both models because I am not performing a train/test split (for the interest in focus on making profit curves) and to artificially give both models the same Area under the ROC Curve (AUC).


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix,roc_curve, roc_auc_score
%matplotlib inline
iris = datasets.load_iris()
X = iris.data # we only take the first two features.
Y = (iris.target==1).astype(int)
svm = SVC(C=.1,probability=True,kernel='linear')
log = LogisticRegression(C=1)
y_pred = log.predict(X)
probs = []
p = log.predict_proba(X)[:,1]
fpr, tpr, thresholds = roc_curve(Y,p)
plt.plot(fpr,tpr,label='Log ROC Curve - AUC: %.2f' % roc_auc_score(Y,p),color='steelblue',lw=2)
y_pred = svm.predict(X)
p = svm.predict_proba(X)[:,1]
fpr, tpr, thresholds = roc_curve(Y,p)
plt.plot(fpr,tpr,label='SVM ROC Curve - AUC: %.2f' % roc_auc_score(Y,p),color='firebrick',lw=2)
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")

Screen Shot 2016-01-26 at 6.59.43 AM

The SVM has a higher true positive rate for lower false positive rates, while the logistic regression model has a higher true positive rate for larger false positive rates.   Everything predicted positive in the upper right, and everything predicted negative is the lower left.

The AUC is the same (and even if it was not) so we need a way to evaluate the business impact.

Profit Curve Calculations

Now that I have models, and I have scored all the p values, We can look a the confusion matrix.  Because I am using sklearn’s method I want to point out that the confusion matrix has the following format:

\Large{ \left( \begin{array}{cc} TN & FP \\ FN & TP \end{array} \right) }

The rates then become:

\Large{ \left( \begin{array}{cc} TN & FP \\ FN & TP \end{array} \right) \rightarrow \left( \begin{array}{cc} \frac{TN}{TN+FP} \ P_- & \frac{FP}{TN+FP} \ P_- \\ \frac{FN}{TP+FN} \ P_+ & \frac{TP}{TP+FN} \ P_+ \end{array} \right) }

The profit matrix is multiplied element wise with the above matrix.  The results are:

\Large{ \left( \begin{array}{cc} \frac{TN}{TN+FP} \ P_- \ B_- & \frac{FP}{TN+FP} \ P_- \ C_- \ \\ \frac{FN}{TP+FN} \ P_+ \ C_+ & \frac{TP}{TP+FN} \ P_+ \ B_+ \end{array} \right) }

We now sum each element in the matrix and get the expected profit.

Making Profit Curves

To make the profit curve we now just have to go through each prediction ordered by strongest predictions to our weakest predictions.   For each prediction, will say everything as strong or stronger than that prediction is positive, and everything weaker than that prediction is negative.  We then calculate the expected profit as shown above.

names = ['Log','SVM']
colors = ['steelblue','firebrick']
for i,p in enumerate(probs):
    order = np.argsort(p)
    cost_vals = []
    for pp in p[order]:
            cm = confusion_matrix(Y,(p>=pp))
            #Make Rates
            cmr = cm/cm.sum(axis=1).astype(float).reshape(2,1)
            #Multiply by target proportions
            acmr = cmr*(cm.sum(axis=1).astype(float)/cm.sum()).reshape(2,1)
            #Elementwise multiplication with cost matrix

    plt.plot(np.array(range(len(p),0,-1)).astype(float)/len(p),cost_vals,label='%s Curve - AUC: %.2f - Max Profit %.2f' % (names[i], roc_auc_score(Y,p),max(cost_vals)),color=colors[i],lw=2)
plt.xlabel("Percent Predicted Positive")
plt.ylabel("Expected Profit Per Prediction")

Screen Shot 2016-01-26 at 7.22.13 AM.png

The profit curve shows that logistic regression has a slightly higher profit than the SVM model, but in reality we should having train/test splits and boot strap errors.   At a first order estimate this gives a feel for the business value a model  can have.  In this case, it allows us to increase profits significantly.

Final Thoughts

There are other business considerations that have be considered in model choice, such as technical debt, model debt, and data debt.   Models do have to be maintained in practice.

While at Galvanize we did a churn analysis similar to this, with a costly intervention strategy for a cell phone provider.   The best performing models are also the most costly to implement, train, and maintain.  Using the wrong thresholds can also cost money.

Screen Shot 2016-01-26 at 7.31.30 AM.png

There is significant gains to be had in this case, but the human and infrastructure costs have to be considered as well.

Thank You

As all ways, thanks for reading.  Hope you enjoyed or gained something.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: