Confusion Matrix – An Overview with Python and R


To develop a machine learning classification model, we first collect data, then perform data exploration, data pre-processing, and cleaning. After completing all these processes, we apply the classification technique to achieve predictions from that model. This is a brief idea about how we develop a machine learning model. Before finalising the classifier model, we have to be sure if it is performing well or not. Confusion Matrix measures the performance of a classifier to check efficiency and precision in predicting results. In this article, we will study the confusion matrix in detail.

Confusion Matrix Definition

A confusion matrix is used to judge the performance of a classifier on the test dataset for which we already know the actual values. Confusion matrix is also termed as Error matrix. It consists of a count of correct and incorrect values broken down by each class. It not only tells us the error made by classifier but also tells us what type of error the classifier made. So, we can say that a confusion matrix is a performance measurement technique of a classifier model where output can be two classes or more. It is a table with four different groups of true and predicted values.

Terminologies in Confusion Matrix

The confusion matrix shows us how our classifier gets confused while predicting. In a confusion matrix we have four important terms which are:

  1. True Positive (TP)
  2. True Negative (TN)
  3. False Positive (FP)
  4. False Negative (FN)

We will explain these terms with the help of visualisation of the confusion matrix:

This is what a confusion matrix looks like. This is a case of a 2-class confusion matrix. On one side of the table, there are predicted values and on one side there are the actual values. 

Let’s discuss the above terms in detail:

True Positive (TP)

Both actual and predicted values are Positive.

True Negative (TN)

Both actual and predicted values are Negative.

False Positive (FP)

The actual value is negative but we predicted it as positive. 

False Negative (FN)

The actual value is positive but we predicted it as negative.

Performance Metrics 

Confusion matrix not only used for finding the errors in prediction but is also useful to find some important performance metrics like Accuracy, Recall, Precision, F-measure. We will discuss these terms one by one.


As the name suggests, the value of this metric suggests the accuracy of our classifier in predicting results.

It is defined as:

Accuracy = (TP + TN) / (TP + TN + FP + FN)

A 99% accuracy can be good, average, poor or dreadful depending upon the problem.


Precision is the measure of all actual positives out of all predicted positive values. 

It is defined as:

Precision = TP / (TP + FP)


Recall is the measure of positive values that are predicted correctly out of all actual positive values.

It is defined as:

Recall = TP / (TP + FN)

High Value of Recall specifies that the class is correctly known (because of a small number of False Negative).


It is hard to compare classification models which have low precision and high recall or vice versa. So, for comparing the two classifier models we use F-measure. F-score helps to find the metrics of Recall and Precision in the same interval. Harmonic Mean is used instead of Arithmetic Mean. 

F-measure is defined as:

F-measure = 2 * Recall * Precision / (Recall + Precision)

The F-Measure is always closer to the Precision or Recall, whichever has a smaller value.

Calculation of 2-class confusion matrix

Let us derive a confusion matrix and interpret the result using simple mathematics.

Let us consider the actual and predicted values of y as given below:

Actual y Y predicted Predicted y with threshold 0.5
1 0.7 1
0 0.1 0
0 0.6 1
1 0.4 0
0 0.2 0

Now, if we make a confusion matrix from this, it would look like:

N=5 Predicted 1 Predicted 0
Actual: 1 1 (TP) 1 (FN)
Actual: 0 1 (FP) 2 (TN)

This is our derived confusion matrix. Now we can also see all the four terms used in the above confusion matrix. Now we will find all the above-defined performance metrics from this confusion matrix.


Accuracy = (TP + TN) / (TP + TN + FP + FN)

So, Accuracy = (1+2) / (1+2+1+1)

                        = 3/5 which is 60%.

So, the accuracy from the above confusion matrix is 60%.


Precision = TP / (TP + FP)

                 = 1 / (1+1)

                 =1 / 2 which is 50%.

So, the precision is 50%.


Recall = TP / (TP + FN)

           = 1 / (1+1)

           = ½ which is 50%

So, the Recall is 50%.


F-measure = 2 * Recall * Precision / (Recall + Precision)

                    = 2*0.5*0.5 / (0.5+0.5)

                    = 0.5

So, the F-measure is 50%.

Confusion Matrix in Python

In this section, we will derive all performance metrics for a confusion matrix using Python

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt                      # Importing the required libraries
import seaborn as sns
%matplotlib inline
df=pd.read_csv("bank.csv", delimiter=";",header='infer')
df.columns   # Columns in the dataset
df.shape           # There are 4521 rows and 17 columns in data ()           # Checking info of data
df.dtypes        # Checking the data types of variables in data
df.describe()              # Summary statistics of numerical columns in data
df.isnull().sum()          # Checking the missing value in data. We can see that there is no missing value in data.
df.corr()                    # Correlation matrix
sns.heatmap(df.corr())         # Visualization of Correlation matrix Using heatmap

As we see, not a single feature is correlated completely with class, hence requires a combination of features.

sns.countplot(y='job', data= df)
sns.countplot(x='marital', data= df)
sns.countplot(x='y', data= df)
from sklearn import preprocessing
from sklearn.preprocessing import LabelEncoder
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn import metrics                                       
from sklearn.metrics import accuracy_score,confusion_matrix
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

Sklearn offers a very effective technique for encoding the classes of a categorical variable into numeric format. LabelEncoder encodes classes with values between 0 and n_classes-1

le = preprocessing.LabelEncoder()
df.job = le.fit_transform(df.job)
df.marital = le.fit_transform(df.marital)
df.default = le.fit_transform(df.default) = le.fit_transform(
df.housing = le.fit_transform(df.housing) = le.fit_transform( = le.fit_transform(
df.month = le.fit_transform(df.month)
df.poutcome = le.fit_transform(df.poutcome)
df.y = le.fit_transform(df.y)
X= df.drop(["y"],axis=1)
y= df ["y"]        #### X consists of all independent variables and y has the dependent variable.

Train and Test split

Now, we will split the data into training and testing sets. We will train the model with training data and will test the performance of our model on the test data which will be unknown for the model.

Here, we split data in train and test in 70:30.

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.3, random_state=42)
Print (X_train.shape,X_test.shape, y_train.shape, y_test.shape)                             
model_log=LogisticRegression (max_iter=1000, random_state=42) (X_train, y_train)
pred=model_log.predict (X_test)
accuracy_score (y_test, pred)
confusion_matrix (y_test, pred)
[[1175   30]
 [ 121   31]]
Print (classification_report (y_test, prediction_log))
          precision    recall f1-score   support
          0       0.91      0.98      0.94      1205
          1       0.51      0.20      0.29       152
    accuracy                           0.89      1357
   macro avg       0.71      0.59      0.62      1357
weighted avg       0.86      0.89      0.87      1357

Confusion Matrix study in R

Library (dplyr)
Library (ggplot2)
library (DataExplorer)
colSums ( # Checking if there is any missing value or not column wise

Changing? into a new category ‘Missing’

df $workclass = ifelse (df $workclass=='?', 'Missing', as.character (df $workclass))
df $workclass = as.factor (df $workclass)
df $occupation = ifelse (df $occupation=='?', 'Missing', as.character(df $occupation))
df $occupation = as.factor (df $occupation)
df $ = ifelse(df $ '?', 'Missing',as.character(df $
df $ = as.factor (df $

Creating a new column target based on income column

df $target=ifelse (df $income == '>50K', 1, 0)
df $target=as.factor (df $target)

For checking outliers:

boxplot (df $capital.gain)
head (sort (df $capital.gain, decreasing = T),10)
boxplot (df $capital.loss)
boxplot (df $hours.per.week)

Changing Age column into 3 categories:

df $age=ifelse (df $age 30 & df $age 

Splitting data into test and train:

set.seed (1000)
index=sample (nrow (df), 0.70*nrow (df), replace=F)
train= df [index,]
test= df [-index,]

Applying logistic regression:

step (mod,direction = 'both')

2nd Iteration based on function call given by step function:

mod1=glm (formula = target ~ age + workclass + fnlwgt + education + 
           marital.status + occupation + relationship + race + sex + 
           capital.gain + capital.loss + hours.per.week +, 
         family = "binomial", data = train)

Changing significant categorical var levels into dummies:

train$age_Young_d = ifelse (train$age== 'Young', 1, 0)
test$age_Young_d = ifelse (test$age== 'Young', 1, 0)

train$workclassLocalgov_d = ifelse (train$workclass== 'Local-gov', 1, 0)
test$workclassLocalgov_d = ifelse (test$workclass== 'Local-gov', 1, 0)

train$workclassMissing_d = ifelse (train$workclass== 'Missing', 1, 0)
test$workclassMissing_d = ifelse (test$workclass== 'Missing', 1, 0)

test$workclassPrivate_d = ifelse (test$workclass== 'Private', 1, 0)
train$workclassPrivate_d = ifelse (train$workclass== 'Private', 1, 0)

train$workclassSelfempnotinc_d = ifelse (train$workclass== 'Self-emp-not-inc', 1, 0)
test$workclassSelfempnotinc_d = ifelse (test$workclass== 'Self-emp-not-inc', 1, 0)

test$workclassSelfempinc_d = ifelse (test$workclass== 'Self-emp-inc', 1, 0)
train$workclassSelfempinc_d = ifelse (train$workclass== 'Self-emp-inc', 1, 0)

train$workclassStategov_d = ifelse (train$workclass== 'State-gov', 1, 0)
test$workclassStategov_d = ifelse (test$workclass== 'State-gov', 1, 0)

train$education1st_4th_d = ifelse (train$education== '1st-4th', 1, 0)
test$education1st_4th_d = ifelse (test$education== '1st-4th', 1, 0)
train$educationAssocacdm_d = ifelse (train$education== 'Assoc-acdm', 1, 0)
test$educationAssocacdm_d = ifelse (test$education== 'Assoc-acdm', 1, 0)

train$educationAssocvoc_d = ifelse (train$education== 'Assoc-voc', 1, 0)
test$educationAssocvoc_d = ifelse (test$education== 'Assoc-voc',1, 0)

train$educationBachelors_d = ifelse (train$education== 'Bachelors', 1, 0)
test$educationBachelors_d = ifelse (test$education== 'Bachelors', 1, 0)

train$educationDoctorate_d = ifelse (train$education== 'Doctorate', 1, 0)
test$educationDoctorate_d = ifelse (test$education== 'Doctorate', 1, 0)

train$educationHSgrad_d = ifelse (train$education== 'HS-grad', 1, 0)
test$educationHSgrad_d = ifelse (test$education== 'HS-grad', 1, 0)

train$educationMasters_d = ifelse (train$education== 'Masters', 1, 0)
test$educationMasters_d = ifelse (test$education=='Masters', 1, 0)

train$educationProfschool_d = ifelse (train$education== 'Prof-school', 1, 0)
test$educationProfschool_d = ifelse (test$education== 'Prof-school', 1, 0)
train$educationSomecollege_d = ifelse (train$education== 'Some-college', 1, 0)
test$educationSomecollege_d = ifelse (test$education== 'Some-college', 1, 0)
train$marital.statusMarriedAFspouse_d = ifelse (train$marital.status== 'Married-AF-spouse',1,0)
test$marital.statusMarriedAFspouse_d = ifelse (test$marital.status== 'Married-AF-spouse',1,0)
train$marital.statusMarriedcivspouse_d = ifelse (train$marital.status== 'Married-civ-spouse',1,0)
test$marital.statusMarriedcivspouse_d = ifelse (test$marital.status== 'Married-civ-spouse',1,0)
train$marital.statusNevermarried_d = ifelse (train$marital.status== 'Never-married', 1, 0)
test$marital.statusNevermarried_d = ifelse (test$marital.status== 'Never-married', 1, 0)
train$marital.statusWidowed_d = ifelse (train$marital.status== 'Widowed', 1, 0)
test$marital.statusWidowed_d = ifelse (test$marital.status== 'Widowed', 1, 0)
train$occupationExecmanagerial_d = ifelse (train$occupation== 'Exec-managerial', 1, 0)
test$occupationExecmanagerial_d = ifelse (test$occupation== 'Exec-managerial', 1,0)
train$occupationFarmingfishing_d = ifelse (train$occupation== 'Farming-fishing', 1, 0)
test$occupationFarmingfishing_d = ifelse (test$occupation== 'Farming-fishing', 1, 0)
train$occupationHandlerscleaners_d = ifelse (train$occupation== 'Handlers-cleaners', 1, 0)
test$occupationHandlerscleaners_d = ifelse (test$occupation== 'Handlers-cleaners', 1, 0)
train$occupationMachineopinspct_d = ifelse (train$occupation== 'Machine-op-inspct', 1, 0)
test$occupationMachineopinspct_d = ifelse (test$occupation== 'Machine-op-inspct', 1, 0)

train$occupationOtherservice_d = ifelse (train$occupation== 'Other-service', 1, 0)
test$occupationOtherservice_d = ifelse (test$occupation== 'Other-service', 1, 0)
train$occupationProfspecialty_d = ifelse (train$occupation== 'Prof-specialty', 1, 0)
test$occupationProfspecialty_d = ifelse (test$occupation== 'Prof-specialty', 1, 0)
train$occupationProtectiveserv_d = ifelse (train$occupation== 'Protective-serv', 1, 0)
test$occupationProtectiveserv_d = ifelse (test$occupation== 'Protective-serv', 1, 0)
train$occupationSales_d = ifelse (train$occupation== 'Sales', 1, 0)
test$occupationSales_d = ifelse (test$occupation== 'Sales', 1, 0)
train$occupationTechsupport_d = ifelse (train$occupation== 'Tech-support', 1, 0)
test$occupationTechsupport_d = ifelse (test$occupation== 'Tech-support', 1, 0)
train$relationshipOwnchild_d = ifelse (train$relationship== 'Own-child', 1, 0)
test$relationshipOwnchild_d = ifelse (test$relationship== 'Own-child', 1, 0)
train$relationshipWife_d = ifelse (train$relationship== 'Wife', 1, 0)
test$relationshipWife_d = ifelse (test$relationship== 'Wife', 1, 0)
train$raceAsianPacIslander_d=ifelse (train$race=='Asian-Pac-Islander', 1, 0)
test$raceAsianPacIslander_d=ifelse (test$race=='Asian-Pac-Islander', 1, 0)
train$raceWhite_d=ifelse (train$race== 'White', 1, 0)
test$raceWhite_d=ifelse (test$race=='White',1, 0)
train$native. countryColumbia_d=ifelse(train$'Columbia',1,0)
test$native. countryColumbia_d=ifelse(test$'Columbia',1,0)

train$native. countrySouth_d=ifelse(train$'South',1,0)
test$native. countrySouth_d=ifelse(test$'South',1,0)

3rd iteration by using significant dummy vars:

mod2=glm (formula=target~age_Young_d+workclassLocalgov_d+workclassMissing_d+workclassPrivate_d+
           educationHSgrad_d+educationMasters_d+educationProfschool_d+educationSomecollege_d+marital. statusWidowed_d+
           marital. statusMarriedAFspouse_d+marital. statusNevermarried_d+marital.statusMarriedcivspouse_d+
           sex+capital. gain+capital. loss+hours.per. week+native.countryColumbia_d+native.countrySouth_d,
         data=train, family='binomial')

Again, getting some insignificant vars. So, to remove those:

mod3=glm (formula=target~age_Young_d+workclassLocalgov_d+workclassMissing_d+workclassPrivate_d+workclassSelfempinc_d+workclassSelfempnotinc_d+workclassStategov_d+fnlwgt+education1st_4th_d+educationAssocacdm_d+educationAssocvoc_d+educationBachelors_d+educationDoctorate_d+educationHSgrad_d+educationMasters_d+educationProfschool_d+educationSomecollege_d+marital. statusWidowed_d+ marital. statusMarriedAFspouse_d+marital. statusNevermarried_d+marital.statusMarriedcivspouse_d+occupationExecmanagerial_d+occupationFarmingfishing_d+occupationHandlerscleaners_d+occupationMachineopinspct_d+occupationOtherservice_d+occupationProfspecialty_d+occupationProtectiveserv_d+occupationSales_d+occupationTechsupport_d+relationshipWife_d+relationshipOwnchild_d+raceWhite_d+sex+capital.gain+capital.loss+hours.per.week+native.countryColumbia_d+native.countrySouth_d, data=train, family='binomial')
# checking VIF value for this model to check multicollinearity
# now all variables are significant and vif value is also okay so this model mod3 is finalized
# Taking top 5 factors most influencing the target variable
head(sort(abs(mod3$coefficients), decreasing = T),6)

Model Validation

pred=0.24, 1, 0)
pred=as.factor (pred)

Confusion matrix is for checking model accuracy:

confusionMatrix (pred, test$target, positive="1")
Confusion Matrix and Statistics
Prediction              0                  1
          0                 5934              374
          1                1477              1984 
Accuracy: 0.8105 95% 
CI: (0.8026, 0.8183) 
No Information Rate: 0.7586
 P-Value [Acc > NIR]: 

In this article, we covered what is confusion matrix, its need, and how to derive it in Python and R. If you wish to learn more about confusion matrix and other concepts of Machine Learning, upskill with Great Learning’s PG program in Machine Learning.

Source :