Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left

  Apply Now

Project - Credit Card Fraud Detection using Machine Learning

22 / 25

Get Confusion matrix and Recall

Let us predict the labels for train and test data, get the confusion matrix, and calculate the recall values.

Note:

  • confusion_matrix: computes confusion matrix to evaluate the accuracy of classification.

    • By definition, a confusion matrix C is such that Ci,j is equal to the number of observations known to be in the group i and predicted to be in group j.

    • Thus in binary classification, the count of true negatives is C00, false negatives is C10, true positives is C11 and false positives is C01.

  • recall is calculated by (true positives)/(true positives + false negatives). Note that we are calculating recall value because we want to detect fraudulent credit card transactions. It might be tolerable to classify some valid transactions as fraudulent, but it is not tolerable to misclassify the fraudulent transactions as valid ones.

INSTRUCTIONS
  • Store the best estimator from the gridsearchcv in lr_gridcv_best.

    lr_gridcv_best = clf.best_estimator_
    
  • Use predict method of lr_gridcv_best on X_test and store the predictions in y_test_pre.

    y_test_pre = lr_gridcv_best.<< your code comes here >>(X_test)
    
  • Call the confusion_matrix function imported from sklearn.metrics. Pass y_test, y_test_pre as arguments.

    cnf_matrix_test = << your code comes here >>(y_test, y_test_pre)
    
  • Calculate the recall for test data predictions by the best model.

    print("Recall metric in the test dataset:", (cnf_matrix_test[1,1]/(cnf_matrix_test[1,0]+cnf_matrix_test[1,1] )))
    
  • Use predict method of lr_gridcv_best on X_train_res and store the predictions in y_train_pre.

    y_train_pre = lr_gridcv_best.<< your code comes here >>(X_train_res)
    
  • Call the confusion_matrix function imported from sklearn.metrics. Pass y_train_res, y_train_pre as arguments.

    cnf_matrix_train = << your code comes here >>(y_train_res, y_train_pre)
    
  • Calculate the recall for resampled train data predictions by the best model.

    print("Recall metric in the train dataset:", (cnf_matrix_train[1,1]/(cnf_matrix_train[1,0]+cnf_matrix_train[1,1] )))
    
Get Hint See Answer


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...