Project - Credit Card Fraud Detection using Machine Learning

9 / 25

Exploring the Class Column

Let us now explore a bit deeper about the data.

  1. Let us first divide the data into features and labels.
  2. Then we shall calculate the percentage of the fraud transaction and valid transactions in the dataset and graphically represent the same.


  • df.loc of pandas is used to access a group of rows and columns of data frame df by label(s) or a boolean array. .loc[] is primarily label based, but may also be used with a boolean array.

  • df.value_counts() of pandas returns a series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.

  • round() function of Python returns a floating-point number that is a rounded version of the specified number, with the specified number of decimals.

  • Store all the rows with all columns except the "Class" column into X, the feature set.

    X = data.loc[:, data.columns != 'Class']
  • Store "Class" values into Y, the label set.

    y = data.loc[:, data.columns == 'Class']
  • Print the value counts of frauds and non-frauds in the data using value_counts() on data['Class'].

    print(data['Class'].<< your code comes here >>)

    Observe, there are more of non-fraud transactions compared to fraudulent transactions. The value_counts() method returned them in decreasing order of counts.

  • Calculate the percentage of Fraud and Non-fraud transactions.

    print('Valid Transactions: ', round(data['Class'].value_counts()[0]/len(data) * 100,2), '% of the dataset')
    print('Fraudulent Transactions: ', round(data['Class'].value_counts()[1]/len(data) * 100,2), '% of the dataset')

    We observe that there is a very high class-imbalance.

Get Hint See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...