Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
Let us now explore a bit deeper about the data.
Note:
df.loc
of pandas is used to access a group of rows and columns of data frame df
by label(s) or a boolean array. .loc[]
is primarily label based, but may also be used with a boolean array.
df.value_counts()
of pandas returns a series containing counts of unique values.
The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.
round()
function of Python returns a floating-point number that is a rounded version of the specified number, with the specified number of decimals.
Store all the rows with all columns except the "Class" column into X
, the feature set.
X = data.loc[:, data.columns != 'Class']
Store "Class" values into Y
, the label set.
y = data.loc[:, data.columns == 'Class']
Print the value counts of frauds and non-frauds in the data
using value_counts()
on data['Class']
.
print(data['Class'].<< your code comes here >>)
Observe, there are more of non-fraud transactions compared to fraudulent transactions.
The value_counts()
method returned them in decreasing order of counts.
Calculate the percentage of Fraud and Non-fraud transactions.
print('Valid Transactions: ', round(data['Class'].value_counts()[0]/len(data) * 100,2), '% of the dataset')
print('Fraudulent Transactions: ', round(data['Class'].value_counts()[1]/len(data) * 100,2), '% of the dataset')
We observe that there is a very high class-imbalance.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...