Only Few Seats Left for Advanced Certification Courses on Data Science, ML & AI by E&ICT Academy IIT RoorkeeApply Now
Let us now explore a bit deeper about the data.
df.loc of pandas is used to access a group of rows and columns of data frame
df by label(s) or a boolean array.
.loc is primarily label based, but may also be used with a boolean array.
df.value_counts() of pandas returns a series containing counts of unique values.
The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.
round() function of Python returns a floating-point number that is a rounded version of the specified number, with the specified number of decimals.
Store all the rows with all columns except the "Class" column into
X, the feature set.
X = data.loc[:, data.columns != 'Class']
Store "Class" values into
Y, the label set.
y = data.loc[:, data.columns == 'Class']
Print the value counts of frauds and non-frauds in the
print(data['Class'].<< your code comes here >>)
Observe, there are more of non-fraud transactions compared to fraudulent transactions.
value_counts() method returned them in decreasing order of counts.
Calculate the percentage of Fraud and Non-fraud transactions.
print('Valid Transactions: ', round(data['Class'].value_counts()/len(data) * 100,2), '% of the dataset') print('Fraudulent Transactions: ', round(data['Class'].value_counts()/len(data) * 100,2), '% of the dataset')
We observe that there is a very high class-imbalance.
No hints are availble for this assesment
Answer is not availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here