Login using Social Account
     Continue with GoogleLogin using your credentials
If the outliers are small in number, or were caused by data entry or data processing error then we can delete the outlier values. We can also use trimming at both ends to remove outliers. But we must always remember that deleting the observation is not a good idea when we have small dataset.
Here, we will first observer a dataset without outliers. Next, we will delete those outliers, and then visualize the dataset once again.
We will start by importing Pandas
as pd
, Numpy
as np
, Seaborn
as sns
, and Pyplot
as plt
import pandas as <<your code goes here>>
import numpy as <<your code goes here>>
import seaborn as <<your code goes here>>
import matplotlib.pyplot as <<your code goes here>>
Next, we will define a dataframe consisting of names and age of 6 different people
data = {'Name':['Tom', 'Dick', 'Harry', 'Jack', 'Alex', 'Mike'],
'Age':[20, 21, 19, 99, 23, 18]}
train = pd.DataFrame(data)
Now let's plot the dataframe and observe the outliers
sns.boxplot(train['Age'])
plt.title("Box Plot before outlier removing")
plt.show()
Next, we will define a function drop_outliers
which will take a dataframe and a corresponsing column name, check for outliers using the IQR method, and finally drop the outliers from that dataframe
def <<your code goes here>>(df, field_name):
iqr = 1.5 * (np.percentile(df[field_name], 75) - np.percentile(df[field_name], 25))
df.drop(df[df[field_name] > (iqr + np.percentile(df[field_name], 75))].index, inplace=True)
df.drop(df[df[field_name] < (np.percentile(df[field_name], 25) - iqr)].index, inplace=True)
Now let's call this function with our dataset
<<your code goes here>>(train, 'Age')
We have dropped the outliers from our dataset, now let's visualize it once again
sns.boxplot(train['Age'])
plt.title("Box Plot after outlier removing")
plt.show()
Observe that the outliers have been dropped from the dataset.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...