 # Deleting observations

If the outliers are small in number, or were caused by data entry or data processing error then we can delete the outlier values. We can also use trimming at both ends to remove outliers. But we must always remember that deleting the observation is not a good idea when we have small dataset.

Here, we will first observer a dataset without outliers. Next, we will delete those outliers, and then visualize the dataset once again.

INSTRUCTIONS
• We will start by importing `Pandas` as `pd`, `Numpy` as `np`, `Seaborn` as `sns`, and `Pyplot` as `plt`

``````import pandas as <<your code goes here>>
import numpy as <<your code goes here>>
import seaborn as <<your code goes here>>
import matplotlib.pyplot as <<your code goes here>>
``````
• Next, we will define a dataframe consisting of names and age of 6 different people

``````data = {'Name':['Tom', 'Dick', 'Harry', 'Jack', 'Alex', 'Mike'],
'Age':[20, 21, 19, 99, 23, 18]}
train = pd.DataFrame(data)
``````
• Now let's plot the dataframe and observe the outliers

``````sns.boxplot(train['Age'])
plt.title("Box Plot before outlier removing")
plt.show()
``````
• Next, we will define a function `drop_outliers` which will take a dataframe and a corresponsing column name, check for outliers using the IQR method, and finally drop the outliers from that dataframe

``````def <<your code goes here>>(df, field_name):
iqr = 1.5 * (np.percentile(df[field_name], 75) - np.percentile(df[field_name], 25))
df.drop(df[df[field_name] > (iqr + np.percentile(df[field_name], 75))].index, inplace=True)
df.drop(df[df[field_name] < (np.percentile(df[field_name], 25) - iqr)].index, inplace=True)
``````
• Now let's call this function with our dataset

``````<<your code goes here>>(train, 'Age')
``````
• We have dropped the outliers from our dataset, now let's visualize it once again

``````sns.boxplot(train['Age'])
plt.title("Box Plot after outlier removing")
plt.show()
``````

Observe that the outliers have been dropped from the dataset.

No hints are availble for this assesment

Answer is not availble for this assesment

Note - Having trouble with the assessment engine? Follow the steps listed here