Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
Transforming variables can also eliminate outliers. These transformed values reduces the variation caused by extreme values. There are various transformation methods including:
These techniques convert values in the dataset to smaller values. If the data has too many extreme values or is skewed, these methods helps to make your data normal. It is to be noted that there is no lose of data from these methods. However, these technique does not always gives the best results.
First, we would import Pandas
as pd
, Numpy
as np
, Seaborn
as sns
, Pyplot
as plt
, and preprocessing
from scikit-learn
, and import scipy
.
import pandas as <<your code goes here>>
import numpy as <<your code goes here>>
import seaborn as <<your code goes here>>
import matplotlib.pyplot as <<your code goes here>>
from sklearn import preprocessing
import scipy
Next, we create a dataset
data = {'Name':['Tom', 'Dick', 'Harry', 'Jack', 'Alex', 'Mike', 'John'],
'Age':[20, 21, 19, 99, 23, 18, 98]}
orig = pd.DataFrame(data)
Now, let's plot the dataset and observe the outliers
sns.boxplot(train['Age'])
plt.title("Box Plot after outlier removing")
plt.show()
First, we would use the scaling
method for the outliers
train = orig.copy()
scaler = preprocessing.StandardScaler()
train['Age'] = scaler.fit_transform(train['Age'].values.reshape(-1,1))
sns.boxplot(train['Age'])
plt.title("Box Plot after outlier removing")
plt.show()
Next, we would use the log transformation
method for the outliers
train = orig.copy()
train['Age'] = np.log(train['Age'])
sns.boxplot(train['Age'])
plt.title("Box Plot after outlier removing")
plt.show()
Now, we will use the cube root transformation
method
train = orig.copy()
train['Age'] = (train['Age']**(1/3))
sns.boxplot(train['Age'])
plt.title("Box Plot after outlier removing")
plt.show()
Finally, we will use the box transformation
method to remove the outliers
train = orig.copy()
train['Age'],fitted_lambda= scipy.stats.boxcox(train['Age'] ,lmbda=None)
sns.boxplot(train['Age'])
plt.title("Box Plot after outlier removing")
plt.show()
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...