# Transforming values

Transforming variables can also eliminate outliers. These transformed values reduces the variation caused by extreme values. There are various transformation methods including:

1. Scaling
2. Log transformation
3. Cube Root Normalization
4. Box-Cox transformation

These techniques convert values in the dataset to smaller values. If the data has too many extreme values or is skewed, these methods helps to make your data normal. It is to be noted that there is no lose of data from these methods. However, these technique does not always gives the best results.

INSTRUCTIONS
• First, we would import `Pandas` as `pd`, `Numpy` as `np`, `Seaborn` as `sns`, `Pyplot` as `plt`, and `preprocessing` from `scikit-learn`, and import `scipy`.

``````import pandas as <<your code goes here>>
import numpy as <<your code goes here>>
import seaborn as <<your code goes here>>
import matplotlib.pyplot as <<your code goes here>>
from sklearn import preprocessing
import scipy
``````
• Next, we create a dataset

``````data = {'Name':['Tom', 'Dick', 'Harry', 'Jack', 'Alex', 'Mike', 'John'],
'Age':[20, 21, 19, 99, 23, 18, 98]}
orig = pd.DataFrame(data)
``````
• Now, let's plot the dataset and observe the outliers

``````sns.boxplot(train['Age'])
plt.title("Box Plot after outlier removing")
plt.show()
``````
• First, we would use the `scaling` method for the outliers

``````train = orig.copy()
scaler = preprocessing.StandardScaler()
train['Age'] = scaler.fit_transform(train['Age'].values.reshape(-1,1))
sns.boxplot(train['Age'])
plt.title("Box Plot after outlier removing")
plt.show()
``````
• Next, we would use the `log transformation` method for the outliers

``````train = orig.copy()
train['Age'] = np.log(train['Age'])
sns.boxplot(train['Age'])
plt.title("Box Plot after outlier removing")
plt.show()
``````
• Now, we will use the `cube root transformation` method

``````train = orig.copy()
train['Age'] = (train['Age']**(1/3))
sns.boxplot(train['Age'])
plt.title("Box Plot after outlier removing")
plt.show()
``````
• Finally, we will use the `box transformation` method to remove the outliers

``````train = orig.copy()
train['Age'],fitted_lambda= scipy.stats.boxcox(train['Age'] ,lmbda=None)
sns.boxplot(train['Age'])
plt.title("Box Plot after outlier removing")
plt.show()
``````

No hints are availble for this assesment

Answer is not availble for this assesment

Note - Having trouble with the assessment engine? Follow the steps listed here