# Isolation Forest

It is a clustering algorithm that belongs to the ensemble decision trees family and is similar in principle to Random Forest.

1. It classify the data point to outlier and not outliers and works great with very high dimensional data.
2. It works based on decision tree and it isolate the outliers.
3. If the result is -1, it means that this specific data point is an outlier. If the result is 1, then it means that the data point is not an outlier.

In this example, we will use the titanic dataset to determine outliers in the 'Fare' column.

INSTRUCTIONS
• First, we will import the `IsolationForest` module from `sklearn`, `Numpy` as `np`, and `Pandas` as `pd`

``````from sklearn.ensemble import <<your code goes here>>
import numpy as <<your code goes here>>
import <<your code goes here>> as pd
``````
• Next, we will load the dataset using the `read_csv` from `Pandas`

``````train = pd.<<your code goes here>>('/cxldata/datasets/project/titanic/train.csv')
``````
• Now, we will define a function named `iso_forest` to calculate the outliers using this method

``````def <<your code goes here>>(df):
iso = IsolationForest( behaviour = 'new', random_state = 1, contamination= 'auto')
preds = iso.fit_predict(df.values.reshape(-1,1))
data = pd.DataFrame()
data['cluster'] = preds
print(data['cluster'].value_counts().sort_values(ascending=False))
``````
• Finally, we will call this function using our dataset

``````iso_forest(train['Fare'])
``````

From the result we can see that there are `182` outliers in the dataset corresponding to the `Fare` column

No hints are availble for this assesment

Answer is not availble for this assesment

Note - Having trouble with the assessment engine? Follow the steps listed here