Login using Social Account
     Continue with GoogleLogin using your credentials
It is a clustering algorithm that belongs to the ensemble decision trees family and is similar in principle to Random Forest.
In this example, we will use the titanic dataset to determine outliers in the 'Fare' column.
First, we will import the IsolationForest
module from sklearn
, Numpy
as np
, and Pandas
as pd
from sklearn.ensemble import <<your code goes here>>
import numpy as <<your code goes here>>
import <<your code goes here>> as pd
Next, we will load the dataset using the read_csv
from Pandas
train = pd.<<your code goes here>>('/cxldata/datasets/project/titanic/train.csv')
Now, we will define a function named iso_forest
to calculate the outliers using this method
def <<your code goes here>>(df):
iso = IsolationForest( behaviour = 'new', random_state = 1, contamination= 'auto')
preds = iso.fit_predict(df.values.reshape(-1,1))
data = pd.DataFrame()
data['cluster'] = preds
print(data['cluster'].value_counts().sort_values(ascending=False))
Finally, we will call this function using our dataset
iso_forest(train['Fare'])
From the result we can see that there are 182
outliers in the dataset corresponding to the Fare
column
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...