Login using Social Account
     Continue with GoogleLogin using your credentials
As the name suggests, DBSCAN is a density-based and unsupervised machine learning algorithm. It takes multi-dimensional data as inputs and clusters them according to the model parameters — e.g. epsilon and minimum samples. Based on these parameters, the algorithm determines whether certain values in the dataset are outliers or not.
Scikit-learn has a DBSCAN module as part of its unsupervised machine learning algorithms. This algorithm has many real life implementation when it comes to detecting outliers, for example we can use it in fraud detection for credit card transactions. Here, we will demonstrate how to detect outliers in the Iris dataset.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn import datasets
df = pd.read_csv("https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv")
print(df.head())
data = df[["sepal_length", "sepal_width"]]
model = DBSCAN(eps = 0.4, min_samples = 10).fit(data)
colors = model.labels_
plt.scatter(data["sepal_length"], data["sepal_width"], c = colors)
outliers = data[model.labels_ == -1]
print(outliers)
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...