Previous Index Next

Statistics of IRIS dataset

The Iris dataset is a famous dataset that is used to train the basics of Machine Learning. You can find more about the Iris dataset from it's Wikipedia page using the below link:

https://en.wikipedia.org/wiki/Iris_flower_data_set

The Iris dataset can be found in the dataset collection of sklearn. It can be loaded as follows:

from sklearn import datasets
datasets.load_iris()

However, this gives you a Pandas DataFrame with a few ndarray components. The data component consists of the actual data. The target component contains the targets. The target_names contains the names of the species of Iris flowers. The DESCR contains description of the dataset.

Here, you will be calculating the mean, median, and standard deviation of a particular column of this Iris dataset.

INSTRUCTIONS

Load the Iris dataset from sklearn and save it in a variable named iris_df
Save the data component in a Pandas DataFrame called data
Name the first, second,third, forth columns as sepal_length, sepal_width, petal_length and petal_width respectively.
Save the target component in a Pandas DataFrame called target
Name the column in target dataframe as species.
Merge data and target to form a single dataset by mapping each row of data to it's respective target and save them in the iris variable
Assign the mean of the sepal_length column to the sepal_len_mean variable.
Assign the median of the sepal_width column to the sepal_width_median variable.
Assign the standard deviation of the petal_length column to the petal_len_std variable.
Assign the minimum value of the petal_width column to the petal_width_min variable.
Assign the number of distinct values of the species column to the num_of_species variable.
Once done, click on the Submit Answer button given above

See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here

Problems on Data Cleaning and Processing For Machine Learning

Statistics of IRIS dataset

XP

Loading comments...