Problems on Data Cleaning and Processing For Machine Learning

8 / 12

Statistics of IRIS dataset

The Iris dataset is a famous dataset that is used to train the basics of Machine Learning. You can find more about the Iris dataset from it's Wikipedia page using the below link:

The Iris dataset can be found in the dataset collection of sklearn. It can be loaded as follows:

from sklearn import datasets

However, this gives you a Pandas DataFrame with a few ndarray components. The data component consists of the actual data. The target component contains the targets. The target_names contains the names of the species of Iris flowers. The DESCR contains description of the dataset.

Here, you will be calculating the mean, median, and standard deviation of a particular column of this Iris dataset.

  • Load the Iris dataset from sklearn and save it in a variable named iris_df
  • Save the data component in a Pandas DataFrame called data
  • Name the first, second,third, forth columns as sepal_length, sepal_width, petal_length and petal_width respectively.
  • Save the target component in a Pandas DataFrame called target
  • Name the column in target dataframe as species.
  • Merge data and target to form a single dataset by mapping each row of data to it's respective target and save them in the iris variable
  • Assign the mean of the sepal_length column to the sepal_len_mean variable.
  • Assign the median of the sepal_width column to the sepal_width_median variable.
  • Assign the standard deviation of the petal_length column to the petal_len_std variable.
  • Assign the minimum value of the petal_width column to the petal_width_min variable.
  • Assign the number of distinct values of the species column to the num_of_species variable.
  • Once done, click on the Submit Answer button given above
See Answer

No hints are availble for this assesment

Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...