End-to-End ML Project- Beginner friendly

You are currently auditing this course.
39 / 94

Plotting histogram

To get a better understanding of the data, we plot histogram for each numerical attribute. It shows us the number of instances that lie between a particular range.

Let's plot a histogram for an arbitrarily chosen dataset-

enter image description here

So, on seeing the above histogram, we can conclude that-

  • There are 100 instances in the dataset whose value lie between 0 and 1.
  • There are 40 instances in the dataset whose value lie between 1 and 2.
  • There are 20 instances in the dataset whose value lie between 2 and 3.
  • There are 60 instances in the dataset whose value lie between 3 and 4.
  • There are 80 instances in the dataset whose value lie between 4 and 5.

We do this generally for numerical attributes as we can see the count of instances belonging to each category of a categorical attribute by value_counts() method of the DataFrame object which we have done before, because it gives us exact figures of the count.

We plot a histogram by calling the hist() method of the DataFrame object. It calls the hist() method of matplotlib.pyplot internally , on each attribute in the DataFrame, resulting in one histogram per column. Hence, we have to first import matplotlib.pyplot to make it work.

Here, matplotlib is a module and pyplot is a sub-module of it. Most of the matplotlib utilities lie under pyplot. It is generally imported under the plt alias.


No hints are availble for this assesment

Answer is not availble for this assesment

Loading comments...