To get a better understanding of the data, we plot histogram for each numerical attribute. It shows us the number of instances that lie between a particular range.

Let's plot a histogram for an arbitrarily chosen dataset-

So, on seeing the above histogram, we can conclude that-

- There are 100 instances in the dataset whose value lie between 0 and 1.
- There are 40 instances in the dataset whose value lie between 1 and 2.
- There are 20 instances in the dataset whose value lie between 2 and 3.
- There are 60 instances in the dataset whose value lie between 3 and 4.
- There are 80 instances in the dataset whose value lie between 4 and 5.

We do this generally for numerical attributes as we can see the count of instances belonging to each category of a categorical attribute by `value_counts()`

method of the DataFrame object which we have done before, because it gives us exact figures of the count.

We plot a histogram by calling the `hist()`

method of the DataFrame object. It calls the `hist()`

method of `matplotlib.pyplot`

internally , on each attribute in the DataFrame, resulting in one histogram per column. Hence, we have to first import `matplotlib.pyplot`

to make it work.

Here, `matplotlib`

is a module and `pyplot`

is a sub-module of it. Most of the `matplotlib`

utilities lie under `pyplot`

. It is generally imported under the `plt`

alias.

