End-to-End ML Project- Beginner friendly

57 / 95

Visualizing the target variable

In the scatter plot created, we can easily see the high-density areas (areas with large population) and low-density areas. But we still can't see any highlighting pattern in the plot. So, let's play more with the parameters of the plot() method of the DataFrame object to identify some distinguishable patterns.

Here comes the best thing about the DataFrame.plot() method. As it by default uses thematplotlib backend, it also supports passing the parameters to matplotlib plotting method. We can pass those parameters simply like we pass parameters of DataFrame.plot(). So, we'll also play with the parameters of matplotlib.pyplot.scatter for better visualizations.

We have already experimented with many values of the parameters. We got the best result when the parameters were tuned to-

  1. s- It specifies the marker size. Markers are the circles that represent data points in plots. We make it directly proportional to the population attribute so that the blocks with a higher population are represented by the markers with a larger radius and the blocks with a lower population are represented by the markers with a smaller radius. We then divide the term by 100 so that the marker size always remains reasonable. So, we set the value to train_copy["population"]/100,

  2. c- It specifies the marker color. We set its value to attribute median_house_value. So, our plot represents different ranges of house prices with different colors.

  3. cmap- It specifies the colormap instance to be used to represent a different range of the attribute specified to the c parameter with different colors. You can think of a colormap as a dictionary that maps the integer data to colors. It is used to distinguish between the data. We set its value to plt.get_cmap("jet"). jet is a commonly used color map.

  4. colorbar- If true, it plots the colorbar. Colorbar is the visualization of the mapping and tells us which color represents which range. We set it True.

  5. figsize- It specifies the size of the plot. We set its value to (10,7).

  6. alpha- We set alpha to 0.4.


Plot a scatterplot between the attributes latitude (represented in the x-axis) and longitude (represented in the y-axis) of the DataFrame train_copy.

Set the value of the parameters as specified above for each in highlighted text.

Get Hint See Answer

Loading comments...