End-to-End ML Project - California Housing

7 / 17

End-to-End ML Project - Visualize the geographic distribution of the data

In this step we will visualize how the income categories are distributed geographically. This will give us a better understanding of how the housing prices are very much related to the location (e.g., close to the ocean) and to the population density. We will do this by creating a scatter plot.

INSTRUCTIONS
  • First, create a copy of the strat_train_set dataset and save it in the housing variable using the copy method

    housing = strat_train_set.<<your code goes here>>()
    
  • Now let's plot the scatter plot using Matplotlib as shown below. Please copy the code as is.

    import matplotlib.image as mpimg
    california_img=mpimg.imread('/cxldata/datasets/project/housing/california.png')
    ax = housing.plot(kind="scatter", x="longitude", y="latitude", figsize=(10,7),
                           s=housing['population']/100, label="Population",
                           c="median_house_value", cmap=plt.get_cmap("jet"),
                           colorbar=False, alpha=0.4,
                          )
    plt.imshow(california_img, extent=[-124.55, -113.80, 32.45, 42.05], alpha=0.5,
               cmap=plt.get_cmap("jet"))
    plt.ylabel("Latitude", fontsize=14)
    plt.xlabel("Longitude", fontsize=14)
    
    prices = housing["median_house_value"]
    tick_values = np.linspace(prices.min(), prices.max(), 11)
    cbar = plt.colorbar(ticks=tick_values/prices.max())
    cbar.ax.set_yticklabels(["$%dk"%(round(v/1000)) for v in tick_values], fontsize=14)
    cbar.set_label('Median House Value', fontsize=16)
    
    plt.legend(fontsize=16)
    plt.show()
    

    Here we are using the imread method to load the PNG image of California that is set as a background to the scatter plot. The xlabel and ylabel methods sets the labels for x- and y-axis. We show the scatter plot using imshow method where we have used the cmap parameter to fix the color map, this is used to map scalar data to colors. The linspace method returns evenly spaced numbers over a specified interval.

See Answer

No hints are availble for this assesment


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...