# End-to-End ML Project - Visualize the geographic distribution of the data

In this step we will visualize how the income categories are distributed geographically. This will give us a better understanding of how the housing prices are very much related to the location (e.g., close to the ocean) and to the population density. We will do this by creating a scatter plot.

INSTRUCTIONS
• First, create a copy of the `strat_train_set` dataset and save it in the `housing` variable using the `copy` method

``````housing = strat_train_set.<<your code goes here>>()
``````
• Now let's plot the scatter plot using Matplotlib as shown below. Please copy the code as is.

``````import matplotlib.image as mpimg
ax = housing.plot(kind="scatter", x="longitude", y="latitude", figsize=(10,7),
s=housing['population']/100, label="Population",
c="median_house_value", cmap=plt.get_cmap("jet"),
colorbar=False, alpha=0.4,
)
plt.imshow(california_img, extent=[-124.55, -113.80, 32.45, 42.05], alpha=0.5,
cmap=plt.get_cmap("jet"))
plt.ylabel("Latitude", fontsize=14)
plt.xlabel("Longitude", fontsize=14)

prices = housing["median_house_value"]
tick_values = np.linspace(prices.min(), prices.max(), 11)
cbar = plt.colorbar(ticks=tick_values/prices.max())
cbar.ax.set_yticklabels(["\$%dk"%(round(v/1000)) for v in tick_values], fontsize=14)
cbar.set_label('Median House Value', fontsize=16)

plt.legend(fontsize=16)
plt.show()
``````

Here we are using the `imread` method to load the PNG image of California that is set as a background to the scatter plot. The `xlabel` and `ylabel` methods sets the labels for x- and y-axis. We show the scatter plot using `imshow` method where we have used the `cmap` parameter to fix the color map, this is used to map scalar data to colors. The `linspace` method returns evenly spaced numbers over a specified interval.

