Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
Now we will continue to load the dataset that we cloned in the previous step.
Please follow the below steps:
import numpy as np
import os
Now we will use pandas to load data from a large csv file (California housing dataset) and create a small csv file (of housing data) by extracting only few rows of data from this large housing.csv file.
We are creating a smaller csv file of data, just for our convenience, to make it easy for us to load it using loadtxt() function.
Don't worry if you don't know pandas yet, just copy and use the below pandas code as it is.
import pandas as pd
# defining housing.csv file path
HOUSING_PATH = '/cxldata/datasets/project/housing'
# reading the large housing.csv file using pandas
housing_raw = pd.read_csv(os.path.join(HOUSING_PATH, "housing.csv"))
# extracting only a few rows (5 rows) of data from the pandas dataframe 'my_df'
my_df = housing_raw.iloc[ : 5]
# creating a new small csv file - 'housing_short.csv' - containing the above extracted 5 rows of data
my_df.to_csv('housing_short.csv', index=False)
Now, let us load the csv file - housing_short.csv
- using NumPy's loadtxt()
function
please define a variable called FILE and assign to it the string value housing_short.csv
.
FILE = '<<your code comes here>>'
Please define a function called load_housing_data()
, as shown below, which takes filename (FILE) as input and loads this file using NumPy's loadtxt()
function. Just copy the below code as it is.
def load_housing_data(file = FILE ):
return np.loadtxt(file, dtype={'names': ('longitude','latitude','housing_median_age','total_rooms','total_bedrooms','population','households','median_income','median_house_value','ocean_proximity'),'formats': ('f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', '|S15')}, delimiter=',', skiprows=1, unpack=True)
first parameter - file. It is the name of the file from which the data is to be loaded.
second parameter - data type dtype
of columns of the loaded csv file housing_short.csv
. It is a Python dictionary with key as names
of the columns, and values
as the data types of these respective columns e.g. f8, |S15, etc.
'f8' means 64-bit floating-point number
'|S15' -means a string of length of 15 characters
third parameter - delimiter. It is the character by which values in a row of our csv file are separated. For example, in our case values of a row of our csv file - housing_short.csv
- are separated by ',' (comma)
fourth parameter - skiprows. You can specify here, how many initial rows of the csv file you want to skip loading. E.g. you may want to skip the first row of this csv file, as it may contain header information in the first row, which you may not want to load.
fifth parameter - unpack. When unpack is True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...)
. When used with a structured data-type, arrays are returned for each field. The default value for unpack is False. But here we are returning the individual arrays so we have kept it here asTrue.
Please call the above defined load_housing_data()
function, which returns various column values as NumPy arrays
longitude_arr,latitude_arr,housing_median_age_arr,total_rooms_arr,total_bedrooms_arr,population_arr,households_arr,median_income_arr,median_house_value_arr,ocean_proximity_arr = load_housing_data()
You can just check and confirm the values of one of the NumPy arrays (say median_house_value_arr
) that you got above by printing the same using print()
function
print(<<your code comes here>>)
median_house_value_arr
contains values of median_house_value column of the csv file - housing_short.csv
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...