Machine Learning Prerequisites (Numpy)

14 / 32

Numpy - Arrays - Loading a text file data using NumPy's loadtxt() function - Step 2

Now we will continue to load the dataset that we cloned in the previous step.


Please follow the below steps:

(1) Import the required libraries

import numpy as np
import os

(2) Load using pandas

Now we will use pandas to load data from a large csv file (California housing dataset) and create a small csv file (of housing data) by extracting only few rows of data from this large housing.csv file.

We are creating a smaller csv file of data, just for our convenience, to make it easy for us to load it using loadtxt() function.

Don't worry if you don't know pandas yet, just copy and use the below pandas code as it is.

import pandas as pd
# defining housing.csv file path
HOUSING_PATH =  '/cxldata/datasets/project/housing'
# reading the large housing.csv file using pandas
housing_raw = pd.read_csv(os.path.join(HOUSING_PATH, "housing.csv"))
# extracting only a few rows (5 rows) of data from the pandas dataframe 'my_df'
my_df = housing_raw.iloc[ : 5]
# creating a new small csv file - 'housing_short.csv' - containing the above extracted 5 rows of data
my_df.to_csv('housing_short.csv', index=False)

(3) Load using Numpy

Now, let us load the csv file - housing_short.csv - using NumPy's loadtxt() function

please define a variable called FILE and assign to it the string value housing_short.csv.

FILE = '<<your code comes here>>'

(4) Create Function

Please define a function called load_housing_data(), as shown below, which takes filename (FILE) as input and loads this file using NumPy's loadtxt() function. Just copy the below code as it is.

def load_housing_data(file = FILE ):
    return np.loadtxt(file, dtype={'names': ('longitude','latitude','housing_median_age','total_rooms','total_bedrooms','population','households','median_income','median_house_value','ocean_proximity'),'formats': ('f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', '|S15')}, delimiter=',', skiprows=1, unpack=True)

loadtxt() function parameters

first parameter - file. It is the name of the file from which the data is to be loaded.

second parameter - data type dtype of columns of the loaded csv file housing_short.csv. It is a Python dictionary with key as names of the columns, and values as the data types of these respective columns e.g. f8, |S15, etc.

'f8' means 64-bit floating-point number

'|S15' -means a string of length of 15 characters

third parameter - delimiter. It is the character by which values in a row of our csv file are separated. For example, in our case values of a row of our csv file - housing_short.csv - are separated by ',' (comma)

fourth parameter - skiprows. You can specify here, how many initial rows of the csv file you want to skip loading. E.g. you may want to skip the first row of this csv file, as it may contain header information in the first row, which you may not want to load.

fifth parameter - unpack. When unpack is True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...). When used with a structured data-type, arrays are returned for each field. The default value for unpack is False. But here we are returning the individual arrays so we have kept it here asTrue.

(5) Call the Function

Please call the above defined load_housing_data() function, which returns various column values as NumPy arrays

longitude_arr,latitude_arr,housing_median_age_arr,total_rooms_arr,total_bedrooms_arr,population_arr,households_arr,median_income_arr,median_house_value_arr,ocean_proximity_arr = load_housing_data()

(6) Print

You can just check and confirm the values of one of the NumPy arrays (say median_house_value_arr) that you got above by printing the same using print() function

print(<<your code comes here>>)

median_house_value_arr contains values of median_house_value column of the csv file - housing_short.csv

Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...