Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
As we discussed earlier, there are two ways (constructs) in NumPy to load data from a text file:
(1) using loadtxt()
function
(2) using genfromtxt()
function
Below is an example of using genfromtxt()
function
genfromtxt()
genfromtxt()
function is very helpful when you are expecting some missing values in the dataset to be loaded. Below is a sample code
import numpy as np
my_arr = np.genfromtxt('my_file.txt', skip_header=2, filling_values=9999999)
Here, if all your data in the dataset is of type integer then, by default, the string values are treated as missing values, and genfromtxt()
function will replace these missing values (string values) with a nan
value.
If you want the missing values to be replaced with some other value other than nan
, then, you can specify this particular value in the filling_values
parameter. For example, in the above code, we are saying that if any missing values found, please replace it with value 9999999.
genfromtxt()
function also trims any white spaces around the values being loaded.
You can also specify if you want to load any maximum number of rows, in this case, only specified number of max. rows will be loaded.
Please follow the below steps:
(1) Please import the required libraries
import numpy as np
import os
(2) Please create a variable HOUSING_PATH and assign to it the path of housing.csv
file
('/cxldata/datasets/project/housing'
) as a string
HOUSING_PATH = <<your code comes here>>
(3) Please define a complete path for your csv file housing.csv
by using os.path.join()
function, by passing to it the HOUSING_PATH and the csv file housing.csv
, and save this complete path in a variable FILE.
FILE = os.path.join(HOUSING_PATH, <<your code comes here>>)
(4) Please define a function load_housing_dataset()
and add to it the complete path of the csv file (FILE) just defined above. This function will load housing.csv
file using genfromtxt()
function.
def <<your code comes here>>(file =FILE ):
return np.genfromtxt(file, dtype={'names': ('longitude','latitude','housing_median_age','total_rooms','total_bedrooms','population','households','median_income','median_house_value','ocean_proximity'),'formats': ('f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', '|S15')}, delimiter=',', skip_header=1, filling_values = 99999999, unpack=False)
genfromtxt()
function parameters:
first parameter - name of the file from which the data is to be loaded.
second parameter - data type (dtype) of columns of the loaded csv file housing.csv
. It is a Python dictionary with key as 'names' of the columns, and 'values' as the data types of these respective columns e.g. f8, |S15, etc.
'f8' means 64-bit floating-point number '|S15' -means a string of length of 15 characters
third parameter - delimiter. Character by which values in a row of our csv file are separated. For example, in our case values of a row of our csv file housing.csv
are separated by ',' (comma)
fourth parameter - skiprows. You can specify here, how many initial rows of the csv file you want to skip loading. E.g. you may want to skip the first row of this csv file, as it may contain header information in the first row, which you may not want to load.
fifth parameter - unpack. Same meaning as explained in loadtxt()
function chapter.
(5) Call the load_housing_dataset()
function, as defined above, and store the output in a variable called result_arr
result_arr = <<your code comes here>>()
(6) Print the length (number of records) of result_arr
print(<<your code comes here>>)
(7) Print array result_arr
to see its values.
print(<<your code comes here>>)
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...