Pandas for Machine Learning

2 / 5

Pandas - DataFrame - Loading the dataset from various data sources

A dataset can be loaded from various data sources using relevant Pandas constructs (functions) as mentioned below:

  • CSV file - read_csv() function
  • JSON file - read_json() function
  • Excel file - read_excel() function
  • Database table - read_sql() function

All the above functions return a dataframe object and most of these functions have a parameter called 'chunksize'.

e.g. to load a JSON data file (myfile.json) you can use the below code

my_df = pd.read_json("myfile.json")

Here, my_df is a pandas dataframe object.

chunksize - It is the number of rows(records) of the dataset (csv, excel, json, table, etc.) which you want to be returned in each chunk.

When you use this parameter - chunksize, these functions (read_csv(), read_sql(), etc.) return you an iterator which enable you to traverse through these chunks of data, where each chunk is of size as specified by chunksize parameter.

This 'chunksize' parameter is very useful when you are dealing with (loading) a large dataset and you have very limited memory (RAM) available on your machine. If 'chunksize' parameter is specified, only a chunk of data will be read into the dataframe at a time. Hence, if your specified chunksize is within your memory (RAM) limits, you can easily load large datasets using these constructs/functions of Pandas.

INSTRUCTIONS

Note: These instructions assume that you have completed the chapter on Numpy and have the necessary file housing_short.csv in the correct directory. In case you don't have please go to the chapter of Numpy and complete the section on loading text file data first.

Please follow the below steps:

Please import pandas as pd

Loading dataset from a CSV file

(1) Please load the data from /cxldata/datasets/project/housing_short.csv file by passing it to the read_csv() function of Pandas library and store the returned dataframe in a variable called 'mydf'

<<your code comes here>> = pd.read_csv("<<your csv file name comes here>>", index_col=0)

(2) Use describe() function of pandas dataframe to see the data in this 'mydf' dataframe.

mydf.<<your code comes here>>

No hints are availble for this assesment

Answer is not availble for this assesment


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...