Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left

  Apply Now

End-to-End ML Project- Beginner friendly

67 / 94

Prepare the Data for Machine Learning Algorithms

Now, as we have gained several insights about the data, let's use them to prepare the data and make it ready to be fed to machine learning algorithms. This step is also called Preprocessing the Data.

It is always advisable to write functions for this, instead of performing the task manually, because-

  1. We can use these functions in our live system to transform the new data before feeding it to our algorithms.

  2. We can use these functions to transform the fresh data (whenever we get it) easily so that we get to train the model further.

  3. We can also use these functions in our future projects.

  4. We can try various transformations easily and see which combination of transformations works best.

But before preprocessing the data, let's separate the target variable and the features. We can do that by using the drop() method of the DataFrame object. Its syntax is-

DataFrame.drop(attribute)

where DataFrame is the name of the DataFrame and attribute is the attribute we want to drop. Remember, the attribute name should be enclosed within single or double quotes. If we want to drop multiple attributes, we specify them in a list.

The drop() method also has a very important parameter, which is axis. We can specify its value as 1 if we want to drop a column otherwise 0 to drop a row which is also its default value.

It returns a DataFrame without the rows or columns specified.

Refer to drop() documentation for further details about the method.

Note- As we have completed the exploration step, we will start again working with our old clean training set i.e., strat_train_set.

INSTRUCTIONS

Drop the target attribute from the DataFrame strat_train_set, and store the result in a variable named train_data.

Also, store the value of strat_train_set['median_house_value'] in a variable named housing_labels using the copy() atribute of our DataFrame object.



Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...