Machine Learning Prerequisites (Numpy)

32 / 32

Numpy and Pandas - What is NumPy and Pandas ?

NumPy and Pandas are the Python libraries that are used to manipulate, process and analyze the data. They don't have constructs that can be used to visualize the data, for that we can use another library from Python called matplotlib.

NumPy

  • NumPy stands for 'Numeric Python' or 'Numerical Python'.

  • It is designed for scientific computations.

  • It has efficiently implemented multi-dimensional arrays and it also provides fast mathematical functions.

  • It is mostly used for array-oriented computing.

  • NumPy's main object is the homogeneous multidimensional array called "ndarray".

Pandas

  • Pandas provide high-performance, easy-to-use data structures, and data analysis tools.

  • The DataFrame is the main and widely used data structure of the Pandas library.

  • DataFrame is a kind of in-memory 2-D table (similar to Excel sheet) with rows and columns.

  • Using DataFrames, we can create pivot tables (just like Excel sheet), compute one column value using values of other columns, etc.

When do we use NumPy and when we use Pandas?

Suppose, your data is currently lying in a text file or a database table. You can load this dataset either using Pandas or Numpy. Pandas library is preferred as it is more efficient.

You can load the dataset using Pandas into a Pandas Dataframe. After loading the dataset, you can use Pandas library functions along with Matplotlib library functions to analyze, visualize and perform statistical analysis on the data in the dataset.

You can use NumPy's functions to create random indices and use these random indices with Pandas dataframe to create a shuffled dataset.

Many functions of the Scikit Learn (sklearn) library (like Imputer, OneHotEncoder, predict()) return a NumPy array, which we may need to process using NumPy. Also, we may need to create a Pandas dataframe from an existing NumPy array.

Imputer - It is used to handle missing values in a dataframe
OneHotEncoder - It is used to perform one hot encoding on categorical values
predict() - It is used to predict values using a trained ML model

So, we see that Pandas and NumPy are used very much in conjunction with each other throughout the Machine Learning process.


No hints are availble for this assesment

Answer is not availble for this assesment

Loading comments...