{"id":1000,"date":"2017-12-13T14:06:49","date_gmt":"2017-12-13T14:06:49","guid":{"rendered":"http:\/\/blog.cloudxlab.com\/?p=1000"},"modified":"2021-06-21T08:26:27","modified_gmt":"2021-06-21T08:26:27","slug":"numpy-pandas-introduction","status":"publish","type":"post","link":"https:\/\/cloudxlab.com\/blog\/numpy-pandas-introduction\/","title":{"rendered":"NumPy and Pandas Tutorial &#8211; Data Analysis with Python"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Python is increasingly being used as a scientific language. Matrix and vector manipulations are extremely important for scientific computations. Both NumPy and Pandas have emerged to be essential libraries for any scientific computation, including machine learning, in python due to their intuitive syntax and high-performance matrix computation capabilities.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this post, we will provide an overview of the common functionalities of NumPy and Pandas. We will realize the similarity of these libraries with existing toolboxes in R and MATLAB. This similarity and added flexibility have resulted in wide acceptance of python in the scientific community lately. Topic covered in the blog are:<\/span><\/p>\n<ol class=\"ili-indent\">\n<li><span style=\"font-weight: 400;\">Overview of NumPy<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Overview of Pandas<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Using Matplotlib<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">This post is an excerpt from a live hands-on training conducted by CloudxLab on 25th Nov\u00a02017. It was attended by more than 100 learners around the globe. The participants were from countries namely; United States, Canada, Australia, Indonesia, India, Thailand, Philippines, Malaysia, Macao, Japan, Hong Kong, Singapore, United Kingdom, Saudi Arabia, Nepal, &amp; New Zealand.<\/span><\/p>\n<p><!--more--><\/p>\n<h2>What is NumPy?<\/h2>\n<p><span style=\"font-weight: 400;\">NumPy stands for \u2018Numerical Python\u2019 or \u2018Numeric Python\u2019. It is an open source module of Python which provides fast mathematical computation on arrays and matrices. Since, arrays and matrices are an essential part of the Machine Learning ecosystem, NumPy along with Machine Learning modules like Scikit-learn, Pandas, Matplotlib, TensorFlow, etc. complete the Python Machine Learning Ecosystem.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">NumPy provides the essential multi-dimensional array-oriented computing functionalities designed for high-level mathematical functions and scientific computation. Numpy can be imported into the notebook using<\/span><\/p>\n<pre class=\"lang:python decode:true \">&gt;&gt;&gt; import numpy as np<\/pre>\n<p><span style=\"font-weight: 400;\">NumPy\u2019s main object is the homogeneous multidimensional array. It is a table with same type elements, i.e, integers or string or characters (homogeneous), usually integers. In NumPy, dimensions are called axes. The number of axes is called the rank.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There are several ways to create an array in NumPy like np.array, np.zeros, no.ones, etc. Each of them provides some flexibility.<\/span><\/p>\n<table>\n<tbody>\n<tr>\n<td><span style=\"font-weight: 400;\">Command to create an array<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Example<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">np.array<\/span><\/td>\n<td>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; a = np.array([1, 2, 3])\n&gt;&gt;&gt; type(a)\n&lt;type 'numpy.ndarray'&gt;\n\n&gt;&gt;&gt; b = np.array((3, 4, 5))\n&gt;&gt;&gt; type(b)\n&lt;type 'numpy.ndarray'&gt;<\/pre>\n<\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">np.ones<\/span><\/td>\n<td>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; np.ones( (3,4), dtype=np.int16 ) \u00a0\narray([[ 1, \u00a01, \u00a01, \u00a01],\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0[ 1, \u00a01, \u00a01, \u00a01],\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0[ 1, \u00a01, \u00a01, \u00a01]])<\/pre>\n<\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">np.full<\/span><\/td>\n<td>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; np.full( (3,4), 0.11 ) \u00a0\narray([[ 0.11, \u00a00.11, \u00a00.11, \u00a00.11], \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\n\u00a0\u00a0[ 0.11, \u00a00.11, \u00a00.11, \u00a00.11], \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\n\u00a0\u00a0[ 0.11, \u00a00.11, \u00a00.11, \u00a00.11]])<\/pre>\n<\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">np.arange<\/span><\/td>\n<td>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; np.arange( 10, 30, 5 )\narray([10, 15, 20, 25])\n\n&gt;&gt;&gt; np.arange( 0, 2, 0.3 ) \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\n# it accepts float arguments\narray([ 0. , \u00a00.3, \u00a00.6, \u00a00.9, \u00a01.2, \u00a01.5, \u00a01.8])<\/pre>\n<\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">np.linspace<\/span><\/td>\n<td>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; np.linspace(0, 5\/3, 6)\narray([0. , 0.33333333 , 0.66666667 , 1. , 1.33333333 \u00a01.66666667])<\/pre>\n<\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">np.random.rand(2,3)<\/span><\/td>\n<td>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; np.random.rand(2,3)\narray([[ 0.55365951, \u00a00.60150511, \u00a00.36113117],\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0[ 0.5388662 , \u00a00.06929014, \u00a00.07908068]])<\/pre>\n<\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">np.empty((2,3))<\/span><\/td>\n<td>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; np.empty((2,3))\narray([[ 0.21288689, \u00a00.20662218, \u00a00.78018623],\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0[ 0.35294004, \u00a00.07347101, \u00a00.54552084]])<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-weight: 400;\">Some of the important attributes of a NumPy object are:<\/span><\/p>\n<ol class=\"ili-indent\">\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><strong>Ndim:<\/strong> displays the dimension of the array<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><strong>Shape:<\/strong> returns a tuple of integers indicating the size of the array<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><strong>Size:<\/strong> returns the total number of elements in the NumPy array<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><strong>Dtype<\/strong>: returns the type of elements in the array, i.e., int64, character<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><strong>Itemsize:<\/strong> returns the size in bytes of each item<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><strong>Reshape<\/strong>: Reshapes the NumPy array<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">NumPy array elements can be accessed using indexing. Below are some of the useful examples:<\/span><\/p>\n<ul class=\"ili-indent\">\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">A[2:5] will print items 2 to 4. Index in NumPy arrays starts from 0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">A[2::2] will print items 2 to end skipping 2 items<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">A[::-1] will print the array in the reverse order<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">A[1:] will print from row 1 to end<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">The session covers these and some important attributes of the NumPy array object in detail. <\/span><\/p>\n<h2>Vectors and Machine learning<\/h2>\n<p><span style=\"font-weight: 400;\">Machine learning uses vectors. Vectors are one-dimensional arrays. It can be represented either as a row or as a column array. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">What are vectors? Vector quantity is the one which is defined by a magnitude and a direction. For example, force is a vector quantity. It is defined by the magnitude of force as well as a direction. It can be represented as an array [a,b] of 2 numbers = [2,180] where \u2018a\u2019 may represent the magnitude of 2 Newton and 180 (\u2018b\u2019) represents the angle in degrees. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Another example, say a rocket is going up at a slight angle: it has a vertical speed of 5,000 m\/s, and also a slight speed towards the East at 10 m\/s, and a slight speed towards the North at 50 m\/s. The rocket&#8217;s velocity may be represented by the following vector: [10, 50, 5000] which represents the speed in each of x, y, and z-direction.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Similarly, vectors have several usages in Machine Learning, most notably to represent observations and predictions. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">For example, say we built a Machine Learning system to classify videos into 3 categories (good, spam, clickbait) based on what we know about them. For each video, we would have a vector representing what we know about it, such as: [10.5, 5.2, 3.25, 7.0]. This vector could represent a video that lasts 10.5 minutes, but only 5.2% viewers watch for more than a minute, it gets 3.25 views per day on average, and it was flagged 7 times as spam. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">As you can see, each axis may have a different meaning. Based on this vector, our Machine Learning system may predict that there is an 80% probability that it is a spam video, 18% that it is clickbait, and 2% that it is a good video. This could be represented as the following vector: class_probabilities = [0.8,0.18,0.02].<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As can be observed, vectors can be used in Machine Learning to define observations and predictions. The properties representing the video, i.e., duration, percentage of viewers watching for more than a minute are called features.\u00a0<\/span><\/p>\n<p>Since the majority of the time of building machine learning models would be spent in data processing, it is important to be familiar to the libraries that can help in processing such data.<\/p>\n<h2>Why NumPy and Pandas over regular Python arrays?<\/h2>\n<p><span style=\"font-weight: 400;\">In python, a vector can be represented in many ways, the simplest being a regular python list of numbers. Since Machine Learning requires lots of scientific calculations, it is much better to use NumPy&#8217;s ndarray, which provides a lot of convenient and optimized implementations of essential mathematical operations on vectors. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Vectorized operations perform faster than matrix manipulation operations performed using loops in python. For example, to carry out a 100 * 100 matrix multiplication, vector operations using NumPy are two orders of magnitude faster than performing it using loops.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Some ways in which NumPy arrays are different from normal Python arrays are:<\/span><\/p>\n<ol class=\"ili-indent\">\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">If you assign a single value to a ndarray slice, it is copied across the whole slice<\/span><\/li>\n<\/ol>\n<table style=\"margin-left: 30px;\">\n<tbody style=\"padding-left: 30px;\">\n<tr style=\"padding-left: 30px;\">\n<td style=\"padding-left: 30px;\"><span style=\"font-weight: 400;\">NumPy Array<\/span><\/td>\n<td style=\"padding-left: 30px;\"><span style=\"font-weight: 400;\">Regular Python array<\/span><\/td>\n<\/tr>\n<tr style=\"padding-left: 30px;\">\n<td style=\"padding-left: 30px;\">\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; a = np.array([1, 2, 5, 7, 8])\n&gt;&gt;&gt; a[1:3] = -1\n&gt;&gt;&gt; a\narray([ 1, -1, -1, \u00a07, \u00a08])<\/pre>\n<\/td>\n<td style=\"padding-left: 30px;\">\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; b = [1, 2, 5, 7, 8]\n&gt;&gt;&gt; b[1:3] = -1\nTypeError: can only assign an iterable<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"padding-left: 30px;\"><span style=\"font-weight: 400;\">So, it is easier to assign values to a slice of an array in a NumPy array as compared to a normal array wherein it may have to be done using loops.<\/span><\/p>\n<ol class=\"ili-indent\" start=\"2\">\n<li><span style=\"font-weight: 400;\"> ndarray slices are actually views on the same data buffer. If you modify it, it is going to modify the original ndarray as well.<\/span><\/li>\n<\/ol>\n<table style=\"height: 416px; margin-left: 30px;\" width=\"574\">\n<tbody style=\"padding-left: 30px;\">\n<tr style=\"padding-left: 30px;\">\n<td style=\"padding-left: 60px; text-align: left;\"><span style=\"font-weight: 400;\">NumPy array slice<\/span><\/td>\n<td style=\"padding-left: 60px; text-align: left;\"><span style=\"font-weight: 400;\">Regular python array slice<\/span><\/td>\n<\/tr>\n<tr style=\"padding-left: 60px;\">\n<td style=\"padding-left: 60px; text-align: left;\">\n<pre class=\"lang:default decode:true\" style=\"padding-left: 30px;\">&gt;&gt;&gt; a = np.array([1, 2, 5, 7, 8])\n&gt;&gt;&gt; a_slice = a[1:5]\n&gt;&gt;&gt; a_slice[1] = 1000\n&gt;&gt;&gt; a\narray([ \u00a0\u00a01, \u00a0\u00a0\u00a02, 1000, 7, \u00a0\u00a0\u00a08])\n# Original array was modified<\/pre>\n<\/td>\n<td style=\"padding-left: 60px;\">\n<pre class=\"lang:default decode:true\" style=\"padding-left: 30px;\">&gt;&gt;&gt; a=[1,2,5,7,8]\n&gt;&gt;&gt; b=a[1:5]\n&gt;&gt;&gt; b[1]=3\n&gt;&gt;&gt; print(a)\n&gt;&gt;&gt; print(b)\n[1, 2, 5, 7, 8]\n[2, 3, 7, 8]<\/pre>\n<p style=\"padding-left: 30px; text-align: left;\">\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"padding-left: 30px;\"><span style=\"font-weight: 400;\">If we need a copy of the NumPy array, we need to use the copy method as another_slice = another_slice = a[2:6].copy(). If we modify another_slice, a remains same<\/span><\/p>\n<ol class=\"ili-indent\" start=\"3\">\n<li><span style=\"font-weight: 400;\"> The way multidimensional arrays are accessed using NumPy is different from how they are accessed in normal python arrays. The generic format in NumPy multi-dimensional arrays is:<\/span><\/li>\n<\/ol>\n<p style=\"padding-left: 30px;\"><span style=\"font-weight: 400;\">Array[row_start_index:row_end_index, column_start_index: column_end_index]<\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"font-weight: 400;\">NumPy arrays can also be accessed using boolean indexing. For example,<\/span><\/p>\n<pre class=\"lang:default decode:true\" style=\"padding-left: 420px;\">&gt;&gt;&gt; a = np.arange(12).reshape(3, 4)\narray([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])\n&gt;&gt;&gt; rows_on = np.array([True, False, True])\n&gt;&gt;&gt; a[rows_on , : ] \u00a0\u00a0\u00a0\u00a0\u00a0# Rows 0 and 2, all columns\narray([[ 0, \u00a01, \u00a02, \u00a03],\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0[ 8, \u00a09, 10, 11]])<\/pre>\n<p><span style=\"font-weight: 400;\">NumPy arrays are capable of performing all basic operations such as addition, subtraction, element-wise product, matrix dot product, element-wise division, element-wise modulo, element-wise exponents and conditional operations.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">An important feature with NumPy arrays is broadcasting.<\/span><\/p>\n<p><img class=\"aligncenter wp-image-1014 size-full\" title=\"Broadcasting - pandas for machine learning\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2017\/12\/Screen-Shot-2017-12-13-at-5.57.21-PM.png\" alt=\"broadcasting - pandas for machine learning\" width=\"597\" height=\"414\" srcset=\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2017\/12\/Screen-Shot-2017-12-13-at-5.57.21-PM.png 597w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2017\/12\/Screen-Shot-2017-12-13-at-5.57.21-PM-300x208.png 300w\" sizes=\"(max-width: 597px) 85vw, 597px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">In general, when NumPy expects arrays of the same shape but finds that this is not the case, it applies the so-called broadcasting rules.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Basically, there are 2 rules of Broadcasting to remember:<\/span><\/p>\n<ol class=\"ili-indent\">\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">For the arrays that do not have the same rank, then a 1 will be prepended to the smaller ranking arrays until their ranks match. For example, when adding arrays A and B of sizes (3,3) and (,3) [rank 2 and rank 1], 1 will be prepended to the dimension of array B to make it (1,3) [rank=2]. The two sets are compatible when their dimensions are equal or either one of the dimension is 1.\u00a0<\/span><\/li>\n<li><span style=\"font-weight: 400;\">When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or \u201ccopied\u201d to match the other. For example, upon adding a 2D array A of shape (3,3) to a 2D ndarray B of shape (1, 3). NumPy will apply the above rule of broadcasting. It shall stretch the array B and replicate the first row 3 times to make array B of dimensions (3,3) and perform the operation.<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">NumPy provides basic mathematical and statistical functions like mean, min, max, sum, prod, std, var, summation across different axes, transposing of a matrix, etc. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">A particular NumPy feature of interest is solving a system of linear equations. NumPy has a function to solve linear equations. For example,<\/span><\/p>\n<pre class=\"lang:default decode:true \" title=\"Linear equations\">2x + 6y = 6\n5x + 3y = -9<\/pre>\n<p><span style=\"font-weight: 400;\">Can be solved in NumPy using<\/span><\/p>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; coeffs \u00a0= np.array([[2, 6], [5, 3]])\n&gt;&gt;&gt; depvars = np.array([6, -9])\n&gt;&gt;&gt; solution = linalg.solve(coeffs, depvars)\n&gt;&gt;&gt; solution\narray([-3., \u00a02.])<\/pre>\n<h2>What is Pandas?<\/h2>\n<p><span style=\"font-weight: 400;\">Similar to NumPy, Pandas is one of the most widely used python libraries in data science. It provides high-performance, easy to use structures and data analysis tools. Unlike NumPy library which provides objects for multi-dimensional arrays, Pandas provides\u00a0in-memory 2d table object called Dataframe. It is like a spreadsheet with column names and row labels.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Hence, with 2d tables, pandas is capable of providing many additional functionalities like creating pivot tables, computing columns based on other columns and plotting graphs. Pandas can be imported into Python using:<\/span><\/p>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; import pandas as pd<\/pre>\n<p><span style=\"font-weight: 400;\">Some commonly used data structures in pandas are:<\/span><\/p>\n<ol class=\"ili-indent\">\n<li><b>Series objects<\/b><span style=\"font-weight: 400;\">: 1D array, similar to a column in a spreadsheet <\/span><\/li>\n<li><b>DataFrame objects:<\/b><span style=\"font-weight: 400;\"> 2D table, similar to a spreadsheet<\/span><\/li>\n<li><b>Panel objects:<\/b><span style=\"font-weight: 400;\"> Dictionary of DataFrames, similar to sheet in MS Excel<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">Pandas Series object is created using pd.Series function. Each row is provided with an index and by defaults is assigned numerical values starting from 0. Like NumPy, Pandas also provide the basic mathematical functionalities like addition, subtraction and conditional operations and broadcasting.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Pandas dataframe object represents a spreadsheet with cell values, column names, and row index labels. Dataframe can be visualized as dictionaries of Series. Dataframe rows and columns are simple and intuitive to access. Pandas also provide SQL-like functionality to filter, sort rows based on conditions. For example,<\/span><\/p>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; people_dict = { \"weight\": pd.Series([68, 83, 112],index=[\"alice\", \"bob\", \"charles\"]), \u00a0 \"birthyear\": pd.Series([1984, 1985, 1992], index=[\"bob\", \"alice\", \"charles\"], name=\"year\"),\n\"children\": pd.Series([0, 3], index=[\"charles\", \"bob\"]),\n\"hobby\": pd.Series([\"Biking\", \"Dancing\"], index=[\"alice\", \"bob\"]),}<\/pre>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; people = pd.DataFrame(people_dict)\n&gt;&gt;&gt; people<\/pre>\n<p><img class=\"size-full wp-image-1015 alignleft\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2017\/12\/table-1.png\" alt=\"\" width=\"283\" height=\"106\" \/><\/p>\n<pre class=\"lang:default decode:true \">&gt;&gt;&gt; people[people[\"birthyear\"] &lt; 1990]<\/pre>\n<p><img class=\"size-full wp-image-1016 alignleft\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2017\/12\/table-2.png\" alt=\"\" width=\"263\" height=\"74\" \/><\/p>\n<p><span style=\"font-weight: 400;\">New columns and rows can be easily added to the dataframe. In addition to the basic functionalities, pandas dataframe can be sorted by a particular column.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Dataframes can also be easily exported and imported from CSV, Excel, JSON, HTML and SQL database. Some other essential methods that are present in dataframes are:<\/span><\/p>\n<ol class=\"ili-indent\">\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><strong>head():<\/strong> returns the top 5 rows in the dataframe object<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><strong>tail():<\/strong> returns the bottom 5 rows in the dataframe<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><strong>info():<\/strong> prints the summary of the dataframe<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><strong>describe():<\/strong> gives a nice overview of the main aggregated values over each column<\/span><\/li>\n<\/ol>\n<h2>What is matplotlib?<\/h2>\n<p><span style=\"font-weight: 400;\">Matplotlib is a 2d plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments. Matplotlib can be used in Python scripts, Python and IPython shell, Jupyter Notebook, web application servers and GUI toolkits.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">matplotlib.pyplot is a collection of functions that make matplotlib work like MATLAB. Majority of plotting commands in pyplot have MATLAB analogs with similar arguments. Let us take a couple of examples:<\/span><\/p>\n<table style=\"height: 438px;\" width=\"829\">\n<tbody>\n<tr>\n<td><span style=\"font-weight: 400;\">Example 1: Plotting a line graph<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Example 2: Plotting a histogram<\/span><\/td>\n<\/tr>\n<tr>\n<td>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; import matplotlib.pyplot as plt\n&gt;&gt;&gt; plt.plot([1,2,3,4])\n&gt;&gt;&gt; plt.ylabel('some numbers')\n&gt;&gt;&gt; plt.show()<\/pre>\n<p><img class=\"aligncenter wp-image-1017\" title=\"line graph - matplotlib - numpy and pandas for machine learning\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2017\/12\/pastedImage0-2.png\" alt=\"line graph - matplotlib - numpy and pandas for machine learning\" width=\"400\" height=\"246\" srcset=\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2017\/12\/pastedImage0-2.png 620w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2017\/12\/pastedImage0-2-300x185.png 300w\" sizes=\"(max-width: 400px) 85vw, 400px\" \/><\/td>\n<td>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; import matplotlib.pyplot as plt\n&gt;&gt;&gt; x = [21,22,23,4,5,6,77,8,9,10,31,32,33,34,35,36,37,18,49,50,100]\n&gt;&gt;&gt; num_bins = 5\n&gt;&gt;&gt; plt.hist(x, num_bins, facecolor='blue')\n&gt;&gt;&gt; plt.show()<\/pre>\n<p><img class=\"aligncenter wp-image-1020 size-full\" title=\"matplotlib - histograms - pandas for machine learning\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2017\/12\/pastedImage0-3.png\" alt=\"matplotlib - histograms - pandas for machine learning\" width=\"372\" height=\"252\" srcset=\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2017\/12\/pastedImage0-3.png 372w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2017\/12\/pastedImage0-3-300x203.png 300w\" sizes=\"(max-width: 372px) 85vw, 372px\" \/><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Summary<\/h2>\n<div id=\":82.ma\" class=\"Mu SP\" title=\"December 13, 2017 at 7:20:19 PM UTC+5:30\"><span id=\":82.co\" class=\"tL8wMe EMoHub\" dir=\"ltr\">Hence, we observe that NumPy and Pandas make matrix manipulation easy. This flexibility makes them very useful in Machine Learning model development. <\/span><\/div>\n<div title=\"December 13, 2017 at 7:20:19 PM UTC+5:30\"><\/div>\n<div class=\"Mu SP\" title=\"December 13, 2017 at 7:20:19 PM UTC+5:30\"><span id=\":82.co\" class=\"tL8wMe EMoHub\" dir=\"ltr\">Check out the course on <a href=\"https:\/\/cloudxlab.com\/course\/92\/python-for-machine-learning\">Python for Machine Learning<\/a> by CloudxLab. You can find in-depth video tutorials on NumPy, Pandas, and Matplotlib in the course.<\/span><\/div>\n<div title=\"December 13, 2017 at 7:20:19 PM UTC+5:30\"><\/div>\n<div id=\":84.ma\" class=\"Mu SP\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Python is increasingly being used as a scientific language. Matrix and vector manipulations are extremely important for scientific computations. Both NumPy and Pandas have emerged to be essential libraries for any scientific computation, including machine learning, in python due to their intuitive syntax and high-performance matrix computation capabilities. In this post, we will provide an &hellip; <a href=\"https:\/\/cloudxlab.com\/blog\/numpy-pandas-introduction\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;NumPy and Pandas Tutorial &#8211; Data Analysis with Python&#8221;<\/span><\/a><\/p>\n","protected":false},"author":11,"featured_media":1099,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[14],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v16.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>NumPy and Pandas Tutorial - Data Analysis with Python | CloudxLab Blog<\/title>\n<meta name=\"description\" content=\"In this free guide, we will learn basics of NumPy and Pandas. NumPy and Pandas are essential for building machine learning models in python.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cloudxlab.com\/blog\/numpy-pandas-introduction\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"NumPy and Pandas Tutorial - Data Analysis with Python | CloudxLab Blog\" \/>\n<meta property=\"og:description\" content=\"In this free guide, we will learn basics of NumPy and Pandas. NumPy and Pandas are essential for building machine learning models in python.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cloudxlab.com\/blog\/numpy-pandas-introduction\/\" \/>\n<meta property=\"og:site_name\" content=\"CloudxLab Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cloudxlab\" \/>\n<meta property=\"article:published_time\" content=\"2017-12-13T14:06:49+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-06-21T08:26:27+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2017\/12\/numpypandasfeatureimage-3.png\" \/>\n\t<meta property=\"og:image:width\" content=\"869\" \/>\n\t<meta property=\"og:image:height\" content=\"400\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:site\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\">\n\t<meta name=\"twitter:data1\" content=\"11 minutes\">\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"CloudxLab Blog\",\"description\":\"Learn AI, Machine Learning, Deep Learning, Devops &amp; Big Data\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/cloudxlab.com\/blog\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/numpy-pandas-introduction\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2017\/12\/numpypandasfeatureimage-3.png\",\"contentUrl\":\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2017\/12\/numpypandasfeatureimage-3.png\",\"width\":869,\"height\":400},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/numpy-pandas-introduction\/#webpage\",\"url\":\"https:\/\/cloudxlab.com\/blog\/numpy-pandas-introduction\/\",\"name\":\"NumPy and Pandas Tutorial - Data Analysis with Python | CloudxLab Blog\",\"isPartOf\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/numpy-pandas-introduction\/#primaryimage\"},\"datePublished\":\"2017-12-13T14:06:49+00:00\",\"dateModified\":\"2021-06-21T08:26:27+00:00\",\"author\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/c9dca0fc0717ada7ba3e1c7900af14c6\"},\"description\":\"In this free guide, we will learn basics of NumPy and Pandas. NumPy and Pandas are essential for building machine learning models in python.\",\"breadcrumb\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/numpy-pandas-introduction\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/cloudxlab.com\/blog\/numpy-pandas-introduction\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/numpy-pandas-introduction\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"Home\"}},{\"@type\":\"ListItem\",\"position\":2,\"item\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/numpy-pandas-introduction\/#webpage\"}}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/c9dca0fc0717ada7ba3e1c7900af14c6\",\"name\":\"Pratik\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f326165c14a3c2a5313ac1b2b60886cb?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f326165c14a3c2a5313ac1b2b60886cb?s=96&d=mm&r=g\",\"caption\":\"Pratik\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/1000"}],"collection":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/comments?post=1000"}],"version-history":[{"count":16,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/1000\/revisions"}],"predecessor-version":[{"id":3593,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/1000\/revisions\/3593"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/media\/1099"}],"wp:attachment":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/media?parent=1000"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/categories?post=1000"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/tags?post=1000"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}