Machine Learning Prerequisites (Numpy)

2 / 32

NumPy, Pandas, Matplotlib in Python

Overview

As part of this session, we will learn the following:

  1. What is NumPy?
  2. Indexing and accessing NumPy arrays
  3. Linear Algebra with NumPy
  4. Basic Operations on NumPy arrays
  5. Broadcasting in NumPy arrays
  6. Mathematical and statistical functions on NumPy arrays
  7. What is Pandas?
  8. Pandas - Series Objects
  9. Pandas - DataFrame Objects
  10. Matplotlib - Overview
  11. Matplotlib - pyplot Module

Recording of Session


Slides

Code Repository for the course on GitHub



No hints are availble for this assesment

Answer is not availble for this assesment

Please login to comment

114 Comments

is the video for all lesson in this numpy, or still need folow one by one until 32 lesson ? 

  Upvote    Share

Hi,

This is a video lecture, and others are MCQs and coding exercises.

  Upvote    Share

what is MCQs ?

and what is the differences of this video and another 32 lesson in this topic

  Upvote    Share

Hi Dezuk,

Why don't you go further and explore it on your own.

  Upvote    Share

import numpy as np
def multiply_loops(A, B):
    C = np.zeros((A.shape[0], B.shape[0]))
    for i in range(A.shape[0]):
         for j in range(B.shape[0]):
                print(A)


A = np.array([1, 2, 5, 7, 8])
B = np.array([1, 2, 5, 7, 8])
multiply_loops(A,B)

In the above code I am getting A printed as 25 times. Is this because the nested loops logic is A.shape[0] X B.shape[0] ? Kindly clarify 

  Upvote    Share

Yes, you are correct. In the code you provided, the nested loops iterate over the dimensions of A and B, respectively. Since both A and B have a shape of (5,), the loops will iterate 5 times each. As a result, the print statement inside the nested loops will execute a total of 5 * 5 = 25 times.

 1  Upvote    Share

Could you tell me where 'housing.csv' is located? for me to load the csv in jupyter notebook?

  Upvote    Share

It is located at /cxldata/datasets/project/housing/housing.csv"

  Upvote    Share

%matplotlib inline
import matplotlib,pyplot as plt
temp=[1.2,2.3,3.4,5.5,6.7]
s7=pd.Series(temp,name="temp")
s7.plot()
plt.show()

I am unable to plot a graph, says module not found. kindly help. Also , pls explain why did we write name= temp while creating the series.

 

Thanks

 

  Upvote    Share

Hi Smriti,

There is a typo error in "import matplotlib,pyplot as plt". It should be "import matplotlib.pyplot as plt". You have used ','(comma) instead of '.'(dot)

We use the 'name' attribute to denote the name to give to the Series. 

  Upvote    Share

Got the answer ..Thanks

  Upvote    Share

Hi Team,

While trying to add 2 series s2 and S3..I am getting NaN in the Integer column.

I created two series.S2 with even numbers [2,4,6,8] and S3 with odd[1,3,5.7]

added them s2+s3.

Please suggest.

  Upvote    Share

Hi,

My question is regarding solving linear algebra equations.I ran below steps one by one.

coeffs  = np.array([[2, 6], [5, 3]])
depvars = np.array([6, -9])
solution = linalg.solve(coeffs, depvars)

Out[13]:

array([-3.,  2.])

But when I try to validate solution both dot products are working fine.

 

np.dot(solution,coeffs)   --- 1*2 . 2*2

result = 

array([  4., -12.]).   -- which is wrong 

np.dot(coeffs,solution).   -- 2*2 . 1*2

result =

array([ 6., -9.]) -- correct result 

 

But how is this dot product is possible?

Here number columns in first matrix and number of rows in second matrix is not equal.

 

 

 

  Upvote    Share

That is because in such cases numpy automatically considers the transpose of the array. 

Also, the dot product is calculated between two vectors. When considering matrices, it is called a matrix product. 

And for matrix, you should use np.matrix instead of np.array to avoid such confusion.

Also the correct result is array([ 4., -12.]).

  Upvote    Share

Thanks for clarificatoin. dependent variables are 6,-9 so correct result is  [ 6., -9.] right?

  Upvote    Share

Yeah,

I was talking about the dot product of solution and coefficients.

array([  4., -12.]).   -- which is wrong 

This is the right one.

 1  Upvote    Share

Hi, 

I have saved a book on my name. Now when I refreshed the page the book is gone.  How can I open it in the Playground beside the video?

  Upvote    Share

Hi,

You can go to my-lab, click on jupyter notebook. You could find all the files, folders, notebooks there.

Thanks.

  Upvote    Share

But it is opening in the new tab. I want to open in the playground. 

  Upvote    Share

Hi,

Only the default notebook for this topic would open in the playground.

Thanks.

  Upvote    Share

hi can i get the link to find the GYB project mentioned in the video at the end?

  Upvote    Share

Hi,

This is the same project as Churn Emails, with a few changes. We have saved the dataset locally instead of the user having to download it, you will find all the instructions on how to access that dataset in the project itself. Also, we are using a different dataset. Let me know once you take a look at that.

Thanks.

  Upvote    Share

This comment has been removed.

Referring to the file Python - Numpy.ipynb, I know its path in github. Pl let me know its path in the Linux console

  Upvote    Share

Hi,

You could clone the repo into Linux. Then you could find the notebook.

Thanks.

 1  Upvote    Share

sometime we use List and sometime we use tuple for crating array.

what is the different?

  Upvote    Share

Hi,

Lists are mutable meaning the elements of a list can be changed. In contrast, Tuples are immutable meaning items can't be replaced or changed. Tuples are mostly used in cases where the information is known and not changed, while lists are used when information is unknown or we need to manipulate the items. For example, we could use tuples as keys in a dictionary because the keys are expected to be immutable. Also,  tuples can be used instead of lists for faster execution, provided the data is to remain constant without any changes. More on tuples and lists.

Hope this helps.

Thanks.

  Upvote    Share

Sir,

It is giving error executing the program as suggested in the pdf. Please help.

  Upvote    Share

Hi,

linalg is from numpy. It should be np.linalg.

Thanks.

  Upvote    Share

Hi Team,

Please assist for given below issue: 

row_condtn = np.array([True,False,True,False])
z = np.arange(6)
z

out[86] = array([0, 1, 2, 3, 4, 5])

When i run given script "z[row_condtn,:]"  i got error :

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-87-efdf230bbdd9> in <module>
      1 row_condtn = np.array([True,False,True,False])
      2 z = np.arange(6)
----> 3 z[row_condtn,:]

IndexError: too many indices for array
  Upvote    Share

Hi,

The error is self-explanatory. Too many indices means that you have specified too many index than there already is in the z array.

Thanks.

  Upvote    Share

Sir,

I do not understand the dimension (.ndim command in Python; How does it work??). I read a few descriptions on the internet and tried to apply the concept of dimension in cloudxlab notebook. the code I used it is:

np.random.rand(2,2,3).ndim

It gave the result '3',  whereas according to different texts of what I understood is that it should have given ' 2 '. Please help me understand or guide be to a text with a proper context.

Thank You

  Upvote    Share

I am attaching the screenshot as well. Need Help!

  Upvote    Share

Hi,

ndim() function return the number of dimensions of an array. Try the following and observe the output:

import numpy as np
  
arr1 = np.array([1, 2, 3, 4]) 
dim1 = arr1.ndim 
print(dim1)

arr2 = np.array([1, 2], [3, 4]) 
dim2 = arr2.ndim 
print(dim2)

Thanks.

 1  Upvote    Share

Sir,

Thank You for the reply. I have tried these types of command before learning from those texts and it gave perfectly as per the texts. But not when I tried with random.

I have tried your codes too. It ran perfectly for the 1st one but threw an error for the second one({TypeError: data type not understood}).

I would request you to look into the picture attached and please tell me why it is showing ' 3 ' and not ' 2 '. Is it because of the use of random?

  Upvote    Share

 

  Upvote    Share

My actual question still lies on the fact as if what is dimension?

E.g.:

import numpy as np

u = np.arange((2, 3, 4) , 1)

So the rows of the above matrix generated would be 3 and the number of the columns would be 4. Subsequently the dimension would be 2, right? I ran your first code(provided by you) and it gave a dimension 1. Then why the random matrix generated by me is not giving the desired dimension of ' 2 ' ?

  Upvote    Share

I think I got the answer for what I was looking for by trial and error... My previous understanding of dimension was wrong. I think I got it almost what it means to be dimension in this case.


Please just tell me what went wrong it went with the execution of the second code that you provided with.

Thank You.

  Upvote    Share

Hi,

My bad. It should be as follows:

arr2 = np.array([[1, 2], [3, 4]]) 
dim2 = arr2.ndim 
print(dim2)

Thanks.

 1  Upvote    Share

How do I add column names to this dataframe?

  Upvote    Share

Hi,

You can try the following code:

df.columns = ['name1', 'name2', 'name3']

Thanks.

  Upvote    Share
ValueError: Length mismatch: Expected axis has 5 elements, new values have 6 elements
  Upvote    Share

what is %matplotlib inline for?

 1  Upvote    Share

Hi,

%matplotlib inline sets the backend of matplotlib to the 'inline' backend: With this backend, the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. 

Thanks.

  Upvote    Share
coeffs.dot(solution), depvars

or

np.dot(solution,depvars)

Are both the code same ? What's the difference ?

  Upvote    Share

Hi,

The correct form of syntax for this is the second one. You can find more about it from the below link:

https://numpy.org/doc/stable/reference/generated/numpy.dot.html

Thanks.

  Upvote    Share

Is there a way to generate random integers ?

np.random.rand(2,3) -> this one generates only decimals.

  Upvote    Share

Hi,

You can try randint().

Thanks.

 1  Upvote    Share
a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])
for func in (a.min, a.max, a.sum, a.prod, a.std, a.var):
    print(func.__name__, "=", func())

Hi,

Can you help understand the above piece of code from the for-condition ? 

Would appreciate if its a little in detail. Thank you.

  Upvote    Share

Hi,

We are printing the function name (e.g. max, min) and printing the respective value returned by that function from the given array.

Thanks.

  Upvote    Share

Hi sir , In NumPy, Pandas, Matplotlib in Python video at 48.54  select choosen row from array . How we can select selective column from array?

Thank you.

 

 1  Upvote    Share

Hi,

Try this:

>>> import numpy as np
>>> A = np.array([[1,2,3,4],[5,6,7,8]])

>>> A
array([[1, 2, 3, 4],
    [5, 6, 7, 8]])

>>> A[:,2] # returns the third columm
array([3, 7])

Thanks.

  Upvote    Share

Q 1 - Why the weight is the 1st column, How Birthyear become the 1st column?

 

 

Q 2 - No operation is working and, or, ' , ', ' : ', etc, How to do these operations please suggest?

 

  Upvote    Share

Hi,

For your queries, without looking at the code it would not be possible for me to understand why is it throwing this error. If you look at the first query for example, I can see that you have defined people_dict, however in the next cell you are trying to print people, which is not viewable. I would suggest you to post your query in the forum or the WhatsApp group with the complete code for peer review.

However, you can also refer to the below notebook for reference:

https://github.com/cloudxlab/ml/blob/master/python/Python%20-%20Pandas.ipynb

Thanks.

  Upvote    Share

This comment has been removed.

What does this line does ?

%matplotlib inline;

Is it necessary to include? Because if I am including this line it's showing me "KeyError-inline" 

Without it , plot created.

Help me please ,  in understanding the code and error better. 

  Upvote    Share

Hi,

%matplotlib inline sets the backend of matplotlib to the 'inline' backend: With this backend, the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document

Thanks.

  Upvote    Share

At 2:28:33 in the above-recorded session, the following code has been mentioned:

def myfuction (x):
    print(x)
    returnx["Weight"]*2;
people.assign(square = myfuction)

Here, the function call statement hasn't been passed with the argument.  Could you pls explain.

 1  Upvote    Share

Hi,

Good question!

As mentioned in the lecture, when a function is called with assign(), it is called on every record. That is why no argument is passed to the function explicitly.

Thanks.

  Upvote    Share

Please tell me why it is giving error why it is not broadcasting and giving error

import numpy as np

x=np.array([[[1,2,3],[4,5,6],[3,2,1]]])
y=np.array([[2,3,4],[5,6,7]])
x,y

print(x+y)
print(x*y)
print(x@y)

Error is below

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-57-8415b99d14f1> in <module>
----> 1 print(x+y)
      2 print(x*y)
      3 print(x@y)
      4 
      5 

ValueError: operands could not be broadcast together with shapes (1,3,3) (2,3) 
  Upvote    Share

This comment has been removed.

Hi,

It does! Please go through the below link for more details:

https://stackoverflow.com/questions/20924085/python-conversion-between-coordinates

Thanks.

  Upvote    Share

In the discussion/PPT of Numpy "Summing across different axes" , we can use c.sum(axis = 0).

 

What does 0 indicate here ? 

In the following website , they are indicating axis =1 as Column and 0 as Row. 

https://www.sharpsightlabs.com/blog/numpy-axes-explained/

Please confirm if my understanding is correct.

  Upvote    Share

Hi,

Yes, you are correct! axis=0 refers to the rows, axis=1 refers to the columns.

Thanks.

  Upvote    Share

What is the use of % in the cell like %matplotlib inline?

  Upvote    Share

Hi,

These are called magic commands. Magic commands are enhancements added over the normal python code and these commands are provided by the IPython kernel. These commands are basically added to solve common problems we face and also provide few shortcuts to your code.

Thanks.

  Upvote    Share

In play area , how will i be able to open my own file instead of default one which is there. If I try to open using file -> open in play area it is is opening in new window, al together.?

  Upvote    Share

Hi,

You won't be able to open a new file here. These are default files identified by the assessment engine, you need to complete your assignments here so that your codes can be automatically detected.

Thanks.

  Upvote    Share

 Issue with assign method

#Add a column using Assign method
def myFunc(x):
    print(x);
    return x['Height']**2;
people.assign(square = myFunc)
print(people)  #Doesn't show the column 'square'

Please tell me why is it so?

  Upvote    Share

Hi,

assign() is used with dataframes, is "people" a dataframe object? Also, could you please tell me this is a part of which assignment?

Thanks.

  Upvote    Share

Dear Cloud X Team,

While taking the lecture , it has been mentioned that ---

# Dataframe Objects can be created out of the Series Objects either by ---

# a) Concatenating the Series Objects or by
# b) Creating & Passing a Dictionary of Series Objects.

The 2nd method of Creating & Passing SO (Series Object) is explicitly mentioned & clearly explained in the given lecture.

However, am not very clear regarding the 1st point---as this point has not been dwelt upon or whether I have missed, am not very clear about it.

Kindly request you to share some illustration codes with 1 or 2 examples, so that my understanding about this sub- topic doubt becomes clearer.. Kindly email me at ss7dec@gmail.com.

Looking forward to your response.

  Upvote    Share

Hi, Sameer.

Good questions.

1) Dictionary to series object.
import pandas as pd weigths=("alice":57,"bob":78,"colin":86,"darwin":68) a1==pd.Series(weights)
a2=pd.Series(weights, index=["colin","alice"]

a3=a1+a1 --> Concatenations will do the "left join" and form a Dataframe if any entry is there in the first and missing in second series then it gives "NaN".

2) Using Multiple Series.

people_dict={

"weight":pd.Series([68,83,112]), index=["alice","bob", "charles"]), "birthday":pd.Series([1984,1991,1982]), index=["bob", "alice", "charles"], name="year"
}

This will create a DataFrame.
The same example is already given in the tutorial, you can practise ;it from there, or you can take the code from here, https://github.com/cloudxla...

All the best!

All the = best!

-- Satyajit Das

  Upvote    Share

in this course we are gong to learn linear regression, multi linear regression and many more....

  Upvote    Share

Hi, Deep.

Yes, you will learn all ML and DL algorithms along with projects.
You will be able to find linear regressions playlist below https://cloudxlab.com/asses... <https: cloudxlab.com="" assessment="" displayslide="" 4919="" numpy-pandas-matplotlib-in-python?course_id="72&amp;playlist_id=195">
All the best!

-- Satyajit Das

  Upvote    Share

Could you please suggest some good resources (books or other documentations) for NumPy and Pandas?

  Upvote    Share

Dear Cloud X Team,

I was looking at various ways & methods of importing the dataset -housing. It is present in R Studio as a free resource.

I tried doing it....But failed to import it in Python.

Rather I switched over to Iris.csv dataset from R Studio which is generating favourable results.

Kindly have a look at my screenshots.

The 3rd screenshot that I'm sharing showcases the codes that I attempted.But generating errors wrt to Housing.csv dataset . 4 th screenshot am unable to upload here will upload subsequently.

I tried even using seaborn & sckitplot libraries to Import the Housing dataset after referring to certain websites.

Q1Would like to know whether Housing dataset is taken from any specific library in Python.???

Q2 Check whether any codes of mine could be modified so as to import Housing dataset from the desired Python library? (Screenshot given in next message following this)

  Upvote    Share

Continuation from my 1st reply in this series regarding Housing.csv. 4th image showcasing other attempted options...

  Upvote    Share

Dear Team,

I didn't get any update regarding Housing dataset.

Q From which Python library has this dataset been taken? I would like to import in Jupyter notebooks from R tool by using/invoking statsmodel.api in Jupyter Notebook or in Spyder IDE?

Awaiting your favourable response.

  Upvote    Share

Hi Cloud X Lab Team,

I am getting an error in this code. It says index has not been defined. This from the topic Pandas video.. It is NOT an Assignment but a Video recording, Following is the screenshot :

:

Can you point out where is my mistake? What is needed to rectify my code? Recording discussion is at 01:48:14 duration of the recording.

  Upvote    Share

Yes, sure.

you have written pd.Series(weights, index["colins","Alice] --> IT should be index=["colins","Alice"].

All the best!

-- Satyajit Das

  Upvote    Share

Satyajit appreciate your prompt response. Thanks for your reply.

However, upon again looking at the screenshot, it has been noted that both Colin & Alice are in " ". I have not missed out at Alice. Again I have cross-verified the same in my screen-shot.

Again Colin & Alice have been intentionally defined in Caps as in the earlier defined Series Object. Kindly have a look regarding s2 & s3 below: Still error is being generated for s4 (Visual /Screenshot - 1st attachment).

Kindly let me know where is the mistake made.

.

  Upvote    Share

Satyajit thanks a lot. I did go through my code once again. My earlier focus was on the brackets {[ ]) & inverted commas " ".

Upon close observation realized that = sign after index was missing and therefore the error.
= sign after index was overlooked by me. The code worked beautifully for me. Thanks a lot and Appreciate your valuable inputs Satyajit!!!

  Upvote    Share

Hello sir,
What does it mean by num_bins=5 here in the program,what it's denoting I am sharing snap please explain me that.

  Upvote    Share

Hi, Shivam.

The number of bins segregate the data points into 5 different ranges/regions in the X-axis and Y-axis tells that number of the data points (counts) in the region.

All the best!

-- Satyajit Das

  Upvote    Share

Hello sir,
why we use %matplotlib inline,when I ignored that statement it was showing error.Please explain me it's use.
Thanks

  Upvote    Share

I am unable to access the Jupiter Notebook. getting i/o error. No space in Disk. Can you check this urgently.

  Upvote    Share

Hi,

Can you please share a screenshot of the same.

Thanks.

-- Mayank Sharma

  Upvote    Share

No concrete explanation of concepts.

  Upvote    Share

Hi,

How can I access the python codes showed in video?

  Upvote    Share

hi,
i tried creating panda series object but it is throwing an error saying

AttributeError: module 'pandas' has no attribute 'series'

  Upvote    Share

try pd.Series (Capital S)

  Upvote    Share

thank you bro sorted out the issue

  Upvote    Share

Hi,
i am getting this message in slides window. - You need permission . Please check the attached screenshot also. i need slides of this session.

  Upvote    Share

Hi,
Can you please check now.
Thanks.

  Upvote    Share

Thanks Sir, Now it is working.

  Upvote    Share

Hi,
If I create a pandas series object using a dictionary, is order guaranteed? we know that dictionaries do not preserve the index order but In case of a series order is important because later I may use the series and plot it on a graph.. if the order is not preserved I will be getting 2 different graphs depending on how the dictionary was created in memory.

  Upvote    Share

Hi,

Yes you are correct.
As per my knowledge the order will remain same. And if you create a graph then the corresponding graph will also be shown in same place.

All the best!

-- Satyajit Das

  Upvote    Share

how can i download the slides

  Upvote    Share

Hi, Pranav.

Kindly refer to the GITHUB repo for codes on Python. The slides on Python are not available presently for downloading but Big-Data slides are available.

https://github.com/cloudxla...

All the best!

-- Satyajit Das

  Upvote    Share

i am not able to delete file using !rn

/usr/bin/sh: rn: command not found

  Upvote    Share

what is the meaning of %matplotlib inline and why it is used?

  Upvote    Share

Hi,

Where can I find the got your back project being talked about in this video ?

  Upvote    Share

Hi Ashutosh,

Are you referring to the codes in this video? If yes, then you would find it in our GitHub repository, the link to which is given below the slides.

Thanks.

  Upvote    Share

Hi Rajtilak,

I am referring to the 2 got your back(GYB) projects being talked about in the video timeline 2:57:30 onwards

  Upvote    Share

Hi Asutosh,

These projects were updated and replaced by the Email Churn project.

Thanks.

  Upvote    Share

Hi Cloudxlab support,

Can you please see why the video is unavailable now

Regards
Pruthvi

  Upvote    Share

Hi, Pruthvi.

I have rechecked all the videos are working fine now.
Kindly send the screenshots if still you are getting any error?

All the best!

-- Satyajit Das

  Upvote    Share

You can use the youtube link. https://youtu.be/LxVzfncBcng

  Upvote    Share

Hi CloudXLab Team,

Seems that video has been deleted.

Last month I had completed section " Machine Learning Prerequisites" but for last few days this video has reduced my completion from 100% to 96% and today suddenly this video has been deleted.
Not sure what is going on here with course content.

Thanks,
Yagyesh

  Upvote    Share

Hi CloudXlab,

I am unable to play recording of session - "NumPy, Pandas, Matplotlib in Python"
Its saying Video unavailable.

How can I study this topic - Machine Learning Prerequisites --> NumPy, Pandas, Matplotlib in Python (2nd session)
I only see slides.
Please help.

  Upvote    Share

Hello Disqus,

Thanks for contacting CloudxLab!

This automatic reply is just to let you know that we received your message and we’ll get back to you with a response as quickly as possible. During business hours (9am-5pm IST, Monday-Friday) we do our best to reply within a few hours. Evenings and weekends may take us a little bit longer.

If you have a general question about using CloudxLab, you’re welcome to browse our below Knowledge Base for walkthroughs of all of our features and answers to frequently asked questions.

- Tech FAQ <https: cloudxlab.com="" faq="" support="">
- General FAQ <https: cloudxlab.com="" faq=""/>

If you have any additional information that you think will help us to assist you, please feel free to reply to this email. We look forward to chatting soon!

Cheers,
The CloudxLab Team

  Upvote    Share