Foundations of Python

You are currently auditing this course.

126 / 134

Previous Index Next

Network Programming in python

As part of this session, we will introduce you to network programming in python.

Slides

Code Repository for the course on GitHub

Previous Index Next

Please login to comment

76 Comments

Viral Naik

a year ago

not able to understand. Can you pls elaborate?

Shubh Tripathi

a year ago

This code is using the BeautifulSoup library. Assuming you have a web page HTML content loaded into soup, this code is specifically extracting and printing the values of the href attribute from all the <a> (anchor) tags in the HTML.

Here's a brief breakdown of the code:

tags = soup('a'): This line finds all the <a> tags in the HTML content parsed by BeautifulSoup and stores them in the tags variable.

for tag in tags:: This line starts a loop that iterates over each <a> tag found in the HTML.

print(tag.get('href', None)): Within the loop, this line prints the value of the href attribute of each <a> tag. The tag.get('href', None) part retrieves the value of the href attribute. If the attribute is not present, it returns None.

5C0

3 years ago

utilizes HTML and xPath to scrape websites

Nirav Raj

3 years ago

What is flow control??

Abhinav Singh

3 years ago

Can you please let us know which slide number are you referring to? It will help us in explaining flow control in the appropriate context

Mayank Chaubey

4 years ago

Facing the above error

Vagdevi K

4 years ago

Hi,

Please go through: https://cloudxlab.com/blog/install-python-packages-cloudxlab/

Thanks.

Venkat Akhil

4 years ago

What is the difference between AutoScraper and BeautifulSoup?? Both are used for web scraping..

Rajtilak Bhattacharjee

4 years ago

Hi,

BeautifulSoup utilizes HTML and xPath to scrape websites, whereas AutoScraper just automates the manual search using simple matching rules.

Thanks.

Chitra Bhatia

4 years ago

Any solutions to thi sissue?

thanks

chitra

Rajtilak Bhattacharjee

4 years ago

Hi,

I checked from my end, the notebook is working fine. Are you facing this issue on any particular notebook? This could also happen if you have exceeded your disk space quota. Please go through the below link for more details on this:

https://discuss.cloudxlab.com/t/my-user-disk-space-quota-in-the-lab-has-exceeded-how-can-i-clean-the-unnecessary-files/5370

Thanks.

Chitra Bhatia

4 years ago

i had lot of code in my jupiter notebook.. unable to retrieve or run any command on jupiter notebook.. Any idea how do we retrieve? Seems like this got hanged..

Vagdevi K

4 years ago

Hi,

Does the problem still persist?

Thanks.

This comment has been removed.

Deepak P Nair

4 years ago

In this video tutorial upto 30 mins most of the codes dont work or it works but the 404 not found is appearing, shows no prior preparation or checking from team. Maybe for a beginner like me all this is more confusing when there is no proper output to understand the program. The explanation is also not upto the mark!

Abhinav Singh

4 years ago

Hi Deepak,

Can you please let us know which URLs are giving 404 so that we can fix the content.

Gaurav Sharma

4 years ago

Contt.... in beautifulsoup

why it is showing this error?

Gaurav Sharma

4 years ago

Why i am getting this Error in installing beautifulsoup?

Rajtilak Bhattacharjee

4 years ago

Hi,

Please go through the below link to understand how you can install packages in our lab:

https://cloudxlab.com/blog/install-python-packages-cloudxlab/

Thanks.

Gaurav Sharma

4 years ago

it is showing same problem again

Rajtilak Bhattacharjee

4 years ago

Hi,

Can you share a screenshot please.

Thanks.

Gaurav Sharma

4 years ago

it is showing same error again and i have followed all the steps in console with given command but it is showing same error in jupyter.

Rajtilak Bhattacharjee

4 years ago

Hi,

We already have BeautifulSoup installed in our lab. Please use the following command to access it:

from bs4 import BeautifulSoup

Thanks.

Gaurav Sharma

4 years ago

i got this error in the console . please resolve this

Rajtilak Bhattacharjee

4 years ago

Hi,

This is because the version of BeautifulSoup you are trying to install requires a different version of pip. You do not need to install BeautifulSoup, it is already installed in our lab.

Thanks.

Gaurav Sharma

4 years ago

i have used this command already bu same error showing again and again

Gaurav Sharma

4 years ago

but it is not working sir. what to do? tell me. i have already sent you all the screen shot of errors . please check it once

Rajtilak Bhattacharjee

4 years ago

Hi Gaurav,

The command for which you had attached a screenshot is the command we use to install a library. You do not need to install the library as it is already installed in our lab. Please use the import command I provided above in the notebook on the right side of the split screen and not the console. If after that still you are facing any issue, please attach a new screenshot with the command I provided and the error that is giving.

Thanks.

Gaurav Sharma

4 years ago

Look at this sir i have used this command in jupyter, but it is showing error again.

Rajtilak Bhattacharjee

4 years ago

Hi,

The 's' in BeautifulSoup should be in upper case. Please remember that Python is a case sensitive language.

Thanks.

Gaurav Sharma

4 years ago

Thank you sir working now.

This comment has been removed.

Prasun Banerjee

5 years ago

What does the code: !pip install beautifulsoup4 does ?

Vagdevi K

5 years ago

Hi,

It installs beautifulsoup4. Generally we write such instructions in console. But using !, we could write such instructions in jupyter notebook.

Thanks.

Prasun Banerjee

5 years ago

What is the difference between the following codes?

1.

import urllib

2.

from urllib import *

Rajtilak Bhattacharjee

5 years ago

Hi,

This is a very good question!

Please go through the below link to understand the difference:

https://stackoverflow.com/questions/710551/use-import-module-or-from-module-import

Thanks.

Manjari Singh

5 years ago

how does the host name relate to python?

which servers do allow hosting of a python program?

or how do we give a URL(www.somename.com) to a python program?

can you please give a practical example? like I thought python is mainly used for a backend ?

so does it connect over http or socket?

Rajtilak Bhattacharjee

5 years ago

Hi,

1. Host name does not related to Python, it is related to a network. We can write a code in Python which can be used to perform tasks over a network.

2. You cannot give a URL to a Python program. You can only create an app with a Python program that you can then host on the internet, or over a network, for others to be able to use it.

3. You can read more about networking and inter process communication in Python from the below link:

https://docs.python.org/3/library/ipc.html

Thanks.

kranti sanglam

5 years ago

There are 3 questions mentioned below, please share the answer and explanation for each:

Q1 - 07:06 - whether the port numbers are universal or can be changed depending upon the service provider supposed Amazon server has some other port number and Google server has some different, is it?

Q2 - 19:54 and page 31 in slides

How can we do this in our terminal?

I’m unable to do it.

Q3 – 45:43

What we do if it is not done automatically, means not already installed? And what is the meaning of below mentioned 2 lines except comments.

I understood pip : - lets u download and install a package in python , packages in python are generally located on central repositories and using pip one can download those packages from central repository and one can install in their work place.

But why exclamation mark is present before pip and is last line address of BS4???

# Already installed on CloudxLab

!pip install beautifulsoup4

Requirement already satisfied: beautifulsoup4 in /usr/local/anaconda/envs/py36/lib/python3.6/site-packages

Rajtilak Bhattacharjee

5 years ago

Hi,

Port numbers are not universal, they can be different in various networks.

Telnet has not been installed in our labs because of security issues. We use SSH instead. You can find more about SSH from the below link:

https://cloudxlab.com/faq/28/how-do-i-connect-to-cloudxlab-from-my-local-machine

The exclamation mark before pip indicates that it is not a Python command but a Linux command. For example, when you use git clone in Jupyter notebook, you add an exclamation mark in front of it.

The last line, "/usr/local/anaconda/envs/py36/lib/python3.6/site-packages" is the folder in which beautifulsoup4 is installed.

Thanks.

kranti sanglam

5 years ago

HI pls look into the matter ,,,

Rajtilak Bhattacharjee

5 years ago

Hi,

This happens at times when you try to view a notebook on GitHub. Would suggest you to clone the repository and then view it in the lab.

Thanks.

apratimkumar pandey

5 years ago

Hi

Can we import all the library using "from"?

Rajtilak Bhattacharjee

5 years ago

Hi,

No, usually we import libraries using the import command.

Thanks.

r r

5 years ago

Sir i am not able to understand the usage and meaning of these two statements-

tags=soup('a')

print(tag.get('href',None)) -what is href in the print statement?

Ashwani Gupta

5 years ago

Hi r r,

Nice name :)
So the first statement

tags = soup('a')

This will select all <a> tags or anchor tags from your HTML. It will return a Python list of <a> tags, which will be stored in tags variable

Now the second statement

print(tag.get('href', None))

Since you are iterating over tags, tag is one of those anchor tag object. Now in HTML every anchor tag is a hyperlink and a hyperlink has to be point to a link or URL. the href is that URL

Megha

5 years ago

How can I download the jupyter notebook associated with this topic?

Aditya Kumar

5 years ago

i am trying to run the urllib module in vscode but it is saying urllib has no attribute 'request'.
pls give a solution.

Yogesh Pandey

5 years ago

I am not able to understand the purpose of the statement "tags = soup_data('a')" in the code. Could you please help me out? PFA the screenshot of the concerned code.

CloudxLab

5 years ago

Hi,

a is the tag given to link in HTML. Here it is trying to capture the data for all links by reading the tag a.

Thanks.

-- Rajtilak Bhattacharjee

Souvik Biswas

5 years ago

Hello,
How do I fix this error?
Please guide me.

CloudxLab

5 years ago

Hi,

Are you still facing this issue? If yes then would request you to share your email id with us.

Thanks.

-- Rajtilak Bhattacharjee

Souvik Biswas

5 years ago

Hello,
This is my email id- biswas.souvik.1989@gmail.com

Souvik Biswas

5 years ago

Hello,
I'm still facing this issue

CloudxLab

5 years ago

Hi,

Could you please check once again if it is working fine now? Also, would request you to delete some of the files that you are not working on anymore.
Thanks.

-- Rajtilak Bhattacharjee

Rajeshwar Jagtap

5 years ago

urllib reuqest Issue:
I have installed urllib3 and its not working.
Code:
html = request.RequestMethods.urlopen(method='GET', url=weburl)
print(html)
Output:
TypeError: urlopen() missing 1 required positional argument: 'self'

CloudxLab

5 years ago

Hi,

Would request you to share a screenshot of your code and the error that you are getting.

Thanks.

-- Rajtilak Bhattacharjee

Krishna

5 years ago

Hi,
Can you please help me to troubleshoot below error.
Code -
# Enter http://en.wikipedia.org/wik...
from urllib import *
from bs4 import BeautifulSoup
url = input('Enter url - ')
html = request.urlopen(url).read()

Response -
Enter url - http://en.wikipedia.org/wik...
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
<ipython-input-6-4ca7babbf941> in <module>
3 from bs4 import BeautifulSoup
4 url = input('Enter url - ')
----> 5 html = request.urlopen(url).read()
6

/usr/local/anaconda/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
221 else:
222 opener = _opener
--> 223 return opener.open(url, data, timeout)
224
225 def install_opener(opener):

/usr/local/anaconda/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

/usr/local/anaconda/lib/python3.6/urllib/request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

/usr/local/anaconda/lib/python3.6/urllib/request.py in error(self, proto, *args)
562 http_err = 0
563 args = (dict, proto, meth_name) + args
--> 564 result = self._call_chain(*args)
565 if result:
566 return result

/usr/local/anaconda/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/usr/local/anaconda/lib/python3.6/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
754 fp.close()
755
--> 756 return self.parent.open(new, timeout=req.timeout)
757
758 http_error_301 = http_error_303 = http_error_307 = http_error_302

/usr/local/anaconda/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

/usr/local/anaconda/lib/python3.6/urllib/request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

/usr/local/anaconda/lib/python3.6/urllib/request.py in error(self, proto, *args)
562 http_err = 0
563 args = (dict, proto, meth_name) + args
--> 564 result = self._call_chain(*args)
565 if result:
566 return result

/usr/local/anaconda/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/usr/local/anaconda/lib/python3.6/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
754 fp.close()
755
--> 756 return self.parent.open(new, timeout=req.timeout)
757
758 http_error_301 = http_error_303 = http_error_307 = http_error_302

/usr/local/anaconda/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

/usr/local/anaconda/lib/python3.6/urllib/request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

/usr/local/anaconda/lib/python3.6/urllib/request.py in error(self, proto, *args)
568 if http_err:
569 args = (dict, 'default', 'http_error_default') + orig_args
--> 570 return self._call_chain(*args)
571
572 # XXX probably also want an abstract factory that knows when it makes

/usr/local/anaconda/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/usr/local/anaconda/lib/python3.6/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
648 class HTTPDefaultErrorHandler(BaseHandler):
649 def http_error_default(self, req, fp, code, msg, hdrs):
--> 650 raise HTTPError(req.full_url, code, msg, hdrs, fp)
651
652 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found

CloudxLab

5 years ago

Hi,

Would request you to share a screenshot of your code, and the error that you are getting. Also, please let us know about the assessment you are trying to attempt here.

Thanks.

-- Rajtilak Bhattacharjee

Shivam Srivastava

5 years ago

Shivam Srivastava

5 years ago

Hello sir,
what does 'href' represents here..? what does that mean or what is it's purpose.
in print() statement.
Thanks

CloudxLab

5 years ago

Hi,

href is an HTML attribute used to define an URL. The *href* attribute specifies the URL of the page the link goes to. Tip: You can use *href*="#top" or *href*="#" to link to the top of the current page! If the *href* attribute is not present, the tag is not a hyperlink.

Thanks.

-- Rajtilak Bhattacharjee

CloudxLab

5 years ago

Hi,

Are you facing any challenges with this code? Please let us know.

Thanks.

-- Rajtilak Bhattacharjee

Shruthi Gopi

5 years ago

In BeautifulSoup - Why do we assign tags = soup('a')? when i tried giving soup('b') it returned None

CloudxLab

5 years ago

Hi Shruthi,

The 'a' stands for anchor tag which looks like . There are no anchor tags with 'b' so it returned none.

Thanks.

-- Rajtilak Bhattacharjee

Mohini Singhal

5 years ago

i am not able to import beautifulsoup4

CloudxLab

5 years ago

Hello Disqus,

Thanks for contacting CloudxLab!

This automatic reply is just to let you know that we received your message and we’ll get back to you with a response as quickly as possible. During business hours (9am-5pm IST, Monday-Friday) we do our best to reply within a few hours. Evenings and weekends may take us a little bit longer.

If you have a general question about using CloudxLab, you’re welcome to browse our below Knowledge Base for walkthroughs of all of our features and answers to frequently asked questions.

- Tech FAQ <https: cloudxlab.com="" faq="" support="">
- General FAQ <https: cloudxlab.com="" faq=""/>

If you have any additional information that you think will help us to assist you, please feel free to reply to this email. We look forward to chatting soon!

Cheers,
The CloudxLab Team

CloudxLab

5 years ago

Hi, Mohini.

You just need to import the beautifulsoup module by the following command.
from bs4 import BeautifulSoup
You will be able to import the module.
If still not kindly send screenshots.

All the best!

-- Satyajit Das

Himanshu Malhotra

5 years ago

re.findall("aaa* ","this is a aa and that is a aaa")

result - ['aa ']

result should be ['aaa ']

please explain

Punit Bhilota

5 years ago

Hi There,

Following code is provided in the chp. 12 "Networked Program' ppt. It is not working.
Have tried 2-3 options (adding 'b' before get, converting to utf-8) but they are also not working. I am able to acess the file (romeo.txt) in the browser.

import socket, sys
mysock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com',80))
mysock.send('GET http://www.py4inf.com/code/... HTTP/1.0\n\n')
while True:
data = mysock.recv(512)
if ( len(data) < 1 ) :
break
print(data)
mysock.close()

Error:

TypeError Traceback (most recent call last)
<ipython-input-1-17a43c34c636> in <module>
2 mysock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
3 mysock.connect(('www.py4inf.com',80))
----> 4 mysock.send('GET http://www.py4inf.com/code/... HTTP/1.0\n\n')
5 #try:
6 # mysock.connect(('www.py4inf.com',10))

TypeError: a bytes-like object is required, not 'str'

Sodhi Sneaks Jot

5 years ago

Hi sir, suppose we want to match a string which contains all the special characters, will the procedure be same for it as well. For instance what if the string is: " @@@ %%% $$$$".

Pavan Kumar Akula

5 years ago

Hi Sodhi.

is this what you are looking for?

import re sStr = "@@@ %%% $$$$" print(re.findall("[@%$]",sStr))

Sodhi Sneaks Jot

5 years ago

Yes sir

Vinaymsgbox

5 years ago

You can use negate alpha num For e.g.

import re
re.findall('[^a-z0-9A-Z]+ ','@@@ %%% $$$$ wer23$23432 #R#TW#')

Atul

5 years ago

How can I download the slides?

Abhinav Singh

5 years ago

Just pop out the slides in new window and you will see the option to download it.