Foundations of Python

You are currently auditing this course.
126 / 134

Network Programming in python

As part of this session, we will introduce you to network programming in python.


Slides


Code Repository for the course on GitHub



No hints are availble for this assesment

Answer is not availble for this assesment

Please login to comment

76 Comments

not able to understand. Can you pls elaborate?

  Upvote    Share

This code is using the BeautifulSoup library. Assuming you have a web page HTML content loaded into soup, this code is specifically extracting and printing the values of the href attribute from all the <a> (anchor) tags in the HTML.

Here's a brief breakdown of the code:

tags = soup('a'): This line finds all the <a> tags in the HTML content parsed by BeautifulSoup and stores them in the tags variable.

for tag in tags:: This line starts a loop that iterates over each <a> tag found in the HTML.

print(tag.get('href', None)): Within the loop, this line prints the value of the href attribute of each <a> tag. The tag.get('href', None) part retrieves the value of the href attribute. If the attribute is not present, it returns None.

  Upvote    Share

utilizes HTML and xPath to scrape websites

  Upvote    Share

What is flow control??

  Upvote    Share

Can you please let us know which slide number are you referring to? It will help us in explaining flow control in the appropriate context

  Upvote    Share

Facing the above error

  Upvote    Share

What is the difference between AutoScraper and BeautifulSoup?? Both are used for web scraping..

  Upvote    Share

Hi,

BeautifulSoup utilizes HTML and xPath to scrape websites, whereas AutoScraper just automates the manual search using simple matching rules.

Thanks.

  Upvote    Share

Any solutions to thi sissue?

 

thanks 

chitra

  Upvote    Share

Hi,

I checked from my end, the notebook is working fine. Are you facing this issue on any particular notebook? This could also happen if you have exceeded your disk space quota. Please go through the below link for more details on this:

https://discuss.cloudxlab.com/t/my-user-disk-space-quota-in-the-lab-has-exceeded-how-can-i-clean-the-unnecessary-files/5370

Thanks.

  Upvote    Share

i had lot of code in my jupiter notebook.. unable to retrieve or run any command on jupiter notebook.. Any idea how do we retrieve? Seems like this got hanged..

  Upvote    Share

Hi,

Does the problem still persist?

Thanks.

  Upvote    Share

This comment has been removed.

In this video tutorial upto 30 mins most of the codes dont work or it works but the 404 not found is appearing, shows no prior preparation or checking from team. Maybe for a beginner like me all this is more confusing when there is no proper output to understand the program. The explanation is also not upto the mark!

  Upvote    Share

Hi Deepak,

Can you please let us know which URLs are giving 404 so that we can fix the content.

  Upvote    Share

Contt.... in beautifulsoup

why it is showing this error?

  Upvote    Share

Why i am getting this Error in installing beautifulsoup?

  Upvote    Share

Hi,

Please go through the below link to understand how you can install packages in our lab:

https://cloudxlab.com/blog/install-python-packages-cloudxlab/

Thanks.

  Upvote    Share

it is showing same problem again

  Upvote    Share

Hi,

Can you share a screenshot please.

Thanks.

  Upvote    Share

it is showing same error again and i have followed all the steps in console with given command but it is showing same error in jupyter.

  Upvote    Share

Hi,

We already have BeautifulSoup installed in our lab. Please use the following command to access it:

from bs4 import BeautifulSoup

Thanks.

  Upvote    Share

i got this error in the console .  please resolve this 

  Upvote    Share

Hi,

This is because the version of BeautifulSoup you are trying to install requires a different version of pip. You do not need to install BeautifulSoup, it is already installed in our lab.

Thanks.

  Upvote    Share

i have used this command already bu same error showing again and again 

  Upvote    Share

but it is not working sir. what to do? tell me. i have already sent you all the screen shot of errors . please check it once

  Upvote    Share

Hi Gaurav,

The command for which you had attached a screenshot is the command we use to install a library. You do not need to install the library as it is already installed in our lab. Please use the import command I provided above in the notebook on the right side of the split screen and not the console. If after that still you are facing any issue, please attach a new screenshot with the command I provided and the error that is giving.

Thanks.

  Upvote    Share

Look at this sir i have used this command in jupyter, but it is showing error again.

  Upvote    Share

Hi,

The 's' in BeautifulSoup should be in upper case. Please remember that Python is a case sensitive language.

Thanks.

  Upvote    Share

Thank you sir working now.

  Upvote    Share

This comment has been removed.

What does the code: !pip install beautifulsoup4 does ?

  Upvote    Share

Hi,

It installs beautifulsoup4. Generally we write such instructions in console. But using !, we could write such instructions in jupyter notebook.

Thanks.

  Upvote    Share

What is the difference between the following codes?

1. 

import urllib

2.

from urllib import *

 

 2  Upvote    Share

Hi,

This is a very good question!

Please go through the below link to understand the difference:

https://stackoverflow.com/questions/710551/use-import-module-or-from-module-import

Thanks.

  Upvote    Share

how does the host name relate to python?

which servers do allow hosting of a python program?

or how do we give a URL(www.somename.com) to a python program?

can you please give a practical example? like I thought python is mainly used for a backend ?

so does it connect over http or socket? 

  Upvote    Share

Hi,

1. Host name does not related to Python, it is related to a network. We can write a code in Python which can be used to perform tasks over a network.

2. You cannot give a URL to a Python program. You can only create an app with a Python program that you can then host on the internet, or over a network, for others to be able to use it.

3. You can read more about networking and inter process communication in Python from the below link:

https://docs.python.org/3/library/ipc.html

Thanks.

  Upvote    Share

There are 3 questions mentioned below, please share the answer and explanation for each:

 

Q1 - 07:06 - whether the port numbers are universal or can be changed depending upon the service provider supposed Amazon server has some other port number and Google server has some different, is it?

 

Q2 - 19:54 and page 31 in slides

How can we do this in our terminal?

I’m unable to do it.

 

Q3 – 45:43

What we do if it is not done automatically, means not already installed? And what is the meaning of below mentioned 2 lines except comments.

I understood pip : - lets u download and install a package in python , packages  in python are generally located on central repositories and using pip one can download those packages from central repository and one can install in their work place.

But why exclamation mark is present before pip and is last line address of BS4???

 

# Already installed on CloudxLab

 

!pip install beautifulsoup4

 

Requirement already satisfied: beautifulsoup4 in /usr/local/anaconda/envs/py36/lib/python3.6/site-packages

  Upvote    Share

Hi,

Port numbers are not universal, they can be different in various networks.

Telnet has not been installed in our labs because of security issues. We use SSH instead. You can find more about SSH from the below link:

https://cloudxlab.com/faq/28/how-do-i-connect-to-cloudxlab-from-my-local-machine

The exclamation mark before pip indicates that it is not a Python command but a Linux command. For example, when you use git clone in Jupyter notebook, you add an exclamation mark in front of it.

The last line, "/usr/local/anaconda/envs/py36/lib/python3.6/site-packages" is the folder in which beautifulsoup4 is installed.

Thanks.

 1  Upvote    Share

HI pls look into the matter ,,,

 

  Upvote    Share

Hi,

This happens at times when you try to view a notebook on GitHub. Would suggest you to clone the repository and then view it in the lab.

Thanks.

  Upvote    Share

Hi

Can we import all the library using "from"?

 

  Upvote    Share

Hi,

No, usually we import libraries using the import command.

Thanks.

  Upvote    Share

Sir i am not able to understand the usage and meaning of these two statements-

 tags=soup('a')

 print(tag.get('href',None)) -what is href in the print statement?

  Upvote    Share

Hi r r,

Nice name :)
So the first statement

tags = soup('a')

This will select all <a> tags or anchor tags from your HTML. It will return a Python list of <a> tags, which will be stored in tags variable

Now the second statement

print(tag.get('href', None))

Since you are iterating over tags, tag is one of those anchor tag object. Now in HTML every anchor tag is a hyperlink and a hyperlink has to be point to a link or URL. the href is that URL

  Upvote    Share

How can I download the jupyter notebook associated with this topic?

  Upvote    Share

i am trying to run the urllib module in vscode but it is saying urllib has no attribute 'request'.
pls give a solution.

  Upvote    Share

I am not able to understand the purpose of the statement "tags = soup_data('a')" in the code. Could you please help me out? PFA the screenshot of the concerned code.

  Upvote    Share

Hi,

a is the tag given to link in HTML. Here it is trying to capture the data for all links by reading the tag a.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hello,
How do I fix this error?
Please guide me.

  Upvote    Share

Hi,

Are you still facing this issue? If yes then would request you to share your email id with us.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hello,
This is my email id- biswas.souvik.1989@gmail.com

  Upvote    Share

Hello,
I'm still facing this issue

  Upvote    Share

Hi,

Could you please check once again if it is working fine now? Also, would request you to delete some of the files that you are not working on anymore.
Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

urllib reuqest Issue:
I have installed urllib3 and its not working.
Code:
html = request.RequestMethods.urlopen(method='GET', url=weburl)
print(html)
Output:
TypeError: urlopen() missing 1 required positional argument: 'self'

  Upvote    Share

Hi,

Would request you to share a screenshot of your code and the error that you are getting.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hi,
Can you please help me to troubleshoot below error.
Code -
# Enter http://en.wikipedia.org/wik...
from urllib import *
from bs4 import BeautifulSoup
url = input('Enter url - ')
html = request.urlopen(url).read()

Response -
Enter url - http://en.wikipedia.org/wik...
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
<ipython-input-6-4ca7babbf941> in <module>
3 from bs4 import BeautifulSoup
4 url = input('Enter url - ')
----> 5 html = request.urlopen(url).read()
6

/usr/local/anaconda/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
221 else:
222 opener = _opener
--> 223 return opener.open(url, data, timeout)
224
225 def install_opener(opener):

/usr/local/anaconda/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

/usr/local/anaconda/lib/python3.6/urllib/request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

/usr/local/anaconda/lib/python3.6/urllib/request.py in error(self, proto, *args)
562 http_err = 0
563 args = (dict, proto, meth_name) + args
--> 564 result = self._call_chain(*args)
565 if result:
566 return result

/usr/local/anaconda/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/usr/local/anaconda/lib/python3.6/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
754 fp.close()
755
--> 756 return self.parent.open(new, timeout=req.timeout)
757
758 http_error_301 = http_error_303 = http_error_307 = http_error_302

/usr/local/anaconda/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

/usr/local/anaconda/lib/python3.6/urllib/request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

/usr/local/anaconda/lib/python3.6/urllib/request.py in error(self, proto, *args)
562 http_err = 0
563 args = (dict, proto, meth_name) + args
--> 564 result = self._call_chain(*args)
565 if result:
566 return result

/usr/local/anaconda/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/usr/local/anaconda/lib/python3.6/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
754 fp.close()
755
--> 756 return self.parent.open(new, timeout=req.timeout)
757
758 http_error_301 = http_error_303 = http_error_307 = http_error_302

/usr/local/anaconda/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

/usr/local/anaconda/lib/python3.6/urllib/request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

/usr/local/anaconda/lib/python3.6/urllib/request.py in error(self, proto, *args)
568 if http_err:
569 args = (dict, 'default', 'http_error_default') + orig_args
--> 570 return self._call_chain(*args)
571
572 # XXX probably also want an abstract factory that knows when it makes

/usr/local/anaconda/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/usr/local/anaconda/lib/python3.6/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
648 class HTTPDefaultErrorHandler(BaseHandler):
649 def http_error_default(self, req, fp, code, msg, hdrs):
--> 650 raise HTTPError(req.full_url, code, msg, hdrs, fp)
651
652 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found

  Upvote    Share

Hi,

Would request you to share a screenshot of your code, and the error that you are getting. Also, please let us know about the assessment you are trying to attempt here.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hello sir,
what does 'href' represents here..? what does that mean or what is it's purpose.
in print() statement.
Thanks

  Upvote    Share

Hi,

href is an HTML attribute used to define an URL. The *href* attribute specifies the URL of the page the link goes to. Tip: You can use *href*="#top" or *href*="#" to link to the top of the current page! If the *href* attribute is not present, the tag is not a hyperlink.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hi,

Are you facing any challenges with this code? Please let us know.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

In BeautifulSoup - Why do we assign tags = soup('a')? when i tried giving soup('b') it returned None

  Upvote    Share

i am not able to import beautifulsoup4

  Upvote    Share

Hello Disqus,

Thanks for contacting CloudxLab!

This automatic reply is just to let you know that we received your message and we’ll get back to you with a response as quickly as possible. During business hours (9am-5pm IST, Monday-Friday) we do our best to reply within a few hours. Evenings and weekends may take us a little bit longer.

If you have a general question about using CloudxLab, you’re welcome to browse our below Knowledge Base for walkthroughs of all of our features and answers to frequently asked questions.

- Tech FAQ <https: cloudxlab.com="" faq="" support="">
- General FAQ <https: cloudxlab.com="" faq=""/>

If you have any additional information that you think will help us to assist you, please feel free to reply to this email. We look forward to chatting soon!

Cheers,
The CloudxLab Team

  Upvote    Share

Hi, Mohini.

You just need to import the beautifulsoup module by the following command.
from bs4 import BeautifulSoup
You will be able to import the module.
If still not kindly send screenshots.

All the best!

-- Satyajit Das

  Upvote    Share

re.findall("aaa* ","this is a aa and that is a aaa")

result - ['aa ']

result should be ['aaa ']

please explain

  Upvote    Share

Hi There,

Following code is provided in the chp. 12 "Networked Program' ppt. It is not working.
Have tried 2-3 options (adding 'b' before get, converting to utf-8) but they are also not working. I am able to acess the file (romeo.txt) in the browser.

import socket, sys
mysock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com',80))
mysock.send('GET http://www.py4inf.com/code/... HTTP/1.0\n\n')
while True:
data = mysock.recv(512)
if ( len(data) < 1 ) :
break
print(data)
mysock.close()

Error:

TypeError Traceback (most recent call last)
<ipython-input-1-17a43c34c636> in <module>
2 mysock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
3 mysock.connect(('www.py4inf.com',80))
----> 4 mysock.send('GET http://www.py4inf.com/code/... HTTP/1.0\n\n')
5 #try:
6 # mysock.connect(('www.py4inf.com',10))

TypeError: a bytes-like object is required, not 'str'

  Upvote    Share

Hi sir, suppose we want to match a string which contains all the special characters, will the procedure be same for it as well. For instance what if the string is: " @@@ %%% $$$$".

  Upvote    Share

Hi Sodhi.

is this what you are looking for?

import re
sStr = "@@@ %%% $$$$"
print(re.findall("[@%$]",sStr))

  Upvote    Share

Yes sir

  Upvote    Share

You can use negate alpha num For e.g.

import re
re.findall('[^a-z0-9A-Z]+ ','@@@ %%% $$$$ wer23$23432 #R#TW#')

  Upvote    Share

How can I download the slides?

  Upvote    Share
Abhinav Singh

Just pop out the slides in new window and you will see the option to download it.

  Upvote    Share