Previous Index Next

Scraping Wikipedia Page - Get URL content

Now we will use the get function from the requests module to make a request to a web page. The get method sends a GET request to the specified URL and returns a requests.Response object. You can read more about requests from the below link:

https://requests.readthedocs.io/en/master/user/quickstart/

Next, we will import the Beautiful Soup object. Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.

To parse a document, pass it into the BeautifulSoup constructor. You can pass in a string or an open file handle. First, the document is converted to Unicode, and HTML entities are converted to Unicode characters. Beautiful Soup then parses the document using the best available parser. It will use an HTML parser unless you specifically tell it to use an XML parser.

INSTRUCTIONS

Use the get function to get the URL content:

page = requests.<<your code goes here>>(url, headers)

Import BeautifulSoup from bs4 and set the soup variable to get the page content:

from bs4 import <<your code goes here>>
soup = BeautifulSoup(page.content, 'html.parser')

Let us just have a look at how the response looks like:
```
print(page.content)
```

See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here

Project- Exploring Web Scraping: Python Adventures on Wikipedia and Amazon

Scraping Wikipedia Page - Get URL content

XP

Please login to comment

Be the first one to comment!