20_BeautifulSoup Library for Web Scraping
20_BeautifulSoup Library for Web Scraping
• This can be done using various libraries and tools available in Python.
Libraries and platforms available
2. Fetch the Web Page: Requests library allows you to send HTTP
requests and handle the responses.
import requests
response = requests.get('https://example.com')
print(response.text)
Response numbers
• 200: The request was successful, and the content can be scrapped.
• 201: Request successful, new resource is created.
• 204: No content available
• 300: Resource Moved permanently
• 302: Resource located at a different URL
• 400: Bad request, invalid syntax
• 401: Unauthorized, authentication is required
• 403: Sever understood the request but is refusing to authorize it.
• 404: Resource not found on the server
• And many more….
Steps: Web Scraping
3. Use BeautifulSoup to parse the HTML content and extract
specific elements.
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser’)
element = soup.find(id='element_id')
elements = soup.find_all(class_='element_class')
tags = soup.find_all('tag_name')
Demonstration
Web Scrapping using BeautifulSoup and Requests library in python
https://www.scrapethissite.com/pages/forms/
Learning objective
• What is web scraping?
• Libraries and platforms available
• Purpose of web scraping
• Is it ethical to do web scraping?
• Steps: Web Scraping
• Response numbers