[go: up one dir, main page]

0% found this document useful (0 votes)
10 views12 pages

20_BeautifulSoup Library for Web Scraping

The document provides an overview of web scraping, including its definition, purpose, and ethical considerations. It highlights popular Python libraries such as BeautifulSoup and Requests, and outlines the steps involved in web scraping, including fetching web pages and parsing HTML content. Additionally, it explains various HTTP response codes relevant to web scraping activities.

Uploaded by

Arif Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views12 pages

20_BeautifulSoup Library for Web Scraping

The document provides an overview of web scraping, including its definition, purpose, and ethical considerations. It highlights popular Python libraries such as BeautifulSoup and Requests, and outlines the steps involved in web scraping, including fetching web pages and parsing HTML content. Additionally, it explains various HTTP response codes relevant to web scraping activities.

Uploaded by

Arif Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Web Scrapping

BeautifulSoup and requests Library


Learning objective
• What is web scraping?
• Libraries and platforms available
• Purpose of web scraping
• Is it ethical to do web scraping?
• Steps: Web Scraping
• Response numbers
• Demo: https://www.scrapethissite.com/pages/forms/
What is web scraping?
• An automated process of extracting data from websites.

• Also known as web crawler, web harvesting or web data extraction.

• It involves fetching the content of a web page and then parsing it to


retrieve the desired information.

• This can be done using various libraries and tools available in Python.
Libraries and platforms available

Python Libraries Online Platforms


• BeautifulSoup • ParseHub
• Selenium • Octoparse
• Pandas
• Scrapy
• PySpider
Purpose of web scraping
• Extracting data for statistical or trend analysis used in academic
research.

• Gathering customer reviews or competitor data for research


purpose.

• Tracking product prices across e-commerce platforms.

• Collecting content from news articles or blogs


Is it ethical to do web scraping?
• No, because you are gaining information about organization website.

• You must ask Permission before you do web scraping.

• You must review the website's terms of use to ensure compliance.

• You must abstain from scraping personal or sensitive information.

• Once permitted, do not overload servers with excessive requests.


Steps: Web Scraping
• BeautifulSoup library is a popular Python library for parsing HTML and
XML documents. It works well with the requests library to fetch web
pages.

1. Install BeautifulSoup and Requests:


pip install beautifulsoup4 requests

2. Fetch the Web Page: Requests library allows you to send HTTP
requests and handle the responses.
import requests
response = requests.get('https://example.com')
print(response.text)
Response numbers
• 200: The request was successful, and the content can be scrapped.
• 201: Request successful, new resource is created.
• 204: No content available
• 300: Resource Moved permanently
• 302: Resource located at a different URL
• 400: Bad request, invalid syntax
• 401: Unauthorized, authentication is required
• 403: Sever understood the request but is refusing to authorize it.
• 404: Resource not found on the server
• And many more….
Steps: Web Scraping
3. Use BeautifulSoup to parse the HTML content and extract
specific elements.
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser’)

4. Use BeautifulSoup methods to find and extract the data you


need.
title = soup.title.text
print(title)
Steps: Web Scraping
5. You can find elements by their ID, class, or tag.

element = soup.find(id='element_id')
elements = soup.find_all(class_='element_class')
tags = soup.find_all('tag_name')
Demonstration
Web Scrapping using BeautifulSoup and Requests library in python

https://www.scrapethissite.com/pages/forms/
Learning objective
• What is web scraping?
• Libraries and platforms available
• Purpose of web scraping
• Is it ethical to do web scraping?
• Steps: Web Scraping
• Response numbers

You might also like