20_BeautifulSoup Library for Web Scraping

The document provides an overview of web scraping, including its definition, purpose, and ethical considerations. It highlights popular Python libraries such as BeautifulSoup and Requests, and outlines the steps involved in web scraping, including fetching web pages and parsing HTML content. Additionally, it explains various HTTP response codes relevant to web scraping activities.

Uploaded by

Arif Ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views12 pages

20_BeautifulSoup Library for Web Scraping

Uploaded by

Arif Ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Web Scrapping

BeautifulSoup and requests Library

Learning objective
• What is web scraping?
• Libraries and platforms available
• Purpose of web scraping
• Is it ethical to do web scraping?
• Steps: Web Scraping
• Response numbers
• Demo: https://www.scrapethissite.com/pages/forms/
What is web scraping?
• An automated process of extracting data from websites.

• Also known as web crawler, web harvesting or web data extraction.

• It involves fetching the content of a web page and then parsing it to

retrieve the desired information.

• This can be done using various libraries and tools available in Python.
Libraries and platforms available

Python Libraries Online Platforms

• BeautifulSoup • ParseHub
• Selenium • Octoparse
• Pandas
• Scrapy
• PySpider
Purpose of web scraping
• Extracting data for statistical or trend analysis used in academic
research.

• Gathering customer reviews or competitor data for research

purpose.

• Tracking product prices across e-commerce platforms.

• Collecting content from news articles or blogs

Is it ethical to do web scraping?
• No, because you are gaining information about organization website.

• You must ask Permission before you do web scraping.

• You must review the website's terms of use to ensure compliance.

• You must abstain from scraping personal or sensitive information.

• Once permitted, do not overload servers with excessive requests.

Steps: Web Scraping
• BeautifulSoup library is a popular Python library for parsing HTML and
XML documents. It works well with the requests library to fetch web
pages.

1. Install BeautifulSoup and Requests:

pip install beautifulsoup4 requests

2. Fetch the Web Page: Requests library allows you to send HTTP
requests and handle the responses.
import requests
response = requests.get('https://example.com')
print(response.text)
Response numbers
• 200: The request was successful, and the content can be scrapped.
• 201: Request successful, new resource is created.
• 204: No content available
• 300: Resource Moved permanently
• 302: Resource located at a different URL
• 400: Bad request, invalid syntax
• 401: Unauthorized, authentication is required
• 403: Sever understood the request but is refusing to authorize it.
• 404: Resource not found on the server
• And many more….
Steps: Web Scraping
3. Use BeautifulSoup to parse the HTML content and extract
specific elements.
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser’)

4. Use BeautifulSoup methods to find and extract the data you

need.
title = soup.title.text
print(title)
Steps: Web Scraping
5. You can find elements by their ID, class, or tag.

element = soup.find(id='element_id')
elements = soup.find_all(class_='element_class')
tags = soup.find_all('tag_name')
Demonstration
Web Scrapping using BeautifulSoup and Requests library in python

https://www.scrapethissite.com/pages/forms/
Learning objective
• What is web scraping?
• Libraries and platforms available
• Purpose of web scraping
• Is it ethical to do web scraping?
• Steps: Web Scraping
• Response numbers

Web Scraping with Python Step by Step: A Practical Guide with Examples
From Everand
Web Scraping with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Python Web Scraping Tutorial
92% (12)
Python Web Scraping Tutorial
65 pages
Web Scraping With Python Tutorials From A To Z
100% (2)
Web Scraping With Python Tutorials From A To Z
35 pages
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
100% (3)
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
26 pages
Text-Processing-For-NLP-Web-Scrapping (5)
No ratings yet
Text-Processing-For-NLP-Web-Scrapping (5)
18 pages
Liferay Beginner’s Guide
From Everand
Liferay Beginner’s Guide
Robert Chen
4/5 (1)
Module-4
No ratings yet
Module-4
14 pages
Retrieving Data From the Web (1)
No ratings yet
Retrieving Data From the Web (1)
9 pages
Managing Data and Media in Silverlight 4: A mashup of chapters from Packt's bestselling Silverlight books
From Everand
Managing Data and Media in Silverlight 4: A mashup of chapters from Packt's bestselling Silverlight books
GastÃ³n C. Hillar
No ratings yet
06 WebScrapingData
No ratings yet
06 WebScrapingData
39 pages
Unit 11 Application Development Using Python
No ratings yet
Unit 11 Application Development Using Python
19 pages
1747399713103-1747037056197-webscraping
No ratings yet
1747399713103-1747037056197-webscraping
12 pages
Web Scraping Using Python - Notes
No ratings yet
Web Scraping Using Python - Notes
6 pages
Data Analysis by Web Scraping Using Python
No ratings yet
Data Analysis by Web Scraping Using Python
6 pages
Web Scraping Using Python [Step by Step Tutorial] – Pythonista Planet
No ratings yet
Web Scraping Using Python [Step by Step Tutorial] – Pythonista Planet
11 pages
Web Scraping 2
No ratings yet
Web Scraping 2
14 pages
Web Scraping
No ratings yet
Web Scraping
28 pages
Web Crawling - python
No ratings yet
Web Crawling - python
34 pages
Web Scraping Presentation With Images
No ratings yet
Web Scraping Presentation With Images
4 pages
Web Scraping Report
No ratings yet
Web Scraping Report
14 pages
Scraping
100% (1)
Scraping
25 pages
Template
No ratings yet
Template
21 pages
Notes for Web Scraping - BeautifulSoup-3903
No ratings yet
Notes for Web Scraping - BeautifulSoup-3903
6 pages
Introduction to Web Scraping in RPA With Python
No ratings yet
Introduction to Web Scraping in RPA With Python
10 pages
web_scrapping_final[1]
No ratings yet
web_scrapping_final[1]
7 pages
Beginner Guide To Web Scraping of Data
No ratings yet
Beginner Guide To Web Scraping of Data
14 pages
PDF Document 2
No ratings yet
PDF Document 2
24 pages
Web Scraping
No ratings yet
Web Scraping
4 pages
Web-Scraping-With-Python
No ratings yet
Web-Scraping-With-Python
16 pages
HTB-CBBH-Report
0% (1)
HTB-CBBH-Report
10 pages
Web Scraping
No ratings yet
Web Scraping
14 pages
Introduction to Web Crawling chapter -13
No ratings yet
Introduction to Web Crawling chapter -13
3 pages
Practical Web Scraping for Economists 1744341390
No ratings yet
Practical Web Scraping for Economists 1744341390
33 pages
Web Scraping With Python and Selenium: Sarah Fatima, Shaik Luqmaan Nuha Abdul Rasheed
No ratings yet
Web Scraping With Python and Selenium: Sarah Fatima, Shaik Luqmaan Nuha Abdul Rasheed
5 pages
Web Scraping
No ratings yet
Web Scraping
7 pages
DAP_4_module
No ratings yet
DAP_4_module
45 pages
scrapeez
No ratings yet
scrapeez
3 pages
20 - 3 - A Study
No ratings yet
20 - 3 - A Study
5 pages
trellix_endpoint_security_(ens)_10_7_x_product_guide_-_windows_2024-12-19-17-02-25
No ratings yet
trellix_endpoint_security_(ens)_10_7_x_product_guide_-_windows_2024-12-19-17-02-25
557 pages
Web Scarpping
No ratings yet
Web Scarpping
4 pages
Web Scraping
No ratings yet
Web Scraping
16 pages
Web Scraping
No ratings yet
Web Scraping
5 pages
Download
No ratings yet
Download
4 pages
1.8 Data Scrapping PDF
No ratings yet
1.8 Data Scrapping PDF
42 pages
The A-Z of Web Scraping in 2020 (A How-To Guide)
No ratings yet
The A-Z of Web Scraping in 2020 (A How-To Guide)
18 pages
Experiment2 Web Scraping and Data Analysis
No ratings yet
Experiment2 Web Scraping and Data Analysis
5 pages
scraping
No ratings yet
scraping
6 pages
Lesson 4 Unstructured Data
No ratings yet
Lesson 4 Unstructured Data
20 pages
Ceritificate 1
No ratings yet
Ceritificate 1
1 page
Seminar Completed
No ratings yet
Seminar Completed
22 pages
web scraping using python
No ratings yet
web scraping using python
18 pages
The Ultimate Web Scraping With Python Bootcamp 2023 - Coderprog
No ratings yet
The Ultimate Web Scraping With Python Bootcamp 2023 - Coderprog
3 pages
Web Scraping and HTML Basics
No ratings yet
Web Scraping and HTML Basics
4 pages
CompTIA A+ 220-1001 Core 1 Course Notes by Professor Messers_019-021
No ratings yet
CompTIA A+ 220-1001 Core 1 Course Notes by Professor Messers_019-021
3 pages
EJMCM Volume7 Issue3 Pages433-442
No ratings yet
EJMCM Volume7 Issue3 Pages433-442
11 pages
Building a Python Web Scraper
No ratings yet
Building a Python Web Scraper
1 page
B42_IP105__S1_D2
No ratings yet
B42_IP105__S1_D2
4 pages
Infobip Overview
No ratings yet
Infobip Overview
31 pages
chp3A10.10072F978 3 319 32001 4 - 483 1
No ratings yet
chp3A10.10072F978 3 319 32001 4 - 483 1
4 pages
7_Functions and Modules
No ratings yet
7_Functions and Modules
38 pages
CompTIA A+ 220-1001 Core 1 Course Notes by Professor Messers_004-006
No ratings yet
CompTIA A+ 220-1001 Core 1 Course Notes by Professor Messers_004-006
3 pages
4a82c633-5051-45ef-a932-6a6495641a0e_4F_IntroToWebScraping
No ratings yet
4a82c633-5051-45ef-a932-6a6495641a0e_4F_IntroToWebScraping
6 pages
ASU Official Transcript
No ratings yet
ASU Official Transcript
2 pages
Summary Paper 13 14 15
No ratings yet
Summary Paper 13 14 15
2 pages
8_time, random, datetime
No ratings yet
8_time, random, datetime
22 pages
Web Scraping, Web Harvesting, or Web Data Extraction Is
No ratings yet
Web Scraping, Web Harvesting, or Web Data Extraction Is
1 page
Seminar Report
No ratings yet
Seminar Report
6 pages
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
No ratings yet
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
8 pages
AWS Module4
No ratings yet
AWS Module4
3 pages
Power Bi
No ratings yet
Power Bi
234 pages
WebScraping Lessons 1
100% (1)
WebScraping Lessons 1
3 pages
Cyber Attack and Data Breach
No ratings yet
Cyber Attack and Data Breach
19 pages
Annotated Diagrams
No ratings yet
Annotated Diagrams
6 pages
Admingd Iwsva 5.6 GM
No ratings yet
Admingd Iwsva 5.6 GM
763 pages
Cisco DNA Software For SD-WAN and Routing Ordering Guide - Guide-C07-740642
No ratings yet
Cisco DNA Software For SD-WAN and Routing Ordering Guide - Guide-C07-740642
24 pages
Ict Emerging Technologies Jacob K Muimi
No ratings yet
Ict Emerging Technologies Jacob K Muimi
7 pages
Fast Path To B2C Commerce Developer Certification - Module 6 - OCAPI and Service Framework
No ratings yet
Fast Path To B2C Commerce Developer Certification - Module 6 - OCAPI and Service Framework
22 pages
CompTIA A+ 220-1001 Core 1 Course Notes by Professor Messers_025-027
No ratings yet
CompTIA A+ 220-1001 Core 1 Course Notes by Professor Messers_025-027
3 pages
SANGFOR - IAM - v12.0.44 - VersionRelease Notes
No ratings yet
SANGFOR - IAM - v12.0.44 - VersionRelease Notes
9 pages
800-21897-A - IPC Tool User Guide
No ratings yet
800-21897-A - IPC Tool User Guide
32 pages
Python Urllib3 - Accessing Web Resources Via HTTP
No ratings yet
Python Urllib3 - Accessing Web Resources Via HTTP
19 pages
Os Cha 6
No ratings yet
Os Cha 6
25 pages
esssss toolsss
No ratings yet
esssss toolsss
1 page
Welcome Kit For New Hires
No ratings yet
Welcome Kit For New Hires
15 pages
Wi Fi Analyzer User Guide
No ratings yet
Wi Fi Analyzer User Guide
14 pages
CP Moshi
No ratings yet
CP Moshi
2 pages
Lab-06-Using Optional Backup Features
No ratings yet
Lab-06-Using Optional Backup Features
6 pages
Chatbot Whatsapp: Introduction: What Is A Chatbot?
No ratings yet
Chatbot Whatsapp: Introduction: What Is A Chatbot?
5 pages
Forcepoint ONE CASB Datasheet
No ratings yet
Forcepoint ONE CASB Datasheet
6 pages
ID
No ratings yet
ID
2 pages
AV EoC Web Manual 6027,8
No ratings yet
AV EoC Web Manual 6027,8
12 pages
Abdullah Alkalbani OM
No ratings yet
Abdullah Alkalbani OM
1 page
Prisma Cloud Service-Level Agreement
No ratings yet
Prisma Cloud Service-Level Agreement
2 pages
Practice Test Comptia 220-801 A+ Certification Domain 2-Networking
No ratings yet
Practice Test Comptia 220-801 A+ Certification Domain 2-Networking
8 pages
Bplist00 ## - ##Webmainresource ###### #
No ratings yet
Bplist00 ## - ##Webmainresource ###### #
4 pages
CCNA - EIGRP Authentication Configuration
100% (1)
CCNA - EIGRP Authentication Configuration
3 pages
College Certificate 1
No ratings yet
College Certificate 1
6 pages
CCNASv2 SKillsAssessment-A Student Training
No ratings yet
CCNASv2 SKillsAssessment-A Student Training
11 pages
DP and Status Video For Whatsapp: Privacy Policy
No ratings yet
DP and Status Video For Whatsapp: Privacy Policy
2 pages
Holy Quran para 1 PDF PDF Arabs Languages of Africa
No ratings yet
Holy Quran para 1 PDF PDF Arabs Languages of Africa
1 page
Conf. Router Comunicacion
No ratings yet
Conf. Router Comunicacion
3 pages
191-2000139 Aont1521 Product Brief
No ratings yet
191-2000139 Aont1521 Product Brief
2 pages