0% found this document useful (0 votes)

37 views5 pages

Beautiful Soup & Selenium Web Scraping Guide

Uploaded by

syamshop134

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views5 pages

Beautiful Soup & Selenium Web Scraping Guide

Uploaded by

syamshop134

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Beautiful Soup vs Selenium

Beautiful Soup is a Python library that turns HTML and XML documents into a
tree of Python objects. It’s widely used in web scraping to extract information from
web pages.

Installation: pip install bs4

Selenium is mostly for automating web applications and testing purposes.

Selenium is also available for NET/C#, Ruby, Java, and JavaScript. It requires a
web driver to run.

Installation: pip install selenium

Which one should you use?

If you are scraping a website with static content use Beautiful Soup. If you are
scraping a website with dynamically loaded content (like infinite scrolling) use
Selenium.
Resources
Web Drivers

● Chrome: https://developer.chrome.com/docs/chromedriver
● Edge: https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver
● Firefox: https://github.com/mozilla/geckodriver
● Safari: https://webkit.org/blog/6900/webdriver-support-in-safari-10

Sites that allow scraping

● Static: https://books.toscrape.com/
● Dynamic: https://webscraper.io/test-sites/e-commerce/scroll/computers/laptops

Scraping Static Web Pages with Beautiful Soup

1. Import the Beautiful Soup and the requests libraries:
from bs4 import BeautifulSoup
import requests

2. Fetch Page HTML with requests:

url = "https://books.toscrape.com"
response= = requests.get(url)

3. Making the Soup:

Soup = BeautifulSoup(response, 'html.parser')

The Beautiful Soup constructor takes in two arguments the html you want to parse
and the parser you want to use (‘html.parser’ is Python’s built-in parser) . A parser
takes raw text (like HTML or JSON) and breaks it down into a structured format that
your program can understand and work with.
Commonly Used Beautiful Soup Methods

Method What it does Example

find(tag, attrs) Finds the first matching soup.find("h1")

element

find_all(tag, attrs) Finds all matching elements soup.find_all("p")

(list)

select(css_selector) Finds elements using CSS soup.select("div.quote

selectors span.text")

select_one(css_selector) Finds the first element soup.select_one("p.intro")

matching CSS selector

.get_text() Gets the inner text of an soup.select_one("p.intro")

element

.attrs Returns all attributes of an soup.find("a").attrs

element (dict)

tag['attr'] Gets a specific attribute (like soup.find("a")["href"]

href, src)

prettify() Returns nicely formatted HTML print(soup.prettify())

Example
Find the first book title
first_title = soup.find("h3").a["title"]

Find all prices

prices = [p.get_text() for p in soup.find_all("p", class_="price_color")]

Using CSS selectors

quotes = soup.select("h3 a")

Get all visible text (shortened)

all_text = soup.get_text("\n")[:200]
Handling multiple pages
Simply collect all links on the page and loop through them.

base_url = "https://books.toscrape.com/catalogue/page-{}.html"

all_titles = []

# Loop through first 5 pages
for page in range(1, 6):
url = base_url.format(page)
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")

# Collect all book titles on this page
for book in soup.find_all("h3"):
all_titles.append(book.a["title"])

Scraping dynamic page content with Selenium

This Selenium script loads an infinite-scroll product page, waits for .product-wrapper cards to
appear, then repeatedly scrolls to the bottom and waits until the number of cards increases. When
the count stops growing (no more items load), it exits the loop, collects each product’s title and price,
prints a summary, and quits the browser.

from selenium import webdriver

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

URL = "https://webscraper.io/test-sites/e-commerce/scroll/computers/laptops"

driver = webdriver.Chrome()
driver.get(URL)
wait = WebDriverWait(driver, 12)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".product-wrapper")))

last = 0
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
# wait until more product cards exist than before
wait.until(lambda d: len(d.find_elements(By.CSS_SELECTOR, ".product-wrapper")) >
last)
count = len(driver.find_elements(By.CSS_SELECTOR, ".product-wrapper"))
if count == last:
break
last = count
except Exception:
break # no more items loaded within timeout

# collect product titles & prices
items = driver.find_elements(By.CSS_SELECTOR, ".product-wrapper")
data = [{
"title": i.find_element(By.CSS_SELECTOR, "a.title").get_attribute("title"),
"price": i.find_element(By.CSS_SELECTOR, "h4.price").text
} for i in items]

print(f"Collected {len(data)} products")
print(data[:5])
driver.quit()

Export results as CSV

import csv
with open("products.csv","w",newline="",encoding="utf-8") as f:
w = csv.DictWriter(f, fieldnames=["title","price"])
w.writeheader(); w.writerows(data)

Unit I
No ratings yet
Unit I
12 pages
Lecture 12 - Web Scrapping
No ratings yet
Lecture 12 - Web Scrapping
11 pages
DAP 4 Module
No ratings yet
DAP 4 Module
45 pages
Workshop 2B: Web Scraping With Beautifulsoup 4: Comp20008 Elements of Data Processing
No ratings yet
Workshop 2B: Web Scraping With Beautifulsoup 4: Comp20008 Elements of Data Processing
5 pages
Beginner Guide To Web Scraping of Data
No ratings yet
Beginner Guide To Web Scraping of Data
14 pages
DAP - Module 4
No ratings yet
DAP - Module 4
57 pages
Python Module-4
No ratings yet
Python Module-4
109 pages
DAP Module4
No ratings yet
DAP Module4
109 pages
Web Scraping
No ratings yet
Web Scraping
11 pages
Beautiful Soup Tutorial
100% (2)
Beautiful Soup Tutorial
56 pages
DAP Module4 1
No ratings yet
DAP Module4 1
110 pages
Beautiful Soup
No ratings yet
Beautiful Soup
7 pages
Web+Scraping+Cheat+Sheet+2 0
No ratings yet
Web+Scraping+Cheat+Sheet+2 0
3 pages
Implementing Web Scraping in Python With Beautifulsoup
No ratings yet
Implementing Web Scraping in Python With Beautifulsoup
6 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
Practical Web Scraping For Economists 1744341390
No ratings yet
Practical Web Scraping For Economists 1744341390
33 pages
Python Selenium Web Scraping Guide
No ratings yet
Python Selenium Web Scraping Guide
14 pages
Web Scraping With Python
No ratings yet
Web Scraping With Python
16 pages
Introduction To Web Crawling Chapter - 13
No ratings yet
Introduction To Web Crawling Chapter - 13
3 pages
SDS WebScraping Bonus Scrapy Vs BeautifulSoup PDF
No ratings yet
SDS WebScraping Bonus Scrapy Vs BeautifulSoup PDF
6 pages
Web Scraping CheatSheet Guide
No ratings yet
Web Scraping CheatSheet Guide
10 pages
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
100% (3)
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
26 pages
Web Scraping Cheat Sheet 2.0
No ratings yet
Web Scraping Cheat Sheet 2.0
3 pages
WEBSCRAping Buildwithpython
No ratings yet
WEBSCRAping Buildwithpython
78 pages
055-En
No ratings yet
055-En
2 pages
Full Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell Ebook All Chapters
No ratings yet
Full Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell Ebook All Chapters
67 pages
Getting Started With Beautiful Soup Sample Chapter
No ratings yet
Getting Started With Beautiful Soup Sample Chapter
15 pages
Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell PDF Available
No ratings yet
Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell PDF Available
127 pages
Web Scraping Using Python: A Step by Step Guide: September 2019
No ratings yet
Web Scraping Using Python: A Step by Step Guide: September 2019
7 pages
Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell Instant Download
No ratings yet
Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell Instant Download
52 pages
Webscraping
No ratings yet
Webscraping
12 pages
Pharmasug China 2022 AD127
No ratings yet
Pharmasug China 2022 AD127
4 pages
Apuntes Curso
No ratings yet
Apuntes Curso
2 pages
4F IntroToWebScraping
No ratings yet
4F IntroToWebScraping
6 pages
Web Scraping Using Python: A Step by Step Guide: September 2019
0% (1)
Web Scraping Using Python: A Step by Step Guide: September 2019
7 pages
Python Web Scraping Guide
No ratings yet
Python Web Scraping Guide
7 pages
Webscraping1 1 PDF
No ratings yet
Webscraping1 1 PDF
10 pages
Scrapeez
No ratings yet
Scrapeez
3 pages
Web Scraping
No ratings yet
Web Scraping
5 pages
Using Scrapy in PyCharm
100% (1)
Using Scrapy in PyCharm
8 pages
New Text Document
No ratings yet
New Text Document
2 pages
RajSingh WIexp4
No ratings yet
RajSingh WIexp4
7 pages
How To Scrape Product Data From Amazon - A Complete Guide - Oxylabs
No ratings yet
How To Scrape Product Data From Amazon - A Complete Guide - Oxylabs
19 pages
A Guide To Web Scraping in Python Using Beautiful Soup
No ratings yet
A Guide To Web Scraping in Python Using Beautiful Soup
6 pages
Class Assign
No ratings yet
Class Assign
3 pages
Tutorial 3 Solution
No ratings yet
Tutorial 3 Solution
12 pages
Hybrid Scraping Techniques
No ratings yet
Hybrid Scraping Techniques
8 pages
Python Web Scraping Guide
100% (1)
Python Web Scraping Guide
13 pages
Data Science
No ratings yet
Data Science
9 pages
A Simple Python Web Crawler...
100% (1)
A Simple Python Web Crawler...
5 pages
Selenium WebDriver Recipes in Python The Problem Solving Guide To Selenium WebDriver in Python 1st Edition Zhimin Zhan PDF Download
100% (9)
Selenium WebDriver Recipes in Python The Problem Solving Guide To Selenium WebDriver in Python 1st Edition Zhimin Zhan PDF Download
71 pages
Web Scraping
No ratings yet
Web Scraping
7 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
52 pages
Googlemap
No ratings yet
Googlemap
2 pages
UI Ex 6 (61) - 1
No ratings yet
UI Ex 6 (61) - 1
3 pages
Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell No Waiting Time
100% (8)
Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell No Waiting Time
115 pages
Template
No ratings yet
Template
21 pages
Etherchannel
No ratings yet
Etherchannel
10 pages
His Redemption Quirky Quinn Read Online Free
10% (125)
His Redemption Quirky Quinn Read Online Free
3 pages
Micropython Docs
No ratings yet
Micropython Docs
620 pages
Backpropagation - Theory, Architectures, and Applications (1) - 1-100
No ratings yet
Backpropagation - Theory, Architectures, and Applications (1) - 1-100
100 pages
Recovering Cryptographic Keys From Partial Information, by Example
No ratings yet
Recovering Cryptographic Keys From Partial Information, by Example
47 pages
Research Paper IOT of Video Survillence
No ratings yet
Research Paper IOT of Video Survillence
9 pages
Exercise2-5
No ratings yet
Exercise2-5
3 pages
Computer Engineering Internship
No ratings yet
Computer Engineering Internship
34 pages
Module 5-1
No ratings yet
Module 5-1
107 pages
Java SE 11 Developer (1Z0-819)
No ratings yet
Java SE 11 Developer (1Z0-819)
3 pages
Exact DVB Exact DVB: DVB Digital TV Exciter
No ratings yet
Exact DVB Exact DVB: DVB Digital TV Exciter
2 pages
STS 25 SOAL PILIHAN BERGANDA (Jawaban)
No ratings yet
STS 25 SOAL PILIHAN BERGANDA (Jawaban)
27 pages
Quantum Computing and Artificial Intelligence - Status and Perspectives
No ratings yet
Quantum Computing and Artificial Intelligence - Status and Perspectives
32 pages
Practice Examination Class X - Mathematics 2021-2022
No ratings yet
Practice Examination Class X - Mathematics 2021-2022
13 pages
Major Examination: Subject - Science SESSION 2019-20 Maximum Marks: 80 Class - VII Time: 3 Hours
No ratings yet
Major Examination: Subject - Science SESSION 2019-20 Maximum Marks: 80 Class - VII Time: 3 Hours
22 pages
Biostar B550gta Spec
No ratings yet
Biostar B550gta Spec
8 pages
BURGLAR-ALARM BSc. Electronics
No ratings yet
BURGLAR-ALARM BSc. Electronics
9 pages
The Impact of E-Commerce On International Trade
No ratings yet
The Impact of E-Commerce On International Trade
11 pages
Significant Figures
No ratings yet
Significant Figures
2 pages
Next Generation Firewall (NGFW) Security Value Map™: Average
No ratings yet
Next Generation Firewall (NGFW) Security Value Map™: Average
1 page
1.1.3 CaseStudy 1 - Netflix
No ratings yet
1.1.3 CaseStudy 1 - Netflix
3 pages
Individual Assignment Questions
No ratings yet
Individual Assignment Questions
3 pages
Ddesb: Blast Effects Computer - Open (BEC-O) User's Manual and Documentation
No ratings yet
Ddesb: Blast Effects Computer - Open (BEC-O) User's Manual and Documentation
69 pages
PHP Full Courses
No ratings yet
PHP Full Courses
198 pages
Web Data Extractors 2025 Guide
No ratings yet
Web Data Extractors 2025 Guide
26 pages
CI's PWM Equivalentes
No ratings yet
CI's PWM Equivalentes
5 pages
InputOuput Devices
No ratings yet
InputOuput Devices
53 pages
Lesson 1-17: Identifying Ports: IC Training - Module One: Computing Fundamentals
No ratings yet
Lesson 1-17: Identifying Ports: IC Training - Module One: Computing Fundamentals
2 pages
Find and Replace
No ratings yet
Find and Replace
7 pages
Data Collection & Presentation
No ratings yet
Data Collection & Presentation
12 pages

Beautiful Soup & Selenium Web Scraping Guide

Uploaded by

Beautiful Soup & Selenium Web Scraping Guide

Uploaded by

Beautiful Soup vs Selenium

Installation: pip install bs4

Selenium is mostly for automating web applications and testing purposes.

Installation: pip install selenium

Which one should you use?

Sites that allow scraping

Scraping Static Web Pages with Beautiful Soup

2.​ Fetch Page HTML with requests:

3.​ Making the Soup:

Method What it does Example

find(tag, attrs) Finds the first matching soup.find("h1")

find_all(tag, attrs) Finds all matching elements soup.find_all("p")

select(css_selector) Finds elements using CSS soup.select("div.quote

select_one(css_selector) Finds the first element soup.select_one("p.intro")

.get_text() Gets the inner text of an soup.select_one("p.intro")

.attrs Returns all attributes of an soup.find("a").attrs

tag['attr'] Gets a specific attribute (like soup.find("a")["href"]

prettify() Returns nicely formatted HTML print(soup.prettify())

Find all prices

Using CSS selectors

Get all visible text (shortened)

Scraping dynamic page content with Selenium

from selenium import webdriver​

Export results as CSV

You might also like

2. Fetch Page HTML with requests:

3. Making the Soup:

from selenium import webdriver