tags. The document also provides an overview of Beautiful Soup's features for navigating, searching and modifying parse trees to extract needed information from HTML documents."> tags. The document also provides an overview of Beautiful Soup's features for navigating, searching and modifying parse trees to extract needed information from HTML documents.">
[go: up one dir, main page]

100% found this document useful (1 vote)
508 views8 pages

Web Scraping With BeautifulSoup

This document discusses web scraping using the Beautiful Soup library in Python. It explains that Beautiful Soup is a tool for parsing HTML and extracting data from websites. It provides an example that scrapes a website by entering a URL, retrieving the HTML with requests, parsing it with Beautiful Soup, and printing all URLs found by finding all <a> tags. The document also provides an overview of Beautiful Soup's features for navigating, searching and modifying parse trees to extract needed information from HTML documents.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
508 views8 pages

Web Scraping With BeautifulSoup

This document discusses web scraping using the Beautiful Soup library in Python. It explains that Beautiful Soup is a tool for parsing HTML and extracting data from websites. It provides an example that scrapes a website by entering a URL, retrieving the HTML with requests, parsing it with Beautiful Soup, and printing all URLs found by finding all <a> tags. The document also provides an overview of Beautiful Soup's features for navigating, searching and modifying parse trees to extract needed information from HTML documents.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

9/28/2016 WebScrapingwithBeautifulSoup

Home
LearnPython
Basics
Lists
Dictionary
CodeSnippets
Modules

Home>>WebScrapingwithBeautifulSoup
Mar.09,2016

Web&Internet

WebScrapingwithBeautifulSoup

WebScraping
"Webscraping(webharvestingorwebdataextraction)isacomputersoftware
techniqueofextractinginformationfromwebsites."

HTMLparsingiseasyinPython,especiallywithhelpoftheBeautifulSouplibrary.
Inthispostwewillscrapeawebsite(ourown)toextractallURL's.

GettingStarted
Tobeginwith,makesurethatyouhavethenecessarymodulesinstalled.

Intheexamplebelow,weareusingBeautifulSoup4andRequestsonasystemwith
Python2.7installed.

InstallingBeautifulSoupandRequestscanbedonewithpip:

$pipinstallrequests

$pipinstallbeautifulsoup4

WhatisBeautifulSoup?

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 1/8
9/28/2016 WebScrapingwithBeautifulSoup

Onthetopoftheirwebsite,youcanread:"Youdidn'twritethatawfulpage.
You'rejusttryingtogetsomedataoutofit.BeautifulSoupisheretohelp.
Since2004,it'sbeensavingprogrammershoursordaysofworkonquickturnaround
screenscrapingprojects."

BeautifulSoupFeatures:

BeautifulSoupprovidesafewsimplemethodsandPythonicidiomsfornavigating,
searching,andmodifyingaparsetree:atoolkitfordissectingadocumentand
extractingwhatyouneed.Itdoesn'ttakemuchcodetowriteanapplication.

BeautifulSoupautomaticallyconvertsincomingdocumentstoUnicodeandoutgoing
documentstoUTF8.Youdon'thavetothinkaboutencodings,unlessthedocument
doesn'tspecifyanencodingandBeautifulSoupcan'tautodetectone.

Thenyoujusthavetospecifytheoriginalencoding.

BeautifulSoupsitsontopofpopularPythonparserslikelxmlandhtml5lib,
allowingyoutotryoutdifferentparsingstrategiesortradespeedfor
flexibility.

ExtractingURL'sfromanywebsite
NowwhenweknowwhatBS4isandwehaveinstalleditonourmachine,
let'sseewhatwecandowithit.

frombs4importBeautifulSoup

importrequests

url=raw_input("EnterawebsitetoextracttheURL'sfrom:")

r=requests.get("http://"+url)

data=r.text

soup=BeautifulSoup(data)

forlinkinsoup.find_all('a'):
print(link.get('href'))

Whenwerunthisprogram,itwillaskusforawebsitetoextracttheURL'sfrom

EnterawebsitetoextracttheURL'sfrom:www.pythonforbeginners.com
http://www.pythonforbeginners.com
http://www.pythonforbeginners.com/pythonoverviewstarthere/
http://www.pythonforbeginners.com/dictionary/
http://www.pythonforbeginners.com/pythonfunctionscheatsheet/
http://www.pythonforbeginners.com/lists/pythonlistscheatsheet/
http://www.pythonforbeginners.com/loops/

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 2/8
9/28/2016 WebScrapingwithBeautifulSoup

http://www.pythonforbeginners.com/pythonmodules/
http://www.pythonforbeginners.com/strings/
http://www.pythonforbeginners.com/sitemap/
http://www.pythonforbeginners.com/feed/
http://www.pythonforbeginners.com
....
....
....

Irecommendthatyoureadourintroductionarticle:"BeautifulSoup4Python"
foundheretogetmoreknowledgeandunderstandingaboutBeautifulSoup.

MoreReading

http://www.crummy.com/software/BeautifulSoup/
http://docs.pythonrequests.org/en/latest/index.html

RecommendedPythonTrainingTreehouse

ForPythontraining,ourtoprecommendationisTreehouse.

Treehouseisanonlinetrainingservicethatteacheswebdesign,webdevelopmentandappdevelopment
withvideos,quizzesandinteractivecodingexercises.

TreehousehasbeginnertoadvancedPythontrainingthatprogrammersofalllevelsbenefitfrom.

Readmoreabout:

Web&Internet

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 3/8
9/28/2016 WebScrapingwithBeautifulSoup

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 4/8
9/28/2016 WebScrapingwithBeautifulSoup

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 5/8
9/28/2016 WebScrapingwithBeautifulSoup

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 6/8
9/28/2016 WebScrapingwithBeautifulSoup

DisclosureofMaterialConnection:Someofthelinksinthepostaboveareaffiliatelinks.Thismeansifyouclickonthelink
andpurchasetheitem,Iwillreceiveanaffiliatecommission.Regardless,PythonForBeginners.comonlyrecommendproductsor
servicesthatwetrypersonallyandbelievewilladdvaluetoourreaders.

Search SEARCH

follow@pythonbeginners

Categories
http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 7/8
9/28/2016 WebScrapingwithBeautifulSoup

Basics
Cheatsheet
Codesnippets
Development
Dictionary
ErrorHandling
Lists
Loops
Modules
Strings
System&OS
Web&Internet

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 8/8

You might also like