Chapter1 PDF
Chapter1 PDF
Hugo Bowne-Anderson
Data Scientist at DataCamp
You’re already great at importing!
Flat les such as .txt and .csv
Hugo Bowne-Anderson
Data Scientist at DataCamp
URL
Uniform/Universal Resource Locator
Ingredients:
Protocol identi er - h p:
import requests
url = "https://www.wikipedia.org/"
r = requests.get(url)
text = r.text
Hugo Bowne-Anderson
Data Scientist at DataCamp
HTML
Mix of unstructured and structured data
Structured data:
Has pre-de ned data model, or
print(soup.title)
print(soup.get_text())
bs4/download/
#Download
bs4/doc/
#HallOfFame
https://code.launchpad.net/beautifulsoup
https://groups.google.com/forum/?fromgroups#!forum/beautifulsoup
http://www.candlemarkandgleam.com/shop/constellation-games/
http://constellation.crummy.com/Constellation%20Games%20excerpt.html
https://groups.google.com/forum/?fromgroups#!forum/beautifulsoup
https://bugs.launchpad.net/beautifulsoup/
http://lxml.de/
http://code.google.com/p/html5lib/