Report
On
PYTHON PROJECT
VI Semester
Academic Year: 2018-2019
Title: WEB SCRAPING USING BEAUTIFUL SOUP
USN Name Signature
1GA14CS010 Akash Kumar S
1GA15CS053 G Janany
1GA16CS191 Vishal Kumar
Guide
[Mr.Shyam Sundar]
Dept. of CSE, GAT 2018-19 1
Objective of the Project
To build a system that is capable of extracting large amounts of data from websites
whereby the data is extracted and saved to a local file or displayed. It is either custom
built for a specific website or is one which can be configured to work with any website.
With the click of a button we can easily save the data available in the website to a file in
our computer.
Dept. of CSE, GAT 2018-19 2
System Requirement Specification
Software Requirements Specification
➢ Language used : Python Programming Language
➢ IDE/Compiler used : PyCharm
➢ OS used : Windows 10
Hardware Requirements Specification
o Processor : i7 8th generation
o Hard Disk : 1 TB
o Monitor : HD LED Antiglare
o Keyboard : Island Style
Dept. of CSE, GAT 2018-19 3
Source Code
# make sure to have python ver 3.5 or higher
# 1> install requests using - pip install requests
# 2> install beautifulsoup using - pip install beautifulsoup4
# 3> install lxml using - pip install lxml
(enter the commands on cmd promt , not on python shell)
import requests #imports requests module
import bs4 #imports beautifulsoup module
res = requests.get('https://en.wikipedia.org/wiki/Python_(programming_language)')
res.text #obtains the entire HTML and/or CSS code of the
website
soup = bs4.BeautifulSoup(res.text, 'lxml') #lxml is a data structure
result = soup.select('.mw-body-content h2') #here you can give any HTML
tag which you want to scrape
for i in soup.select('https://en.wikipedia.org/wiki/Python_(programming_language)'):
print(i.text)
result #displays the required data in html code
result[0] #displays first element in the array(in this case there is only
one element)
result[0].getText() #displays the required data in string format
Dept. of CSE, GAT 2018-19 4
Snapshots
1.Snapshot of Source Code
Dept. of CSE, GAT 2018-19 5
2. Snapshot of Result
Dept. of CSE, GAT 2018-19 6
3. Snapshot of Webpage
Dept. of CSE, GAT 2018-19 7