Unit 5
Unit 5
Python in Visual Studio supports developing web projects in Bottle, Flask, and Django
frameworks through project templates and a debug launcher that can be configured to handle
various frameworks. These templates include a requirements.txt file to declare the necessary
dependencies. When creating a project from one of these templates, Visual Studio prompts you
to install those packages (see Install project requirements later in this article).
You can also use the generic Web Project template for other frameworks such as Pyramid. In
this case, no frameworks are installed with the template. Instead, install the necessary packages
into the environment you're using for the project (see Python environments window - Package
tab).
You create a project from a template using File > New > Project. To see templates for web
projects, select Python > Web on the left side of the dialog box. Then select a template of your
choice, providing names for the project and solution, set options for a solution directory and Git
repository, and select OK.
The generic Web Project template, mentioned earlier, provides only an empty Visual Studio
project with no code and no assumptions other than being a Python project.
All the other templates are based on the Bottle, Flask, or Django web frameworks, and fall into
three general groups as described in the following sections. The apps created by any of these
templates contain sufficient code to run and debug the app locally. Each one also provides the
necessary WSGI app object (python.org) for use with production web servers.
Blank group
All Blank <framework> Web Project templates create a project with more or less minimal
boilerplate code and the necessary dependencies declared in a requirements.txt file.
Template Description
Blank Bottle Generates a minimal app in app.py with a home page for / and
Web Project a /hello/<name> page that echoes <name> using a very short inline page template.
Blank Django Generates a Django project with the core Django site structure but no Django
Template Description
Web Project apps. For more information, see Django templates and Learn Django Step 1.
Blank Flask Generates a minimal app with a single "Hello World!" page for /. This app is
Web Project similar to the result of following the detailed steps in Quickstart: Use Visual
Studio to create your first Python web app. Also see Learn Flask Step 1.
Web group
All <Framework> Web Project templates create a starter web app with an identical design
regardless of the chosen framework. The app has Home, About, and Contact pages, along with a
nav bar and responsive design using Bootstrap. Each app is appropriately configured to serve
static files (CSS, JavaScript, and fonts), and uses a page template mechanism appropriate for the
framework.
Template Description
Bottle Web Generates an app whose static files are contained in the static folder and handled
Project through code in app.py. Routing for the individual pages is contained in routes.py,
and the views folder contains the page templates.
Django Web Generates a Django project and a Django app with three pages, authentication
Project support, and a SQLite database (but no data models). For more information,
see Django templates and Learn Django Step 4.
Flask Web Generates an app whose static files are contained in the static folder. Code
Project in views.py handles routing, with page templates using the Jinja engine contained in
the templates folder. The runserver.py file provides startup code. See
When creating a project from a framework-specific template, a dialog appears to help you install
the necessary packages using pip. We also recommend using a virtual environment for web
projects so that the correct dependencies are included when you publish your web site:
If you're using source control, you typically omit the virtual environment folder as that
environment can be recreated using only requirements.txt. The best way to exclude the folder is
to first select the I will install them myself in the prompt shown above, then disable auto-
commit before creating the virtual environment. For details, see Learn Django Tutorial - Steps 1-
2 and 1-3 and Learn Flask Tutorial - Steps 1-2 and 1-3.
When deploying to Microsoft Azure App Service, select a version of Python as a site
extension and manually install packages. Also, because Azure App Service
does not automatically install packages from a requirements.txt file when deployed from Visual
Studio, follow the configuration details on aka.ms/PythonOnAppService.
Debugging
When a web project is started for debugging, Visual Studio starts a local web server on a random
port and opens your default browser to that address and port. To specify additional options, right-
click the project, select Properties, and select the Web Launcher tab:
In the Debug group:
Search Paths, Script Arguments, Interpreter Arguments, and Interpreter Path: these
options are the same as for normal debugging.
Launch URL: specifies the URL that is opened in your browser. It defaults to localhost.
Port Number: the port to use if none is specified in the URL (Visual Studio selects one
automatically by default). This setting allows you to override the default value of
the SERVER_PORT environment variable, which is used by the templates to configure the
port the local debug server listens on.
The properties in the Run Server Command and Debug Server Command groups (the latter is
below what's shown in the image) determine how the web server is launched. Because many
frameworks require the use of a script outside of the current project, the script can be configured
here and the name of the startup module can be passed as a parameter.
Command: can be a Python script (*.py file), a module name (as in, python.exe -m
module_name), or a single line of code (as in, python.exe -c "code"). The value in the
drop-down indicates which of these types is intended.
Arguments: these arguments are passed on the command line following the command.
Environment: a newline-separated list of <NAME>=<VALUE> pairs specifying
environment variables. These variables are set after all properties that may modify the
environment, such as the port number and search paths, and so may overwrite these values.
Any project property or environment variable can be specified with MSBuild syntax, for
example: $(StartupFile) --port $(SERVER_PORT). $(StartupFile) is the relative path to the
startup file and {StartupModule} is the importable name of the startup
file. $(SERVER_HOST) and $(SERVER_PORT) are normal environment variables that are set
by the Launch URL and Port Number properties, automatically, or by
the Environment property.
Web Forms
Flask has an extension that makes it easy to create web forms.
WTForms is “a flexible forms validation and rendering library for Python Web development.”
With Flask-WTF, we get WTForms in Flask.
We will install the Flask-WTF extension to help us work with forms in Flask. There are many
extensions for Flask, and each one adds a different set of functions and capabilities. See the list
of Flask extensions for more.
In Terminal, change into your Flask projects folder and activate your virtual
environment there. Then, at the command prompt — where you see $ (Mac)
or C:\Users\yourname> (Windows )—
This installation is done only once in any virtualenv. It is assumed you already have Flask
installed there.
Flask-WTF docs
More details in WTForms docs
Bootstrap-Flask docs
The Bootstrap-Flask GitHub repository has good examples for forms; look at the README
Imports for forms with Flask-WTF and Bootstrap-Flask
You will have a long list of imports at the top of your Flask app file:
fromflask_wtfimportFlaskForm, CSRFProtect
fromwtformsimportStringField, SubmitField
fromwtforms.validatorsimportDataRequired, Length
Note as always that Python is case-sensitive, so upper- and lowercase must be used exactly as
shown. The wtforms import will change depending on your form’s contents. For example, if
you have a SELECT element, you’ll need to import that. See a simplified list of WTForms form
field types or further explanation in the WTForms documentation.
app = Flask(__name__)
app.secret_key = 'tO$&!|0wkamvVia0?n$NqIRVWOG'
Flask allows us to set a “secret key” value. This value is used to prevent malicious hijacking of
your form from an outside submission. A better way to do it:
importsecrets
foo = secrets.token_urlsafe(16)
app.secret_key = foo
Flask-WTF’s FlaskForm will automatically create a secure session with CSRF (cross-site
request forgery) protection if this key-value is set and the csrf variable is set. Don’t publish an
actual key on GitHub!
You can read more about the secret key in this StackOverflow post.
Next, we configure a form that inherits from Flask-WTF’s class FlaskForm . Python style
dictates that a class starts with an uppercase letter and uses camelCase, so here our new class is
named NameForm (we will use the form to search for a name).
In the class, we assign each form control to a unique variable. This form has only one text input
field and one submit button.
classNameForm(FlaskForm):
name = StringField('Which actor is your favorite?', validators=[DataRequired(), Length(10, 40)])
submit = SubmitField('Submit')
If you had more than one form in the app, you would define more than one new class in this
manner.
Note that StringField and SubmitField were imported at the top of the file. If we needed other
form-control types in this form, we would need to import those also. See a simplified list of
WTForms form field types or further explanation in the WTForms documentation.
Note that several field types (such as RadioField and SelectField ) must have an
option choices=[] specified. Within the list, each choice is a pair in this format: ('string-form-
variable-name', 'string-label-text') .
category = RadioField('Choose a detail to search:', validators=[InputRequired(message=None)],
choices=[ ('President', 'President\'s Name, e.g. John'), ('Home-state', 'Home State, e.g. Virginia'),
('Occupation', 'Occupation, e.g. Lawyer'), ('College', 'College, e.g. Harvard')] )
Here is a live form page shown beside the rendered source code for choices.
For more help with the FlaskForm class, see this Bootstrap-Flask page. It shows great examples
with the exact code needed.
WTForms also has a long list of validators we can use. The DataRequired() validator prevents
the form from being submitted if that field is empty. Note that these validators must also
be imported at the top of the file. Validators and custom validators are discussed further in the
WTForms documentation.
../python_code_examples/flask/actors_app/actors.py
29 @app.route('/', methods=['GET', 'POST'])
30 def index():
31 names = get_names(ACTORS)
32 # you must tell the variable 'form' what you named the class, above
33 # 'form' is the variable name used in this template: index.html
34 form = NameForm()
35 message = ""
36 ifform.validate_on_submit():
37 name = form.name.data
38 ifname.lower() in names:
39 # empty the form field
40 form.name.data = ""
41 id = get_id(ACTORS, name)
42 # redirect the browser to another route and template
43 return redirect( url_for('actor', id=id) )
44 else:
45 message = "That actor is not in our database."
46returnrender_template('index.html', names=names, form=form, message=message)
A crucial line is where we assign our configured form object to a new variable:
form = NameForm()
We must also pass that variable to the template, as seen in the final line above.
Be aware that if we had created more than one form class, each of those would need to be
assigned to a unique variable.
Before we break all that down and explain it, let’s look at the code in the template index.html:
../python_code_examples/flask/actors_app/templates/index.html
1 {% extends 'base.html' %}
2 {% from 'bootstrap5/form.html' importrender_form %}
3
4 {% block title %}
5 Best Movie Actors
6 {% endblock %}
7
8
9 {% block content %}
10
11 <!--
12 TIPS about using Bootstrap-Flask:
13 https://github.com/helloflask/bootstrap-flask
14 https://bootstrap-flask.readthedocs.io/
15 -->
16
17 <div class="container">
18 <div class="row">
19 <div class="col-md-10 col-lg-8 mx-lg-auto mx-md-auto">
20
21 <h1 class="pt-5 pb-2">Welcome to the best movie actors Flask example!</h1>
22
23 <p class="lead">This is the index page for an example Flask app using Bootstrap and
24 WTForms. Note that <em>only 100 actors</em> are in the data source. Partial names are not
25 valid.</p>
26
27 {{ render_form(form) }}
28
29 <p class="pt-5"><strong>{{ message }}</strong></p>
30
31 </div>
32 </div>
33 </div>
{% endblock %}
Where is the form? This is the amazing thing about Flask-WTF — by configuring the form as
we did in the Flask app, we can generate a form with Bootstrap styles in HTML using nothing
more than the template you see above. Line 25 is the form.
Note that in the Flask route function, we passed the variable form to the template index.html:
When you use {{ render_form(form) }} , the argument inside the parentheses must be
the variable that represents the form you created in the app.
form = NameForm()
Note that it is possible to use Bootstrap-Flask without any forms! The actors app demonstrates
how the usual Bootstrap classes such as container and row can be used in Flask templates.
Before reading further, try out a working version of this app. The complete code for the app is in
the folder named actors_app.
1. You type an actor’s name into the form and submit it.
2. If the actor’s name is in the data source (ACTORS), the app loads a detail page for that actor.
(Photos of bears stand in for real photos of the actors.)
3. Otherwise, you stay on the same page, the form is cleared, and a message tells you that actor
is not in the database.
../python_code_examples/flask/actors_app/actors.py
29 @app.route('/', methods=['GET', 'POST'])
30 def index():
31 names = get_names(ACTORS)
32 # you must tell the variable 'form' what you named the class, above
33 # 'form' is the variable name used in this template: index.html
34 form = NameForm()
35 message = ""
36 ifform.validate_on_submit():
37 name = form.name.data
38 ifname.lower() in names:
39 # empty the form field
40 form.name.data = ""
41 id = get_id(ACTORS, name)
42 # redirect the browser to another route and template
43 return redirect( url_for('actor', id=id) )
44 else:
45 message = "That actor is not in our database."
46 returnrender_template('index.html', names=names, form=form, message=message)
First we have the route, as usual, but with a new addition for handling form data: methods .
Every HTML form has two possible methods, GET and POST . GET simply requests a
response from the server. POST , however, sends a request with data attached in the body of
the request; this is the way most web forms are submitted.
This route needs to use both methods because when we simply open the page, no form was
submitted, and we’re opening it with GET . When we submit the form, this same page is opened
with POST if the actor’s name (the form data) was not found. Thus we cannot use only one of
the two options here.
def index():
names = get_names(ACTORS)
At the start of the route function, we get the data source for this app. It happens to be in a list
named ACTORS , and we get just the names by running a function, get_names() . The function
was imported from the file named modules.py.
form = NameForm()
message = ""
We assign the previously configured form object, NameForm() , to a new variable, form . This
has been discussed above.
ifform.validate_on_submit():
name = form.name.data
form.name.data is the contents of the text input field represented by name . Perhaps we should
review how we configured the form:
classNameForm(FlaskForm):
name = StringField('Which actor is your favorite?', validators=[DataRequired(), Length(10, 40)])
submit = SubmitField('Submit')
That name is the name in form.name.data — the contents of which we will now store in a
new variable, name . To put it another way: The variable name in the app now contains
whatever the user typed into the text input field on the web page — that is, the actor’s name.
38 ifname.lower() in names:
39 # empty the form field
40 form.name.data = ""
41 id = get_id(ACTORS, name)
42 # redirect the browser to another route and template
43 return redirect( url_for('actor', id=id) )
44 else:
45 message = "That actor is not in our database."
This if-statement is specific to this app. It checks whether the name (that was typed into the
form) matches any name in the list names . If not, we jump down to else and text is put into the
variable message . If name DOES match, we clear out the form, run a function
called get_id() (from modules.py) and — important! — open a different route in this app:
Thus redirect( url_for('actor', id=id) ) is calling a different route here in the same Flask app
script. (See actors.py, lines 48-57.) The redirect() function is specifically for this use, and
we imported it from the flask module at the top of the app. We also imported url_for() , which
you have seen previously used within templates.
As far as using forms with Flask is concerned, you don’t need to worry about the actors and
their IDs, etc. What is important is that the route function can be used to evaluate the data sent
from the form. We check to see whether it matched any of the actors in a list, and a different
response will be sent based on match or no match.
Any kind of form data can be handled in a Flask route function.
You can do any of the things that are typically done with HTML forms — handle usernames and
passwords, write new data to a database, create a quiz, etc.
The final line in the route function calls the template index.html and passes three variables to it:
Flask-WTF provides convenient methods for working with forms in Flask. Forms can be built
easily and also processed easily, with a minimum of code.
Adding Bootstrap-Flask ensures that we can build mobile-friendly forms with a minimum
amount of effort.
Note that it is possible to build a customized form layout using Bootstrap styles in a Flask
template, or to build a custom form with no Bootstrap styles. In either of those two cases,
you cannot use {{ render_form(form) }} but would instead write out all the form code in your
Flask template as you would in a normal HTML file. To take advantage of WTForms, you would
still create the form class with FlaskForm in the same way as shown above.
An example is the demo Flask app Books Hopper, which includes four separate Bootstrap forms:
a login form
a registration form
a search form
a form for writing a book review and selecting a rating
Bootstrap 4 was used in all templates in the Books Hopper app, but Bootstrap-Flask was not.
Bootstrap styles were all coded in the usual ways.
Templates
Folder structure for a Flask app
A proper Flask app is going to use multiple files — some of which will be template files. The
organization of these files has to follow rules so the app will work. Here is a diagram of the
typical structure:
my-flask-app
├── static/
│ └── css/
│ └── main.css
├── templates/
│ ├── index.html
│ └── student.html
├── data.py
└── students.py
1. Everything the app needs is in one folder, here named my-flask-app.
2. That folder contains two folders, specifically named static and templates.
The static folder contains assets used by the templates, including CSS files, JavaScript
files, and images. In the example, we have only one asset file, main.css. Note that it’s
inside a css folder that’s inside the static folder.
The templates folder contains only templates. These have an .html extension. As we will
see, they contain more than just regular HTML.
3. In addition to the static and templates folders, this app also contains .py files. Note that
these must be outside the two folders named static and templates.
Get started with templates
Let’s first imagine a situation where we are going to need a lot of pages that all have the same
layout.
For example, we might want to build an app that includes all the U.S. presidents. Each president
will have their own page, like this:
We do not want to build 45 pages by hand. We happen to have all the presidential data in a
spreadsheet. Can we make one HTML template and be done?
Yes!
That might look slightly intimidating, but notice that inside each double-pair of curly braces is
a key (in square brackets) that tells you exactly which information item will appear there. So, for
Lincoln:
Abraham Lincoln, the 16th president of the United States, was born on 2/12/1809, in LaRue
County, Kentucky. He was 52 when he took office on 3/4/1861. Member: Republican/National
Union Party.
The spreadsheet was converted to a list of Python dictionaries — one dictionary per president.
The dictionary for Lincoln looks like this:
In the previous chapter, Flask, part 2, you saw this route function:
@app.route('/user/<name>')
def user(name):
personal = f'<h1>Hello, {name}!</h1>'
instruc = '<p>Change the name in the <em>browser address bar</em>\
and reload the page.</p>'
return personal + instruc
1. Put the HTML into a template (note the double curly braces {{ }} around name):
8. @app.route('/user/<name>')
9. def user(name):
10. returnrender_template('hello.html', name=name)
11. We must import the render_template module, so add it to the line at the top of the Flask app
script:
12. fromflaskimport Flask, render_template
When you import more than one module from flask, you can put them on one line, separated by
commas.
<!DOCTYPE html>
<htmllang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Hello HTML Templates</title>
<linkrel="stylesheet" href="{{ url_for('static', filename='main.css') }}">
</head>
<body>
<div>
</body>
</html>
The templates in Flask are handled by the Jinja template engine, which comes with Flask when
you first install it.
The render_template() function both selects the template file to be used and passes to it any
values or variables it needs.
returnrender_template('example.html',
name=name,phone=phone_number,state='FL')
We can pass as many variables as necessary to the template. In the example above,
the template must contain variables for name, phone, and state. That is, these expressions must
be somewhere in the template named example.html:
{{name}}
{{phone}}
{{state}}
The value (on the right side of the equals sign) must come from the code in the Flask app file. In
the example above, both name and phone_number must already have a value before
the return render_template() line. Note that 'FL' is a string — we can pass in a string, integer,
float, or Boolean.
It’s also possible to pass lists or dictionaries as values to a template. Here’s a line from the
presidents app:
returnrender_template('president.html',
pres=pres_dict,ord=ord,the_title=pres_dict['President'])
The template file is named president.html, and the values depend on a Python
dictionary, pres_dict , that was defined earlier in the Flask app script. We can see the following
expressions in the template’s HTML:
{{ord}}
{{pres['President']}}
{{pres['Birth-date']}}
{{pres['Birthplace']}}
Why pres and not pres_dict ? Because pres=pres_dict . In the render_template() function call,
the dictionary pres_dict is assigned to the template’s variable pres . It is shorter and simpler for
use in the template file.
If you need help understanding Python dictionaries, see the Dictionaries chapter. Dictionaries are
incredibly useful for Flask templates!
In another template in the presidents app, base.html, we can see this expression in the HEAD
element:
<title>{{the_title}}</title>
Because of the_title=pres_dict['President'] , the HTML TITLE element will be filled in with the
value of the dictionary item that has the key 'President' .
Converting a CSV to a list of dictionaries
All the data we need about the U.S. presidents is in this CSV file.
Early in the Flask app in presidents.py, we create a list of dictionaries
named presidents_list from that CSV. Each item in the list is a dictionary. Each dictionary
contains all the data about ONE president. That is, each dictionary is equivalent to one
row from the original CSV.
4. Create a list of pairs in which each pair consists of the unique ID and the name of the
president it belongs to. This will be used to create an HTML list of all the presidents’ names,
which are links to the Flask route.
Here is the top of the presidents.py script:
../python_code_examples/flask/presidents/presidents.py
1 fromflaskimport Flask, render_template
2 frommodulesimportconvert_to_dict, make_ordinal
3
4 app = Flask(__name__)
5 application = app
6
7 # create a list of dicts from a CSV
8 presidents_list = convert_to_dict("presidents.csv")
The list is created on line 8, using a function in an external file named modules.py. The
function convert_to_dict() was imported from that file on line 2.
We will skip to the second route in the app and come back to the first one later.
@app.route('/president/<num>')
def detail(num):
try:
pres_dict = presidents_list[int(num) - 1]
except:
return f"<h1>Invalid value for Presidency: {num}</h1>"
# a little bonus function, imported on line 2 above
ord = make_ordinal( int(num) )
returnrender_template('president.html', pres=pres_dict, ord=ord, the_title=pres_dict['President'])
@app.route('/president/<num>')
Earlier, we saw that a URL for Abraham Lincoln ends with /president/16.
Recall that each president’s details are in a dictionary, and all the dictionaries are in
a list. Recall that any list item can be accessed by its index (see Working with Lists).
pres_dict = presidents_list[int(num) - 1]
That line (line 27) assigns the dictionary for the one selected president to the variable pres_dict .
For the list index, int(num) - 1 changes num from a string to an integer and subtracts 1, so that
the dictionary for George Washington, first president, comes from list item 0. (And the
dictionary for Abraham Lincoln, 16th president, comes from list item 15, etc..)
Line 27 is in a try clause because it’s possible someone would manually change the URL to
something invalid, such as /president/100. In that case, the except clause would run, and the
screen would show the text “Invalid value for Presidency: 100” styled as an H1 heading.
Line 31 is a bonus because it takes the value of num (e.g. 1 for Washington or 16 for Lincoln)
and converts it to an ordinal (e.g. 1st for Washington or 16th for Lincoln). The
function make_ordinal() is in modules.py.
Finally, line 32 is the render_template() function we’ve seen before. The Flask template here
is president.html. We know that pres_dict is the single dictionary for the selected president, and
we pass it to the template as the variable pres because that’s shorter and simpler. We
pass ord to the template as ord . For the HTML TITLE, we pass the value
of pres_dict['President'] to the template as the_title .
Summary: The route tells Flask, “When this URL is received, run the following function.” Then
everything up to the final return in the function is preparing the data that will be in
the render_template() function. We also have an except clause, in case the route’s variable
value is unusable.
Now we turn to the first route. This is the equivalent of an index, where the user selects a
president. Then a request is sent to the server, and the page for that single president appears in
the browser.
The web page needs to provide a list of links, one for each president.
To work, the partial URL for a president must contain the Presidency number. This was
covered in the previous section — the URL for Abraham Lincoln, for example, ends
with /president/16.
The link text is the president’s name. In the CSV, it is in the column named President.
pairs_list = []
for p inpresidents_list:
pairs_list.append( (p['Presidency'], p['President']) )
# first route
@app.route('/')
def index():
returnrender_template('index.html', pairs=pairs_list, the_title="Presidents Index")
conn= sqlite3.connect('test.db')
import sqlite3
conn= sqlite3.connect('test.db')
print"Opened database successfully";
conn.close()
When the above program is executed, it will create the COMPANY table in your test.db and it
will display the following messages −
Opened database successfully
Table created successfully
Insert Operation
Following Python program shows how to create records in the COMPANY table created in the
above example.
#!/usr/bin/python
import sqlite3
conn= sqlite3.connect('test.db')
print"Opened database successfully";
conn.commit()
print"Records created successfully";
conn.close()
When the above program is executed, it will create the given records in the COMPANY table
and it will display the following two lines −
Opened database successfully
Records created successfully
Select Operation
Following Python program shows how to fetch and display records from the COMPANY table
created in the above example.
#!/usr/bin/python
import sqlite3
conn= sqlite3.connect('test.db')
print"Opened database successfully";
ID = 2
NAME = Allen
ADDRESS = Texas
SALARY = 15000.0
ID = 3
NAME = Teddy
ADDRESS = Norway
SALARY = 20000.0
ID = 4
NAME = Mark
ADDRESS = Rich-Mond
SALARY = 65000.0
import sqlite3
conn= sqlite3.connect('test.db')
print"Opened database successfully";
ID = 2
NAME = Allen
ADDRESS = Texas
SALARY = 15000.0
ID = 3
NAME = Teddy
ADDRESS = Norway
SALARY = 20000.0
ID = 4
NAME = Mark
ADDRESS = Rich-Mond
SALARY = 65000.0
import sqlite3
conn= sqlite3.connect('test.db')
print"Opened database successfully";
ID = 3
NAME = Teddy
ADDRESS = Norway
SALARY = 20000.0
ID = 4
NAME = Mark
ADDRESS = Rich-Mond
SALARY = 65000.0
Installation
Requests installation depends on the type of operating system, the basic command anywhere
would be to open a command terminal and run,
pip install requests
Making a Request
Python requests module has several
sev built-in
in methods to make HTTP requests to specified URI
using GET, POST, PUT, PATCH, or HEAD requests. A HTTP request is meant to either
retrieve data from a specified URI or to push data to a server. It works as a request
request-response
protocol between a client and a server. Here we will be using the GET request.
GET method is used to retrieve information from the given server using a given URI. The GET
method sends the encoded user information
i appended to the page request.
Program
import requests
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
requests.get('https://www.geeksforgeeks.org/python language/')
print(r)
print(r.content)
Output:
Response object
When one makes a request to a URI, it returns a response. This Response object in terms of
python is returned by requests.method(),
requests.method( method being – get, post, put, etc. Response is a
powerful object with lots of functions and attributes that assist in normalizing data or creating
ideal portions of code. For example, response.status_code returns the status code from the
headers itself, and one can check if the request was processed successfully or not.
Response objects can be used to imply lots of features, methods, and functionalities.
Program:
import requests
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
print(r.url)
print(r.status_code)
Output:
https://www.geeksforgeeks.org/python-programming-language/
200
BeautifulSoup Library
BeautifulSoup is used extract information from the HTML and XML files. It provides a parse
tree and the functions to navigate, search or modify this parse tree.
Beautiful Soup is a Python library used to pull the data out of HTML and XML files for
web scraping purposes. It produces a parse tree from page source code that can be utilized
to drag data hierarchically and more legibly.
It was first presented by Leonard Richardson, who is still donating to this project, and this
project is also supported by Tide lift (a paid subscription tool for open-source supervision).
Beautiful soup3 was officially released in May 2006, Latest version released by Beautiful
Soup is 4.9.2, and it supports Python 3 and Python 2.4 as well.
Beautiful Soup is a Python library developed for quick reversal projects like screen-scraping.
Three features make it powerful:
1. Beautiful Soup provides a few simple methods and Pythonic phrases for guiding, searching,
and changing a parse tree: a toolkit for studying a document and removing what you need. It
doesn’t take much code to document an application.
2. Beautiful Soup automatically converts incoming records to Unicode and outgoin
outgoing forms to
UTF-8.
8. You don’t have to think about encodings unless the document doesn’t define an
encoding, and Beautiful Soup can’t catch one. Then you just have to choose the original
encoding.
3. Beautiful Soup sits on top of famous Python parsers like LXML
LXML and HTML, allowing you to
try different parsing strategies or trade speed for flexibility.
Installation
Inspecting Website
Before getting out any information from the HTML of the page, we must understand the
structure of the page. This is needed to be done in order to select the desired data from the
entire page. We can do this by right-clicking
right clicking on the page we want to scrape aand select inspect
element.
After clicking the inspect button the Developer Tools of the browser gets open. Now
almost all the browsers come with the developers tools installed, and we will be using
Chrome for this tutorial.
The developer’s tools allow seeing the site’s Document Object Model (DOM)
(DOM). If you don’t
know about DOM then don’t worry just consider the text displayed as the HTML structure of
the page.
After getting the HTML of the page let’s see how to parse this raw HTML code into some
useful information. First of all, we will create a BeautifulSoup object by specifying the parser
we want to use.
Program
import requests
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
requests.get('https://www.geeksforgeeks.org/python language/')
print(r)
print(soup.prettify())
Output:
This information is still not useful to us, let’s see another example to make some clear picture
from this. Let’s try to extract the title of the page.
Program
import requests
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
print(soup.title)
print(soup.title.name)
print(soup.title.parent.name)
Output:
Finding Elements
Now, we would like to extract some useful data from the HTML content. The soup object
contains all the data in the nested structure which could be programmatically extracted. The
website we want to scrape contains a lot of text so now let’s scrape all those content. First, let’s
inspect the webpage we want to scrape.
In the above image, we can see that all the content of the page is under the div with class entry-
content. We will use the find class. This class will find the given tag with the given attribute.
In our case, it will find all the div having class as entry-content. We have got all the content
from the site but you can see that all the images and links are also scraped. So our next task is
to find only the content from the above-parsed HTML. On again inspecting the HTML of our
website –
We can see that the content of the page is under the <p> tag. Now we have to find all the p tags
present in this class. We can use the find_all class of the BeautifulSoup.
Program
import requests
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
requests.get('https://www.geeksforgeeks.org/python language/')
s = soup.find('div', class_='entry--content')
content = s.find_all('p')
print(content)
Output:
Finding Elements by ID
In the above example, we have found the elements by the class name but let’s see how to find
elements by id. Now for this task let’s scrape
scrape the content of the leftbar of the page. The first
step is to inspect the page and see the leftbar falls under which tag.
The above image shows that the leftbar falls under the <div> tag with id as main. Now lets’s
get the HTML content under this tag. Now let’s inspect more of the page get the content of the
leftbar.
We can see that the list in the leftbar is under the <ul> tag with the class as leftBarList and our
task is to find all the li under this ul.
Program
import requests
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
requests.get('https://www.geeksforgeeks.org/python language/')
# Finding by id
content = leftbar.find_all('li')
print(content)
Output:
In the above examples, you must have seen that while scraping the data the tags also get
scraped but what if we want only the text without any tags. Don’t worry we will discuss the
same in this section. We will be using the text property. It only prints the text from the tag. We
will be using the above examplele and will remove all the tags from them.
Program
import requests
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
requests.get('https://www.geeksforgeeks.org/python language/')
s = soup.find('div', class_='entry--content')
lines = s.find_all('p')
for line in lines:
print(line.text)
Output:
Extracting Links
Program
import requests
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
requests.get('https://www.geeksf language/')
print(link.get('href'))
Output:
Extracting Image Information
On again inspecting the page, we can see that images lie inside the img tag and the link of that
image is inside the src attribute. See the below image –
Program:
import requests
r = requests.get('https://www.geeksforgeeks.org/python
tps://www.geeksforgeeks.org/python-programming-language/')
language/')
images_list = []
images = soup.select('img')
alt = image.get('alt')
images_list.append({"src":
nd({"src": src, "alt": alt})
print(image)
Output: