HAProxy The Guide To Multi Layer Security
HAProxy The Guide To Multi Layer Security
Multi-Layer Security
Defense in Depth Using
the Building Blocks of
HAProxy
Chad Lavoie
Table of Contents
Our Approach to Multi-Layer Security 4
Formatting an ACL
There are two ways of specifying an ACL—a named ACL and
an anonymous or in-line ACL. The first form is a named ACL:
We begin with the acl keyword, followed by a name, followed
by the condition. Here we have an ACL named is_static. This
ACL name can then be used with i f and unless statements
such as use_backend be_static if is_static. This form
is recommended when you are going to use a given condition
for multiple actions.
The condition, p
ath -i -m beg /static, checks to see if
the URL starts with /static. You’ll see how that works along
with other types of conditions later in this chapter.
The second form is an anonymous or in-line ACL:
This does the same thing that the above two lines would do,
just in one line. For in-line ACLs, the condition is contained
inside curly braces.
In both cases, you can chain multiple conditions together.
ACLs listed one after another without anything in between
will be considered to be joined with an and. The condition
overall is only true if both ACLs are true. ( Note: ↪ means
continue on same line)
This will prevent any client in the 1
0.0.0.0/16 subnet from
accessing anything starting with /api, while still being able to
access other paths.
Adding an exclamation mark inverts a condition:
Now only clients in the 1
0.0.0.0/16 subnet are allowed to
access paths starting with /api while all others will be
forbidden.
The IP addresses could also be imported from a file:
Within blacklist.acl you would then list individual or a range
of IP addresses using CIDR notation to block, as follows:
192.168.122.3
192.168.122.0/24
You can also define an ACL where either condition can be
true by using ||:
With this, each request whose path starts with / evil (e.g.
/evil/foo) or ends with /evil (e.g. /foo/evil) will be denied.
You can also do the same to combine named ACLs:
With named ACLs, specifying the same ACL name multiple
times will cause a logical OR of the conditions, so the last
block can also be expressed as:
This allows you to combine ANDs and ORs (as well as named
and in-line ACLs) to build more complicated conditions, for
example:
This will block the request if the path starts or ends with / evil,
but only for clients that are not in the 10.0.0.0/16 subnet.
Now that you understand the basic way to format an ACL
you might want to learn what sources of information you can
use to make decisions on. A source of information in HAProxy
is known as a fetch. These allow ACLs to get a piece of
information to work with.
You can see the full list of fetches available in the
documentation. The documentation is quite extensive and
that is one of the benefits of having HAProxy Enterprise
Support. It saves you time from needing to read through
hundreds of pages of documentation.
Here are some of the more commonly used fetches:
src Returns the client IP address that made
the request
Converters
Once you have a piece of information via a fetch, you might
want to transform it. Converters are separated by commas
from fetches, or other converters if you have more than one,
and can be chained together multiple times.
Some converters (such as lower and upper) are specified by
themselves while others have arguments passed to them. If
an argument is required it is specified in parentheses. For
example, to get the value of the path with / static removed
from the start of it, you can use the regsub converter with a
regex and replacement as arguments:
path,regsub(^/static,/)
As with fetches, there are a wide variety of converters, but
below are some of the more popular ones:
lower Changes the case of a sample to lowercase
Flags
You can put multiple flags in a single ACL, for example:
This will perform a case insensitive match based on the
beginning of the path and matching against patterns stored
Matching Methods
In this case, our ACL, hdr_beg(host) -i www, ensures that
the client is redirected unless their Host HTTP header already
begins with www.
The command h ttp-request redirect scheme changes
the scheme of the request while leaving the rest alone. This
allows for trivial HTTP-to-HTTPS redirect lines:
Here, our ACL !{ ssl_fc } checks whether the request did
not come in over HTTPS.
The command h ttp-request redirect prefix allows you
to specify a prefix to redirect the request to. For example, the
following line causes all requests that don’t have a URL path
beginning with /foo to be redirected to /foo/{original URI
here}:
For each of these a code argument can be added to specify a
response code. If not specified it defaults to 302. Supported
response codes are 301, 302, 303, 307, and 308. For
example:
This will redirect HTTP requests to HTTPS and tell clients
that they shouldn’t keep trying HTTP. Or for a more secure
version of this, you could inject the Strict-Transport-Security
header via h ttp-response set-header.
Selecting a Backend
In HTTP Mode
The use_backend line allows you to specify conditions for
using another backend. For example, to send traffic
requesting the HAProxy Stats webpage to a dedicated
backend, you can combine use_backend with an ACL that
checks whether the URL path begins with / stats:
use_backend
↪ be_%[path,map_beg(/etc/hapee-1.9/paths.map)]
If the file paths.map contains /api api as a key-value pair,
then traffic will be sent to be_api, combining the prefix b e_
with the string a pi. If none of the map entries match and
you’ve specified the optional second parameter to the map
function, which is the default argument, then that default will
be used.
use_backend
↪ be_%[path,map_beg(/etc/hapee-1.9/paths.map,
↪ mydefault)]
In this case, if there isn’t a match in the map file, then the
backend be_mydefault will be used. Otherwise, without a
default, traffic will automatically fall-through this rule in
search of another use_backend rule that matches or the
default_backend line.
In TCP Mode
We can also make routing decisions for TCP mode traffic, for
example directing traffic to a special backend if the traffic is
SSL:
Note that for TCP-level routing decisions, when requiring
data from the client such as needing to inspect the request,
the inspect-delay statement is required to avoid HAProxy
passing the phase by without any data from the client yet. It
won’t wait the full 10 seconds unless the client stays silent
for 10 seconds. It will move ahead as soon as it can decide
whether the buffer has an SSL hello message.
Setting an HTTP
Header
There are a variety of options for adding an HTTP header to
the request (transparently to the client). Combining these
with an ACL lets you only set the header if a given condition
is true.
add-header Adds a new header. If a header of the
same name was sent by the client this will
ignore it, adding a second header of the
same name.
There is also s
et-query, which changes the query string
instead of the path, and s
et-uri, which sets the path and
query string together.
http-request set-var(txn.session_id)
↪ cook(sessionid)
use_backend
↪ be_%[var(txn.session_id),
↪ map(/etc/hapee-1.9/sessionid.map)]
↪ if { var(txn.session_id),
↪ map(/etc/hapee-1.9/sessionid.map) -m found }
http-response
↪ set-map(/etc/hapee-1.9/sessionid.map)
↪ %[var(txn.session_id)]
↪ %[res.hdr(x-new-backend)]
↪ if { res.hdr(x-new-backend) -m found }
default_backend be_login
Now if a backend sets the x -new-backend header in a
response, HAProxy will send subsequent requests with the
client’s sessionid cookie to the specified backend. Variables
are used as, otherwise, the request cookies are inaccessible
by HAProxy during the response phase—a solution you may
want to keep in mind for other similar problems that HAProxy
will warn about during startup.
There is also the related del-map to delete a map entry based
on an ACL condition.
Caching
New to HAProxy 1.8 is small object caching, allowing the
caching of resources based on ACLs. This, along with
http-response cache-store, allows you to store select
requests in HAProxy’s cache system. For example, given that
we’ve defined a cache named icons, the following will store
responses from paths beginning with / icons and reuse them
in future requests:
This line will deny the request if the -m sub part of the
user-agent request header contains the string evil anywhere
in it. Remove the -m sub, leaving you with
req.hdr(user-agent) evil as the condition, and it will be
an exact match instead of a substring.
Attackers can vary more with their attacks, so you can rely on
the fact that legitimate user agents are longer while also
being set to a minimum length:
This will then block any requests which have a user-agent
header shorter than 32 characters.
Path
If an attacker is abusing a specific URL that legitimate clients
don’t, one can block based on path:
Or you can prevent an attacker from accessing hidden files or
folders:
dynamic update
update id /etc/hapee-1.9/whitelist.acl
↪ url http://192.168.122.1/whitelist.acl
↪ delay 60s
HAPEE will now update the ACL contents every 60 seconds
by requesting the specified URL. Support also exists for
retrieving the URL via HTTPS and using client certificate
authentication.
Conclusion
That’s all folks! We have provided you with some examples to
show the power within the HAProxy ACL system. The above
list isn’t exhaustive or anywhere near complete, but it should
give you the building blocks needed to solve a vast array of
problems you may encounter quickly and easily. Use your
imagination and experiment with ACLs.
A stick table collects and stores data about requests that are
flowing through your HAProxy load balancer. Think of it like a
machine that color codes cars as they enter a race track. The
first step then is setting up the amount of storage a stick
table should be allowed to use, how long data should be kept,
and what data you want to observe. This is done via the
stick-table directive in a frontend or b
ackend.
Here is a simple stick table definition:
backend webfarm
stick-table type ip size 1m expire 10s
↪ store http_req_rate(10s)
In this line we specify a few arguments: t ype, s
ize, expire
and s tore. The type, which is i p in this case, decides the
Did you know? If just storing rates, then the expire argument
should match the longest rate period; that way the counters
will be reset to 0 at the same time that the period ends.
Each frontend or backend section can only have one
stick-table defined in it. The downside to that is if you
want to share that storage with other frontends and
backends. The good news is that you can define a frontend or
backend whose sole purpose is holding a stick table. Then
you can use that stick table elsewhere using the t
able
parameter. Here’s an example (we’ll explain the
http-request track-sc0 line in the next section):
backend st_src_global
stick-table type ip size 1m expire 10s
↪ store http_req_rate(10s)
frontend fe_main
bind *:80
http-request track-sc0 src table st_src_global
Two other stick table arguments that you’ll want to know
about are n
opurge and p
eers. The former tells HAProxy to
not remove entries if the table is full and the latter specifies a
Tracking Data
backend st_src_login
stick-table type ip size 1m expire 10m
↪ store http_req_rate(10m)
backend st_src_api
stick-table type ip size 1m expire 10m
↪ store http_req_rate(10m)
frontend fe_main
bind *:80
http-request track-sc0 src table st_src_global
http-request track-sc1 src table st_src_login
↪ if { path_beg /login }
http-request track-sc1 src table st_src_api
↪ if { path_beg /api }
In this example, the line http-request track-sc0 doesn’t
have an i f statement to filter out any paths, so sc0 is tracking
all traffic. Querying the st_src_global stick table with the
Runtime API will show the HTTP request rate per client IP.
Easy enough.
Sticky counter 1, s c1, is being used twice: once to track
requests beginning with / login and again to track requests
beginning with /api. This is okay because no request passing
through this block is going to start with both / login and /api,
so one sticky counter can be used for both tables.
You can see three total requests in the st_src_global table,
two requests in the st_src_api table, and one in the
st_src_login table. Even though the last two used the same
sticky counter, the data was segregated. If I had made a
mistake and tracked both s t_src_global and st_src_login
using sc0, then I’d find that the st_src_login table was empty
because when HAProxy went to track it, sc0 was already
used for this connection.
In addition, this data can be viewed using HAProxy
Enterprise’s Real-Time Dashboard.
Types of Keys
A stick table tracks counters for a particular key, such as a
client IP address. The key must be in an expected type, which
is set with the type argument. Each t ype is useful for
different things, so let’s take a look at them:
Type Size (b) Description
Types of Values
After the s
tore keyword comes a comma delimited list of the
values that should be associated with a given key. While
some types can be set using ACLs or via the Runtime API,
most are calculated automatically by built-in fetches in
http_req_rate
This is likely the most frequently stored/used value in stick
tables. As its name may imply, it stores the number of HTTP
requests, regardless of whether they were accepted or not,
that the tracked key (e.g. source IP address) has made over
the specified time period. Using this can be as simple as the
following:
The first line defines a stick table for tracking IP addresses
and their HTTP request rates over the last ten seconds. This
is done by storing the http_req_rate value, which accepts
the period as a parameter. Note that we’ve set the e xpire
parameter to match the period of 10 seconds.
The second line is what inserts or updates a key in the table
and updates its counters. Using the sticky counter sc0, it sets
the key to the source IP using the s rc fetch method. You
might wonder when to use tcp-request content
One way to use this is to detect when a client has opened too
many connections so you can deny any more connections
from them. In this case the connection will be rejected and
the connection closed if the source IP currently has more than
10 connections open at the moment.
These counters are primarily used to protect against attacks
that involve a lot of new connections that originate from the
same IP address. In the next section, you’ll see HTTP
counters, which are more effective at protecting against
HTTP request floods. The HTTP counters track requests
independently of whether HTTP keep-alive or multiplexing
are used.
However in the case of floods of new connections, these
counters can stop them best.
http_err_rate
This tracks the rate of HTTP requests that end in an error
code (4xx) response. This has a few useful applications:
This will make a table that can be retrieved by the Runtime
API and shows the error rate of various paths:
● You can detect login brute force attacks or scanners.
If your login page produces an HTTP error code when
a login fails, then this can be used to detect brute
force attacks. For this you would track on s rc rather
than on p ath as in the previous example.
bytes_out_rate
The bytes_out_rate counter measures the rate of traffic
being sent from your server for a given key, such as a path. Its
gpc0 / gpc1
The general purpose counters (gpc0 and gpc1) are
special—along with gpt0 (general purpose tag)—for
defaulting to 0 when created and for not automatically
updating. ACLs can be used to increase this counter via the
sc_inc_gpc0 fetch method so that you can track custom
statistics with it.
If you track gpc0_rate, it will automatically give you a view of
how quickly g pc0 is being incremented. This can tell you how
frequently this event is happening.
Now that you’ve seen how to create stick table storage and
track data with it, you’ll want to be able to configure HAProxy
to take action based on that captured information. Going back
to a common use case for stick tables, let’s see how to use
the data to persist a client to a particular server. This is done
with the stick on directive and is usually found in a b ackend
section looking like the following:
backend mysql
mode tcp
stick-table type integer size 1 expire 1d
stick on int(1)
on-marked-down shutdown-sessions
server primary 192.168.122.60:3306 check
server backup 192.168.122.61:3306 check backup
With this configuration, we store only a single entry in the
stick table, where the key is 1 and the value is the server_id of
the active server. Now if the primary server goes down, the
backup server’s server_id will overwrite the value in the stick
backend st_ssl_stats
stick-table type string len 32 size 200
↪ expire 24d store http_req_rate(24d)
frontend fe_main
tcp-request inspect-delay 10s
tcp-request content track-sc0
↪ ssl_fc_protocol table st_ssl_stats
Now you can query the server and see which TLS protocols
have been used:
Or you could turn it around and track clients who have used
TLSv1.1 by IP address:
backend st_ssl_stats
stick-table type ip size 200 expire 1h
↪ store http_req_rate(1d)
frontend fe_main
tcp-request inspect-delay 10s
tcp-request content track-sc0 src
↪ table st_ssl_stats if
↪ { ssl_fc_protocol TLSv1.1 }
Now your stick table is a list of IPs that have used TLSv1.1.
To learn more about the Runtime API, take a look at our blog
post Dynamic Configuration with the HAProxy Runtime API
(bit.ly/2SpOPDX).
If you aren’t tracking the key that you want to look up, you
can use the t able_http_req_rate and similar fetches to
retrieve a value without updating it. Using track-sc* will
update h ttp_req_rate and similar counters while looking up
a value like this will not. These work like converters where
they take the key as input, the table name as an argument,
and output the value. For example we could do:
Other Considerations
inspect-delay
Let’s talk about a line that is sometimes needed and ends up
causing confusion:
You only need to use this in a frontend or backend when you
have an ACL on a statement that would be processed in an
earlier phase than HAProxy would normally have the
information. For example, tcp-request content reject if
{ path_beg /foo } needs a t cp-request inspect-delay
because HAProxy won’t wait in the TCP phase for the HTTP
URL path data. In contrast h ttp-request deny if {
path_beg /foo } doesn’t need an t cp-request
inspect-delay line because HAProxy won’t process
http-request rules until it has an HTTP request.
When t cp-request inspect-delay is present, it will hold
the request until the rules in that block have the data they
need to make a decision or until the specified delay is
reached, whichever is first.
nbproc
If you are using the nbproc directive in the g
lobal section of
your configuration, then each HAProxy process has its own
set of stick tables. The net effect is that you’re not sharing
stick table information among those processes. Also note that
the peers protocol, discussed next, can’t sync between
processes on the same machine.
There are two ways to solve this. The first is to use the newer
nbthread directive instead. This is a feature introduced in
HAProxy Enterprise 1.8r1 and HAProxy 1.8 that enables
multithreading instead of multiple processes and shares
memory, thus sharing stick tables between threads running in
listen fe_main
bind *:443 ssl crt /path/to/cert.pem
bind *:80
server local
↪ unix:/var/run/hapee-1.9/ssl_handoff.sock
↪ send-proxy-v2
frontend fe_secondary
bind unix:/var/run/hapee-1.9/ssl_handoff.sock
↪ accept-proxy process 1
# Stick tables, use_backend,
default_backend...
The first proxy terminates TLS and passes traffic to a single
server listed as server local
unix:/var/run/hapee-1.9/ssl_handoff.sock
send-proxy-v2. Then you add another frontend with bind
unix:/var/run/hapee-1.9/ssl_handoff.sock
accept-proxy process 1 in it. Inside this frontend you can
have all of your stick table and statistics collection without
issue. Since TLS termination usually takes most of the CPU
time, it’s highly unusual to need more than one process for
the backend work.
peers mypeers
peer centos7vert 192.168.122.64:10000
peer shorepoint 192.168.122.1:10000
Then change your stick table definition to include a p
eers
argument:
At least one of the peers needs to have a name that matches
the server’s host name or you must include a setenv
hostname line in the g
lobal section of the configuration to
inform HAProxy what it should see the host name as.
Now the two servers will exchange stick table entries; but
there is a downside: they won’t sum their individual counters,
Conclusion
In this chapter, you learned about HAProxy’s in-memory
storage mechanism, stick tables, that allow you to track client
activities across requests, enable server persistence, and
collect real-time metrics.
A few things to note about the structure of this file:
● It’s plain text
● A key begins each line (e.g. static.example.com)
● A value comes after a key, separated by at least one
space (e.g. b
e_static)
● Empty lines and extra whitespace between words are
ignored
● Comments must begin with a hash sign and must be
on their own line
Did you know? Map files are loaded into an Elastic Binary
Tree format so you can look up a value from a map file
containing millions of items without a noticeable performance
impact.
Map Converters
To give you an idea of what you can do with map files, let’s
look at using one to find the correct backend pool of servers
where users should be sent. You will use the hosts.map file
that you created previously to look up which backend should
be used based on a given domain name. Begin by editing
your haproxy.cfg file. As you will see, you will add a map
converter that reads the map file and returns a backend
name.
frontend fe_main
bind :80
use_backend %[str(example.com),
↪ map(/etc/hapee-1.9/maps/hosts.map)]
The first row in hosts.map that has example.com as a key will
have its value returned. Notice how the input,
str(example.com), begins the expression and is separated
from the converter with a comma.
When this expression is evaluated at runtime, it will be
converted to the line u se_backend be_static, which directs
requests to the b e_static pool of servers. Of course, rather
than passing in a hardcoded string like example.com, you can
send in the value of an HTTP header or a URL parameter. The
next example uses the value of the Host header as the input.
use_backend
%[req.hdr(host),lower,
↪ map(/etc/hapee-1.9/maps/hosts.map,
↪ be_static)]
map_sub Looks for entries in the map file that make up
a substring of the sample (e.g. an input of
“abcd” would match “ab” or “c” in the file).
Unlike the other match modes, this doesn’t
perform ebtree lookups and instead checks
each line.
Much of the value of map files comes from your ability to
modify them dynamically. This allows you to, for example,
change the flow of traffic from one backend to another, such
as for maintenance.
There are four ways to change the value that we get back
from a map file. First, you can change the values by editing
the file directly. This is a simple way to accomplish the task,
but does require a reload of HAProxy. This is a good choice if
Did you know? The lb-update module can also be used to
synchronize TLS ticket keys.
dynamic-update update id
↪ /etc/hapee-1.9/maps/sample.map
↪ url http://10.0.0.1/sample.map delay 300s
See the HAProxy Enterprise documentation for detailed
usage instructions or contact us to learn more.
The first column is the location of the entry and is typically
ignored. The second column is the key to be matched and the
third is the value. We can easily add and remove entries via
the Runtime API. To remove an entry from the map file, use
del map. Note that this only removes it from memory and not
from the actual file.
You can also delete all entries with c
lear map:
Add a new key and value with a
dd map:
Change an existing entry with set map:
Using show map, we can get the contents of the file, filter it to
only the second and third columns with a
wk, and then save
the in-memory representation back to disk:
frontend fe_main
bind :80
acl in_network src 192.168.122.0/24
acl is_map_add path_beg /map/add
http-request
↪ set-map(/etc/hapee-1.9/maps/hosts.map)
↪ %[url_param(domain)] %[url_param(backend)]
↪ if is_map_add in_network
This will allow you to make web requests such as
http://192.168.122.64/map/add?domain=example.com&bac
kend=be_static for a quick and easy way to update your
maps. If the entry already exists, it will be updated. Notice
that you can use http-request deny deny_status 200 to
prevent the request from going to your backend servers.
Using the s how map technique you saw earlier, you might
schedule a cron job to save your map files every few minutes.
However, if you need to replicate these changes across
multiple instances of HAProxy, using one of the other
approaches will be a better bet.
A Blue-Green Deployment
Suppose you wanted to implement a blue-green deployment
wherein you’re able to deploy a new release of your web
application onto a set of staging servers and then swap them
active be_blue
In this scenario, the be_blue backend contains your set of
currently active, production servers. Here is your HAProxy
configuration file:
frontend fe_main
bind :80
use_backend %[str(active),
↪ map(/etc/hapee-1.9/maps/bluegreen.map)]
backend be_blue
server server1 10.0.0.3:80 check
server server2 10.0.0.4:80 check
backend be_green
server server1 10.0.0.5:80 check
server server2 10.0.0.6:80 check
After you deploy a new version of your application to the
be_green servers and test it, you can use the Runtime API to
swap the active b
e_blue servers with the b
e_green servers,
causing your be_green servers to become active in
production.
Now your traffic will be directed away from your b e_blue
servers and to your b
e_green servers. This, unlike a rolling
deployment, ensures that all of your users are migrated to the
new version of your application at the same time.
/api/routeA 40
/api/routeB 20
Consider the following frontend, wherein the current
request rate for each client is measured over 10 seconds. A
URL path like /api/routeA/some_function would allow up to
four requests per second (40 requests / 10 seconds = 4 rps).
Here, the s
tick-table definition records client request rates
over ten seconds. Note that we are tracking clients using the
base32+src fetch method, which is a combination of the
Host header, URL path, and source IP address. This allows us
Conclusion
Now that you’ve seen a few of the possibilities, consider
reaching for your trusty tool, maps, the next time you run into
a problem where it can help.
HTTP Flood
The danger of HTTP flood attacks is that they can be carried
out by just about anyone. They don’t require a large botnet
and tools for orchestrating the attack are plentiful. This
accessibility makes it especially important that you have
defenses in place to repel these assaults.
These attacks can come in a few different forms, but the most
commonly seen pattern consists of attackers requesting one
or more of your website’s URLs with the highest frequency
they are able to achieve. A shotgun approach will be to
request random URLs, whereas more sophisticated attackers
will profile your site first, looking for slow and uncached
backend per_ip_rates
stick-table type ip size 1m expire 10m
↪ store http_req_rate(10s)
This sets up the storage that will keep track of your clients by
their IP addresses. It initializes a counter that tracks each
user’s request rate. Begin tracking a client by adding an
http-request track-sc0 directive to a frontend section,
as shown:
frontend fe_mywebsite
bind *:80
http-request track-sc0 src table per_ip_rates
With this configuration in place, all clients visiting your
website through HAProxy via the fe_mywebsite frontend will
frontend fe_mywebsite
bind *:80
http-request track-sc0 src table per_ip_rates
http-request deny deny_status 429
↪ if { sc_http_req_rate(0) gt 100 }
This rule instructs HAProxy to deny all requests coming from
IP addresses whose stick table counters are showing a
request rate of over 10 per second. When any IP address
exceeds that limit, it will receive an HTTP 429 Too Many
Requests response and the request won’t be passed to any
HAProxy backend server.
These requests will be easy to spot in the HAProxy access
log, as they will have a termination state of P
R–, which means
If you’d like to define rate limit thresholds on a per URI basis,
you can do so by adding a map file that pairs each rate limit
with a URL path. See the previous chapter, Introduction to
HAProxy Maps for an example.
Maybe you’d like to rate limit POST requests only? It’s simple
to do by adding a statement that checks the built-in ACL,
METH_POST.
You can also tarpit abusers so that their requests are rejected
with a HTTP 500 status code with a configurable delay. The
duration of the delay is set with the timeout tarpit
directive. Here, you’re delaying any response for five seconds:
timeout tarpit 5s
http-request tarpit if { sc_http_req_rate(0)
↪ gt 100 }
Slowloris Attacks
Before getting into our second point about DDoS detection,
identifying odd patterns among users, let’s take a quick look
at another type of application-layer attack: Slowloris.
Slowloris involves an attacker making requests very slowly to
tie up your connection slots. Contrary to other types of DDoS,
the volume of requests needed to make this attack successful
is fairly low. However, as each request only sends one byte
every few seconds, they can tie up many request slots for
several minutes.
An HAProxy load balancer can hold a greater number of
connections open without slowing down than most web
servers. As such, the first step towards defending against
Slowloris attacks is setting maxconn values. First, set a
maxconn in the g lobal section that leaves enough headroom
so that your server won’t run out of memory even if all the
connections are filled. Then inside the frontend or a
defaults section, set a maxconn value slightly under that so
that if an attack saturates one frontend, the others can still
operate.
Next, add two lines to your d efaults section:
The first line causes HAProxy to respond to any clients that
spend more than five seconds from the first byte of the
request to the last with an HTTP 408 Request Timeout error.
Normally, this only applies to the HTTP request and its
headers and doesn’t include the body of the request.
However, with o ption http-buffer-request, HAProxy will
store the request body in a buffer and apply the
http-request timeout to it.
You can also reject requests that have non-browser
User-Agent headers, such as curl.
This line will deny the request if the -m sub part of the
User-Agent request header contains the string c url anywhere
in it. The -
i makes it case-insensitive. You might also check
for other strings such as phantomjs and s limerjs, which are
two scriptable, headless browsers that could be used to
automate an attack.
If you have many strings that you’re checking, consider
saving them to a file—one string per line—and referencing it
like this:
At other times, an attacker who is using an automated tool
will send requests that don’t contain a User-Agent header at
all. These can be denied too, as in the following example:
Even more common is for attackers to randomize the
User-Agent strings that they send in order to evade detection
for longer. Oftentimes, these come from a list of genuine
values that a true browser would use and make it harder to
identify malicious users.
This is where the HAProxy Enterprise F ingerprint Module
comes in handy. It uniquely identifies clients across requests,
even when they change their User-Agent string. It works by
triangulating many data points about a client to form a
signature specific to them. Using this information, you can
then ID and dynamically block the abusers.
1.0.1.0/2 4
1.0.2.0/2 3
1.0.8.0/2 1
1.0.32.0/1 9
1.1.0.0/2 4
1.1.2.0/2 3
1.1.4.0/2 2
# etc.
To streamline this, you can use a GeoIP database like
MaxMind or Digital Element. Read our blog post, U sing GeoIP
Database within HAProxy (http://bit.ly/2D5oqBU) to see
how to set this up. Alternatively, these lookups can happen
directly from within HAProxy Enterprise using a native
module that allows for live updates of the data and doesn’t
require extra scripts to translate to map files. The native
modules also result in less memory consumption in cases
where lookups need to be granular, for example, on a city
basis.
If you don’t like the idea of banning entire ranges of IP
addresses, you might take a more lenient approach and only
greylist them. G reylisting allows those clients to access your
website, but enforces stricter rate limits for them. The
following example sets a stricter rate limit for clients that
have IP addresses listed in greylist.acl:
dynamic-update
update id /etc/hapee-1.9/blacklist.acl
↪ url https://192.168.122.1/blacklist.acl
↪ delay 60s
Protecting TCP
(non-HTTP) Services
So far, we’ve primarily covered protecting web servers.
However, HAProxy can also help in protecting other
TCP-based services such as SSH, SMTP, and FTP. The first
step is to set up a stick-table that tracks c
onn_cur and
conn_rate:
frontend fe_smtp
mode tcp
bind :25
option tcplog
timeout client 1m
tcp-request content track-sc0 src
↪ table per_ip_connections
↪ tcp-request content reject
↪ if { sc_conn_cur(0) gt 1 } ||
↪ { sc_conn_rate(0) gt 5 }
default_backend be_smtp
With the usual backend:
backend be_smtp
mode tcp
timeout server 1m
option tcp-check
server smtp1 162.216.18.221:25 maxconn 50
check
Now, each client can establish one SMTP connection at a
time. If they try to open a second one while the first is still
open, the connection will be immediately closed again.
This will immediately connect any client that has made only
one connection within the last minute. A threshold of less
than two is used so that we’re able to accept one connection,
but it also makes it easy to scale that threshold up. Other
connections from this client will be held in limbo for 10
seconds, unless the client sends data down that second pipe,
which we check with req_len. In that case, HAProxy will
close the connection immediately without bothering the
backend.
This type of trick is useful against spam bots or SSH
bruteforce bots, which will often launch right into their attack
without waiting for the banner. With this, if they do launch
right in, they get denied, and if they don’t, they had to hold
the connection in memory for an additional 10 seconds. If
they open more connections to get around that rate limit, the
conn_cur limits from the previous section will stop them.
use_backend be_website_bots if {
sc_http_req_rate(0)
↪ gt 100 }
This will typically go after the h
ttp-request deny rules,
which would have a higher threshold like 200, so that an
overly abusive bot will still get direct error responses, while
ones with a lower request rate can get the be_website_bots
backend instead. If returning errors even at the higher rates
concerns you, you can add { be_conn(be_website) gt
backend per_ip_and_url_rates
stick-table type binary len 8 size 1m expire
24h
↪ store http_req_rate(24h)
backend per_ip_rates
stick-table type ip size 1m expire 24h
↪ store gpc0,gpc0_rate(30s)
The first table, which is defined within your
per_ip_and_url_rates backend, will track the number of times
that a client has requested the current webpage during the
last 24 hours. Clients are tracked by a unique key. In this case,
the key is a combination of the client’s IP address and a hash
of the path they’re requesting. Notice how the stick table’s
type is binary so that the key can be this combination of data.
The second table, which is within a backend labelled
per_ip_rates, stores a g eneral-purpose counter called gpc0.
You can increment a general-purpose counter when a
custom-defined event occurs. We’re going to increment it
whenever a client visits a page for the first time within the
past 24 hours.
The gpc0_rate counter is going to tell us how fast the client
is visiting new pages. The idea is that bots will visit more
pages in less time than a normal user would. We’ve arbitrarily
set the rate period to thirty seconds. Most of the time, bots
are going to be fast. For example, the popular Scrapy bot is
frontend fe_main
bind :80
default_backend web_servers
default_backend web_servers
With this, any user who requests more than 15 unique pages
within the last thirty seconds will get a 403 Forbidden
response. Optionally, you can use d eny_status to pass an
alternate code such as 429 Too Many Requests. Note that the
user will only be banned for the duration of the rate period, or
thirty seconds in this case, after which it will reset to zero.
That’s because we’ve added !exceeds_limit to the end of the
http-request sc-inc-gpc0(0) line so that if the user
backend per_ip_rates
stick-table type ip size 1m expire 24h
↪ store gpc0,gpc0_rate(30s),gpt0
Then, add h ttp-request sc-set-gpt0(0) to your
frontend to set the tag to 1, using the same condition as
before. We’ll also add a line that denies all clients that have
this flag set.
Alternatively, you can send any tagged IP addresses to a
special backend by using the use_backend directive, as
shown:
0x10afd7c:
key=3CBC49B17F000001000000000000000000000000
use=0 exp=596584 http_req_rate(86400000)=5
I’ve made one request to /foo and five requests to / bar; all
from a source IP of 127.0.0.1. Although the key is in binary
format, you can see that the first four bytes are different. Each
key is a hash of the path I was requesting and my IP address,
so it’s easy to see that I’ve requested different pages. The
http_req_rate tells you how many times I’ve accessed these
pages.
Did you know? You can key off of IPv6 addresses with this
configuration as well, by using the same url32+srcfetch
method.
Use the Runtime API to inspect the per_ip_rates table too.
You’ll see the g
pc0 and gpc0_rate values:
Here, the two requests for unique pages over the past 24
hours are reported as g pc0=2. The number of those that
happened during the last thirty seconds was also two, as
indicated by the g pc0_rate(30000) value.
If you’re operating more than one instance of HAProxy,
combining the counters that each collects will be crucial to
getting an accurate picture of user activity. HAProxy
Enterprise provides cluster-wide tracking with a feature
called the S
tick Table Aggregator that does just that. This
feature shares stick table data between instances using the
peers protocol, adds the values together, and then returns the
combined results back to each instance of HAProxy. In this
way, you can detect patterns using a fuller set of data.
Now, once an IP is marked as a bot, the client will just get
reCAPTCHA challenges until such time as they solve one, at
which point they can go back to browsing normally.
HAProxy Enterprise has another great feature: the Antibot
module. When a client behaves suspiciously by requesting
too many unique pages, HAProxy will send them a JavaScript
challenge. Many bots aren’t able to parse JavaScript at all, so
this will stop them dead in their tracks. The nice thing about
this is that it isn’t disruptive to normal users, so customer
experience remains good.
Beyond Scrapers
So far, we’ve talked about detecting and blocking clients that
access a large number of unique pages very quickly. This
method is especially useful against scrapers, but similar rules
can also be applied to detecting bots attempting to
brute-force logins and scan for vulnerabilities. It requires only
a few modifications.
Brute-force Bots
Bots attempting to brute force a login page have a couple of
unique characteristics: They make POST requests and they
hit the same URL (a login URL), repeatedly testing numerous
username and password combinations. In the previous
We’ve been using http-request sc-inc-gpc0(0) to
increment a general-purpose counter, gpc0, on the
per_ip_rates stick table when the client is visiting a page for
the first time.
http-request sc-inc-gpc0(0) if
↪ { sc_http_req_rate(1) eq 1 } !exceeds_limit
You can use this same technique to block repeated hits on the
same URL. The reasoning is that a bot that is targeting a login
page will send an anomalous amount of POST requests to
that page. You will want to watch for POST requests only.
First, because the per_ip_and_url_rates stick table is
watching over a period of 24 hours and is collecting both GET
and POST requests, let’s make a third stick table to detect
brute-force activity. Add the following s
tick-table
definition:
Then add an h
ttp-request track-sc2 and an
http-request deny line to the f
rontend:
You now have a stick table and rules that will detect repeated
POST requests to the /login URL, as would be seen when an
attacker attempts to find valid logins. Note how the ACL {
path /login } restricts this to a specific URL. This is optional, as
you could rate limit all paths that clients POST to by omitting
it.
In addition to denying the request, you can also use any of
the responses discussed in the U nblocking Real Users section
above in order to give valid users who happen to get caught
in this net another chance.
Vulnerability Scanners
Vulnerability scanners are a threat you face as soon as you
expose your site or application to the Internet. Generic
vulnerability scanners will probe your site for many different
backend per_ip_rates
stick-table type ip size 1m expire 24h
↪ store
gpc0,gpc0_rate(30s),http_err_rate(5m)
Now, with that additional counter, and the http-request
track-sc0 already in place, you have—and can view via the
You can also use the gpc0 counter that we are using for the
scrapers to block them for a longer period of time:
http-request sc-inc-gpc0(0) if
↪ { sc_http_err_rate(0) eq 1 } !exceeds_limit
Now the same limits that apply to scrapers will apply to
vulnerability scanners, blocking them quickly before they
succeed in finding vulnerabilities. Alternatively, you can
shadowban these clients and send their requests to a
honeypot backend, which will not give the attacker any
reason to believe that they have been blocked. Therefore,
they will not attempt to evade the block. To do this, add the
following in place of the h
ttp-request deny above. Be sure
to define the backend be_honeypot:
use_backend be_honeypot if
↪ { sc_http_err_rate(0) gt 10 }
Now, search engines won’t get their page views counted as
scraping. If you have multiple files, such as another for
whitelisting admin users, you can order them like this:
When using whitelist files, it’s a good idea to ensure that they
are distributed to all of your HAProxy servers and that each
server is updated during runtime. An easy way to accomplish
this is to purchase HAProxy Enterprise and use its l b-update
module. This lets you host your ACL files at a URL and have
each load balancer fetch updates at a defined interval. In this
way, all instances are kept in sync from a central location.
module-load hapee-lb-maxmind.so
maxmind-load COUNTRY
↪
/etc/hapee-1.9/geolocation/GeoLite2-Country.mmdb
maxmind-cache-size 10000
Within your f rontend, use http-request set-header to
add a new HTTP header to all requests, which captures the
client’s country:
http-request set-header
↪ x-geoip-country
↪
%[src,maxmind-lookup(COUNTRY,country,iso_code)]
Now, requests to the b
ackend will include a new header that
looks like this:
x-geoip-country: US
You can also add the line m
axmind-update url
https://example.com/maxmind.mmdb to have HAProxy
automatically update the database from a URL during
runtime.
module-load hapee-lb-netacuity.so
netacuity-load 26
↪ /etc/hapee-1.9/geolocation/netacuity/
netacuity-cache-size 10000
Then, inside of your frontend add an h
ttp-request
set-header line:
http-request set-header
↪ x-geoip-country %[src,netacuity-lookup-ipv4
↪ ("pulse-two-letter-country")]
This adds a header to all requests, which contains the client’s
country:
x-geoip-country: US
Since this information is stored in an HTTP header, your
backend server will also have access to it, which gives you
the ability to take further action from there. We won’t get into
it here, but HAProxy also supports device detection and other
types of application intelligence databases.
Conclusion
In this chapter, you learned how to identify and ban bad bots
from your website by using the powerful configuration
language within the HAProxy load balancer. Placing this type
of bot protection in front of your servers will protect you from
these crawlers as they attempt content scraping, brute
forcing and mining for security vulnerabilities.
A Specific
Countermeasure
We've enjoyed the benefits of network firewalls since the
1980s. They allow IT admins to filter traffic between
networks based on any of the information in the TCP
protocol: source IP, source port, destination IP, and
destination port. Don't want someone directly accessing your
database from the Internet? Put a firewall in front of it and
close off access to the outside world. In fact, common
practice is to block everything by default and only punch a
hole through for specific applications.
Next-generation firewalls (NGFW) took this to the next level.
They often include deep packet inspection (DPI) and intrusion
detection systems (IDS) that allows the firewall to open up IP
packets and look at their contents, even up to the application
layer. For instance, an IDS might analyze packets to discover
Routine Scanning
First things first. You need a way to assess the security of
your application. There are a number of web security
scanners out there including Acunetix, Nessus, and Burp
Suite. We'll use one called O WASP Zed Attack Proxy (ZAP),
which can be downloaded and installed onto Windows,
Linux, and Mac. I've found ZAP to be one of the easier
scanners to use and it's able to detect an impressive range of
Log in with the credentials admin and password. Once in,
click the C reate / Reset Database button to initialize the site's
MySQL database. At this point, there is no WAF protecting
the site. It's wide open to security exploits.
Let's run sqlmap and see what if finds. When you log into
DVWA, it places a cookie in your browser called PHPSESSID
that tells the site that you're a logged-in user. So that sqlmap
can bypass the login screen and scan the site, it needs the
value of this cookie. Open your browser's Developers Tools
and view the site's cookies on the Network tab. Then, find the
PHPSESSID cookie and copy its value.
In the following command, the - -cookie parameter is passed
to sqlmap with the value of the PHPSESSID cookie. You
should also give it the value of a cookie called s ecurity, which
is set to l ow. This tells DVWA to not use its own built-in,
practice WAF. Replace the session ID and IP address with
your own values:
This command probes the /vulnerabilities/sqli page for SQL
injection flaws, substituting various strings for the id
parameter in the URL. When it's successful, it will gain access
to the backend MySQL instance and enumerate the
databases it finds:
As you can see, sqlmap was able to find information about
the website's databases and list out sensitive information.
That's certainly a security weakness! You'll see in the next
section how the HAProxy Enterprise WAF stops this from
happening.
Next, let's use the ZAP scanner to find pages susceptible to
cross-site scripting. You can use ZAP to scan for other sorts
of vulnerabilities, too, if you like. Open ZAP and, from the
right-hand panel, choose Launch Browser.
In the browser that opens, go to the site and log in. Using
Launch Browser helps ZAP to learn the layout of the website.
You can also have it crawl the site on its own, but that isn't as
effective. To demonstrate a vulnerability, we'll focus on
cross-site scripting (XSS) by going to the XSS (Reflected)
page and typing a value into the W hat's your name? field.
Then click Submit. After that, you can close the browser
window.
We need to beef up our defenses so that sqlmap and ZAP
don't find these vulnerabilities. In the next section, you'll see
how to set up the WAF module in HAProxy Enterprise.
ssh -i ./haproxy_demo.pem
↪ ubuntu@[HAPROXY_IP_ADDRESS]
You need to download the CRS. There's a script that will take
care of this for you. Simply run the following command and
the files will be downloaded to the
/etc/hapee-1.9/modsec.rules.d directory:
sudo
/opt/hapee-1.9/bin/hapee-lb-modsecurity-getcrs
Next, go to /etc/hapee-1.9 and edit the hapee-lb.cfg file with
your favorite editor for these situations (vi, nano, etc.). Add
the following m odule-load directive to the global section:
module-load hapee-lb-modsecurity.so
Also add a f
ilter directive to your HAProxy f
rontend to
enable protection for that proxy. Here's what it looks like:
frontend fe_main
filter modsecurity owasp_crs rules-file
↪
/etc/hapee-1.9/modsec.rules.d/lb-modsecurity.conf
Then save the file and restart the load balancer services with
the hapee-1.9 command:
At this point, the WAF is in detection-only mode. That means
that it will classify attacks as it sees them and write warnings
to the file / var/log/modsec_audit.log. However, it will not
block any requests. To turn on blocking, edit the file
/etc/hapee-1.9/modsec.rules.d/modsecurity.conf. Near the
beginning, change SecRuleEngine DetectionOnly to
SecRuleEngine On. Then restart the load balancer services
again.
Here, even though we gave it a page that we know if
susceptible to SQL injection, it wasn't able to find it. That's
because the WAF is blocking requests that seem malicious
with 403 Forbidden responses.
For example, you might show the client a Javascript challenge
by using the Antibot module if they're flagged as potentially
malicious. Subscribe to our blog to be alerted when this
functionality becomes available!