-
-
Notifications
You must be signed in to change notification settings - Fork 32k
email: invalid RFC 2047 address header after refolding with email.policy.default #121284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Really interesting issue Mike. Thanks to you and Andre for the example - really cunning bit of Base 64 there :). Perhaps I'm being naive, but it might save time or at least give a work around to try to solve this at a high level first. So may I ask if you're sure the policies are consistent? I don't know if the bottom line of the following is a valid RFC 2047 header. But there's no unquoted comma in it, and the base 64 is unchanged (not decoded):
Nonetheless, the default policy to Also consistently specifying
I'm sure there's code there that parses and decodes the To: header correctly, as curiously the string representation of the
|
Thanks for the analysis, James.
Sorry, yes, I used "default" confusingly in a code comment. I was trying to convey that the policy of an email.message.Message defaults to email.policy.compat32. (I've edited my original report to clarify.) And I originally thought the root of this problem was using compat32 to build the message, then email.policy.default in the generator. But your analysis seems to show that compat32 doesn't need to be involved, and this is actually a bug with email.policy.default header refolding in the generator. Here's an example using another problematic encoded-word that Andrés identified and sticking entirely to email.policy.default: em2 = email.message.EmailMessage() # (uses policy=email.policy.default)
em2["To"] = '=?utf-8?q?A_v=C3=A9ry_long_name_with_non-ASCII_char_and=2C_comma?= <to@example.com>'
em2["To"]
# '"A véry long name with non-ASCII char and, comma" <to@example.com>'
em2.as_string() # (uses em2.policy, which is email.policy.default)
# 'To: A =?utf-8?q?v=C3=A9ry?= long name with non-ASCII char and, comma\n <to@example.com>\n\n' The unquoted comma is still invalid. (Pretty sure the wrapping between display-name and addr-spec is allowed, though it might be discouraged in a "should not" sense.) The bug shows up whether the message is constructed or parsed: from email import message_from_string
em3 = message_from_string(
"To: =?utf-8?q?A_v=C3=A9ry_long_name_with_non-ASCII_char_and=2C_comma?= <to@example.com>\r\n\r\n",
policy=email.policy.default,
)
em3.as_string()
# 'To: A =?utf-8?q?v=C3=A9ry?= long name with non-ASCII char and, comma\n <to@example.com>\n\n' But if the header is already wrapped before parsing (so that it doesn't require refolding in the generator), the problem goes away: em4 = message_from_string(
"To: =?utf-8?q?A_v=C3=A9ry_long_name_with_non-ASCII_char_and=2C_comma?="
+ "\r\n <to@example.com>\r\n\r\n", # note header wrapping
policy=email.policy.default,
)
em4.as_string()
# 'To: =?utf-8?q?A_v=C3=A9ry_long_name_with_non-ASCII_char_and=2C_comma?=\n <to@example.com>\n\n'
I believe that is valid. Pretty sure you can freely mix encoded-words, atoms, and quoted-strings. (Whether all email clients handle it properly is a different question.) As far as a high-level workaround, the original report is actually a case isolated from Django, which has a lot of code built on the legacy email API combined with its own workarounds for historic issues in that API. Some of those workarounds tend to generate these problematic (but technically valid) RFC 2047 encoded-word headers. The best workaround seems to be not pre-encoding the headers, and then sticking entirely to the modern policy=default API (to avoid other legacy issues that necessitated pre-encoding). I'm working on updating Django for that. But bottom line, it seems like there is a bug here in email.policy.default header refolding. |
Also, this seems to be a rare case where compat32 works better than email.policy.default: import email.message
import email.policy
import email.utils
display_name = "A véry long name with non-ASCII char and, com
8000
ma"
to = email.utils.formataddr((display_name, "to@example.com"))
# '=?utf-8?q?A_v=C3=A9ry_long_name_with_non-ASCII_char_and=2C_comma?= <to@example.com>'
msg = email.message.EmailMessage(policy=email.policy.default)
msg["To"] = to
msg.as_string(policy=email.policy.default) # invalid header:
# 'To: A =?utf-8?q?v=C3=A9ry?= long name with non-ASCII char and, comma\n <to@example.com>\n\n'
msg.as_string(policy=email.policy.compat32) # valid header:
# 'To: =?utf-8?q?A_v=C3=A9ry_long_name_with_non-ASCII_char_and=2C_comma?=\n <to@example.com>\n\n' And I'm now convinced this is the same underlying problem as #80222 (which focuses on email addresses in the display-name, and has an abandoned PR). |
Thanks Mike. I agree this is definitely a real bug, and it potentially affects even more code (not just bytes) than your example - thanks for the report. Unfortunately from my dip in to the code base briefly today, I think this is better addressed by someone with more knowledge than me, of all the things the email library has already fixed. Considerable effort has gone into it beforehand, and it does a lot of things very well already for a lot of people. I don't want to break any of that. |
…-word Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded.
While this is related to #80222, it turns out to be a different issue. But they both have the same security implications. The problem here is that _refold_parse_tree() (in email._header_value_parser) can remove RFC 2047 encoding from parts of a parsed encoded-word as it refolds a header. If that encoded-word was in a structured header (like an address field), removing the encoding can result in leaking 'specials' (like commas) into the header without proper quoting. Since _refold_parse_tree() doesn't know whether it's working in a structured or unstructured header, the easiest fix seems to be retaining any existing RFC 2047 encoding on text that contains a 'specials' character. I've opened PR #122754 with that change. [Incidentally, I reported this to the PSRT about a month ago, together with pointing out #80222 was still around. They agreed both problems seem to be security issues, and instructed me to handle the fixes publicly. I'm guessing since #80222 has already been public for five years, there's not much point in keeping the discussion private. Someone may want to add the #topic-email and #type-security labels to both issues.] |
Until a fix lands, I did find a (somewhat reasonable) workaround. |
Actually |
How does that knowledge get passed down through recursive part.fold() calls? (E.g., refolding an EncodedWord inside an UnstructuredTokenList vs one in a DisplayName.) |
Ah, your are right of course. It has been too long since I wrote this code ;) Now I remember that I'd been thinking there was more information that fold needed passed along, but you are right that that it is too complex for a security PR. |
… encoded-word [Better fix from @bitdancer.] Co-authored-by: R David Murray <rdmurray@bitdance.com>
…H-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…-word (pythonGH-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…-word (pythonGH-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…-word (pythonGH-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…encoded-word (pythonGH-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…encoded-word (pythonGH-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…ncoded-word (pythonGH-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…ncoded-word (pythonGH-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…ncoded-word (pythonGH-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…d-word (GH-122754) (#131403) gh-121284: Fix email address header folding with parsed encoded-word (GH-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…d-word (GH-122754) (#131404) gh-121284: Fix email address header folding with parsed encoded-word (GH-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…-word (pythonGH-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…d-word (GH-122754) (GH-131405) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
…d-word (GH-122754) (GH-131411) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com>
…-word (GH-122754) (GH-131412) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com>
…ncoded-word (pythonGH-122754) (pythonGH-131412) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] (cherry picked from commit 295b53d) Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: R David Murray <rdmurray@bitdance.com>
…-word (pythonGH-122754) Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded. [Better fix from @bitdancer.] --------- Co-authored-by: R David Murray <rdmurray@bitdance.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
Bug report
Bug description:
If an email message (modern or legacy) is assigned an address header that is pre-encoded with RFC 2047, calling as_bytes(policy=default) can generate an invalid address header. The resulting header may include unquoted RFC 5322 special characters in a way that can alter its meaning.
Here is a minimal example to demonstrate the problem, isolated from much larger code (including Django's django.core.mail.message). Although this example starts with a legacy Message, an example using only the modern email API is in a later comment:
(The unquoted comma in the resulting display-name is not valid.)
For a real-world case where this can occur, see Django ticket 35378 and anymail/django-anymail#369. (Thanks to @andresmrm for noticing the problem and isolating a test case.)
Oddly, as_string(policy=default) doesn't exhibit the problem:
Also, the problem does not occur when assigning the non-encoded equivalent to the header:
(Possibly related to #80222)
CPython versions tested on:
3.9, 3.12
Operating systems tested on:
macOS
[edits: removed ambiguous use of "default" in example comment; clarified this is not a real-world example, but a minimal test case]
Linked PRs
The text was updated successfully, but these errors were encountered: