8000 Comparing v2.1.2...v2.2.0 · JoshData/python-email-validator · GitHub
[go: up one dir, main page]

Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: JoshData/python-email-validator
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v2.1.2
Choose a base ref
...
head repository: JoshData/python-email-validator
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v2.2.0
Choose a head ref
  • 19 commits
  • 16 files changed
  • 2 contributors

Commits on Apr 19, 2024

  1. Parse display name <addr> syntax

    Per request in #116, parse display name syntax also, but don't allow it unless a new allow_display_name option is set. Parsing according to the MIME specification probably isn't what's generally wanted since the use case is probably parsing inputs in email composition-like user interfaces. So it's in the spirit of a MIME message but not the letter.
    
    If display name syntax is permitted, return the unquoted/unescaped display name in the returned object.
    JoshData committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    4691a62 View commit details
    Browse the repository at this point in the history

Commits on May 9, 2024

  1. Ratchet up mypy settings

    tamird committed May 9, 2024
    Configuration menu
    Copy the full SHA
    8d91a45 View commit details
    Browse the repository at this point in the history
  2. Fix typo

    tamird committed May 9, 2024
    Configuration menu
    Copy the full SHA
    68019d7 View commit details
    Browse the repository at this point in the history
  3. mypy: disallow_untyped_defs

    tamird committed May 9, 2024
    Configuration menu
    Copy the full SHA
    5734e5e View commit details
    Browse the repository at this point in the history
  4. mypy: disallow_untyped_calls

    tamird committed May 9, 2024
    Configuration menu
    Copy the full SHA
    9da5071 View commit details
    Browse the repository at this point in the history
  5. Run test_and_build on PR

    tamird committed May 9, 2024
    Configuration menu
    Copy the full SHA
    be42a70 View commit details
    Browse the repository at this point in the history

Commits on May 10, 2024

  1. Configuration menu
    Copy the full SHA
    380e44e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a9a8a62 View commit details
    Browse the repository at this point in the history
  3. Move README section on unsafe Unicode to a later section since it app…

    …lies to both the local part and the domain part
    JoshData committed May 10, 2024
    Configuration menu
    Copy the full SHA
    5cf49cf View commit details
    Browse the repository at this point in the history

Commits on Jun 17, 2024

  1. Configuration menu
    Copy the full SHA
    0b22c13 View commit details
    Browse the repository at this point in the history

Commits on Jun 19, 2024

  1. Several fixes for parsing display names

    * Fix error message text for input addresses without @-signs. The incorrect message was "There must be something after the @-sign.". This was broken by the changes to parse display names. Prior to that, the message was "The email address is not valid. It must have exactly one @-sign.".
    * Move the allow_display_name check to the end of the syntax checks. The optional checks should be the last to occur so that fatal syntax errors are raised first.
    * Check that display name email addresses have a closing angle bracket and nothing after.
    * Don't treat < + U+0338 (Combining Long Solidus Overlay) as the start of a bracketed email address. This would already be rejected because the combining character would be reported as an unsafe character at the start of the address, but it may be confusing since the caller won't see the address that way. When splitting the address into parts, skip the other special characters (@, quote, backslash) that have meaningful combining characters after them (i.e. they change under NFC normalization), although I don't think there are any such cases.
    JoshData committed Jun 19, 2024
    Configuration menu
    Copy the full SHA
    3426885 View commit details
    Browse the repository at this point in the history
  2. Add a test that shows that the local part is returned with Unicode NF…

    …C normalization
    
    s + U+0323 + U+0307 normalizes under NFC to U+1E69 (Latin Small Letter S With Dot Below And Dot Above) (https://www.unicode.org/reports/tr15/). We normalize when creating the returned email address info.
    JoshData committed Jun 19, 2024
    Configuration menu
    Copy the full SHA
    1fb55d4 View commit details
    Browse the repository at this point in the history
  3. Check that the local part is valid after Unicode NFC normalization to…

    … prevent injection of invalid characters
    
    We encourage callers to use the normalized email address returned by validate_email (in the `normalized` attribute). This form has had Unicode NFC normalization applied to the local part. However, all of the syntactic validation on the local part was performed before the normalization. Consequently, the normalization could change the local part to become invalid by the replacement of valid characters with invalid characters or by changing the length of the local part to exceed the maximum length. Callers who use the normalized form may then unexpectedly be using an invalid address. To ensure that callers do not get an invalid address, local part syntax checks are now repeated after Unicode normalization has been applied.
    
    A user submitted one case where NFC normalization changes a local part from valid to invalid: U+037E (Greek Question Mark)'s NFC normalization is the ASCII semicolon. The former is otherwise a permitted character, but ASCII semicolons are not permitted in local parts. The user noted that the semicolon could cause the address to be reinterpreted as a list and change the recipient of a message.
    
    No other Unicode character on its own is valid (in a local part) before normalization and invalid after --- I checked every character. I am not sure if there are character sequences that are valid before but not after normalization, but I can't yet find any: I checked that no Unicode character's NFD decomposition, when valid in a local part, normalizes under NFC to a sequence that is not valid. I also could not find any examples where NFC normalization changes something to or from a period, which could also change the validity of a local part.
    
    (The string '<' or '>' plus U+0338 (Combining Long Solidus Overlay) normalizes under NFC to ≮ U+226E (Not Less-Than) and ≯ U+226F (Not Greater-Than). The two-character sequences are not valid in a local part because < and > are not valid, although they are valid after NFC normalization. These addresses were rejected before and continue to be rejected. Although < could be the start of a bracketed email address if display names are permitted, the two-character sequence is now (in an earlier commit) is ignored for the purposes of parsing display names.)
    
    There are a small number of characters whose NFC normalization increases the string length, including U+FB2C (Hebrew Letter Shin With Dagesh And Shin Dot). This could also cause the local part to become invalid after normalization where it is valid before. This is now also caught by performing the syntax check again after normalization. (The whole-address length check is similarly fixed in a later commit.)
    
    Some checks that were previously only applied after normalization, for checking safe Unicode characters, are now also applied to the un-normalized form, which also may protect callers that ignore the normalized form and use the original email address string. However, I could not find an example where normalization turns an unsafe string into a safe string.
    
    See #142.
    JoshData committed Jun 19, 2024
    Configuration menu
    Copy the full SHA
    9ef1f82 View commit details
    Browse the repository at this point in the history
  4. Check that email address length is valid on the original email addres…

    …s string since callers may continue to use that string
    
    Previously, we checked that the ASCII email address (with IDNA ASCII) and the normalized email address satisfied the whole-address length limit. However, callers may use the original input string. Since Unicode NFC normalization typically reduces string length (if it changes the string), this can cause the post-normalization check to pass when the pre-normalization length is not valid. So we should additionally check that the original input also meets the maximum length requirement. Callers might also construct an address that has an internationalized local part and ASCII domain, maybe? So that's now checked too.
    
    The whole-address length test is revised to test each possible address format, first the original email address string (with any display name removed) so that exception messages correspond to the input string where possible. Then the normalized address is checked, since we encourage callers to use it. Then the ASCII address is checked since callers who send email without a SMTPUTF8-enabled stack will use this, or the normalized internationalized local part (there won't be an ASCII local part in this case) combined with the ASCII domain.
    
    Some length tests are added with a Unicode character whose NFC normalization is actually a decomposition: U+FB2C (Hebrew Letter Shin With Dagesh And Shin Dot) is unusual in that its NFC normalization actually expands it to multiple code points (https://www.unicode.org/faq/normalization.html). In these cases, the address will be valid before normalization but not valid after.
    
    See #142.
    JoshData committed Jun 19, 2024
    Configuration menu
    Copy the full SHA
    f8709e8 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    452e0ca View commit details
    Browse the repository at this point in the history
  6. Improve the error message for IDNA domains being too long by handling…

    … the length check ourselves rather than in idna.encode
    JoshData committed Jun 19, 2024
    Configuration menu
    Copy the full SHA
    c23c0d6 View commit details
    Browse the repository at this point in the history
  7. Check domain syntax after normalization to internationalized characte…

    …rs as a precaution
    
    Out of caution that normalization of the domain part to internationalized characters could turn a valid domain string into an invalid one, it is re-parsed at the end to ensure that it still is validated by the idna package. I could not find any examples where that was not already caught, however, since it seems like the existing IDNA calls already prevent it.
    
    Some tests are added for invalid characters in the domain part which become invalid after Unicode NFC normalization. These were already handled. (The new code never raises an exception.)
    
    See #142.
    JoshData committed Jun 19, 2024
    Configuration menu
    Copy the full SHA
    7f1f281 View commit details
    Browse the repository at this point in the history
  8. Improve the error message for invalid characters in domain names afte…

    …r Unicode NFC normalization
    
    These cases were previously handled by the call to idna.encode or idna.alabel, but the error message wasn't consistent with similar checks we do for the local part.
    
    See #142.
    JoshData committed Jun 19, 2024
    Configuration menu
    Copy the full SHA
    8051347 View commit details
    Browse the repository at this point in the history

Commits on Jun 20, 2024

  1. Version 2.2.0

    JoshData committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    6589b1e View commit details
    Browse the repository at this point in the history
Loading
0