-
Notifications
You must be signed in to change notification settings - Fork 121
Comparing changes
Open a pull request
base repository: JoshData/python-email-validator
base: v2.1.2
head repository: JoshData/python-email-validator
compare: v2.2.0
- 19 commits
- 16 files changed
- 2 contributors
Commits on Apr 19, 2024
-
Parse
display name <addr>
syntaxPer request in #116, parse display name syntax also, but don't allow it unless a new allow_display_name option is set. Parsing according to the MIME specification probably isn't what's generally wanted since the use case is probably parsing inputs in email composition-like user interfaces. So it's in the spirit of a MIME message but not the letter. If display name syntax is permitted, return the unquoted/unescaped display name in the returned object.
Configuration menu - View commit details
-
Copy full SHA for 4691a62 - Browse repository at this point
Copy the full SHA 4691a62View commit details
Commits on May 9, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 8d91a45 - Browse repository at this point
Copy the full SHA 8d91a45View commit details -
Configuration menu - View commit details
-
Copy full SHA for 68019d7 - Browse repository at this point
Copy the full SHA 68019d7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5734e5e - Browse repository at this point
Copy the full SHA 5734e5eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9da5071 - Browse repository at this point
Copy the full SHA 9da5071View commit details -
Configuration menu - View commit details
-
Copy full SHA for be42a70 - Browse repository at this point
Copy the full SHA be42a70View commit details
Commits on May 10, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 380e44e - Browse repository at this point
Copy the full SHA 380e44eView commit details -
Configuration menu - View commit details
-
Copy full SHA for a9a8a62 - Browse repository at this point
Copy the full SHA a9a8a62View commit details -
Move README section on unsafe Unicode to a later section since it app…
…lies to both the local part and the domain part
Configuration menu - View commit details
-
Copy full SHA for 5cf49cf - Browse repository at this point
Copy the full SHA 5cf49cfView commit details
Commits on Jun 17, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0b22c13 - Browse repository at this point
Copy the full SHA 0b22c13View commit details
Commits on Jun 19, 2024
-
Several fixes for parsing display names
* Fix error message text for input addresses without @-signs. The incorrect message was "There must be something after the @-sign.". This was broken by the changes to parse display names. Prior to that, the message was "The email address is not valid. It must have exactly one @-sign.". * Move the allow_display_name check to the end of the syntax checks. The optional checks should be the last to occur so that fatal syntax errors are raised first. * Check that display name email addresses have a closing angle bracket and nothing after. * Don't treat < + U+0338 (Combining Long Solidus Overlay) as the start of a bracketed email address. This would already be rejected because the combining character would be reported as an unsafe character at the start of the address, but it may be confusing since the caller won't see the address that way. When splitting the address into parts, skip the other special characters (@, quote, backslash) that have meaningful combining characters after them (i.e. they change under NFC normalization), although I don't think there are any such cases.
Configuration menu - View commit details
-
Copy full SHA for 3426885 - Browse repository at this point
Copy the full SHA 3426885View commit details -
Add a test that shows that the local part is returned with Unicode NF…
…C normalization s + U+0323 + U+0307 normalizes under NFC to U+1E69 (Latin Small Letter S With Dot Below And Dot Above) (https://www.unicode.org/reports/tr15/). We normalize when creating the returned email address info.
Configuration menu - View commit details
-
Copy full SHA for 1fb55d4 - Browse repository at this point
Copy the full SHA 1fb55d4View commit details -
Check that the local part is valid after Unicode NFC normalization to…
… prevent injection of invalid characters We encourage callers to use the normalized email address returned by validate_email (in the `normalized` attribute). This form has had Unicode NFC normalization applied to the local part. However, all of the syntactic validation on the local part was performed before the normalization. Consequently, the normalization could change the local part to become invalid by the replacement of valid characters with invalid characters or by changing the length of the local part to exceed the maximum length. Callers who use the normalized form may then unexpectedly be using an invalid address. To ensure that callers do not get an invalid address, local part syntax checks are now repeated after Unicode normalization has been applied. A user submitted one case where NFC normalization changes a local part from valid to invalid: U+037E (Greek Question Mark)'s NFC normalization is the ASCII semicolon. The former is otherwise a permitted character, but ASCII semicolons are not permitted in local parts. The user noted that the semicolon could cause the address to be reinterpreted as a list and change the recipient of a message. No other Unicode character on its own is valid (in a local part) before normalization and invalid after --- I checked every character. I am not sure if there are character sequences that are valid before but not after normalization, but I can't yet find any: I checked that no Unicode character's NFD decomposition, when valid in a local part, normalizes under NFC to a sequence that is not valid. I also could not find any examples where NFC normalization changes something to or from a period, which could also change the validity of a local part. (The string '<' or '>' plus U+0338 (Combining Long Solidus Overlay) normalizes under NFC to ≮ U+226E (Not Less-Than) and ≯ U+226F (Not Greater-Than). The two-character sequences are not valid in a local part because < and > are not valid, although they are valid after NFC normalization. These addresses were rejected before and continue to be rejected. Although < could be the start of a bracketed email address if display names are permitted, the two-character sequence is now (in an earlier commit) is ignored for the purposes of parsing display names.) There are a small number of characters whose NFC normalization increases the string length, including U+FB2C (Hebrew Letter Shin With Dagesh And Shin Dot). This could also cause the local part to become invalid after normalization where it is valid before. This is now also caught by performing the syntax check again after normalization. (The whole-address length check is similarly fixed in a later commit.) Some checks that were previously only applied after normalization, for checking safe Unicode characters, are now also applied to the un-normalized form, which also may protect callers that ignore the normalized form and use the original email address string. However, I could not find an example where normalization turns an unsafe string into a safe string. See #142.
Configuration menu - View commit details
-
Copy full SHA for 9ef1f82 - Browse repository at this point
Copy the full SHA 9ef1f82View commit details -
Check that email address length is valid on the original email addres…
…s string since callers may continue to use that string Previously, we checked that the ASCII email address (with IDNA ASCII) and the normalized email address satisfied the whole-address length limit. However, callers may use the original input string. Since Unicode NFC normalization typically reduces string length (if it changes the string), this can cause the post-normalization check to pass when the pre-normalization length is not valid. So we should additionally check that the original input also meets the maximum length requirement. Callers might also construct an address that has an internationalized local part and ASCII domain, maybe? So that's now checked too. The whole-address length test is revised to test each possible address format, first the original email address string (with any display name removed) so that exception messages correspond to the input string where possible. Then the normalized address is checked, since we encourage callers to use it. Then the ASCII address is checked since callers who send email without a SMTPUTF8-enabled stack will use this, or the normalized internationalized local part (there won't be an ASCII local part in this case) combined with the ASCII domain. Some length tests are added with a Unicode character whose NFC normalization is actually a decomposition: U+FB2C (Hebrew Letter Shin With Dagesh And Shin Dot) is unusual in that its NFC normalization actually expands it to multiple code points (https://www.unicode.org/faq/normalization.html). In these cases, the address will be valid before normalization but not valid after. See #142.
Configuration menu - View commit details
-
Copy full SHA for f8709e8 - Browse repository at this point
Copy the full SHA f8709e8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 452e0ca - Browse repository at this point
Copy the full SHA 452e0caView commit details -
Improve the error message for IDNA domains being too long by handling…
… the length check ourselves rather than in idna.encode
Configuration menu - View commit details
-
Copy full SHA for c23c0d6 - Browse repository at this point
Copy the full SHA c23c0d6View commit details -
Check domain syntax after normalization to internationalized characte…
…rs as a precaution Out of caution that normalization of the domain part to internationalized characters could turn a valid domain string into an invalid one, it is re-parsed at the end to ensure that it still is validated by the idna package. I could not find any examples where that was not already caught, however, since it seems like the existing IDNA calls already prevent it. Some tests are added for invalid characters in the domain part which become invalid after Unicode NFC normalization. These were already handled. (The new code never raises an exception.) See #142.
Configuration menu - View commit details
-
Copy full SHA for 7f1f281 - Browse repository at this point
Copy the full SHA 7f1f281View commit details -
Improve the error message for invalid characters in domain names afte…
…r Unicode NFC normalization These cases were previously handled by the call to idna.encode or idna.alabel, but the error message wasn't consistent with similar checks we do for the local part. See #142.
Configuration menu - View commit details
-
Copy full SHA for 8051347 - Browse repository at this point
Copy the full SHA 8051347View commit details
Commits on Jun 20, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 6589b1e - Browse repository at this point
Copy the full SHA 6589b1eView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v2.1.2...v2.2.0