8000 [Routing] Support UTF-8 in paths and parameters by c960657 · Pull Request #19562 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

[Routing] Support UTF-8 in paths and parameters #19562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

c960657
Copy link
Contributor
@c960657 c960657 commented Aug 7, 2016
Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? yes
Deprecations? no
Tests pass? yes
Fixed tickets #5236
License MIT
Doc PR

This PR is a new take on a previous proposal in #19054.

@c960657
Copy link
Contributor Author
c960657 commented Aug 7, 2016

Some of the tests for FrameworkBundle are failing on PHP 7, because the tests assumes the changes made to symfony/routing by this PR, but Travis installs latest released version of symfony/routing. I can bump the requirement in composer.json, or I can make the code and tests compatible with both versions. How do we usually deal with this?

@@ -82,11 +89,12 @@ class UrlGenerator implements UrlGeneratorInterface, ConfigurableRequirementsInt
* @param RequestContext $context The context
* @param LoggerInterface|null $logger A logger instance
*/
public function __construct(RouteCollection $routes, RequestContext $context, LoggerInterface $logger = null)
public function __construct(RouteCollection $routes, RequestContext $context, LoggerInterface $logger = null, $charset = 'UTF-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to upgrade the docblock with new arg

@c960657
Copy link
Contributor Author
c960657 commented Aug 8, 2016

@MacDada Thanks for the review. I have addresses all your comments.

I still need some advice on the FrameworkBundle issue mentioned above.

@nicolas-grekas
Copy link
Member
nicolas-grekas commented Aug 9, 2016

I think there is a mismatch is the definition of the $charset params:
URLs have no encoding. HTML5 says that they should be encoded in UTF-8, and that when an non-valid UTF-8 URL is found, it should be converted from CP1252 by default.

-> we need a parameter to configure this fallback input charset. This can't be kernel.charset by default, but CP1252.

Then there is the output encoding of the URLs we generate. These should match the output encoding the app is using.

-> this matches the definition of the kernel.charset parameter so it can be injected here.

  • Let's say I build an UTF-8 app in Russia => kernel.charset is UTF-8 and fallback input enc is some cyrillic enc.
  • Let's now say I build a cyrillic app in Russia => kernel.charset is this cyrillic enc, and the fallback enc should also be kernel.charset (and we should still handle UTF-8 URLs as input of course).

@c960657
Copy link
Contributor Author
c960657 commented Aug 9, 2016

URLs have no encoding.

True, a final URL has no encoding. The charset is used when we want to represent a non-ASCII character. For example, the Wikipedia article about the letter É has the URL https://en.wikipedia.org/wiki/%C3%89. %C3%89 is the percent-encoding of the two bytes representing É in UTF-8.

If ISO-8859-1 was used note (that was very common years ago), the URL would have been https://en.wikipedia.org/wiki/%C9 (that URL currently redirects to the UTF-8-encoded one). Nowadays UTF-8 is the standard, and when accessing a UTF-8-encoded URL, the browser displays https://en.wikipedia.org/wiki/É in the URL field.

HTML5 says that they should be encoded in UTF-8, and that when an non-valid UTF-8 URL is found, it should be converted from CP1252 by default.

That rule only comes into play if the HTML contains URLs containing characters that are not properly percent-encoded. Such URLs are not valid and should never occur in HTML generated by Symfony functions.

However, since browsers prefer UTF-8 in the URL, I guess you would probably prefer to use UTF-8 as $router->charset even when kernel.charset is cyrillic. So I guess that default the router charset to kernel.charset is a bad idea.

@nicolas-grekas
Copy link
Member

That rule only comes into play if the HTML contains URLs containing characters that are not properly percent-encoded.

Not at all in my understanding, and in fact you cited what I meant: type https://en.wikipedia.org/wiki/%C9 in the browser, and get redirected to UTF-8, in the same way than HTML5 says (not the redirection but the charset conversion).

I guess that default the router charset to kernel.charset is a bad idea.

Yep, we need a fallbak input charset

@c960657
Copy link
Contributor Author
c960657 commented Aug 9, 2016

Which part of the HTML5 spec are you referring to?

I guess we could implement a fallback encoding that automatically redirects like Wikipedia does, but I think that is a new feature that is outside the scope of this PR.

@nicolas-grekas
Copy link
Member
nicolas-grekas commented Aug 10, 2016

Which part of the HTML5 spec are you referring to?

an older draft one looks like, I can't find the reference :)

@nicolas-grekas
Copy link
Member
nicolas-grekas commented Aug 12, 2016

I tried an alternative approach in #19604
If #19604 looks good, it means we will have to work on another PR for charset reconciliation between incoming URLs and the charset used in the matcher. To me, this other PR doesn't necessarily belong to the router.We could just have a listener that comes before the router, checks the charset of the URL and either redirects, 404 or converts (the 3 strategies are valid to me, choice should belong to the dev). Unicode normalization could be also done in this listener (if we want it: it's costly, and we lived without for years, could be opt-in only).

fabpot added a commit that referenced this pull request Aug 16, 2016
This PR was merged into the 2.7 branch.

Discussion
----------

[Routing] Add missing options in docblock

| Q             | A
| ------------- | ---
| Branch?       | 2.7
| Bug fix?      | no
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

Courtesy of @c960657 in #19562

Commits
-------

f45da32 [Routing] Add missing options in docblock
@nicolas-grekas
Copy link
Member
nicolas-grekas commented Aug 23, 2016

@c960657 OK to close this one in favor of #19604?

@c960657
Copy link
Contributor Author
c960657 commented Aug 23, 2016

Yes, let’s continue in #19604.

@c960657 c960657 closed this Aug 23, 2016
fabpot added a commit that referenced this pull request Aug 25, 2016
…s (nicolas-grekas)

This PR was merged into the 3.2-dev branch.

Discussion
----------

[Routing] Add seamless support for unicode requirements

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | yes
| Tests pass?   | yes
| Fixed tickets | #3629, #5236, #19562
| License       | MIT
| Doc PR        | symfony/symfony-docs#6890

This PR adds unicode support to route matching and generation by automatically adding the `u` modifier to regexps that use either unicode characters or unicode enabled character classes (e.g. `\p...` `\x{...}` `\X`).

As a side note, if one wants to match a single unicode character (vs a single byte), one should use `\PM` or `\X` instead of `.` *or* set the `unicode` parameter to true.

Commits
-------

a829d34 [Routing] Add seamless support for unicode requirements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0