8000 [Routing] Replace previously added preg_match with preg_replace_callback by arjenm · Pull Request #18379 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

[Routing] Replace previously added preg_match with preg_replace_callback #18379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
[Routing] Replace previously added preg_match with preg_replace_callback
  • Loading branch information
arjenm committed Mar 31, 2016
commit 052ec95af96981e54695d2f23c0cee4d9cf6858a
65 changes: 34 additions & 31 deletions src/Symfony/Component/Routing/Generator/UrlGenerator.php
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,30 @@
*/
class UrlGenerator implements UrlGeneratorInterface, ConfigurableRequirementsInterface
{
/**
* This regexp matches all characters that should be percent encoded in paths for url's generated by this class.
*
* PHP's rawurlencode() encodes all chars except "a-zA-Z0-9-._~" according to RFC 3986. But we want to allow some chars
* to be used in their literal form (reasons below). Other chars inside the path must of course be encoded, e.g.
* "?" and "#" (would be interpreted wrongly as query and fragment identifier),
* "'" and """ (are used as delimiters in HTML).
*
* These characters are besides rawurlencode()'s list:
* - The slash '/' can be used to designate a hierarchical structure and we allow using it with this meaning,
* some webservers don't allow the slash in encoded form in the path for security reasons anyway
* see http://stackoverflow.com/questions/4069002/http-400-if-2f-part-of-get-url-in-jboss
* - The '@' and ':' are general delimiters in the URI specification but have only special meaning in the authority component,
* so they can safely be used in the path in unencoded form.
* - The ';', ',', '=', '+', '!', '*', '|' are only sub-delimiters that have no predefined meaning and can therefore be used literally
* so URI producing applications can use these chars to delimit subcomponents in a path segment without being encoded for better readability.
*
* The regexp is an inverse of the characters above. It allows to only call rawurlencode() for specific charaters,
* thus optimizing scenario's where there are no or very few such characters in urls.
*
* @internal
*/
const PATH_UNSAFE_CHARACTER_REGEXP = '#[^-.~a-zA-Z0-9_/@:;,=+!*|]+#';

/**
* @var RouteCollection
*/
Expand All @@ -48,37 +72,11 @@ class UrlGenerator implements UrlGeneratorInterface, ConfigurableRequirementsInt
protected $logger;

/**
* This array defines the characters (besides alphanumeric ones) that will not be percent-encoded in the path segment of the generated URL.
* This array used to define the characters (besides alphanumeric ones) that will not be percent-encoded in the path segment of the generated URL.
*
* PHP's rawurlencode() encodes all chars except "a-zA-Z0-9-._~" according to RFC 3986. But we want to allow some chars
* to be used in their literal form (reasons below). Other chars inside the path must of course be encoded, e.g.
* "?" and "#" (would be interpreted wrongly as query and fragment identifier),
* "'" and """ (are used as delimiters in HTML).
*/
protected $decodedChars = array(
// the slash can be used to designate a hierarchical structure and we want allow using it with this meaning
// some webservers don't allow the slash in encoded form in the path for security reasons anyway
// see http://stackoverflow.com/questions/4069002/http-400-if-2f-part-of-get-url-in-jboss
'%2F' => '/',
// the following chars are general delimiters in the URI specification but have only special meaning in the authority component
// so they can safely be used in the path in unencoded form
'%40' => '@',
'%3A' => ':',
// these chars are only sub-delimiters that have no predefined meaning and can therefore be used literally
// so URI producing applications can use these chars to delimit subcomponents in a path segment without being encoded for better readability
'%3B' => ';',
'%2C' => ',',
'%3D' => '=',
'%2B' => '+',
'%21' => '!',
'%2A' => '*',
'%7C' => '|',
);

/**
* @var string This regexp matches all characters that are not or should not be encoded by rawurlencode (see list in array above).
* @deprecated This array should not be used anymore. Changing it can result in url's that are incompatible with Symfony's url-matching, browsers and/or server software.
*/
private $urlEncodingSkipRegexp = '#[^-.~a-zA-Z0-9_/@:;,=+!*|]#';
protected $decodedChars = array();

/**
* Constructor.
Expand Down Expand Up @@ -187,9 +185,14 @@ protected function doGenerate($variables, $defaults, $requirements, $tokens, $pa

if ('' === $url) {
$url = '/';
} elseif (preg_match($this->urlEncodingSkipRegexp, $url)) {
// the context base URL is already encoded (see Symfony\Component\HttpFoundation\Request)
} elseif (!empty($this->decodedChars)) {
@trigger_error('The class variable '.__CLASS__.'::$decodedChars is deprecated since version 2.7 and will be removed in 3.1.', E_USER_DEPRECATED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks suspicious: 2.7 can't get more deprecations. Only master can.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be removed in 3.1 too, only in 4.0 otherwise it will not be BC with 3.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be more clear in what I'm supposed to do. This is basically my first PR.
I am not an experienced Symfony developer, and have no clear idea what I'm supposed to do or where to find the appropriate documentation. Only telling me what I'm doing wrong, won't improve this PR :/

I'd gladly donate this change/PR to someone who actually knows what to do. I just want the route-generation to be faster, since it takes a fairly large part of the symfony 'overhead'* of our application.

* Its not really overhead, but due to its flexibility it is much more complex than simply concatenating the right parts of a url.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjenm deprecations cannot be made in existing minor releases. A deprecation implies that there's a new way of doing something, also known as a feature. In order to add a feature, it has to be in the newest minor release (3.1, 3.2 etc). This PR is requested against 2.7, which doesn't accept new features anymore (same goes for any other lower than 3.1 at the moment as 3.1 is the next).

So what I think you should do, is reopen this PR against the master if you want it as a feature. This means you will have to update your message to something like "deprecated in 3.1, removed in 4.0". Currently 2.7 and 2.8 only accept bug fixes (will be explained a bit better in the docs: symfony/symfony-docs#6412).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjenm you should just remove this trigger_error and add it later in another PR on master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolas-grekas while I appreciate your suggestion and it would allow early inclusion of this change. Its also not in the spirit of the intended prevention of BC-breakage. If someone would rely on the contents of the decodedChars-array, that would still break (i.e. it would be an empty array, rather than one containing several key/value pairs) and without any warning. Its already a bug waiting to happen in its current state; changing the array without changing the url matcher could result in mismatched routes...

Adding more code to test if the array was non-default, could destroy most of the performance gain.

So I'll rebase it against master, where such a change is much less unexpected. So I'll follow the suggestion of @iltar

$url = strtr(rawurlencode($url), $this->decodedChars);
} else {
// the context base URL is already encoded (see Symfony\Component\HttpFoundation\Request)
$url = preg_replace_callback(self::PATH_UNSAFE_CHARACTER_REGEXP, function(array $matches){
return rawurlencode($matches[0]);
}, $url);
}

// the path segments "." and ".." are interpreted as relative reference when resolving a URI; see http://tools.ietf.org/html/rfc3986#section-3.3
Expand Down
0