8000 [RFC][Routing] Improve the current matching process by vicb · Pull Request #3678 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

[RFC][Routing] Improve the current matching process #3678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

vicb
Copy link
Contributor
@vicb vicb commented Mar 22, 2012

This still needs more thinking & discussions.

See what it adds by looking at the former 2.0 PR.

@vicb
Copy link
Contributor Author
vicb commented Mar 30, 2012

The proposed spec:

When you have some variables in your pattern i.e. /base/{foo}-{bar}, the variable will match anything until the expected separator (the separator being the char immediately following the variable):

  • {foo} will match anything until a -
  • {bar}will mach anything as no separator is specified (i.e. it will match ab-cd/ef)

What this PR solves:

Before this change the character immediately preceding the variable (/ for foo and - for bar) was also considered as a separator and the presence of this character would prevent the match (foo could not have been ab/cd before).

  • The described behavior is almost true (there was a small bug in the logic that should not need to be considered here),
  • There is also a micro optim in the regexp (removing an unrequired trailing ?),
  • Of course it was possible to change the default behavior described just before by specifying some requirements on the variables.

What you should pay attention to:

Before this change /foo/{bar} would not have been a valid pattern to match /foo/abc/def. It is now and the value of bar would be set to abc/def (The same goes with foo/abc/ with bar = abc/).

If you need to stick with the former behavior, add a requirement to the bar variable: [^/]+.

Conclusion

The new behavior is consistent and predictable but is different from the former one.

@stof, @Seldaek, @denderello, @fabpot please give me your feedback (you have reacted on the initial PR).

@Seldaek
Copy link
Member
Seldaek commented Mar 30, 2012

Sounds like a great change to me.

@stof
Copy link
Member
stof commented Mar 30, 2012

I'm not convinced byt the separator change. /foo/{bar} matching any url starting with /foo/ could confuse users. and this BC break is likely to break all existing apps with hard-to-debug issues (for people not aware of the internals of the routing).

@stof
Copy link
Member
stof commented Mar 30, 2012

Thus, the Routing component was marked as stable so we should avoid BC breaks in it

@Seldaek
Copy link
Member
Seldaek commented Mar 30, 2012

Why would it break apps? It just makes what the URLs accept more flexible, but I doubt a slash will break most things. If people want to restrict what gets in I hope they don't use blacklists.

@stof
Copy link
Member
stof commented Mar 30, 2012

@Seldaek this routing file would be broken after the change as going to /foo/2/edit would be matched by the show route with the id 2/edit whereas 2.0.x matches the edit route

foo_show:
    pattern; /foo/{id}

foo_edit:
    pattern: /foo/{id}/edit

@Seldaek
Copy link
Member
Seldaek commented Mar 30, 2012

Ah right, that is a good point, and indeed kind of a blocker.

@vicb
Copy link
Contributor Author
vicb commented Mar 30, 2012

Two things to keep in mind:

  • There is a bug in the current implementation (see the original PR for more info),
  • This PR brings consistency as:
foo_show:
    pattern; /{foo}-{id}

foo_edit:
    pattern: /{foo}-{id}/edit

was not working before (due to the same problem as what I and @stof have described). You have also probably noticed the curly brackets around foo: this is due to the first bug - the problem would not show up on the first variable with this specific separator config (but can occur with different separators).

@vicb
Copy link
Contributor Author
vicb commented Mar 30, 2012

And the example given by @stof can easily be fixed by adding a '\d+' requirement to the id.

One of the thing I would have liked to add is an automatic check to see if a regexp masks a consecutive one. However this is not a trivial task, I have to google a little bit for that (This would raise a warning in the router panel).

@stof
Copy link
Member
stof commented Mar 30, 2012

@vicb I know that adding the requirement fixes it. But I guess many existing apps don't have it because people write the simpler route definition working for them, and it is obviously the code above for 2.0.x, not the code adding a requirement.

@stof
Copy link
Member
stof commented Mar 30, 2012

Btw, the check in the profiler (or in a command) is a great idea

@Seldaek
Copy link
Member
Seldaek commented Mar 30, 2012

I guess a command (or ideally standalone tool, or backport the command to 2.0) that checks for all sorts of potential issues with 2.1 upgrade and then lets you know about solutions or offers to fix it for you would be good too.

@Tobion
Copy link
Contributor
Tobion commented Mar 30, 2012

The tiniest BC breaks (like replacing gif images by png in the profiler) have always been rejected. So I'm quite surprised this PR is even considered as it is a huge BC break that affects many users.

Don't understand me wrong. I'm still in favor of this PR as the proposed spec is easy to understand and allows a flexible default matching configuration. But i'm really surprised.

One thing that should be included in the spec is if the matching is eager or not. This is not clear.
Consider route /{foo}-{bar}- that allows - in both placeholders. Now matching /text1-text2-text3-text4-, what would the params foo and bar consist of? Is foo = text1-text2-text3 and bar = text4 or the other way round?
This should be stated in the routing spec.

@vicb
Copy link
Contributor Author
vicb commented Mar 30, 2012

@Tobion isn't the reply to your question covered by:

the variable will match anything until the expected separator (the separator being the char immediately following the variable)

There would be no match in your example: foo would match text1, bar would match text2, the extra text3-text4- would prevent matching.

8000
@Tobion
Copy link
Contributor
Tobion commented Mar 30, 2012

@vicb you overread

Consider route /{foo}-{bar}- that allows - in both placeholders.

So my question, that should be convered in the documentation/spec, is not directly related to your PR. Just a thing that should also be considered in the documentation (and some tests if it isn't).

@vicb
Copy link
Contributor Author
vicb commented Mar 30, 2012

yes I did misread and yes it is not related to this PR, let's not discuss it here.

@Tobion
Copy link
Contributor
Tobion commented Mar 31, 2012

One idea to reduce the BC break is to follow your proposal with one exception: The last parameter that has no following character will default to [^/]. It would at least solve the example given by @stof that for sure many people have been using.
Given that there are still bc breaks, it might just be an ugly workaround.

@vicb
Copy link
Contributor Author
vicb commented Apr 2, 2012

@Tobion I had the same idea in order to reduce the BC break (see the 2.0 PR as I have done a force push in this branch). Now, I think it was not a great idea:

  • It makes different separators (i.e. - vs /) works differently,
  • It makes the same separtator (/) works differently according to the variable position inside the pattern.

I think that the current code of this PR is easily understandable. The only drawback I can see is that it will be easy to have overlaps when you are used to the former behavior.

I would really like to solve this by having an algo able to find regex intersections.

@Tobion
Copy link
Contributor
Tobion commented Apr 3, 2012

Yeah a tool would be great that checks if there is a route that cannot be matched at all (because another route with higher priority already covers it). I think it is possible (because we do not use non-regular features like back references) but is very complex.

@schmittjoh
Copy link
Contributor

I would rather consider that as something that can be optimized when
dumping, but not as a bug.

JMSI18nRoutingBundle is doing this regularly when expanding routes that
have the same pattern for different locales. It then generates one route
for all locales (used for matching), and additionally one route for each
locale (used for generation).

en_de_homepage:
pattern: /
defaults: { _locales: [en, de] }

en_homepage:
pattern: /
defaults: { _locale: en }

de_homepage:
pattern: /
defaults: { _locale: de }

On Mon, Apr 2, 2012 at 7:26 PM, Tobias Schultze <
reply@reply.github.com

wrote:

Yeah a tool would be great that checks if there is a route that cannot be
matched at all (because another route with higher priority already covers
it).


Reply to this email directly or view it on GitHub:
#3678 (comment)

@vicb
Copy link
Contributor Author
vicb commented Apr 3, 2012

@schmittjoh not sure to get what bug you are referring to ?

@schmittjoh
Copy link
Contributor

What I meant is there shouldn't be an error if a route is not reachable
during the matching process, or if you want to introduce such an error then
it should be possible to be turned off via an option on the route.

On Tue, Apr 3, 2012 at 12:43 AM, Victor Berchet <
reply@reply.github.com

wrote:

@schmittjoh not sure to get what bug you are referring to ?


Reply to this email directly or view it on GitHub:
#3678 (comment)

@vicb
Copy link
Contributor Author
vicb commented Apr 3, 2012

@schmittjoh Yep, there should be an option to turn the warning/error off. The route used for generation only should not specify a controller, right ? (if this is true we can filter them out by default).

For a variable in the last position (what has been discussed by @stof and @Tobion above) we could use the preceding char as a separator by default, that would reduce BC break and still be consistent.

I am working on the algo to detect the intersections.

I have discussed thoses points w/ @fabpot.

@schmittjoh
Copy link
Contributor

They are exact clones of the original route (thus have a controller), but
the option would just be fine for me and also be more explicit.

On Tue, Apr 3, 2012 at 8:53 AM, Victor Berchet <
reply@reply.github.com

wrote:

@schmittjoh Yep, there should be an option to turn the warning/error off.
The route used for generation only should not specify a controller, right ?
(if this is true we can filter them out by default).

For a variable in the last position (what has been discussed by @stof and
@Tobion above) we could use the preceding char as a separator by default,
that would reduce BC break and still be consistent.

I am working


Reply to this email directly or view it on GitHub:
#3678 (comment)

fabpot added a commit that referenced this pull request Apr 11, 2012
Commits
-------

9307f5b [Routing] Implement bug fixes and enhancements

Discussion
----------

[Routing] Implement bug fixes and enhancements (from @Tobion)

This is mainly #3754 with some minor formatting changes.

Original PR message from @Tobion:

Here a list of what is fixed. Tests pass.

1. The `RouteCollection` states

    > it overrides existing routes with the same name defined in the instance or its children and parents.

    But this is not true for `->addCollection()` but only for `->add()`. addCollection does not remove routes with the same name in its parents (only in its children). I don't think this is on purpose.
    So I fixed it by making sure there can only be one route per name in all connected collections. This way we can also simplify `->get()` and `->remove()` and improve performance since we can stop iterating recursively when we found the first and only route with a given name.
    See `testUniqueRouteWithGivenName` that fails in the old code but works now.

2. There was an bug with `$collection->addPrefix('0');` that didn't apply the starting slash. Fixed and test case added.

3. There is an issue with `->get()` that I also think is not intended. Currently it allows to access a sub-RouteCollection by specifing $name as array integer index. But according to the PHPdoc you should only be allowed to receive a Route instance and not a RouteCollection.
    See `testGet` that has a failing test case. I fixed this behavior.

4. Then I recognized that `->addCollection` depended on the order of applying them. So

        $collection1->addCollection($collection2, '/b');
        $collection2->addCollection($collection3, '/c');
        $rootCollection->addCollection($collection1, '/a');

    had a different pattern result from

        $collection2->addCollection($collection3, '/c');
        $collection1->addCollection($collection2, '/b');
        $rootCollection->addCollection($collection1, '/a');

    Fixed and test case added. See `testPatternDoesNotChangeWhenDefinitionOrderChanges`.

5. PHP could have ended in an infinite loop when one tried to add an existing RouteCollection to the tree. Fixed by throwing an exception when this situation is detected. See tests `testCannotSelfJoinCollection` and `testCannotAddExistingCollectionToTree`.

6. I made `setParent()` private because its not useful outside the class itself. And `remove()` also removes the route from its parents. Added public `getRoot()` method.

7. The `Route` class throwed a PHP warning when trying to set an empty requirement.

8. Fixed issue #3777. See discussion there for more info. I fixed it by removing the over-optimization that was introduced in 91f4097 but didn't work properly. One cannot reorder the route definitions, as is was done, because then the wrong route might me matched before the correct one. If one really wanted to do that, it would require to calculate the intersection of two regular expressions to determine if they can be grouped together (a tool that would also be useful to check whether a route is unreachable, see discussion in #3678) We can only safely optimize routes with a static prefix within a RouteCollection, not across multiple RouteCollections with different parents.  This is especially true when using variables and regular expressions requirements.
I could however apply an optimization that was missing yet: Collections with a single route were missing the static prefix optimization with `0 === strpos()`.

9. Fixed an issue where the `PhpMatcherDumper` would not apply the optimization if the root collection to be dumped has a prefix itself. For this I had to rewrite `compileRoutes`. It is also much easier to understand now. Addionally I added many comments and PHPdoc because complex recursive methods like this are still hard to grasp.
I added a test case for this (`url_matcher3.php`).

10. Fix that `Route::compile` needs to recompile a route once it is modified. Otherwise we have a wrong result. Test case added.
@Tobion
Copy link
Contributor
Tobion commented Apr 13, 2012

@vicb you're working on the regex intersections? I would be interested to see some implementation ideas.

@Tobion
Copy link
Contributor
Tobion commented Apr 13, 2012

I think there is another bug in the routing compiler: Two variables side-by-side do not work: /{var1}{var2}
var2 simply gets ignored as far as I interpret the compiler regex.
Also the returned token array includes redundant information that could be simplified (which would make the generated class by RouteGeneratorDumper more lightweight).

@vicb
Copy link
Contributor Author
vicb commented Apr 13, 2012

@Tobion yes I am working on it, I should have a proto soon. I think we should throw an exc if there is no separator between two variables.

@fabpot
Copy link
Member
fabpot commented Apr 13, 2012

/{var1}{var2} worked before. You just need to be sure to configure proper requirements for it to work reliably. If it's broken now, this is a regression.

@Seldaek
Copy link
Member
Seldaek commented Apr 20, 2012

I was thinking about this and about the initial plan to support URI templates, and it occured to me maybe these changes (or the current behavior) are going in the wrong direction. The spec says as far as I understand it that {foo} is anything except reserved chars (:, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, =). Then {+foo} allows for matching of those reserved chars too. Shouldn't we simply match that? It sounds sensible to me to have {foo} parse to [^:/?#[]@!$&'()*+;=,]+? which should let {foo}-{bar} work fine for example given the non-greediness of the regex.

@asm89
Copy link
Contributor
asm89 commented Apr 20, 2012

This is probably related: #3227.

@vicb
Copy link
Contributor Author
vicb commented Apr 23, 2012

@Tobion could you create a separate issue for /{var1}{var2} ?
@Seldaek could you create a separate issue with your last comment - and link 3227 ?
Thanks.

@Seldaek
Copy link
Member
Seldaek commented Apr 23, 2012

@vicb: Well.. if 3227 already exists. I don't see why I should create an additional one. The question IMO is whether it's relevant to this PR or not, and if yes what we do about it.

@vicb
Copy link
Contributor Author
vicb commented Apr 23, 2012

@Seldaek right, it should be covered by 3227 and it is out of the scope of this PR (which is stated in the first comment).

@Tobion
Copy link
Contributor
Tobion commented Apr 26, 2012

I wrote my analysis of URI templates in 3227 and as you can read there, I think we should not follow that draft.
I think the direction that @vicb proposed here for the matching chars is better and more flexible.

@vicb
Copy link
Contributor Author
vicb commented Apr 27, 2012

I will update this PR very soon. I'll also push a first proto for the route collision detection in a few days.

@vicb
Copy link
Contributor Author
vicb commented Apr 27, 2012

see the linked PR

@vicb vicb closed this Apr 27, 2012
fabpot added a commit that referenced this pull request May 7, 2012
Commits
-------

a196ca0 [Routing] Compiler: remove lazy quantifiers with no effect
8232aa1 [Routing] Compiler: fix in the computing of the segment separators

Discussion
----------

[Routing] Fix the matching process

This PR is based on the PR #3678, #4139.

[![Build Status](https://secure.travis-ci.org/vicb/symfony.png?branch=routingmatcher)](http://travis-ci.org/vicb/symfony)

**The spec**

A pattern is composed of both text and variable segments: `/{variable}-test/{other_variable}`.

A variable segment will match anything until a separator is encountered. The separator is the character following the variable segment when available or preceding the variable otherwise (i.e. at the end of the pattern).

That is:

* the separator is `-` for the `variable`,
* the separator is `/` for the `other_variable`.

*Note: This default matching behavior can be overridden if a requirement is specified for a variable)*

**Fixes**

* The current behavior is to consider booth the preceding and following characters as separators (considering availability),
* The "preceding" separator of the first variable is always set to `/` whatever the preceding character is (due to `$pos = 0` for the first iteration).

**Todo**

Update the doc once this is merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0