-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Yaml] Base the parser on a StringReader #19750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6c187fc
to
8947a4e
Compare
I think I won't adapt more things in this PR as it become very quickly hard to review and I would like to do that step by step in case it is accepted :) |
A perf comparison would be great. |
@staabm the tests are ~50ms slower on my computer (they took 250ms) but i think they should be faster or the same as currently once the parser will be entirely converted (the StringReader is instantiated several times for now and some variables are instantiated only because other parts aren't converted yet). BTW I think flexibility is the most important part for this component as its result is cached most of the time. Edit: I think we could improve performances with a We could as well generate ranges and hardcode them |
Maybe you would prefer a real feature directly ? (such as custom tags) |
fc7dd89
to
fe6901b
Compare
@Ener-Getick after a quick review of your proposal there is one (minor) thing I don't like: the name of the |
@javiereguiluz how would you call |
@Ener-Getick yes, exactly like that :) |
* | ||
* @return string A YAML string | ||
* | ||
* @throws ParseException When malformed inline YAML string is parsed | ||
* | ||
* @internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was marked as internal in 2.x then you can even make this method private, no?
If it was marked as internal in 3.x it may not be possible to change it (just yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the entire class is @internal
so I removed this tag ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is only internal since Symfony 3.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xabbuh if someone looks at the 3.1
code he will see that this method is @internal
because its class is. What's the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you're right, if it should be removed, it should be done in 3.1
.
read(Any, Span, CSpan, String) and consumeWhiteSpace (that name is used in PHP-cs-fixer, could even leave out the White part). |
// quoted scalar | ||
$output = self::parseQuotedScalar($scalar, $i); | ||
if ($quote = $reader->eatSpan('"\'', 1)) { | ||
$unescaper = new Unescaper(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be injected / defined as a property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its methods are static, it could even not be instantiated.
I was looking at the StringReader implementations in .NET and in Java. Some comments in case they're useful:
|
|
3e3cbb7
to
51b87f7
Compare
Do you think it will be possible to have this in 3.2 or is it too late ? |
BTW this PR fixes several limitations:
|
I am not against this change per se especially if it greatly increases readability. However, we should be sure not to decrease performance as this component is used by many packages (of which many probably do not use a caching system like Symfony is using it when building the DI container). |
@xabbuh i don't think people should use this component for its performance (there is the yaml extension for that) but instead for its extensibility and its integration with symfony. Tests are ~10% slower on my computer. Maybe we can improve this more (blackfire attributes a big cost to functions not doing anything themselves, such as readSpan, maybe caused by blackfire itself or because of something else?). |
I guess we now have to wait for 3.3? I don't know if i should continue to add changes here as it would only make the diff bigger but i still think we can really improve the yaml component by changing its internals. |
e1a948d
to
e607582
Compare
I don't have a really strong opinion either way. Though it would be good to get some more opinion of the other @symfony/deciders before investing more time here IMO. |
Having a true parser instead of relying on regex is a good idea. However @xabbuh is right, I saw a lot of PHP applications using this component to parse YAML files at runtime and without cache. Using the C extension is often not an option because it is not broadly available and requires to be able to install packages on the server. |
@GuilhemN can you share a blackfire comparison of this PR vs master (and maybe the profiles themselves too, at least for the PR) ? Btw, most usage of |
Thus, most places consuming whitespaces don't care about them, so you could have a |
I used this benchmark based on the The first try was 34% slower, then with the |
@GuilhemN I'm getting a 404 for your link (latest profile) :/ |
@theofidry sorry, I forgot to make them public... it should work now. |
|
||
// bc support of [ foo, , ] | ||
while ($reader->readChar(',')) { | ||
$reader->consumeWhiteSpace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section looks like one of those rare cases where a do {} while()
makes sense.
@GuilhemN please use public links for the comparisons too |
It looks like the links were broken this time... |
@GuilhemN are you still using the benchmark that you mentioned in this comment? for ($i = 0; $i < 10000; ++$i) {
$parser->parse('[foo, bar, quz: foo, {bar: foo, quz: mou,}]');
} I don't like that benchmark because it parses a small YAML file thousands of times. I prefer something more realistic, like parsing a medium-sized YAML file a few times. For example, you could use the Symfony Demo app/config/config.yml file and parse 5 times or so. |
@javiereguiluz i did not like it too, i changed it to this one, based on the standard edition. |
@GuilhemN OK thanks! I still think that parsing the YAML file 1,000 times is very unrealistic. |
@javiereguiluz the results when executed 5 times: before / after / comparison. |
@GuilhemN to be more fair in your benchmark you could profile:
You could do so separately or within the same process. You could also use Blackfire prob to exclude the autoloading from the profile which can always be kinda falsy cf. Blackfire doc. |
@theofidry I don't think that's worth it, a benchmark will never be exact anyway. The goal is to see the slowest parts of the code and see the global trend when changing a big part of the code; it's good enough as is imo :) For the autoloader, I'll run more samples at the same time, the results should be more accurate. |
06d0a98
to
c00d8e2
Compare
c00d8e2
to
f25f7f2
Compare
I'm closing this as it needs a lot of work to be rebased and most of the things it fixed we're fixed otherwise. |
This PR proposes the introduction of a
StringReader
in theYaml
component (could be in its own component later if we are satisfied).It's a huge change (no more regex and offset management in the parser) so I only updated some simple part of the parser for now to show how it could be used.
This would simplify the
Yaml
parser maintenance by removing regexs but also by using a clearer syntax (no more obvious function calls), the first feature I imagine is allowing custom tags (what about!service foo
instead of@foo
for dependency injection ?).WDYT?