-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Finder] Handle filtering of recursive iterators and use it to skip looping over excluded directories #15802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Would it fix #5951 completely? |
So, as a bug fix, it's going to be merged into 2.3 instead. |
Work great! 50% faster for the following code (from the Symfony root dir): require_once __DIR__.'/vendor/autoload.php';
use Symfony\Component\Finder\Finder;
$finder = Finder::create();
$ct = 1;
foreach ($finder->in(__DIR__) as $file) {
++$ct;
}
print "$ct\n"; And here is the result from Blackfire:
Visually, it confirms that we are doing a better job at limiting the number of calls: https://blackfire.io/profiles/compare/f45a1e98-0ab2-4f31-99bf-621da8a329fb/graph Note: For the record, the speed improvement for such a simple Finder command comes from the fact that we are excluding the VCS directories by default (which adds about 9 exclude directory constraints to the Finder). |
It would be great to run benchmarks of the PHP adapter vs find-based adapters again with this change applied, to see how much better they behave ( @jfsimon had such benchmark when working on the component). @nicolas-grekas another possible improvement could be to split the DepthRangeFilterIterator in 2:
|
@@ -18,9 +18,24 @@ | |||
* | |||
* @author Alex Bogomazov | |||
*/ | |||
abstract class FilterIterator extends \FilterIterator | |||
abstract class FilterIterator extends \FilterIterator implements \RecursiveIterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure implementing RecursiveIterator on all out filters is a good idea. Most of our filters are required to be applied on a flattened iterator to work properly.
What about using a child class RecursiveFilterIterator which would become the parent of the filters applied in a recursive way ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would also avoid the need to memoize the return value in all other filters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary: implementing inner recursive iterator is just a new feature of the class, but it's opt-in. If you don't use it as a recursive iterator (which nobody does since it didn't implement the interface) then its the same as before. Now, if one uses it as a recursive iterator, then it works. I see no need for more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value is not memoized in all other filters, but in filters where having a recursive inner iterator could make sense. The filter on the contents of a file for example doesn't memoize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicolas-grekas the issue is that this makes it harder to distinguish filters which can be applied in a recursive way and filters which simply cannot be applied this way because it would change their behavior.
For instance, the PathFilterIterator should never be applied as a recursive iterator: it would mean that rejecting a folder would also reject all files in it (because of the behavior of RecursiveIteratorIterator which does not look for children of excluded elements).
Implementing RecursiveIterator on these filter iterators would be misleading for people looking at them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memoization removed, this should fix your concern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memoization was not the culprit (it was simply useless). The issue is marking filter iterators as being RecursiveIterator while using them in a recursive way provides a broken behavior (it will exclude too much things)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not broken at all, it's just a new behavior that didn't exist before and can be changed easily by wrapping the injected recursive iterator in a RecursiveIteratorIterator. I see no issue here. To get the new behavior is really opt-in: inject a recursive iterator, and use this one as a recursive iterator also. I wouldn't say anyone will do this by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicolas-grekas but having an iterator configured to exclude /Tests$/
and having it exclude .../Tests/foo.php
would be confusing.
The filtering iterators are meant to implement the filters provided in the finder component, and implementing RecursiveIterator should be done only on filters which are actually providing their expected behavior in a recursive filtering IMO. Otherwise it would be too easy for someone to break the behavior in the future by moving filters around and getting one before the RecursiveIteratorIterator by mistake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what if that's exactly what I want, excluding /Tests$/
and having it exclude .../Tests/foo.php
?
This gives more power to the user, and we wouldn't force what can or can't be done with the iterators.
More power means more way to get things wrong of course... and I see no issue with that.
@stof I'm checking-out the benchmark suite. |
Benched with https://github.com/jfsimon/symfony-finder-benchmark: before:
after:
|
84b2512
to
07e101f
Compare
right. I forgot about the way the DepthRangeFilterIterator is implemented internally |
$path = $this->isDir() ? $this->current()->getRelativePathname() : $this->current()->getRelativePath(); | ||
$path = $this->current(); | ||
|
||
if (isset($this->patterns[$path->getFilename()])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks wrong for files rather than folders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right! fixed, memoization removed also!
0b2996e
to
8ee45b2
Compare
…ooping over excluded directories
8ee45b2
to
0763167
Compare
Thank you @nicolas-grekas. |
…t to skip looping over excluded directories (nicolas-grekas) This PR was submitted for the 2.8 branch but it was merged into the 2.3 branch instead (closes #15802). Discussion ---------- [Finder] Handle filtering of recursive iterators and use it to skip looping over excluded directories | Q | A | ------------- | --- | Bug fix? | yes | New feature? | yes | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | #5951, #8685 | License | MIT | Doc PR | - By implementing RecursiveIterator in FilterIterator, we can make it able to skip children of excluded branches of recursive inner iterators. We use it immediately for our main target: skip over children of excluded directories in the Finder. This is a significant performance boost when iterating over big directories, thus the "bugfix" status. Commits ------- 8c691bd [Finder] Handle filtering of recursive iterators and use it to skip looping over excluded directories
This PR was merged into the 2.3 branch. Discussion ---------- [Finder] Fix recursive filter iterator | Q | A | ------------- | --- | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | #15802 | License | MIT | Doc PR | - #15802 was broken because the filters where not propagated to children iterators. This fixes the issue and adds a test case for this situation. The RecursiveIterator implementation is moved from FilterIterator to ExcludeDirectoryFilterIterator. Doing so on other filters is possible but would be a new feature that is not required for fixing the performance issue we had previously. Commits ------- cf3019b [Finder] Fix recursive filter iterator
This PR was merged into the 2.3 branch. Discussion ---------- [Finder] Optimize the hot-path | Q | A | ------------- | --- | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | #15824 | License | MIT | Doc PR | - A significant part of the perf gain in #15802 was related to filters not being applied recursively... #15824 fixing this, performance dropped again. This PR optimizes the hot path by replacing a regexp test by a simple `isset` when possible. Blackfire diff after #15824 is the following: https://blackfire.io/profiles/compare/9e489018-998d-4acb-92a0-46011828e83b/graph `preg_match` is not called anymore, and `Symfony\Component\Finder\Iterator\RecursiveDirectoryIterator::current()` is also cut by two. When this `isset` optimization is disabled and replaced by a concatenation of all the regexps patterns in a single bigger one, the gain is still significant but lower: https://blackfire.io/profiles/compare/db86b80e-b63e-4fc9-9ff3-9ed32baeb948/graph This makes me think that an other and last round of optimization is possible by merging all regexps in one in MultiplePcreFilterIterator filters. If someone wants to work on this, please do it :) Commits ------- f156de6 [Finder] Optimize the hot-path
By implementing RecursiveIterator in FilterIterator, we can make it able to skip children of excluded branches of recursive inner iterators.
We use it immediately for our main target: skip over children of excluded directories in the Finder.
This is a significant performance boost when iterating over big directories, thus the "bugfix" status.