8000 [DomCrawler] Add a way to filter only direct children nodes · Issue #28171 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

[DomCrawler] Add a way to filter only direct children nodes #28171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Einenlum opened this issue Aug 9, 2018 · 8 comments
Closed

[DomCrawler] Add a way to filter only direct children nodes #28171

Einenlum opened this issue Aug 9, 2018 · 8 comments

Comments

@Einenlum
Copy link
Contributor
Einenlum commented Aug 9, 2018

Description

In jQuery, there is the .find() method and the .children([selector]) method. Its documentation states that :

The .children() method differs from .find() in that .children() only travels a single level down the DOM tree while .find() can traverse down multiple levels to select descendant elements (grandchildren, etc.) as well.

The Dom-Crawler component only has a filter() method (to filter the node and all its children) and a children() method to return direct children. There is no way to easily filter (thanks to a selector) the direct children of a node.

I think it could be nice to add an optional $selector parameter to the children() method, to allow to filter direct children, as jQuery allows to.

Example

Considering this example:

$html = <<<'HTML'
<html>
    <body>
        <div id="foo">
            <p class="lorem" id="p1">ipsum</p>
            <p class="lorem" id="p2">era</p>
            <div id="nested">
                <p class="lorem" id="p3">amenos</p>
            </div>
        </div>
    </body>
</html>
HTML;

$crawler = new Crawler($html);
$foo = $crawler->filter('#foo');

Currently, there is no way (or am I missing something?) to only return #p1 and #p2 starting from the $foo variable.

$foo->filter('.lorem') returns #p1, #p2 and #p3.
$foo->children() returns #p1, #p2 and #nested.

Adding this new feature would allow $foo->children('.lorem') to return #p1 and #p2.
Also, since it would be an optional parameter, it would not break BC.

@xabbuh
Copy link
Member
xabbuh commented Aug 9, 2018

filter() uses CSS selectors. So doesn't something like div#foo > p.lorem work?

@Einenlum
Copy link
Contributor Author
Einenlum commented Aug 9, 2018

@xabbuh, thanks for your answer.

Sorry, maybe I should have emphasized on this part: there is no way currently to only return #p1 and #p2 from the $foo variable.

I'm currently working on a project where a crawler is made of multiple different crawlers, all taking care of a small part of the DOM. They only receive the element directly (the $foo element in this example).
These small crawlers could access the parent and try somehow to use a selector that is only selecting the $foo element through the parent but that would be somehow hazardous (here it's an id, but what if my $foo element is part of a few .some-class siblings?) and kind of a break of responsibility.

So to keep the example above, I'm in the situation where I have a class with a method like :

public function getLorems(Crawler $fooElement): array
{
    // … ?
}

I think it would save a lot of effort and bad risky code to add this way of filtering direct children.

@javiereguiluz
Copy link
Member
javiereguiluz commented Aug 9, 2018

@Einenlum have you tried what @xabbuh suggested and it doesn't work?

$foo->filter('> .lorem')

@Einenlum
Copy link
Contributor Author
Einenlum commented Aug 9, 2018

@javiereguiluz yes. It's not a valid selector, according to the CssSelector component.

Symfony\Component\CssSelector\Exception\SyntaxErrorException: Expected selector, but <delimiter ">" at 0> found.

@juanmiguelbesada
Copy link
Contributor

Hi,

Well technically > .lorem it's not a valid CSS Selector. But maybe we can find a way to handle this use case.

BTW, as a workaround you maybe you could use $foo->filterXpath("/*[@class='lorem']")

@Einenlum
Copy link
Contributor Author

@juanmiguelbesada It seems it does not work either.

$foo->filter('.lorem')->count() // 3
$foo->children()->filter('.lorem')->count() // 3
$foo->filterXpath("/*[@class='lorem']")->count() // 0

@xabbuh
Copy link
Member
xabbuh commented Aug 13, 2018

@Einenlum $foo->filterXPath('child::*/p[@class="lorem"]') would work, for example.

@Einenlum
Copy link
Contributor Author
Einenlum commented Aug 13, 2018

@xabbuh Thanks! Your example seems to work :).

So actually we could easily do something like :

    /**
     * Returns the children nodes of the current selection.
     *
     * @param $selector string|null A optional CSS selector to filter children
     *
     * @return self
     *
     * @throws \InvalidArgumentException When current node is empty
     * @throws \RuntimeException if the CssSelector Component is not available and $selector is provided
     */
    public function children($selector = null)
    {
        if (!$this->nodes) {
            throw new \InvalidArgumentException('The current node list is empty.');
        }

        if ($selector) {
            if (!class_exists(CssSelectorConverter::class)) {
                throw new \RuntimeException('To filter with a CSS selector, install the CssSelector component ("composer require symfony/css-selector").');
            }

            $converter = new CssSelectorConverter($this->isHtml);
            $xpath = $converter->toXPath($selector, 'child::*/');

            return $this->filterXPath($xpath);
        }

        $node = $this->getNode(0)->firstChild;

        return $this->createSubCrawler($node ? $this->sibling($node) : array());
    }
    $foo->children()->count() // 3
    $foo->children('.lorem')->count() // 2

Obviously this needs to be tested cause I'm bad at Xpath, but if it works, I think it could be really nice for the end users.

nicolas-grekas added a commit that referenced this issue Aug 24, 2018
…nlum)

This PR was squashed before being merged into the 4.2-dev branch (closes #28221).

Discussion
----------

[DomCrawler] Add a way to filter direct children

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #28171
| License       | MIT
| Doc PR        | -

The Dom-Crawler component only has a `filter()` method (to filter the node and all its children) and a `children()` method to return direct children.
**There is currently no way to easily filter (thanks to a selector) the direct children of a node, like jQuery allows so (with a selector passed to the `.children([selector])` method).**

**This PR adds a way to optionally filter direct children thanks to a CSS selector**. Here is an example of the usage:

```php
$html = <<<'HTML'
<html>
    <body>
        <div id="foo">
            <p class="lorem" id="p1"></p>
            <p class="lorem" id="p2"></p>
            <div id="nested">
                <p class="lorem" id="p3"></p>
            </div>
        </div>
    </body>
</html>
HTML;

$crawler = new Crawler($html);
$foo = $crawler->filter('#foo');

$foo->children() // will select `#p1`, `#p2` and `#nested`
$foo->children('p') // will select `#p1` and `p2`
$foo->children('.lorem') // will select `#p1` and `p2`
```
This PR adds only an optional parameter and adds no BC break.

Commits
-------

f634afd [DomCrawler] Add a way to filter direct children
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
0