8000 Get direct descendent text only when calling text() on a node. · Issue #42294 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

Get direct descendent text only when calling text() on a node. #42294

New issue

Have a question about this project? Sign up for a free GitHub account to o 8000 pen an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Bilge opened this issue Jul 27, 2021 · 6 comments · Fixed by #42338
Closed

Get direct descendent text only when calling text() on a node. #42294

Bilge opened this issue Jul 27, 2021 · 6 comments · Fixed by #42338

Comments

@Bilge
Copy link
Contributor
Bilge commented Jul 27, 2021

Description
Get direct descendent text only when calling text() on a node.

Example
When an HTML node contains mixed text and element nodes, it is difficult to get just the text of the current node, since the text() call will also recursively parse all descendents.

<foo>
    foo
    <bar>bar</bar>
</foo>

When we have a Crawler instance pointing at <foo>, calling text() will return foo bar. There is no way to get just the direct descendent text node, foo.

@xabbuh
Copy link
Member
xabbuh commented Jul 27, 2021

$crawler->filterXPath('//foo/text()')->text() does what you need, doesn't it?

@Bilge
Copy link
Contributor Author
Bilge commented Jul 27, 2021

And what if you don't know XPath and rely on filter() to navigate deep and complex structures? Is there a CSS equivalent?

@xabbuh
Copy link
Member
xabbuh commented Jul 30, 2021

Not that I am aware of. In any case, there's nothing we can do here in the codebase about it. So I am closing here. Thank you for understanding.

@xabbuh xabbuh closed this as completed Jul 30, 2021
@Bilge
Copy link
Contributor Author
Bilge commented Jul 30, 2021

But I don't understand. Why is there nothing that can be done?

@xabbuh
Copy link
Member
xabbuh commented Jul 30, 2021

What solution do you see if CSS doesn't allow to filter the particular node exclusively?

@Bilge
Copy link
Contributor Author
Bilge commented Jul 30, 2021

Either a parameter or method that only returns the direct decedent text node, e.g. text($ownTextOnly = false) (I believe this is not possible because text() already takes a parameter for some other purpose), or ownText(), for example. The implementation of ownText() would be free to use XPath or CSS as it wishes since it would be internal.

fabpot added a commit that referenced this issue Sep 21, 2021
This PR was squashed before being merged into the 5.4 branch.

Discussion
----------

[DomCrawler] Added Crawler::innerText() method

| Q             | A
| ------------- | ---
| Branch?       | 5.4
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| Tickets       | Fix #42294
| License       | MIT
| Doc PR        | symfony/symfony-docs#... <!-- required for new features -->

Adds a method to get the inner text that is directly descended from the current node only, ignoring text nodes in any child nodes.

Commits
-------

4767694 [DomCrawler] Added Crawler::innerText() method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0