8000 Able to load big xml files with DomCrawler by zorn-v · Pull Request #16873 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

Able to load big xml files with DomCrawler #16873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Able to load big xml files with DomCrawler
  • Loading branch information
zorn committed Dec 7, 2015
commit 2fa81184ab5250419276167ce8044d94e7fbe3a1
2 changes: 1 addition & 1 deletion src/Symfony/Component/DomCrawler/Crawler.php
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ public function addXmlContent($content, $charset = 'UTF-8')
$dom->validateOnParse = true;

if ('' !== trim($content)) {
@$dom->loadXML($content, LIBXML_NONET);
@$dom->loadXML($content, LIBXML_NONET | LIBXML_PARSEHUGE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this option have any drawbacks when parsing non-huge documents?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual says that it only relaxes any hardcoded limit from the parser.
https://secure.php.net/manual/en/libxml.constants.php

It only for Libxml >= 2.7.0 but I dont know is version below is widespread.
For ex. on CentOS 6 is 2.7.6

Copy link
Contributor

Maybe we need something like:
LIBXML_NONET | (defined('LIBXML_PARSEHUGE') ? LIBXML_PARSEHUGE : 0)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant defined in php extension and avail since PHP >= 5.3.2 and PHP >= 5.2.12 which is less than min requirement for DomCrawler

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zorn-v is it always defined, or does it depend on the libxml version being used ? Distributions generally compile PHP against the system libxml rather than the version bundled with PHP, meaning that it may change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. In php 5.3.9 ext\libxml\libxml.c

#if LIBXML_VERSION >= 20703
    REGISTER_LONG_CONSTANT("LIBXML_PARSEHUGE",  XML_PARSE_HUGE,         CONST_CS | CONST_PERSISTENT);
#endif 

So minimum libxml version actualy 2.7.3 not 2.7.0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can find only one dist with libxml2 < 2.7.3 - CentOS 5. But there PHP 5.1.6 (in standart repo)
Even on Debian 6 it 2.7.8

I think threre is no sense in that check, but I add it just in case.

}

libxml_use_internal_errors($internalErrors);
Expand Down
0