-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[DomCrawler] fixed handling of schemes by Link::getUri() #7214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DomCrawler] fixed handling of schemes by Link::getUri() #7214
Conversation
array(' | ||
/foo', 'http://localhost/bar/foo/', 'http://localhost/foo'), | ||
array('/foo | ||
', 'http://localhost/bar/foo', 'http://localhost/foo'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thos two tests should be kept and must work. If not, this is a BC break. And indeed, I think your change does not change these.
Ok, I will update the pull request |
@@ -89,7 +89,7 @@ public function getUri() | |||
$uri = trim($this->getRawUri()); | |||
|
|||
// absolute URL? | |||
if (0 === strpos($uri, 'http')) { | |||
if (!is_null(parse_url($uri, PHP_URL_SCHEME))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't use is_null
in Symfony but null !==
instead.
A link (anchor tag with an href attr) in crawled by the Crawler can contain any valid URI, including mailto: links. Currently this is not correctly supported by Link::getUri. Schemes that do not start with 'http' are treated as relative URIs and appenden to the base URI. This leads to strange URIs like this: http://foo.com/mailto:foo@bar.com Fixed Link::getUri to treat any URI with a schema part as an absolute URL. Updated the unit tests to test for this.
Fixed |
…(PR #7214) This PR was merged into the 2.1 branch. Commits ------- 8f8ba38 [DomCrawler] fix handling of schemes by Link::getUri() Discussion ---------- [DomCrawler] fixed handling of schemes by Link::getUri() A link (anchor tag with an href attr) in pages crawled by the Crawler can contain any valid URI, including mailto: links. Currently this is not correctly supported by `Link::getUri`. Schemes that do not start with 'http' are treated as relative URIs and appenden to the base URI. This leads to strange URIs like this: http://foo.com/mailto:foo@bar.com Fixed `Link::getUri` to treat any URI with a schema part as an absolute URL. Updated the unit tests to test for this. --------------------------------------------------------------------------- by matthijsvandenbos at 2013-02-28T11:55:18Z Ok, I will update the pull request --------------------------------------------------------------------------- by matthijsvandenbos at 2013-02-28T12:25:45Z Fixed
It's still wrong for URLs starting with // (double slash). |
@tomasz-ulanowski the support of scheme-relative uris has been added by a separate PR |
A link (anchor tag with an href attr) in pages crawled by the Crawler
can contain any valid URI, including mailto: links.
Currently this is not correctly supported by
Link::getUri
.Schemes that do not start with 'http' are treated as relative URIs
and appenden to the base URI. This leads to strange URIs like this:
http://foo.com/mailto:foo@bar.com
Fixed
Link::getUri
to treat any URI with a schema part as anabsolute URL. Updated the unit tests to test for this.