Fixed language detection to support parsing of HTML fragments #121

jkphl · 2017-05-25T16:46:54Z

The current way of language detection breaks when parsing an HTML fragment only (instead of a complete HTML document). The parser expects an <html> element to be present and will run into

Argument 1 passed to Mf2\Parser::language() must be an instance of DOMElement, instance of DOMDocument given, called in L:\micrometa\vendor\mf2\mf2\Mf2\Parser.php on line 513 and defined

otherwise. Making sure the parent node is a DOMElement before recursing fixes the problem.

gRegorLove · 2017-05-25T17:04:50Z

This looks good (simple enough of an update), but can you add a test case? Specifically I'm wondering if this still parses language from the HTML fragments or skips it.

jkphl · 2017-05-25T17:13:22Z

@gRegorLove I tried running the present tests but 2 of them fail for me (both having to do with fetching remote documents from Barnaby's and Aaron's sites, so I guess it's a problem on my side). I don't know if there's a test included that checks for the language, but if so, it should still work. The new release Aaron just pushed out broke my code (micrometa) because of the mandatory <html> element, adding the check for DOMElement fixes it again. I'll see if I can add a test proving that language parsing still works and also HTML fragments are supported (will take me a while thoug, please bear with me).

jkphl · 2017-05-25T17:22:21Z

@gRegorLove Re-read your comment. Well, there should be no difference in language handling after adding the check for DOMElement, except that the recursion stops before hitting the DOMDocument. For fragments without an <html> element there will be no <meta http-equiv=""> check, but xml:langs will still be detected.

jkphl · 2017-05-26T15:04:05Z

@gRegorLove I now added a simple test for parsing the language of an HTML fragment without enclosing <html> element. All the tests in ParseLanguageTest.php run totally fine for both complete HTML documents as well as HTML fragments.

gRegorLove · 2017-05-26T17:41:07Z

Awesome, thanks! I think I misunderstood the update initially. Having another test is good though. :)

aaronpk · 2017-05-27T14:45:58Z

I tried running the test case without the change you submitted, but it doesn't fail the test. Can you provide example HTML that causes this error before the fix?

…ts#121)

aaronpk · 2017-05-27T15:00:08Z

Got it, thanks!

jkphl · 2017-05-27T15:00:49Z

Interesting! I just realized that the problem hits you only in an edge case: The parser allows to pass in a DOMDocument as $input. When this document has been constructed via ->loadXML() (instead of ->loadHTML()), then an HTML fragment won't be wrapped with the base HTML scaffold ... hence the error.

Fixed language detection to support parsing of HTML fragments

23aabcb

Added test for HTML fragment language testing (microformats#121)

acda009

jkphl mentioned this pull request May 27, 2017

Fix dependency on microformats/tests #122

Closed

Added XML loading test for HTML fragment language testing (microforma…

f101447

…ts#121)

aaronpk merged commit 84bd6ef into microformats:master May 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed language detection to support parsing of HTML fragments #121

Fixed language detection to support parsing of HTML fragments #121

Fixed language detection to support parsing of HTML fragments #121

Fixed language detection to support parsing of HTML fragments #121

Conversation