8000 [DomCrawler] \Symfony\Component\DomCrawler\Crawler::addContent guess charset from html? · Issue #9061 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

[DomCrawler] \Symfony\Component\DomCrawler\Crawler::addContent guess charset from html? #9061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bronze1man opened this issue Sep 17, 2013 · 1 comment
Labels
DomCrawler Feature Good first issue Ideal for your first contribution! (some Symfony experience may be required)

Comments

@bronze1man
Copy link
Contributor

I just download this page.
http://search.jd.com/Search?keyword=%E6%96%87%E5%AD%A6&enc=utf-8&area=22&book=y
it do not send a header of charset in http response header and I put it content to Crawler::addContent,Then Crawler use wrong charset ISO-8859-1.
It works fine if I use follow code:

        $crawler = new Crawler();
        $crawler->addHtmlContent($content,'GBK');

Is there some way to guess charset from html meta?

@stof
Copy link
Member
stof commented Sep 17, 2013

for reference, here is how Behat MinkGoutteDriver is extending Goutte to guess the charset from the content when there is no content type header: https://github.com/Behat/MinkGoutteDriver/blob/master/src/Behat/Mink/Driver/Goutte/Client.php#L32

This could be reused to implement it directly in DomCrawler (the Mink code is under MIT)

fabpot added a commit that referenced this issue Sep 19, 2013
This PR was squashed before being merged into the 2.2 branch (closes #9074).

Discussion
----------

[DomCrawler]Crawler guess charset from html

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets |  #9061
| License       | MIT
| Doc PR        | n/a

Commits
-------

e5282e8 [DomCrawler]Crawler guess charset from html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DomCrawler Feature Good first issue Ideal for your first contribution! (some Symfony experience may be required)
Projects
None yet
Development

No branches or pull requests

2 participants
0