8000 Yaml parser · Issue #15674 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

Yaml parser #15674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
igormukhingmailcom opened this issue Sep 3, 2015 · 6 comments
Closed

Yaml parser #15674

igormukhingmailcom opened this issue Sep 3, 2015 · 6 comments
Labels

Comments

@igormukhingmailcom
Copy link

Hi.

Have next situation...
I've got yaml file from my colleague and this file parsed not successfully:

# fixture.yml
table.inventory:
    inventory1:
        name: First Inventory
        description: Information about first inventory
        filename: first_inventory.txt

Parsed data:

# bash output from var_dump
array(2) {
  'table.inventory' =>
  NULL # <----------------------------- THIS IS NOT RIGHT
  '  inventory1' =>
  array(3) {
    'name' =>
    string(15) "First Inventory"
    'description' =>
    string(33) "Information about first inventory"
    'filename' =>
    string(19) "first_inventory.txt"
  }
}

So I have a look to this file and found that some looks-like-space characters is not spaces.
When I've converted this looks-like-space character into html code I've got &ensp;.

(When I replaced it with spaces characters - all become work fine).

How can we prevent that situations... When parser see tab character - it throws error. Maybe we can also throw exceptions with another looks-like-space characters?

Thank you.

@stof
Copy link
Member
stof commented Sep 3, 2015

It throws error because tabs are forbidden in the Yaml spec. On the other hand, unicode space-like chars are not forbidden

@igormukhingmailcom
Copy link
Author

@stof So this is expected behavior? Maybe parser need to understand space-like chars as spaces?

@javiereguiluz
Copy link
Member

@igormukhingmailcom the problem is that in Unicode charset there are a lot of space-like chars. The four common ones related to HTML entities are nbsp, ensp, emsp and thinsp, but there are more:

unicode-spaces

source

For this reason, I think we should close this issue as "won't fix" and always expect that the input data is well formatted with regular white spaces. However, maybe someone can think of an easy solution to this problem.

@igormukhingmailcom
Copy link
Author

@stof @javiereguiluz Maybe we can just previously replace all space-like chars to space at start of line?

@derrabus
Copy link
Member
derrabus commented Sep 4, 2015

If I read http://www.yaml.org/spec/1.2/spec.html#space/indentation/ correctly, 0x20 is the only space character that may be used for indentation. imho, the current behavior is correct.

@igormukhingmailcom If you have to deal with broken YAML files, you might want to fix them before passing them to the YAML parser.

@javiereguiluz
Copy link
84F1
Member

@derrabus indeed you are right! The YAML spec is clear about this: only white spaces (0x20) and tabs (0x9) are allowed:

yaml_spec_spaces

source

I guess it's clear now that we have to close this issue as "won't fix" for not being an actual error. @igormukhingmailcom I'm afraid that you'll have to "clean" the files before parsing them as Yaml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants
0