-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[RFC][Intl] Exclude legacy languages? #33165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am from Norway, and I hold the strong opinion that the deprecation of "no" and/or "nor" as a macrolanguage is a bug. |
We can also wait a bit to see what happens on the next ICU update, though im not aware of any release dates also getting the data consistent upstream (assuming ISO is right) would be ideal, however im not sure where it comes from actually since ICU in turn merges from CLDR data. |
I think the only issue here is that Since this causes a problem for us whith alpha2 codes not having the same list of languages as alpha3 codes I think we should follow the official source, and just add our own patch to the data. |
this is consistent with ISO, but another issue is ICU points
so this would still cause different lists (alpha2=nb vs alpha3=nor,nob), which seems weird i agree 😅 |
Yes, so the solution is clear to me then:
I think we will have to wait for a very long time for this to get sorted in the upstream, so I think we should patch the data after we read it from the ICU source. |
Any reference to |
i'd like to see the impacted data if we skip all here's another interesting case 😓 https://github.com/unicode-org/icu/blob/master/icu4c/source/data/lang/en.txt#L77 which is also not ISO :)
sr_Latn is already included on itself, and during lookup we resolve aliases. So yes, i agree if we decide to exclude legacy languages, we should also exclude locales bound to those languages from the list |
I think "Serbo-Croatian" is very much a live language and language name. It is just that it should be referenced as |
My point was that if an ISO code is marked as legacy (like |
Yes, I agree that "American Sign Language" should not be there. |
not sure i understand that, or the actual concern raised my point is the locale list currently includes aliases (i qualify that duplication): symfony/src/Symfony/Component/Intl/Resources/data/locales/en.json Lines 489 to 490 in 0bdf10a
whereas ICU tells us to use symfony/src/Symfony/Component/Intl/Resources/data/locales/en.json Lines 507 to 508 in 0bdf10a
symfony/src/Symfony/Component/Intl/Resources/data/locales/en.json Lines 513 to 514 in 0bdf10a
|
According to https://en.wikipedia.org/wiki/Serbo-Croatian
If we want to keep it as a makrolanguage the alpha3 code can be gotten from ISO 639-3 After reading https://en.wikipedia.org/wiki/Serbo-Croatian I guess you are right, "Serbo-Croatian" as a lanaguage itself is what has been marked as legacy. |
Symfony version(s) affected: 3.4
When we generate the list of language names for the Intl component we rely on https://github.com/unicode-org/icu/blob/master/icu4c/source/data/lang/en.txt
This provides many translations, but is not the authoritative code list as-is.
See e.g.
no
andsh
which are marked legacy in the metadata file, respectively
The problem comes with ISO vs. ICU. ISO qualifies e.g.
no
a macrolanguage (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) and seems valid today: https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no, butsh
is not: https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=shNot sure what to do :) Issue for now, to move forward in #33140
For now this causes the list to (alpha2 vs alpha3) to vary so it seems.
The text was updated successfully, but these errors were encountered: