8000 [Intl] Add option to regenerate Intl data-set with Kosovo by llupa · Pull Request #59693 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

[Intl] Add option to regenerate Intl data-set with Kosovo #59693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

llupa
Copy link
Contributor
@llupa llupa commented Feb 4, 2025
Q A
Branch? 7.3
Bug fix? yes/no
New feature? yes
Deprecations? no
Issues Fix #54711
License MIT

I am starting this PR because the linked issue has had me debate with my code more than I'd like to. The core principle of the int-component, as far as I can understand, is too be a more slim data-dump of the ICU data-set and offer easy access to most requested information, like country codes, names, ISO alpha codes, for languages and countries, timezones, scripts and such.

I tried to make a dynamic data-dump as much as I could, but that would require heavy rewrite of the component's logic (see below). So, I decided for this PR as a starting point to let users regenerate all data themselves including Kosovo for projects that need it.

ToDo-s

  • If left simply as is the data regeneration might take when constantly downloading ICU data
  • Docs need to be updated to explain how to run and how to ideally cache ICU data

There are other touch points that are ignored such as validation constraints because I am looking for comments from the community.

Alternative larger code change

I tried a proof-of-concept only changing the PhpBundleWriter class and RegionDataGenerator class to always return an array of 2 values when calling any of the generate* functions, one for the default data-set and one for the optional user-assigned (only XK in this case to keep data light). Then, I changed the template format to array merge recursively if ALLOW_OPTIONAL_USER_ASSIGNED was true.

This had the issue that it only works for php data format, and not for Json. So I have been playing around with making a new format of data that always has the default data (as is now) and an extra array / object with XK and changed the related PhpBundleReader and JsonBundleReader to decide weather XK needs to be merged depending on the env var value. This looks promising but, again, I think it is such a big change, that I feel compelled to hear from core members opinion.

PS. This component is definitely interesting.

@llupa
Copy link
Contributor Author
llupa commented Feb 4, 2025

cc @javiereguiluz @wouterj @ro0NL @stof

@stof
Copy link
Member
stof commented Feb 4, 2025

I don't think the proposal that @javiereguiluz agreed on was to make projects regenerate the intl data in a different way in their deployment process (effectively changing the code of the component during their deployment process without any review of the actual effect of the script)

@nicolas-grekas
Copy link
Member

What about putting the denylist in the generated src/Symfony/Component/Intl/Resources/data/regions/meta.php instead, and dump all countries (including Kosovo and its translations) in localized files? On runtime, we could then use the meta to filter by default, and provide a way to allow some extra countries on demand?

@llupa
Copy link
Contributor Author
llupa commented Feb 4, 2025

What about putting the denylist in the generated src/Symfony/Component/Intl/Resources/data/regions/meta.php instead, and dump all countries (including Kosovo and its translations) in localized files? On runtime, we could then use the meta to filter by default, and provide a way to allow some extra countries on demand?

I have many different branches around the ways I initially thought that I might have tunneled vision at this point. I will definitely try this too 👍

@OskarStark OskarStark added the Intl label Feb 4, 2025
@carsonbot carsonbot changed the title Add option to regenerate Intl data-set with Kosovo [Intl] Add option to regenerate Intl data-set with Kosovo Feb 4, 2025
@nicolas-grekas
Copy link
Member

Friendly up @llupa

@connorhu
Copy link
Contributor
connorhu commented Apr 2, 2025

Unlike the readme, you don't just execute a command, you also need a composer install inside the directory of the component. However, the biggest problem is that the update command takes a long time to run. Doing this every deployment is not the right direction.
However, nicolas' solution is sympatic to add / filter out user assigned codes at runtime. I have to deal with this issue for one of my customer so I will do something similar.

# time php vendor/symfony/intl/Resources/bin/update-data.php
---------------------------------------------------------------------------
                      ICU Resource Bundle Compilation
---------------------------------------------------------------------------
Starting git clone. This may take a while...
Git clone to /tmp/icu-data complete.
Checking out `release-77-1` for version `77`...
Building genrb.
Running configure...
Running make...
[1/6] libicudata.so... ok.
[2/6] libicuuc.so... ok.
[3/6] libicui18n.so... ok.
[4/6] libicutu.so... ok.
[5/6] libicuio.so... ok.
[6/6] genrb... ok.
Using /tmp/icu-data/77/build/bin/genrb.
Preparing resource bundle compilation (version 77.1)...
Starting resource bundle compilation. This may take a while...
Generating language data...
Generating script data...
Generating region data...
Generating currency data...
Generating locale data...

ErrorException: file_put_contents(...): Failed to open stream: No such file or directory

real	8m55.920s
user	13m25.107s
sys	1m15.706s

@connorhu
Copy link
Contributor
connorhu commented Apr 2, 2025

Something like this does the trick. What do you think?

Obviously, we need to do the addition for all the relevant methods and the generator needs to be updated.

Subject: [PATCH] NamesUserAssigned
---
Index: Countries.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/Countries.php b/Countries.php
--- a/Countries.php	(revision 0bd68a922ebf783861fda60815575849c3afb0c4)
+++ b/Countries.php	(date 1743607734798)
@@ -147,9 +147,13 @@
      *
      * @return array<string, string>
      */
-    public static function getNames(?string $displayLocale = null): array
+    public static function getNames(?string $displayLocale = null, bool $includeUserAssigned = false): array
     {
-        return self::asort(self::readEntry(['Names'], $displayLocale), $displayLocale);
+        $names = self::readEntry(['Names'], $displayLocale);
+        if ($includeUserAssigned === true) {
+            $names = [...$names, ...self::readEntry(['NamesUserAssigned'], $displayLocale)];
+        }
+        return self::asort($names, $displayLocale);
     }
 
     /**

@connorhu
Copy link
Contributor
connorhu commented Apr 3, 2025

I went through the idea I posted and it doesn't work (see #54711 (comment)).
My other problem with the Intl data part at the moment is that you can't easily customize. You can't create your own list (historical data, legacy records kept for backwards compatibility, unused locale cleanup). My current opinion is that the data should be separate from the code, so that it can be replaced and extended.

@llupa llupa 9E84 closed this by deleting the head repository Apr 6, 2025
@OskarStark
Copy link
Contributor
OskarStark commented Apr 28, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intl has Kosovo in DENYLIST
6 participants
0