[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IANA timezone db reference in the spec : should backzone be taken into account? #272

Open
jungshik opened this issue Sep 21, 2018 · 16 comments
Assignees
Labels
c: datetime Component: dates, times, timezones question s: in progress Status: the issue has an active proposal Small Smaller change solvable in a Pull Request
Milestone

Comments

@jungshik
Copy link
jungshik commented Sep 21, 2018

The current spec keeps referring to 'Zone and Link names', but that's not sufficient and leads to a divergence between implementations.

The main question is whether or not to take into account 'backzone' file in the IANA timezone database.

The ECMAScript 2018 Internationalization API Specification identifies time zones using the Zone and Link names of the IANA Time Zone Database. Their canonical form is the corresponding Zone name in the casing used in the IANA Time Zone Database.

All registered Zone and Link names are allowed. Implementations must recognize all such names

Firefox uses zone and link names in 'backzone' file of the IANA tz db, but some links in 'backzone' file contradicts what's in other files.

backward file has the following:

Link    Asia/Shanghai           Asia/Chongqing
Link    Asia/Shanghai           Asia/Chungking

backzone file has the following:

Link Asia/Chongqing Asia/Chungking

Note that backzone file has the following comment at the top:

# This file contains data outside the normal scope of the tz database,
# in that its zones do not differ from normal tz zones after 1970.
# Links in this file point to zones in this file, superseding links in
# the file 'backward'.

Because Firefox takes into account 'backzone' file, 'Asia/Chungking' is canonicalized to 'Asia/Chongqing' instead of 'Asia/Shanghai'.

ICU/CLDR (as used by v8) ignores 'backzone' file and both 'Asia/Chungking' and 'Asia/Chongqing' are canonicalized to 'Asia/Shaghai' per 'backward' file.

CLDR/ICU, however, do not canonicalize 'Asia/Phnom_Penh' and 'Asia/Vientiane' to 'Asia/Bangkok' despite the following in 'asia' file:

Link Asia/Bangkok Asia/Phnom_Penh       # Cambodia
Link Asia/Bangkok Asia/Vientiane        # Laos

That's because IANA timezone DB relegated the two zone names to links rather recently (2014-2015) and CLDR/ICU do not want to destabilize the tz ID space. So, they kept them as canonical zone IDs.

@anba
Copy link
Contributor
anba commented Sep 24, 2018

We've went with using the 'backzone' ids in Firefox to avoid the risk to make users in the affected time zones upset. For example canonicalizing 'Europe/Ljubljana', 'Europe/Podgorica', 'Europe/Sarajevo', 'Europe/Skopje', and 'Europe/Zagreb' to 'Europe/Belgrade' (which would be the case when not applying 'backzone') may have negative cultural/political effects.

Relevant comments in the Firefox bug tracker: https://bugzilla.mozilla.org/show_bug.cgi?id=1303091#c3, https://bugzilla.mozilla.org/show_bug.cgi?id=1303091#c9, https://bugzilla.mozilla.org/show_bug.cgi?id=1303091#c11

Unfortunately just using CLDR instead of IANA data can also lead to wrong canonicalizations, cf. https://unicode-org.atlassian.net/browse/ICU-12044 and http://unicode.org/cldr/trac/ticket/9892. In the CLDR bug, jungshik has also given an example where CLDR didn't update the mapping despite being outdated since 1993.

And the related (unresolved) bugs.ecmascript.org bug which also mentions the complications between selecting which tz links are safe to apply and which ones are more contentious: https://tc39.github.io/archives/bugzilla/1892/

And more related threads from the tzdata mailing list (mostly from 2013-2014 when many zones were moved to the backzone file): http://mm.icann.org/pipermail/tz/2014-July/021170.html, https://mm.icann.org/pipermail/tz/2013-September/019821.html, https://mm.icann.org/pipermail/tz/2014-November/021888.html.

@jungshik
Copy link
Author

'Europe/Ljubljana', 'Europe/Podgorica', 'Europe/Sarajevo', 'Europe/Skopje', and 'Europe/Zagreb' to 'Europe/Belgrade'

Thanks a lot for alerting me about those entries and references to TZ mailing list threads on the topic. A similar sentiment may exist about canonicalizing Asia/Phnom_Penh and Asia/Vientiane to Asia/Bangkok.

Unfortunately just using CLDR instead of IANA data can also lead to wrong canonicalizations,

Yup, you're right. I'm aware of the issue because CLDR sticks to pretty old IDs that had been deprecated well before CLDR project started. (Calcutta vs Kolkata, Saigon vs Ho_Chi_Minh, Katmandu vs Kathmadu and many others).

@yumaoka

@sffc sffc added c: datetime Component: dates, times, timezones s: help wanted Status: help wanted; needs proposal champion Small Smaller change solvable in a Pull Request labels Mar 19, 2019
@sffc sffc added this to the ES 2021 milestone Jun 5, 2020
@sffc sffc added the question label Jun 5, 2020
@ryzokuken
Copy link
Member

@sffc I'm unsure what needs to be done here. Could you tell me what the web reality is? IIUC, Firefox now uses ICU too, but did ICU ever end up taking this into account and start using the backzone file?

@sffc
Copy link
Contributor
sffc commented Jun 23, 2020

@anba What is Firefox doing these days? Is it still necessary to put in the exception to allow backzone to be used for time zone names?

@sffc
Copy link
Contributor
sffc commented Jun 23, 2020

Is there a snippet of code that can reproduce the Firefox/Chrome discrepancy? It appears that Asia/Chongqing and Asia/Shanghai are equivalent in modern times, but may have differed at some time in the past, perhaps before China decided to unify under one time zone.

I wrote the following code in my best attempt to reproduce the difference, but was unsuccessful in finding a difference:

new Date(1945, 0, 1).toLocaleString("en", { timeZone: "Asia/Chongqing", timeZoneName: "long" })
// "1/1/1945, 2:00:00 PM GMT+09:00"
new Date(1945, 0, 1).toLocaleString("en", { timeZone: "Asia/Shanghai", timeZoneName: "long" })
// "1/1/1945, 2:00:00 PM GMT+09:00"

@jungshik
Copy link
Author

IIUC, Firefox now uses ICU too, but did ICU ever end up taking this into account and start using the backzone file?

I think Firefox still uses a rather large override map (to take care of cases mentioned in this issue) on top of ICU. Firefox already used ICU when this issue was filed, btw. :-)

CLDR has a policy on the ID stability and it's a bit hard to change that, I'm afraid. Given this, I was thinking of what Firefox does in v8 to handle 'Saigon => Ho_Chi_Minh', 'Calcutta => Kolkata', etc, but held it off because I wanted it to be resolved at the CLDR so that v8 does not need a local override map [1]. My (dim) hope for a possible CLDR change was based on my 'findings' that turned out to be false. See below.

As for using 'backzone' (this issue), it's related but a bit different.

Unfortunately just using CLDR instead of IANA data can also lead to wrong canonicalizations, cf. https://unicode-org.atlassian.net/browse/ICU-12044 and http://unicode.org/cldr/trac/ticket/9892. In the CLDR bug, jungshik has also given an example where CLDR didn't update the mapping despite being outdated since 1993.

And, unfortunately, my claim turned out to be false. I thought 'Asia/Calcutta' had been changed to 'Asia/Kolkata' well before the CLDR project started. In https://unicode-org.atlassian.net/browse/CLDR-9892, @yumaoka dug up the historic IANA timezone files and found that as lately as 2008 (well after the CLDR project started) had 'Asia/Calcutta' instead of 'Asia/Kolkata'. He suspected that the same was true of 'Saigon vs Ho_Chi_Minh' and 'Katmandu vs Kathmandu'.

[1] To make things complicated, there's a possibility that the override map needs to be duplicated for Chrome OS, which was yet another reason I wanted it to be resolved in CLDR. An alternative of changing the ICU data locally for Chromium was not desirable, either because that'd make the TZ db update process more complicated (although it may not be that bad).

@jungshik
Copy link
Author
jungshik commented Jun 23, 2020

The repro step is as following:

new Intl.DateTimeFormat("en", {timeZone:"Asia/Chongqing"}).resolvedOptions().timeZone

"Asia/Chongqing" : Firefox
"Asia/Shanghai" : Chrome

@jungshik
Copy link
Author
jungshik commented Jun 23, 2020

Without underlying zoneinfo files supporting the historical difference between Asia/Chongqing and Asia/Shanghai, I think it's all but pointless to treat them as separate zones.

Below is what Firefox does with my computer timezone set to America/Los_Angeles. Note that Asia/Chongqing and Asia/Shanghai had different local mean time (they have different longitudes), but the result is the same. The same holds for Asia/Bangkok vs Asia/Phnom_Penh.

new Date(1850,0,1).getTimezoneOffset()
472.96666666666664.   # In 1850, LMT was used everywhere including America/Los_Angeles 
new Date(1850,0,1).toLocaleString("en")
"1/1/1850, 12:00:00 AM"
new Date(1850,0,1).toLocaleString("en", {timeZone: "UTC"})
"1/1/1850, 7:52:58 AM"


new Date(1850,0,1).toLocaleString("en", {timeZone: "Asia/Shanghai"})
"1/1/1850, 3:58:41 PM"
new Date(1850,0,1).toLocaleString("en", {timeZone: "Asia/Chongqing"})
"1/1/1850, 3:58:41 PM"

new Date(1850,0,1).toLocaleString("en", {timeZone: "Asia/Phnom_Penh"})
"1/1/1850, 2:35:02 PM"
new Date(1850,0,1).toLocaleString("en", {timeZone: "Asia/Bangkok"})
"1/1/1850, 2:35:02 PM"

@anba
Copy link
Contributor
anba commented Jun 23, 2020

There are multiple issues, some overlapping, which lead to differences between browsers when handling time zones:

  1. Let's start with accepted time zone strings, because any difference here may have side-effects later on.

    • SM: Uses ICU to validate time zone names, but has an extra mapping to reject non-IANA names (e.g. ICU legacy time zones like ACT or previous IANA names like Canada/East-Saskatchewan). Also disallows SystemV time zones, which are disabled by default in tzdata.
    • V8: Uses a simple parser to validate time zone names before passing them to ICU. The parser rejects legacy ICU time zones, but still allows Canada/East-Saskatchewan, even though that one is no longer valid per IANA (but still valid in CLDR!). Recently an extra mapping was added to handle more cases. The parser also rejects SystemV time zones, but it's not clear to me if that's intentional or just a happy coincidence.
    • JSC: Directly calls into ICU to validate time zone names. That leads to accepting legacy ICU names and SystemV time zones.
  2. Canonicalisation differences between IANA and CLDR for same time zones:

    • SM: Contains extra mappings to override ICU to make sure IANA mappings are applied.
    • V8/JSC: Directly return whatever ICU returns.
    • Examples:
      • "America/Buenos_Aires" links to "America/Argentina/Buenos_Aires" in IANA, but it's the other way around in CLDR.
      • "EST" is its own zone in IANA, but a link to "Etc/GMT+5" in CLDR.

Now let's go over to the backzone file. First, as a quick reminder, ICU doesn't contain any data for backzone time zones!

  1. backzone time zones which ICU claim to support, because they're CLDR time zones:
    • SM/V8/JSC: Accepted in all browsers, but giving the wrong results.
    • Example:
js> var date = new Date("1800-01-01T00:00:00Z")
js> var dtf = new Intl.DateTimeFormat("en", {timeZone:"Europe/Belgrade", hour:"2-digit", minute:"2-digit"})
js> dtf.format(date)                                                                                        
"1:22 AM"
js> dtf.resolvedOptions().timeZone
"Europe/Belgrade"
js> var dtf = new Intl.DateTimeFormat("en", {timeZone:"Europe/Sarajevo", hour:"2-digit", minute:"2-digit"}) 
js> dtf.format(date)                                                                                        
"1:22 AM"
js> dtf.resolvedOptions().timeZone                                                                          
"Europe/Sarajevo"

Rules for "Europe/Belgrade" and "Europe/Sarajevo"

# Zone	NAME		STDOFF	RULES	FORMAT	[UNTIL]
Zone	Europe/Belgrade	1:22:00	-	LMT	1884
Zone	Europe/Sarajevo	1:13:40	-	LMT	1884

CLDR lists "Europe/Sarajevo" as a time zone, not a link:

<type name="basjj" description="Sarajevo, Bosnia and Herzegovina" alias="Europe/Sarajevo"/>
  1. backzone zones which are links in CLDR.

    • SM: Reported as its own zone.
    • V8/JSC: Canonicalised to a different zone.
    • Example: "Asia/Chongqing".
      • SM: "Asia/Chongqing" reported as the canonical name, but returns the data for "Asia/Shanghai".
      • V8/JSC: Canonicalised to "Asia/Shanghai".
  2. backzone zones which are different links in CLDR.

    • SM: Links to the zone in backzone
    • V8/JSC: Links to zone outside of backzone.
    • Example: "Asia/Chungking".
      • SM: Canonicalised to "Asia/Chongqing", but returns the data for "Asia/Shanghai".
      • V8/JSC: Canonicalised to "Asia/Shanghai".

@anba
Copy link
Contributor
anba commented Jun 23, 2020

@anba What is Firefox doing these days? Is it still necessary to put in the exception to allow backzone to be used for time zone names?

We're basically still in the same position as when we've originally implemented these overrides for backzone. Objectively speaking, we're returning the wrong data for pre-1970 time stamps for backzone zones, but are reluctant to canonicalise to the zones whose data is used for the reasons outlined in #272 (comment).

@jungshik
Copy link
Author

@anba, thank you for the summary as well as the reminder about 'Z' in Date ctor that I forgot.

I also forgot what I wrote about
{Asia/Phnom_Penh and Asia/Vientiane} vs Asia/Bangkok . They have the same issue as Europe/Sarajevo and Europe/Belgrade. That is, the third issue in @anba's comment.

@jungshik
Copy link
Author

The parser also rejects SystemV time zones, but it's not clear to me if that's intentional or just a happy coincidence.

That's intentional omission. The V8 tzname parser rejects everything that is not explicitly handled. SystemV zone name handling is omitted on purpose because it's disallowed.

FYI, @FrankYFTang

@sffc
Copy link
Contributor
sffc commented Sep 18, 2023

@justingrant Thoughts on this issue?

@justingrant
Copy link
Contributor
justingrant commented Sep 20, 2023

AFAIK, the current plan is for CLDR and ICU to resolve the issues discussed in this thread:

  1. A new iana attribute has been added to https://github.com/unicode-org/cldr/blob/main/common/bcp47/timezone.xml, which (after https://unicode-org.atlassian.net/browse/ICU-22452 is implemented soon) will allow ICU clients to fetch the latest canonical ID for IDs like Asia/Calcutta (canonical ID is Asia/Kolkata) and Europe/Kiev (canonical ID is Europe/Kyiv).
  2. The CLDR data has been cleaned up so that the list of canonical IDs is similar to what's found by using backzone, and which should be very close (modulo a few corner cases and judgement calls) to the results you'd get when building TZDB with PACKRATDATA=backzone PACKRATLIST=zone.tab.
  3. There are a few IANA IDs that are missing from timezone.xml, like "EST" and "PST8PDT. These will be added to timezone.xml in a later PR.

Once this CLDR and ICU work is completed and released, we have a choice to make:

  • A) ECMA-402 defers to CLDR in the spec.
  • B) ECMA-402 specifies the specific criteria used to determine what's canonical in the IANA Time Zone Database, which happens to match what CLDR does.
  • C) Some combination of (A) and (B), where ECMA-402 explains the criteria used like (B), but let implementers know that by using CLDR they'll be following the spec.

A while ago I filed #825 to encourage a decision on the choice of (A), (B), or (C).

Unless there are objections, I think that this issue can be closed as a dupe of that one?

@sffc
Copy link
Contributor
sffc commented May 2, 2024

Does #877 fix this issue?

@sffc sffc added s: in progress Status: the issue has an active proposal and removed s: help wanted Status: help wanted; needs proposal champion labels May 2, 2024
@justingrant
Copy link
Contributor

Does #877 fix this issue?

Yes, it resolves the questions raised by this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: datetime Component: dates, times, timezones question s: in progress Status: the issue has an active proposal Small Smaller change solvable in a Pull Request
Projects
No open projects
Development

No branches or pull requests

5 participants