[go: up one dir, main page]

Showing posts with label Unicode. Show all posts
Showing posts with label Unicode. Show all posts

Friday, December 19, 2025

Opening CLDR Survey Tool early for DDL locales

[image]

We are announcing an early submission window for the CLDR Survey Tool, exclusively for Digitally Disadvantaged Languages (DDLs). These include languages across the world that lack full digital support, such as Qʼeqchiʼ with about 1.3M speakers, and many more.


The early submission window will allow more time for individuals and organizations that make DDL contributions, providing crucial data to close the digital support gap. The data will go into the CLDR v50 release, targeted at October 2026. Languages maintained by the CLDR Technical Committee are not available during this special window. They will be available for submission in Q2 2026.


See DDL: Help Center for more information on how to contribute to a DDL language.


If your language is not yet in CLDR, organizations can submit a formal request to add it; see adding a new language.


CLDR Organizations are needed for approval of CLDR data, so that it can be picked up by libraries, applications, programming languages, and operating systems. To register a new CLDR Organization, see adding an organization to CLDR. Individuals can also request languages and submit/approve data; however, the data cannot reach even Basic coverage without at least one CLDR Organization supporting it.



What is CLDR?


CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). All major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)


Contributors supply data for their languages via the online Survey Tool. This data is widely used to support much of the world’s software and is also a factor in determining which languages are supported on mobile phones and computer operating systems.


The Survey Tool opened on December 18, 2025 for DDL languages. The tool will remain open for data submission and correction until July 2026. A public alpha will make the draft data available in early August 2026. Data contributed at this time will be scheduled for publication and available for use in October 2026.


Each additional CLDR language starts with a small set of Core Data, such as a list of characters used in the language. Submitters of new languages commit to bringing the coverage up to a minimum of Basic coverage (very basic formats for dates, times, numbers, and endonyms). 


Once a language reaches Basic coverage, it will have the minimum support for use in language selection, such as on mobile devices. That is the first step; for broader support the Moderate level is typically required. 


If you would like to contribute missing data for your language, see Survey Tool Accounts.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Tuesday, September 10, 2024

Announcing The Unicode® Standard, Version 16.0

[image] Version 16.0 of the Unicode Standard is now available. This is a major version update that includes new characters and code charts, new data files and annexes, an updated core specification, and updated annexes and synchronized standards.

This version adds 5,185 new characters, including 3,995 additional Egyptian Hieroglyph characters plus seven new scripts, seven new emoji characters, and over 700 symbols from legacy computing environments, for a total of 154,998 characters. See the delta code charts for details on all the new scripts and characters. For additional details regarding new emoji, see Emoji Recently Added, v16.0.

In addition to new characters, new “Moji Jōhō Kiban” (文字情報基盤) Japanese source references have been added for over 36,000 CJK unified ideographs. This is reflected in the code charts for virtually all CJK unified ideograph blocks by additional representative glyphs in the “J” column.

The core specification for Version 16.0 is now available for browsing online as per-chapter web pages with “breadcrumb” and other links for easy navigation.

Two new annexes have been added to this version:
  • UAX #53, Unicode Arabic Mark Rendering: This annex, which was previously published as a Technical Report, specifies an algorithm for handling combining marks when rendering to ensure correct and consistent display of Arabic script text.
  • UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet): This annex documents the format of the Unikemet.txt data file, which provides information clarifying the identity of Egyptian Hieroglyph characters and properties useful for implementations.
For complete details on Unicode Version 16.0, see https://www.unicode.org/versions/Unicode16.0.0/.


Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock


As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

Monday, May 20, 2024

Unicode CLDR Version 46 Submission Open

[image] The Unicode CLDR Survey Tool is open for submission for version 46. CLDR provides key building blocks for software to support the world’s languages (dates, times, numbers, sort-order, etc.) All major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

Version 46 is focusing on:
  • Unicode 16 additions: new emoji, script names, collation data (Chinese & Japanese), …
  • Emoji search keywords: Expanding keyword coverage to make it easier for users to find the right emoji
  • New Languages targeting Basic:
    • Ewe (ee),
    • Ga (gaa)
    • Kinyarwanda (rw)
    • Northern Sotho (nso)
    • Oromo (om),
    • Sesotho (st)
    • Setswana (tn),
  • Up-leveling: Akan (ak)
Submission of new data opened recently, and is slated to finish on June 11. The new data then enters a vetting phase, where contributors work out which of the supplied data for each field is best. That vetting phase is slated to finish on July 1. A public alpha makes the draft data available around August 28, and the final release targets October 16.

Each new locale starts with a small set of Core data, such as a list of characters used in the language. Submitters of those locales need to bring the coverage up to Basic level (very basic basic dates, times, numbers, and endonyms) during the next submission cycle.

Once a language reaches Basic coverage, it has the minimum support for use in language selection, such as on mobile devices. In the next submission cycle, the name for that language is also added for translation for all languages at Modern coverage.

If you would like to contribute missing data for your language, see Survey Tool Accounts. For more information on contributing to CLDR, see the CLDR Information Hub.


Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Friday, March 8, 2024

Breaking the Cycle 🔗💥

by Jennifer Daniel

(This article was originally published on Jennifer’s Substack, January 17, 2023. Republished here with minor revision.)

Phoenix image
In the fall of 2022, the Unicode Technical Committee announced that the 2023 release of the Unicode Standard would be a “dot” release with limited character additions, with the next major release in 2024. This wasn’t without precedent — COVID slowed down the release of Unicode 14.0 in 2020 and the world seemed to survive 😉. Subcommittees were well prepared and adjusted accordingly, discussing what this meant for their respective areas of expertise.

For the Emoji Subcommittee (ESC) — the group responsible for defining the rules, algorithms, and properties necessary to achieve interoperability between different platforms for those smiley faces that appear on your keyboard (Shout out 😁🥰🥹🤔🫣🫡😵‍💫!) — this delay presented an opportunity. Sure, we were so close to exhaling a sigh of relief (the intake period for Emoji 16.0 proposals had just completed). But upon learning we couldn’t ship any new codepoints until 2024 we turned our energy towards recommending new emoji based on existing ones. (These are called emoji ZWJ sequences. That's when a combination of multiple emoji display as a single emoji … like 👩 🏽 +🏭 = 🧑🏽‍🏭).

When Less is More

An incredibly powerful aspect of written language is that it consists of a finite number of characters that can "do it all". And yet, as the emoji ecosystem has matured over time our keyboards have ballooned and emoji categories are about to hit or have hit a level of saturation. Upon reflecting on how emoji are used, the ESC has entered a new era where the primary way for emoji to move forward is not merely to add more of them to the Unicode Standard. Instead, the ESC approves fewer and fewer emoji proposals every year.

But our work is not done. Not by a longshot. Language is fluid and doesn’t stand still. There is more to do! This “off-cycle” gives us a chance to address some long-standing major pain points using emoji. The first one that came to mind: skin-tone.

What is a family?

The encoding of multi-person multi-tone support has matured over the years; However, the implementation can seem random to the average person: While it’s true, all people emoji have toned options (with the exception of characters where you can’t see skin like 🤺) there are … misfits. Some two people emoji offer tone support ( 🧑🏻‍❤️‍🧑🏿) others do not ( 👯). A few non RGI emoji render with tone but with no affordance to change one of the two characters (For example, 🤼🏾‍♂ renders with skintone on Android but as gold on iOS. WHY. This is why we standardize these things, people).

And then ... There is the suite of family emoji (👨‍👦👨‍👦‍👦👨‍👧👨‍👧‍👦👨‍👧‍👧👩‍👦👩‍👦‍👦👩‍👧👩‍👧‍👦👩‍👧‍👧 👨‍👨‍👦👨‍👨‍👦‍👦👨‍👨‍👧👨‍👨‍👧‍👦👨‍👨‍👧‍👧👩‍👩‍👦👩‍👩‍👦‍👦👩‍👩‍👧👩‍👩‍👧‍👦👩‍👩‍👧‍👧👨‍👩‍👦👨‍👩‍👦‍👦👨‍👩‍👧👨‍👩‍👧‍👦👨‍👩‍👧‍👧👪). These characters include two people, three people, sometimes four and none of them have any tone support (!). We seem to have a lot of family emoji and yet simultaneously not enough.

The 26 “family” emoji can be broken down into four groups:

Families image

Despite the Unicode Standard containing 26 “family” emoji, each one of these glyphs is overly prescriptive with regard to delivering on a visual representation of a family. The inclusion of many permutations of families was well intentioned. But we can’t list them all, and by listing some of the combinations, it calls attention to the ones that are excluded.

What even is a family? For some, family is the people you were raised with. Others have embraced friends as their chosen family. Some families have children, other families have pets. There are multi-generational families, mutliracial families and of course many families are any combination of all of these characteristics and more.

Fortunately, we don’t need to add 7000 variants to your keyboards (even this would fall short of capturing the breadth of "family" as a concept). Instead we can juxtapose individual emoji together to capture a concept with some reasonable level of specificity — not too unlike arranging letters together to create words to convey concepts 😉

Different families image
For emoji keyboards to advance in creating more intuitive and personalized experiences the Emoji Subcommittee is recommending a visual deprecation of the family emoji. This small set of emoji will be redesigned as part of a multi-phase effort to “complete the set” of toned variants for the remaining multi-person emoji. This of course begs the question: when there are as many families as there are people in the world, is there an effective way at conveying the concept of “family” without being overly prescriptive in defining what is and is not a family? Well, thankfully icons can do a lot of heavy lifting without requiring very much detail.

Famiy, symbol image

When is an emoji running for the police or getting chased by them?

Another area the ESC is actively exploring is how the semantics of emoji sequences can differ when writing directionality changes. Some emoji characters have semantics that encode implicit directionality but when the string is mirrored and their meaning may be unintentionally lost or changed.

Left to Right emoji image
Left to Right Emoji Sequence: Quickly running towards an “exciting” police chase

Right to Left emoji image
Right to Left Emoji Sequence: Running away from the coppers

What, if anything, can we do to aid in ensuring that messages are meaningfully translated be them tiny pictures or tiny letters? As part of 15.1 we’re proposing a small set of emoji with strong directionality — with an initial focus on people — to face the opposite direction. Soon you too can run towards or away from ... excitement.

Emoji 15.1

Given that the intake cycle of emoji proposals for Unicode 16.0 ended last July, the Emoji Subcommittee has also decided to temporarily delay the intake of Unicode Version 17.0 proposals until April 2024. Fortunately, you won’t have to wait until then to get new emoji. (Note: I know it sounds like I’m talking about the past and future simultaneously ... the emoji lifecycle is looooong and as a result overlaps with multiple releases. Expect a future blog post about the Emoji 15.0 candidates landing early this year (Shout out goose, pink heart, and pushing hands). I’ve been holding off writing about this set until you can actually see them on your phones but given that we’re already talking about 2024 maybe it’s time I dust that blog post off).

Emoji 2023 timeline image

Anyways, among the list of Emoji 15.1 recommendations for 2024 includes 578 characters (most of them the candidates described above to support directionality). The list also includes a few humble additions including a broken chain, a lime, a non-poisonous mushroom, a nodding and shaking face, and a phoenix bird. Each one of these leverages a unique valid ZWJ sequence of emoji so while they look like atomic characters made of a single codepoint they are composed of two or more codepoints.

Broken chain and other emoji image

Broken chain is the result of a 🔗💥 ZWJ and contains a variety of meanings, such as freedom, breaking a cycle, or perhaps a broken url ;-). Nodding face and shaking face are composed of arrows to imply movement in a still image (🙂↔️) and (🙂↕️). Oh, and of course there is a phoenix rising from the ashes (🐦🔥), an ancient metaphor that captures the zeitgeist of today.

The Unicode Technical Committee (UTC) will review the required documents at its first meeting of 2023 in January – and if these candidates move forward, you can expect an update from the UTC later this Spring and Summer.


Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Wednesday, November 15, 2023

Looking to give back differently for this #GivingTuesday?

[image of 3 badges]
Adopt a Character or Emoji to give it the attention it deserves!

Now you can adopt a character and show off your hobby or business, favorite sport, or love. For that special someone who seems to have everything, you can also give a unique gift.

Allergies? 🤧 Traveling? ✈️ No worries, the cat emoji 😺 has no fur and requires no feeding! The dog emoji 🐶? No need to go out for a 3 am walk! Looking to be a Scrabble champ? The strong and fast letter Z is right for you!

​Your good friend is studying to be a doctor. How about the stethoscope emoji as a gift? 🩺Or even an emoji to support your favorite college football team this season! 🏈

With nearly 150,000 characters there's something for everyone! The possibilities are endless! It's also a tax-deductible donation in the United States, to the extent allowed by law. Your company may also provide matching funds.

☯🏏 🏈 ⚽ 🔥🎁💍爱戀🥳 🙌 🎂💗💟₨ ₪ € ₭ ₱🥰 😍♕Ωπ

About Adopt-a-Character

The Adopt-a-Character program was launched in 2015 to support Unicode's mission to ensure everyone can communicate in their own languages. Adopt-a-Character funds have supported work on historic scripts, including Old Uyghur, Old Sogdian, Sogdian, Seal Script (China), and Mayan Hieroglyphs, and Egyptian Hieroglyphs. Additional support has been provided to encode the modern scripts Hanifi Rohingya, Tolong Siki, and Sunuwar, among others.

Characters can be adopted at three levels:

Gold - $5,000
For any particular character there can only be one Gold adoption! Be the only!

Silver - $1,000
For any particular character there can only be five Silver adoptions! Be one of the five to adopt your favorite characters as a Silver adopter!

Bronze - $100
For any character, there are an unlimited number of Bronze-level adoptions! Also a wonderful option!

Each adoption is recognized with a digital badge that you (or your recipient!) can proudly share via your social channels and via websites. Adoptions also come with a digital certificate that you can print to display or email to your giftee!

About the Unicode Consortium

The Unicode Consortium is the premier 501(c)3 non-profit, open source, open standards body for the Internationalization of software and services. It is arguably the most widely deployed software in the world available across 20 billion devices and counting! At its core, Unicode enables people around the world to communicate in any language.

And - if you want to simply make a donation to support Unicode’s work, you can do that, too!

This Giving Tuesday, let's come together to continue to celebrate and preserve linguistic diversity. Adopt a character and make a difference!

Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Wednesday, November 1, 2023

What do a leafless tree, a fingerprint, and a harp have in common?

This is not a set up to a riddle. This is Emoji 16.0.

By Jennifer Daniel, Chair of the ESC


This week, the Unicode Technical Committee gathered for our last meeting of 2023 to discuss the encoding, data files, and list of characters related to digitizing the world’s languages. Amongst the topics discussed were emoji and as a result seven new characters are on their way for inclusion into the Unicode Standard, into your keyboards, and into your hearts ;-)

emoji table image
The final recommendations culminated in seven emoji: one emoji per major category.

An incredibly powerful aspect of written language is that it consists of a finite number of characters that can “do it all”. And yet, as the emoji ecosystem has matured over time our keyboards have ballooned and emoji categories are about to hit — or have hit — a level of saturation. Upon reflecting on how emoji are used, the Unicode Emoji Subcommittee (ESC) has entered a new era where the primary way for emoji to move forward is not merely to add more of them to the Unicode Standard, but to consider how the ones added provide the most linguistic flexibility. As a result, the ESC approves fewer and fewer emoji proposals every year.

The few that are added this year have demonstrated their adaptability in different contexts — take for example, fingerprint. It is commonly used to represent multiple concepts. Fingerprints are a symbol of identity (unique as you), security (as a passkey), and forensics (what crime show logo is complete without a fingerprint?). While we think of fingerprints as a relatively modern phenomenon according to Forensics Digest, the earliest use of fingerprints dates back to 1000 B.C.

In fact all of this year’s emoji candidates have deep roots in history. Harps have been known since antiquity in Asia, Africa, and Europe, dating back at least as early as 3000 BCE. Today it has political, sporting, corporate, and religious symbolism 👼 Leafless trees have been around as long as ... well, trees (and poetry!) I suppose. Leafless trees literally represent droughts or winter and metaphorically indicate a state of barrenness and death.

Shovel isn’t just another noun — sure, yes, it’s a tool commonly found in your shed — in our keyboards, however, it’s also a verb. Digging yourself out of a hole, digging yourself into a hole, shoveling 💩, it does it all. But wait, there’s more. Splatter is one of those stealth emoji that when you look at you might be thinking, “really, another sex emoji?” (To be honest, show me someone who doesn’t think an emoji is a sex emoji and I’ll show you someone who lacks imagination). Splatter is a spill. Splatter is expressive. Splatter is soft —  a perfect counterpoint to collision 💥 — the bouba to 💥’s kiki.

When can you get these new emoji?

A simple question that deserves a simple answer. Alas, you’re dealing with Unicode so the answer is complex. Did you know it can take up to two years to encode an emoji? It’s true. If we want the symbols we digitize to truly “just work” across the entirety of not just the Internet but all digital surfaces … it takes time. So, don’t expect to see these characters anytime soon. In fact, despite the previous batch of emoji (phoenix, lime, broken chain, etc.) getting approved last year they still haven’t landed on your device of choice yet but are well on their way to pop up in the first half of 2024.

emoji at a glance
Emoji 16.0 has a long road ahead and will appear on most devices in May-June 2025.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Tuesday, September 12, 2023

Announcing The Unicode® Standard, Version 15.1


Version 15.1 of the Unicode Standard is now available. This minor version update includes updated code charts, data files and annexes. The core specification is unchanged from Unicode Version 15.0.

This version adds 627 characters, bringing the total number of characters to 149,813. The additions include 622 CJK unified ideographs in a new block, CJK Unified Ideographs Extension I. These new ideographs are urgently needed in China for use in public service databases, and are expected to be included in a forthcoming amendment to China’s GB 18030-2022 standard. The other new characters are five ideographic description characters that enhance the ability to describe rare or not-yet-encoded CJK ideographs.

There are six completely new emoji, such as for phoenix and lime and (finally) an edible mushroom. For 108 people emoji, you can now switch the direction that they are facing (for example, person walking facing right versus facing left).

Security-related updates have been made to UAX #9, Unicode Bidirectional Algorithm and UAX #31, Unicode Identifiers and Syntax along with updates to UTS #39, Unicode Security Mechanisms. These updates complement the release of a new Unicode Technical Standard, UTS #55, Unicode Source Code Handling.

The new characters are limited to three blocks, and the code charts for several other blocks have changed. The most significant change to charts is for the CJK Unified Ideographs, CJK Unified Ideographs Extension A and CJK Unified Ideographs Extension B blocks with the addition of representative glyphs and source references for over 24,000 KP-source (North Korea) ideographs. There are also many other glyph corrections and improvements—see the 15.1 delta code charts for details.

Significant updates have been made to UAX #14, Unicode Line Breaking Algorithm and UAX #29, Unicode Text Segmentation adding better support for scripts of South and Southeast Asia, including grapheme cluster support for aksaras and consonant conjuncts, and line breaking at orthographic syllable boundaries.

For complete details on Unicode Version 15.1, see https://www.unicode.org/versions/Unicode15.1.0/.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Tuesday, May 23, 2023

Unicode 15.1 Beta Review Open

[image] The beta review period for Unicode 15.1 has started, and is open until July 4, 2023. The beta is intended primarily for review of character property data and changes to algorithm specifications (Unicode Standard Annexes).

Normally at this phase of a release, the character repertoire is considered stable and very unlikely to change. Also, the plan for Unicode 15.1 had been for a minor release with only a very limited set of new characters.

Recent developments have led to a tentative change in those plans, however.

China has a very urgent need for encoding of certain CJK ideographs used in public services databases. To accommodate this urgent need, the Unicode Technical Committee (UTC) decided at its April 2023 meeting to encode 603 new characters in Unicode 15.1 as CJK Unified Ideographs Extension I. This new block is included in the delta charts for the Unicode 15.1 beta. However, inclusion of these characters in Unicode 15.1 is contingent on support for this addition from China, and on support for this addition in the corresponding ISO/IEC 10646 standard from ISO/IEC JTC 1/SC 2 at their upcoming meeting in June. While support for the new block is anticipated, there is a small chance that minor changes to this repertoire will be made after the beta, or that UTC will pull this block entirely from the 15.1 release.

Several of the Unicode Standard Annexes have significant modifications and associated data changes for version 15.1. For example, UAX #14, Unicode Line Breaking Algorithm has significant enhancements to support line breaking at orthographic syllable boundaries in several South and Southeast Asian scripts. Also, in conjunction with the parallel development of a new standard, UTS #55, Unicode Source Code Handling (see Public Review Issue #474), there are significant revisions to UAX #31, Unicode Identifiers and Syntax that will provide better specifications and guidance related to security, and also improved guidance for applications that define identifier systems using Unicode.

While draft content for the beta has been published as of May 23rd, the work groups preparing updates to the content could continue to make changes to data or specs during the Beta review period. Any substantive changes for the beta will be frozen by June 5th.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by July 4, 2023. The review period will only be for six weeks, so prompt feedback is appreciated. Feedback instructions are on the beta page.

See https://www.unicode.org/versions/beta-15.1.0.html for more information about testing and providing feedback on the 15.1.0 beta.

See https://www.unicode.org/versions/Unicode15.1.0/ for the current draft summary of Unicode Version 15.1.0.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Monday, November 14, 2022

The Unicode® Standard – 2023 Release Planning

By Peter Constable, Chair of the Unicode Technical Committee

[image] At the Q4 Unicode Technical Committee (UTC) meeting held from November 1-3, our member representatives unanimously agreed to a release plan for 2023 and tentative plan for 2024. Along with some tooling updates, our plans aim to ensure that we are more agile to meet the evolving internationalization landscape and better able to meet the needs of Unicode members and other consumers of the Standard.

More information can be found in the Release Management Group’s Recommendations for 2023-2024.

BACKGROUND

For several years now, the UTC has worked on an annual cycle for new versions of The Unicode Standard and related specifications. New versions used to be released in March of each year, but in 2021, due to COVID-19, the release was delayed until September. 

MOVING FORWARD

Going forward, our plan is to continue with a new release each year in September. That annual, predictable cycle works well for Unicode's other major projects—CLDR and ICU—and helps implementers in their planning. 

In 2023, we will keep up that cadence with a September release, but we also need to take some time to evaluate and update our processes for developing each new version of the Standard.

Therefore, the 2023 release will be a “dot” release: Unicode 15.1. It will include important updates to Unicode Standard Annexes and to the Unicode Character Database, and have a limited set of new characters — but new scripts and most other character additions will be held until the 2024 release. A major new area is the planned release of a Unicode Technical Standard for avoiding source-code spoofing, along with associated changes in other specifications.

Regarding emoji, if there are any new emoji in the 15.1 release, they would leverage existing code points, as was done for the 13.1 release, rather than the addition of entirely new characters.

2024 AND BEYOND

For 2024, we anticipate returning to our regular cadence, with a major release in September 2024. Unicode 16.0 will include additional new scripts, emoji and other characters, as well as other updates.



Learn more about how you can support the Unicode Consortium and our mission, including information on our Adopt-a-Character program, here!
[badge]

Tuesday, September 13, 2022

Announcing The Unicode® Standard, Version 15.0

[Nag Mundari image] Version 15.0 of the Unicode Standard is now available, including the core specification, annexes, and data files. This version adds 4,489 characters, bringing the total to 149,186 characters. These additions include two new scripts, for a total of 161 scripts, along with 20 new emoji characters, and 4,193 CJK (Chinese, Japanese, and Korean) ideographs. The new scripts and characters in Version 15.0 add support for modern language groups including:
  • Nag Mundari, a modern script used to write Mundari, a language spoken in India
  • A Kannada character used to write Konkani, Awadhi, and Havyaka Kannada in India
  • Kaktovik numerals, devised by speakers of Iñupiaq in Kaktovik, Alaska for the counting systems of the Inuit and Yupik languages
Among the popular symbol additions are 20 new emoji, including hair pick, maracas, jellyfish, khanda, and pink heart. For the full list of new emoji characters, see emoji additions for Unicode 15.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.

[Image credit Noto Emoji]

Other symbol and notational additions include:
Support for other languages and scholarly work includes:
  • Kawi, a historical script found in Southeast Asia, used to write Old Javanese and other languages
  • Three additional characters for the Arabic script to support Quranic marks used in Turkey
  • Three Khojki characters found in handwritten and printed documents
  • Ten Devanagari characters used to represent auspicious signs found in inscriptions and manuscripts
  • Six Latin letters used in Malayalam transliteration
  • Sixty-three Cyrillic modifier letters used in phonetic transcription
Important chart font updates include:
  • A set of updated glyphs for Egyptian hieroglyphs, in addition to standardized variation sequences to support rotated glyphs found in texts
  • Improved glyphs for Unified Canadian Aboriginal Syllabics, which provide better support for Carrier and other languages
  • A new Wancho font, with improved and simplified shapes
Updates to the CJK blocks add:
  • 4,192 ideographs in the new CJK Unified Ideographs Extension H block
  • One ideograph in the CJK Unified Ideographs Extension C block
Unicode properties and specifications determine the behavior of text on computers and phones. The following six Unicode Standard Annexes and Technical Standards have noteworthy updates for Version 15.0:
  • UAX #9, Unicode Bidirectional Algorithm, amends the note in UAX9-C2 to emphasize the use of higher-level protocols to mitigate potential source code spoofing attacks.
  • UAX #31, Unicode Identifier and Pattern Syntax, provides more guidance on profiles for default identifiers, clarifies the use of default ignorable code points in identifiers, and discusses the relationship between Pattern_White_Space and bidirectional ordering issues in programming languages.
  • UAX #38, Unicode Han Database, adds the kAlternateTotalStrokes property. The kCihaiT property’s category was changed to Dictionary Indices, the kKangXi property was expanded, and Sections 3.0, 3.10, and 4.5 were added.
  • UTS #39, Unicode Security Mechanisms, changes the zero width joiner (ZWJ) and zero width non-joiner (ZWNJ) characters from Identifier_Status=Allowed to Identifier_Status=Restricted; they are therefore no longer allowed by the General Security Profile by default.
  • UAX #45, U-Source Ideographs, has records for new ideographs in its data file, “ExtH” was added as a new status, the status identifiers for the existing CJK Unified Ideographs blocks were improved, and Section 2.5 was added.
  • UTS #46, Unicode IDNA Compatibility Processing, clarified the edge case of the empty label in ToASCII and added documentation regarding the new IDNA derived property data files.

About the Unicode Standard

The Unicode Standard provides the basis for processing, storage and seamless data interchange of text data in any language in all modern software and information technology protocols. It provides a uniform, universal architecture and encoding for all languages of the world, with over 140,000 characters currently encoded.

Unicode is required by modern standards such as XML, Java, C#, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646. It is a fundamental component of all modern software.

For additional information on the Unicode Standard, please visit https://home.unicode.org/.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. For a complete member list go to https://home.unicode.org/membership/members/.
For more information, please contact the Unicode Consortium https://home.unicode.org/connect/contact-unicode/.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, June 8, 2022

Unicode CLDR Version 42 Submission Open

[ballot box image] The Unicode CLDR Survey Tool is open for submission for version 42. CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

Version 42 is focusing on:
  • Additional Coverage
    • Unicode 15.0 additions: new emoji, script names, collation data (Chinese & Japanese), …
    • New Languages: Adding Haryanvi, Bhojpuri, Rajasthani at a Basic level.
    • Up-leveling: Xhosa, Hinglish (Hindi-Latin), Nigerian Pidgin, Hausa, Igbo, Yoruba, and Norwegian Nynorsk.
  • Person Name Formatting: for handling the wide variety in the way that people’s names work in different languages.
    • People may have a different number of names, depending on their culture--they might have only one name (“Zendaya”), two (“Albert Einstein”), or three or more.
    • People may have multiple words in a particular name field, eg “Mary Beth” as a given name, or “van Berg” as a surname.
    • Some languages, such as Spanish, have two surnames (where each can be composed of multiple words).
    • The ordering of name fields can be different across languages, as well as the spacing (or lack thereof) and punctuation.
    • Name formatting need to be adapted to different circumstances, such as a need to be presented shorter or longer; formal or informal context; or when talking about someone, or talking to someone, or as a monogram (JFK).
Submission of new data opened recently, and is slated to finish on June 22. The new data then enters a vetting phase, where contributors work out which of the supplied data for each field is best. That vetting phase is slated to finish on July 6. A public alpha makes the draft data available around August 17, and the final release targets October 19.

Each new locale starts with a small set of Core data, such as a list of characters used in the language. Submitters of those locales need to bring the coverage up to Basic level (very basic basic dates, times, numbers, and endonyms) during the next submission cycle. In version 41, the following levels were reached:

Level Languages Locales* Notes
Modern 89 361 Suitable for full UI internationalization
Afrikaans‎, ‎… Čeština‎, ‎… Dansk‎, ‎… Eesti‎, ‎… Filipino‎, ‎… Gaeilge‎, ‎… Hrvatski‎, ‎Indonesia‎, ‎… Jawa‎, ‎Kiswahili‎, ‎Latviešu‎, ‎… Magyar‎, ‎…Nederlands‎, ‎… O‘zbek‎, ‎Polski‎, ‎… Română‎, ‎Slovenčina‎, ‎… Tiếng Việt‎, ‎… Ελληνικά‎, ‎Беларуская‎, ‎… ‎ᏣᎳᎩ‎, ‎ Ქართული‎, ‎Հայերեն‎, ‎עברית‎, ‎اردو‎, … አማርኛ‎, ‎नेपाली‎, … ‎অসমীয়া‎, ‎বাংলা‎, ‎ਪੰਜਾਬੀ‎, ‎ગુજરાતી‎, ‎ଓଡ଼ିଆ‎, ‎தமிழ்‎, ‎తెలుగు‎, ‎ಕನ್ನಡ‎, ‎മലയാളം‎, ‎සිංහල‎, ‎ไทย‎, ‎ລາວ‎, ‎မြန်မာ‎, ‎ខ្មែរ‎, ‎한국어‎, ‎… 日本語‎, ‎…
Moderate 13 32 Suitable for full “document content” internationalization, such as formats in a spreadsheet.
Binisaya, … ‎Èdè Yorùbá, ‎Føroyskt, ‎Igbo, ‎IsiZulu, ‎Kanhgág, ‎Nheẽgatu, ‎Runasimi, ‎Sardu, ‎Shqip, ‎سنڌي, …
Basic 22 21 Suitable for locale selection, such as choice of language in mobile phone settings.
Asturianu, ‎Basa Sunda, ‎Interlingua, ‎Kabuverdianu, ‎Lea Fakatonga, ‎Rumantsch, ‎Te reo Māori, ‎Wolof, ‎Босански (Ћирилица), ‎Татар, ‎Тоҷикӣ, ‎Ўзбекча (Кирил), ‎کٲشُر, ‎कॉशुर (देवनागरी), ‎…, ‎মৈতৈলোন্, ‎ᱥᱟᱱᱛᱟᱲᱤ, ‎粤语 (简体)‎
* Locales are variants for different countries or scripts.

If you would like to contribute missing data for your language, see Survey Tool Accounts. For more information on contributing to CLDR, see the CLDR Information Hub.



Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, May 4, 2022

Out of this World: New Astronomy Symbols Approved for the Unicode Standard

Five Trans-Neptunian Objects to Join Character Set

By Deborah Anderson, Chair of Unicode Script Ad Hoc Committee

In January 2022, the Unicode Technical Committee approved five new symbols to be published in Unicode 15.0. With the projected release date of September 2022, these symbols are based on newly discovered trans-Neptunian objects (TNOs) in the Solar System. They resulted from research efforts such as those led by astronomer and professor Dr. Michael Brown at the California Institute of Technology (CalTech).

These five objects orbit the Sun at a distance far larger than the major planets. They are currently believed to be large enough to be round, planetary worlds, in a category of objects called “dwarf planets” that also includes Ceres, Pluto, Eris and probably Sedna. The most famous trans-Neptunian object is Pluto, which historically had been considered to be the ninth planet from the Sun, but was reclassified as a dwarf planet in 2006 by the International Astronomical Union (IAU).[1]

[Pluto image]

How did this happen?

Individuals or organizations who want to propose new characters have to check existing characters to avoid duplicates, find out if there are equivalent forms already in existence, and most critically, determine the need for a digital interchange of them, such as symbols that have been encoded for use by NASA and other agencies. The proposal authors then must submit a proposal that articulates how their request meets the criteria.

Once a proposal is submitted, the Unicode Technical Committee determines whether to review the proposal and accept or decline it. This process can take a couple of years or more. In the case of these five characters, the proposers demonstrated the need, clearing the path for approval. 

Tell me more about these new characters. What are their names?

The International Astronomical Union (IAU) has standard conventions for naming objects both within and outside of the solar system. Objects orbiting the Sun outside the orbit of Neptune are named after mythological figures, particularly those associated with creation. But the subset that orbit in a two-to-three resonance with Neptune — the so-called “plutinos”, such as Pluto and Orcus — are named after figures associated with the underworld. In this case, the five TNOs, ordered by distance from the sun, are named:
  • Orcus: the Etruscan and Roman god of the underworld.
  • Haumea: the Hawaiian goddess of fertility; the telescope used to discover this object is located on Hawaiʻi.
  • Quaoar: an important mythological figure of the Tongva, the indigenous people who originally occupied the land where CalTech is located.
  • Makemake: the creator god of the Rapanui of Easter Island.
  • Gonggong: a destructive Chinese water god.
What information is there on the actual symbols that will be available?

All five symbols were designed by Denis Moskowitz, a software engineer in Massachusetts who had previously designed the Unicode symbol for Sedna. He drew inspiration from existing symbols and the “native name or culture” of the objects’ namesakes [2] to create the characters.

[TNO glyphs image]

Denis explains his inspiration for each symbol below:
  • Orcus: The symbol for Orcus is a combination of the Latin letters “O” and “R”, stylized to resemble a skull and an orca’s grin.
  • Haumea: The symbol created for Haumea was a combination and simplification of Hawaiian petroglyphs for “childbirth” and “woman”.
  • Quaoar: The symbol is the Latin letter “Q” with the tail fashioned into the shape of a canoe. The angular shape is intended to reflect Tongva rock art.
  • Makemake: The Makemake symbol is a traditional petroglyph of the face of the creator god Makemake, stylized to suggest an “M”. The design was a collaboration with John T. Whelan.
  • Gonggong: Gonggong’s symbol was based on the first Chinese character in the god’s name, 共 gòng, with a snaky tail replacing the lower section.
What else should we know?

The five symbols supplement a set of other characters for planetary objects that were published in 2018 (Unicode 11.0) and earlier. Two of the newly approved characters appear in a NASA poster. Other people have used the symbols in various media, including tattoos and art. Ultimately, these five new characters will join the 149,180 other characters in the Unicode Standard Version 15.0 and be accessible to anyone, anywhere in the world, who is using a computer or mobile device.

Where can I learn more?
Acknowledgments

Special thanks to Sarah Rivera and Kirk Miller for their contributions to this blog.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Monday, March 28, 2022

The Past and Future of Flag Emoji

Emoji Flags are dead, long live Emoji Flags 🏁 🏁 🏁 By Jennifer Daniel, Unicode Emoji Subcommittee Chair

With Emoji 16.0 submissions open from April 4, 2022 through July 31, 2022, the Unicode Emoji Subcommittee members stand with open arms for your future hair pick, khanda, and pink heart emoji proposals (BTW, if you were planning to prepare proposals for those concepts, we have some good news for you: they are already Emoij 15.0 draft candidates!).

That being said, there is one particular type of emoji for which the Unicode Consortium will no longer accept proposals. Flag emoji of any category.

Flag emoji have always been subject to special criteria due to their open-ended nature, infrequent use, and burden on implementations. Today nine out of ten are in the top twenty most frequently shared flags. (The only outlier is Russia.) The addition of other flags and thousands of valid sequences into the Unicode Standard has not resulted in wider adoption. They don’t stand still, are constantly evolving, and due to the open-ended nature of flags, the addition of one creates exclusivity at the expense of others.

Why do flag emoji exist in the first place?

Well, the shorter, more technical answer is: The country flags use a generative mechanism, and were encoded early on for compatibility reasons.

The longer answer requires a flashback to the 1990’s. KDDI and SoftBank — two Japanese mobile phone carriers — had early emoji sets which included 10 country flags: 🇨🇳 🇩🇪 🇪🇸 🇫🇷 🇬🇧 🇮🇹 🇯🇵 🇰🇷 🇷🇺 🇺🇸¹. A possibly apocryphal explanation is that they were used to denote what to grab for dinner: "American 🇺🇸 or Italian 🇮🇹?" (Such an innocent time in emoji history, pre-hamburger 🍔 emoji). Alas, as Unicode stepped in to create meaningful interoperability between these carrier-specific encodings, they were presented with a problem: why should these 10 countries have flag emoji when others do not?

The original emoji set included ten flags (shown above).
¹ Interestingly, Windows has never supported flag emoji 🔮. So, if you are reading this on a Windows device and flags aren't displaying, simply refer to the image above of the ten original flag emoji.

Various ideas were considered. The Unicode Consortium isn’t in the business of determining what is a country and what isn’t. That’s when the Consortium chose ISO 3166-1 alpha 2 as the source for valid country designations. ISO 3166 is a widely-accepted standard, and this particular mechanism represents each country with 2 letters, such as “US” (For United States), “FR” (France), or “CN” (China).

It wasn’t a perfect solution, but by allowing the 10 flag emoji — and the rest of the country flags — to be accurately interchanged between DoCoMo, KDDI, SoftBank, Google, and Apple, and others, it worked just fine.

Why this flag emoji but not that one?

Today, the largest emoji category is flags (Out of only ~3600 emoji, there are over 200 flags!). But, did you know that there are over 5,000 geographically-recognized regions that are also “valid”? These are known as subdivision regions and are based on ISO 3166-2. (These include states in the US, regions in Italy, provinces in Argentina, and so on.)

First, what does “valid” mean to the Unicode Standard? Well, think of it this way. Today, anyone could make a font of 5,000 emoji flags using these sequences. They are valid sequences. They are legit sequences. They won’t break. Any platform, application, or font can implement them. The significant difference here is that valid doesn’t mean they are recommended for implementation.

Back to ISO. ISO groups countries in a more formal way than say FIFA or The Olympics. For example, the four regions of the UK are regularly used in sport but not recognized in ISO 3166-1. In 2016, the Unicode Consortium started looking into solutions to support their inclusion (with the technical feasibility of adding more if needed in the future). This was the impetus for adding a general mechanism to make all ISO 3166-2 codes be valid for flags. However, only three of the 5,000 ISO 3166-2 codes have widely adopted emoji— England, Scotland, and Wales. (Northern Ireland remains in limbo until an “official flag” is formalized).

Flags for England, Scotland, and Wales were included in Emoji 5.0

So, with so many “valid sequences” why hasn’t anyone taken advantage of this sweet sweet rich flag opportunity?

At the time, in 2016, adding a few flags seemed reasonable but in retrospect was short-sighted. If the Emoji Subcommittee recommends the addition of a Catalonia flag emoji, then it looks like favoritism unless all the other subdivisions of Spain are added. And if those are added, what about the subdivisions of Japan or Namibia, or the Cantons of Liechtenstein? The inclusion of new flags will always continue to emphasize the exclusion of others. And there isn’t much room for the fluid nature of politics — countries change but Unicode additions are forever — once a character is added it can never be removed. (That being said, font designers can always update the designs as regimes change).

What happens when a country changes the design of its flag? 

Once Unicode designates a codepoint for a flag, Unicode is not part of the process for how the final codepoints look.
 
However, sometimes flags get redesigned and people have questions about what happens next. Whenever there is a change that requires an update to an existing flag emoji, the update can happen once a new design is officially recognized. For example, in 2023 when the Martinique assembly voted for and adopted the official flag.
 
The implementation of the new design takes some time and doesn’t happen immediately. Given the complexity of flag designs, artwork provided by an official representative is the safest way to ensure codepoints are accurately representing the country. Then, it can be deployed across various platforms.

How are flag emoji used? Flags are very specific in what they mean, and they don’t represent concepts used multiple times a day or even multiple times a year. You could say flag emoji have transcended the messaging experience and are primarily found in more auto-biographical contexts. (Like your TikTok bio. Or, maybe you add a flag to your username on Twitter.) But, even then flags are not as commonly found in biographical spaces as you may expect. (The top five emoji found in Twitter bios? ❤️✨💙💜💛.)

Despite being the largest emoji category with a strong association tied to identity, flags are by far the least used. (There are exceptions: usage of the rainbow flag is above median!) That begs the question, “So, why not encode more identity flags?” Well, we have seen the same results for flags as we have seen for other emoji — a very long tail of rarely used options. They also tend to change over time! In the past six years since adding a Pride Flag to the Unicode Standard (2019) it’s already been redesigned. Many times. Identities are fluid and unstoppable which makes mapping them to a formal unchanging universal character set incompatible.


Why does usage matter in selecting emoji? Any emoji additions have to take into consideration usage frequency, trade-offs with other choices, font file size, and the burden on developers (and users!) to make it easier to send and receive emoji. That’s why the Emoji Subcommittee set out to reduce the number of emoji we encode in any given year. Flags are also super hard to discern at emoji sizes — it’s quite easy to send a different flag than you intended (and with each additional flag the problem gets worse). The simple truth is that if more people used flags then there would be more of an argument to encode them. The Unicode Standard subset is just not a viable solution here for implementers nor users. Fortunately, there are seemingly infinite other ways to exchange images of flags that are more flexible and decentralized, such as stickers, gifs, and image attachments.

What is Unicode doing about it? We realize closing this door may come as a disappointment — after all, flags often serve as a rallying cry to be seen, heard, recognized, and understood.

The Internet is a different place now than it was in the 90’s — the distribution of imagery online is unstoppable! Given how flags are commonly used this is a reasonable path forward: If you care to denote your affiliation with a region be it geographic, political, or identity (or all three) you can add a flag to your avatar image, share videos, or send a gif or sticker to razz your friend during a sports game (and of course there is always ⚽ ⚽ ⚽ ⚽ ⚽).


The more emoji can operate as building blocks, the more versatile, fluid, and useful they become! Rather than relying on Unicode to add new emoji for every concept under the Sun (this is simply not attainable) the citizens of the world have proven to be infinitely creative and fluid: often using existing emoji like the colored hearts (❤️️ 🧡 💛 💚 💙 💜 🤎 🖤 🤍) to express themselves. Hearts are among the most frequently used type of emoji and the nine colored hearts are often juxtaposed next to each other to denote markers of emotion (“I’m sorry 💙” or “love you ❤️”) and identity or affiliation that are not represented with atomic emoji in the Unicode Standard (ex. “Pan African pride ❤️️💚🖤”, “Hi I’m bi 💖💙💜”, and yes even sports teams “Go Mets! 💙🧡” ).

With this in mind, the Emoji Subcommittee has put forth a strategy to add a pink heart, a light blue heart, and a gray heart to the Unicode Standard. These are colors commonly found in gender flags (gender fluid pride flag), sexuality flags (bisexual pride flag), in sports team colors (Go Spurs!) and even some regional flags (Brussels). As of this year, these three heart emoji advanced as draft candidates, and you can expect them to land on your device of choice sometime next year.

In some ways we have returned to where we first started: Adding three new emoji to support a seemingly infinite number of concepts. This time if it fails, at least we’ll be left with lots of heart emoji that have multiple uses. ❤️🧡💛💚💙💜🤎🖤🤍



In light of this change, we’d like to clarify a few additional frequently asked questions with regards to emoji flags

Wait, if a country gains independence and is recognised by ISO, does that mean no flag emoji for them?
Flags for countries with Unicode region codes are automatically recommended, with no proposals necessary! First their codes and translated names are added to Unicode’s Common Locale Data Repository [CLDR], and then the emoji become valid in the next version of Unicode. These emoji are also automatically recommended for general interchange and wide deployment.

What about flags that change designs for geopolitical reasons?
Unicode does not specify the appearance of flag emoji. It is the responsibility of font designers to update their fonts as politics change. EG: no Unicode changes required for https://emojipedia.org/flag-mauritania/

My region was assigned a 3166-2 code. Do we have to submit a proposal?
No, the Emoji Subcommittee is no longer taking in any proposals for flags of any kind.

As a recent example, Kurdistan (a subdivision of Iraq) became an official subdivision in ISO 3166-2 (IQ-KR) on May 3, 2021. The corresponding Unicode subdivision code (iqkr) is slated for release in CLDR v41 on Apr 6, 2022. At that point the flag for Kurdistan will officially be valid — any platform, app, or font could support it. But that doesn’t mean it automatically gets in the queue for everyone’s phone. Only countries with ISO 3166-1 region codes are automatically recommended and require no proposal to move forward.

So what warrants an ISO 3166-1 assignment vs ISO 3166-2?
ISO 3166-1 is for countries recognized by the United Nations and ISO 3166-2 is for parts of countries.

Why is Antarctica part of ISO 3166-1 but Africa isn’t? There seems to be no rational explanation with regard to why islands with no inhabitants have a flag while regions with millions of people have no emoji flag.
It’s true, there are "Exceptional reservations." Antarctica has an ISO 3166-1 alpha 2 code: AQ. But WHY does it have an ISO 3166-1 code? Because ISO 3166 decided to (ages ago) include it, probably since the whole continent is "shared."

For historical reasons, you may see other exceptions like 🇦🇨 AC Ascension Island, 🇨🇵 CP Clipperton Island, or 🇩🇬 DG Diego Garcia.

Why don’t we have asexual, bisexual, pansexual, and non-binary pride flags? And if 🏴󠁧󠁢󠁷󠁬󠁳󠁿 and 🏴󠁧󠁢󠁳󠁣󠁴󠁿 get Unicode flags, surely there’s room for the Aboriginal and Torres Strait Islander flags?
Before diving into the facts of why these flags are not part of the universal character set, we want to first take a moment to consider what people mean when they ask these questions and what Unicode means when they decline these flag proposals. Because this question is not one we take lightly. In the course of world history, groups have used flags as a rallying cry to be seen, heard, recognized, and understood. In the Unicode Consortium’s mission to digitize the world’s languages, improve communication online, and achieve meaningful interoperability between platforms, the requests for flags have become a lightning rod for these rallying cries.

When people ask for a new flag emoji, we recognize that the underlying request is about more than simply a new emoji. And when we say, “We aren’t adding more flags,” we are only saying changing the Unicode Standard is not an effective mechanism for this recognition.

What if I submit a proposal for a flag despite this policy?
Your proposal will not be processed.

Relevant docs/Further Reading
https://www.unicode.org/L2/L2021/21128-esc-recs.pdf
https://www.unicode.org/L2/L2021/21167.htm
https://www.unicode.org/L2/L2021/21172-esc-recs.pdf
https://www.unicode.org/emoji/proposals.html#Flags
http://www.unicode.org/L2/L2019/19084-trans-flag.pdf

___________________________________________

This article was updated on Dec 12, 2024 to clarify how, when, and why flag emoji design can change.

Wednesday, November 10, 2021

ICU4X 0.4 Released

ICU LogoUnicode® ICU4X 0.4 has just been released. This revision brings an implementation of Unicode Properties, major performance and memory improvements for DateTimeFormat, and extends the data provider data loading models with BlobDataProvider.

ICU4X 0.4 also adds initial time zone support in DateTimeFormat, week of month/year, iteration APIs in Segmenter and experimental ListFormatter.

The ICU4X team is shifting to work on the 0.5 release in accordance with the roadmap and a product requirements document setting sights on a stable 1.0 release in Q2 2022.

ICU4X aims to develop a highly modular set of internationalization components for resource-constrained environments, portable across programming languages.

Multiple early adopters use ICU4X in pre-release software in Rust, C, C++, and WebAssembly. The team is ready to onboard additional early adopters to refine the APIs, build processes, and feature sets before the 1.0 release. The team is also looking for contributors to write code generation for additional target programming languages. For more information, please open a discussion on the ICU4X GitHub.

For details, please see the changelog.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]