[go: up one dir, main page]

Unicode Frequently Asked Questions

Display of Unsupported Characters

Q: How should characters be displayed if the rendering system doesn’t fully support them?

There are three main options, depending on the type of character involved. Some should not display at all (zero-width invisible characters); some should display as a visible (but blank) space; and some should be displayed with one or more generic glyphs, often referred to as “missing glyphs” or a “.notdef glyph”.

Two implementations of generic glyphs are worth calling out. One is the Last Resort Font, for which the generic glyphs are specific to the script, making it easier to identify which font resources might need to be installed to support the characters. The other common implementation displays the characters hex code instead of a non-specific glyph.

Q: Which characters should be displayed as a visible but blank space?

This is the easy one: all the characters that have the White_Space property, also generically known as “whitespace characters”. This set includes SPACE, of course, but also such characters as the tab control character, NO-BREAK SPACE, LINE SEPARATOR, and so on. For the full list, see the White_Space values in PropList.txt.

Q: Which characters should be displayed as invisible, if not supported?

All default ignorable characters should be rendered as completely invisible (and non advancing, i.e. “zero width”), if not explicitly supported in rendering. These include:

For the full list, see the Default_Ignorable_Code_Point values in DerivedCoreProperties.txt. Note that there are no White_Space chracters that have the Default_Ignorable_Code_Point property.

Q: What is the intended display for variation selector sequences (including unsupported ones)?

The expected rendering behavior for the sequence of character plus a variation selector (C+VS) is specified as follows:

If C is unsupported see Q: How should characters be displayed if the rendering system doesn’t fully support them?

A VS sequence may also be part of a grapheme cluster, such as an emoji sequence. See UTS #51 Unicode Emoji for more details about emoji display.

Q: Which characters should be displayed with a missing glyph, if not supported?

All characters other than whitespace and default ignorable characters.

Note that recommended practice is to provide different missing glyphs for characters to give the user some indication of the type of character which is missing a glyph. For more information see the text under “Interpretable but Unrenderable Characters” in Section 5.3, Unknown and Missing Characters, and see the Last Resort Font.

Q: How does the recommendation not to give any visible display for a subset of default ignorable code points affect font design?

Fonts are really best viewed in the context of a whole rendering system, since other parts of that system may handle various aspects of rendering. Where a font is being designed for a rendering system that does not handle invisible characters (such as variation selectors), then the best glyph for them — in the absence of other support — is a zero-width invisible glyph.

Q: When would a font ever contain glyphs for invisible characters?

Rendering systems may support special display modes such as “Display Hidden”, which are intended to reveal characters that would not otherwise display. Fonts intended for such purpose would contain glyphs intended for visible display of default ignorable code points that would otherwise be rendered invisibly when not supported.