Talk:Unicode block

Computing Mid‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Mid	This article has been rated as Mid-importance on the project's importance scale.

Typography Mid‑importance

	This article is within the scope of WikiProject Typography, a collaborative effort to improve the coverage of articles related to Typography on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.TypographyWikipedia:WikiProject TypographyTemplate:WikiProject TypographyTypography articles
Mid	This article has been rated as Mid-importance on the importance scale.

Writing systems Mid‑importance

	Writing portal This article falls within the scope of WikiProject Writing systems, a WikiProject interested in improving the encyclopaedic coverage and content of articles relating to writing systems on Wikipedia. If you would like to help out, you are welcome to drop by the project page and/or leave a query at the project’s talk page.Writing systemsWikipedia:WikiProject Writing systemsTemplate:WikiProject Writing systemsWriting system articles
Mid	This article has been rated as Mid-importance on the project's importance scale.

Archives

1

Proposed moves of unicode block articles

Latest comment: 5 years ago15 comments12 people in discussion

The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review after discussing it on the closer's talk page. No further edits should be made to this discussion.

The result of the move request was: Not moved. No consensus to move any of these because it's unnecessary disambiguation. There may be consensus to move the most generic sounding ones, but probably only if we have a better destination, either a dab page or as primary redirect to another article, for the generic name. Those can be proposed separately, as I don't see consensus for anything specific like that here. (non-admin closure) В²C ☎ 01:08, 10 May 2019 (UTC)Reply

Jorge Stolfi (talk) 22:01, 1 May 2019 (UTC)Reply

– A little more than half of the articles about unicode character blocks have the qualifier "(Unicode block)", the others (listed above) don't. I propose to rename the latter so that all of them do. Here are some reasons:

The current names are not descriptive of the subjects. A name like "Control Pictures", "Geometric Shapes", "Mathematical Operators", "Greek and Coptic" or even "Latin Extended-B" gives no clue that the topic of the article is a section of a specific computer character set.
The current names are not specific to the subject of the article. Even names like "Greek Extended" or "Miscellaneous Symbols", that one could infer are about character encodings, could apply to other character sets, besides Unicode, that were in use in the rather recent past, and would still deserve articles of their own.
The naming of the Unicode blocks is not consistent. The fact that only half of the articles have the "(Unicode block)" qualifier causes difficulties for editors and potential confusion for readers. For example, an editor intending to link to an article on the history and use of Braille system may link to Braille Patterns instead by mistake.
The articles violate the Wikipedia standards for titles. Many of the unqualified Unicode block articles are in the plural, have unnecessary capitalizations, or violate the standards in other ways. For example, to satisfy the standards the article Mathematical Operators should be named Mathematical operator. But of course that is not the name of the Unicode block; and it is the name of an article with a very different subject. Adding the qualifier "(Unicode block)" would satisfy the naming standards, besides avoiding confusion.
Those topics are not very notable and are only of specialized and ephemeral technical interest. The division of the Unicode character space into blocks is mostly an artifact of the way the Unicode Consortium discusses, approves, and documents proposals to include characters. It has only tenuous (and often very questionable) connections to the history, usage, or semantics of those characters.
The division is relevant only to those who are interested in the history of Unicode, or who intend to propose new symbols for it.
The division is not relevant to users of Unicode. On the contrary, to find the Unicode for a desired glyph, like a special math symbol or a letter with a certain modifier, one should ignore the block division and use Google or some other generic search tool -- because one cannot tell which block that symbol has been put into.
The division is not even useful to font designers. While at some point one would find computer fonts that were limited to one or two specific blocks, that has never been a rule, and fonts are increasingly cutting across the Unicode block boundaries.

Apparently the names above were assigned without the "(Unicode block)" because it was felt that the qualifier was unnecessary, since there was no other page in Wikipedia with that name. But that is not what "unnecessary" means. Most of the names above have a common-sense meaning that has nothing to do with Unicode; so a qualifier is necessary to differentiate them from those common meanings. If you say "Geeometric Patterns", "Number Forms", or "Greek and Coptic" to someone, even to a computer expert, the last thing she will think of is the Unicode block of that name. Initially the moves will create a redirect from each unqualified name to the coresponding qualified name. I will try to replace all uses of the former by the latter. In some cases, like "Tai Viet" or "Mathematical Operators" the redirect is inappropriate or pointless, in which case it will be deleted or redirected to a more appropriate article. Note that, if one will type "Tai Viet" to the search window, the "(Unicode block)" article will be listed anyway as one of the suggested alternatives. There are also half a dozen cases where the Unicode block article was merged into an article about a language, script, or typography article:

Phoenician (Unicode block) • Taixuanjing • Kangxi radical • Dingbat • Halfwidth and fullwidth forms

These merges should be undone, since the article about the Unicode block is suposed to have a lengthy section that documents the history of the block and the relevant Unicode Consortium publications, that do not belong to the articles above. For reference, the following articles are already named with the qualifier:

Adlam (Unicode block) • Aegean Numbers (Unicode block) • Ahom (Unicode block) • Alchemical Symbols (Unicode block) • Anatolian Hieroglyphs (Unicode block) • Ancient Greek Numbers (Unicode block) • Arabic (Unicode block) • Armenian (Unicode block) • Arrows (Unicode block) • Avestan (Unicode block) • Balinese (Unicode block) • Bamum (Unicode block) • Basic Latin (Unicode block) • Bassa Vah (Unicode block) • Batak (Unicode block) • Bengali (Unicode block) • Bhaiksuki (Unicode block) • Bopomofo (Unicode block) • Brahmi (Unicode block) • Buginese (Unicode block) • Buhid (Unicode block) • CJK Strokes (Unicode block) • CJK Unified Ideographs (Unicode block) • Carian (Unicode block) • Caucasian Albanian (Unicode block) • Chakma (Unicode block) • Cham (Unicode block) • Cherokee (Unicode block) • Chess Symbols (Unicode block) • Coptic (Unicode block) • Cuneiform (Unicode block) • Currency Symbols (Unicode block) • Cypriot Syllabary (Unicode block) • Cyrillic (Unicode block) • Deseret (Unicode block) • Devanagari (Unicode block) • Dogra (Unicode block) • Duployan (Unicode block) • Egyptian Hieroglyphs (Unicode block) • Elbasan (Unicode block) • Elymaic (Unicode block) • Emoticons (Unicode block) • Ethiopic (Unicode block) • Georgian (Unicode block) • Glagolitic (Unicode block) • Gothic (Unicode block) • Grantha (Unicode block) • Gujarati (Unicode block) • Gunjala Gondi (Unicode block) • Gurmukhi (Unicode block) • Hangul Jamo (Unicode block) • Hanifi Rohingya (Unicode block) • Hanunoo (Unicode block) • Hatran (Unicode block) • Hebrew (Unicode block) • Hiragana (Unicode block) • Ideographic Description Characters (Unicode block) • Imperial Aramaic (Unicode block) • Indic Siyaq Numbers (Unicode block) • Inscriptional Pahlavi (Unicode block) • Inscriptional Parthian (Unicode block) • Javanese (Unicode block) • Kaithi (Unicode block) • Kanbun (Unicode block) • Kannada (Unicode block) • Katakana (Unicode block) • Kayah Li (Unicode block) • Kharoshthi (Unicode block) • Khmer (Unicode block) • Khojki (Unicode block) • Khudawadi (Unicode block) • Lao (Unicode block) • Latin-1 Supplement (Unicode block) • Lepcha (Unicode block) • Limbu (Unicode block) • Linear A (Unicode block) • Lisu (Unicode block) • Lycian (Unicode block) • Lydian (Unicode block) • Mahajani (Unicode block) • Mahjong Tiles (Unicode block) • Makasar (Unicode block) • Malayalam (Unicode block) • Mandaic (Unicode block) • Manichaean (Unicode block) • Marchen (Unicode block) • Masaram Gondi (Unicode block) • Mayan Numerals (Unicode block) • Medefaidrin (Unicode block) • Meetei Mayek (Unicode block) • Mende Kikakui (Unicode block) • Meroitic Cursive (Unicode block) • Meroitic Hieroglyphs (Unicode block) • Miao (Unicode block) • Modi (Unicode block) • Mongolian (Unicode block) • Mro (Unicode block) • Multani (Unicode block) • Musical Symbols (Unicode block) • Myanmar (Unicode block) • NKo (Unicode block) • Nabataean (Unicode block) • Nandinagari (Unicode block) • New Tai Lue (Unicode block) • Newa (Unicode block) • Nushu (Unicode block) • Nyiakeng Puachue Hmong (Unicode block) • Ogham (Unicode block) • Ol Chiki (Unicode block) • Old Hungarian (Unicode block) • Old Italic (Unicode block) • Old North Arabian (Unicode block) • Old Permic (Unicode block) • Old Persian (Unicode block) • Old Sogdian (Unicode block) • Old South Arabian (Unicode block) • Old Turkic (Unicode block) • Optical Character Recognition (Unicode block) • Oriya (Unicode block) • Osage (Unicode block) • Osmanya (Unicode block) • Ottoman Siyaq Numbers (Unicode block) • Pahawh Hmong (Unicode block) • Palmyrene (Unicode block) • Pau Cin Hau (Unicode block) • Phags-pa (Unicode block) • Phaistos Disc (Unicode block) • Psalter Pahlavi (Unicode block) • Rejang (Unicode block) • Runic (Unicode block) • Samaritan (Unicode block) • Saurashtra (Unicode block) • Sharada (Unicode block) • Shavian (Unicode block) • Siddham (Unicode block) • Sogdian (Unicode block) • Sora Sompeng (Unicode block) • Soyombo (Unicode block) • Specials (Unicode block) • Sundanese (Unicode block) • Superscripts and Subscripts (Unicode block) • Sutton SignWriting (Unicode block) • Syloti Nagri (Unicode block) • Syriac (Unicode block) • Tagalog (Unicode block) • Tagbanwa (Unicode block) • Tags (Unicode block) • Tai Le (Unicode block) • Tai Tham (Unicode block) • Takri (Unicode block) • Tamil (Unicode block) • Tangut (Unicode block) • Telugu (Unicode block) • Thaana (Unicode block) • Thai (Unicode block) • Tibetan (Unicode block) • Tifinagh (Unicode block) • Tirhuta (Unicode block) • Ugaritic (Unicode block) • Vai (Unicode block) • Variation Selectors (Unicode block) • Wancho (Unicode block) • Warang Citi (Unicode block) • Zanabazar Square (Unicode block)

Jorge Stolfi (talk) 22:01, 1 May 2019 (UTC)Reply

Comment. I think you're on the right track, but taken individually, I would say that while some clearly make sense (e.g. Mathematical Operators), others are a bit questionable (e.g. CJK Unified Ideographs Extension A), and the only justification for adding the parenthetical disambiguator in that case would be for WP:CONSISTENCY. I normally don't favor unnecessarily disambiguation, especially not unnecessary parenthetical disambiguation, but I recognize in rare circumstances it can be a valid solution and a general rule like this is not without precedent, cf. Wikipedia:Naming conventions (UK Parliament constituencies). I will wait to see what others have to say about this proposal. -- King of ♥ ♦ ♣ ♠ 02:58, 2 May 2019 (UTC)Reply

Agree. Makes sense, and I think this qualifies as a circumstance for which we can have a specific naming convention. Rajanala Samyak (talk) 01:52, 3 May 2019 (UTC)Reply

Oppose moves per User:King of Hearts except for a few that could legitimately be ambiguous and/or that should direct to another article (Yi Radicals?, Ancient Greek Musical Notation?). Mass adding of parenthetical disambiguators where there is no ambiguity is against current Wikipedia practice (see WP:PRECISION). Agnostic on the unmerging proposal. — AjaxSmack 01:22, 4 May 2019 (UTC)Reply

In general, Support: a lot of the titles make more sense as redirects to more general coverage, Combining Diacritical Marks to combining character for example, given that combining diacritical marks as a concept are neither limited to that Unicode block nor unique to Unicode (see for example Windows-1258, ANSEL…). Control Pictures as a general concept would make most sense as a section in control character. And so forth. This is similar to the mentioned UK parliament constituencies, which take their name from a region, and said region would in general be the primary topic for the constituency name.

The mentioned example of CJK Unified Ideographs Extension A would potentially be an exception, given that CJK Unified Ideographs is a specifically Unicode concept already: while JIS X 0208 does unify Kanji variants, it's only on the scale of one language; GB18030 (the current GB(K) version) is now based on Unicode; KS C 5601 didn't even unify certain different uses of identical written characters within the one orthography; Unicode's main comptetitors in pan-CJK encoding such as TRON (encoding) or CCCII eschew glyph form unification altogether. (I'm remaining agnostic on whether "Extension A" or "Supplement" names need the qualifier in general.)

Halfwidth and fullwidth forms would logically take a broad view of the subject both in legacy encodings and in Unicode, possibly merging in (edit: or even moving into place and expanding) half-width kana to be honest. (As a speculative sidenote, the current article looks like it's expanded off from a Unicode block article as a place to provide more well-rounded coverage than just katakana, but that's a guess, I haven't dug through their histories.)

But in general, just because there's a Unicode block named with a given exact form of a term, does not always logically make it the primary topic for that term. {{redirect}} can be used, {{redirect|Combining Diacritical Marks|the Unicode block|Combining Diacritical Marks (Unicode block)}} in the above example. -- HarJIT (talk) 16:49, 4 May 2019 (UTC)Reply

Comment I can agree about "CJK" since the only meaning of the word "CJK" is the Unicode plane. --Jorge Stolfi (talk) 22:46, 5 May 2019 (UTC)Reply

Oppose with the possible exception of the generic-sounding names like “Mathematical Operators”. Here is a point-by-point rebuttal.
1. The current names are not descriptive of the subjects. – That does not matter. Names of things need not be descriptive. For example, Joseph Andrews sounds like a person, but it is about a novel; Abbey Road sounds like a road, but it is about an album; Tarantula hawk sounds like a kind of hawk, but it is about a kind of wasp. These Unicode blocks are no different.
2. The current names are not specific to the subject of the article. – Names like “Mathemetical Operators” are indeed ambiguous, if you ignore the capitalization. Specific individual arguments should be made for disambiguating them, not as part of a mass renaming. Names like “Greek Extended” could theoretically have been used for other character encodings, but you would need to demonstrate that each such name actually is used for something else, which you haven’t. Names like “Arabic Presentation Forms-B” could not plausibly refer to anything other than Unicode blocks.
3. The naming of the Unicode blocks is not consistent. – The rule is that the article title includes “(Unicode block)” if and only if the bare title is ambiguous. This rule has been applied consistently. Adding “(Unicode block)” indiscriminately would be inconsistent with the rest of Wikipedia.
4. The articles violate the Wikipedia standards for titles. Many of the unqualified Unicode block articles are in the plural, have unnecessary capitalizations, or violate the standards in other ways. – The rules about plurals and capital letters do not apply to proper nouns. For example, the article about the novel Vile Bodies is at “Vile Bodies”, not “Vile body”. Similarly, Unicode block names are proper nouns: they are names, not mere descriptions. That they are names is clear because of the idiosyncratic style of names like “Arabic Presentation Forms-B”, which is not standard English prose.
5. Those topics are not very notable and are only of specialized and ephemeral technical interest. – In that case, nominate them for deletion. As long as they are considered notable enough to have articles, and as long as there are no other notable topics with the same names, there is no need to disambiguate them, so they should not be disambiguated. Gorobay (talk) 15:32, 5 May 2019 (UTC)Reply

Comment: The examples are not quite convincing. The name of an article about a book like John Andrews must obviously be the title of the book, no matter whether it is descriptive or not. The article abut the album is named Abbey Road without the "(album)" only because editors decided that (for now!) 99% of the searches for that name are for the album, not the actual road. However, titles of books, albums, and songs are usually put in italics in English texts precisely because they would otherwise be confusing. They must somehow be marked to make it clear that they are not to be parsed for their English meaning, but taken unparsed as a whole, as the proper name of something.
As for "Tarantula hawk", that is the common English name of the insect (not a proper name), so Wikipedia is not to be blamed if someone somehow thinks that it may be a spider of bird.
But, otherwise, names of articles must be as descriptive of their concepts as possible, especially to readers who are not familiar with the subject but may want to read about it. The name of the article about X should not be how the X-ologists call X, but how a sufficiently broader community refers to X.
Now, the names of the Unicode blocks are neither titles of novels nor common names for the concepts. They are merely the names of sections in the Unicode standard document, not in the crharacter encoding itself. Again, not even a computer expert will think of the Unicode block when reading "IPA Extensions" or "Ornamental Dingbats" or "Mongolian Supplement". In common English texts, those references would have to be qualified as "the Mongolian Supplement block of Unicode".
The rule that "a qualifier should be omitted when there is no other article with that name" is should not be followed strictly and blindly. It is only a "testable" approximation to the ideal rule "avoid unnecessary qualifiers". One should take into account also the existence of articles or redirects whose names differ only in subtle details like capitalization or plurals. And it makes sense to consider also articles that don't exist, but should -- like "Rumi Numeral Symbols" -- or redirects that should point to other articles -- like "Linear B Ideograms". The wholesale renaming proposal above would clear the way for plugging such holes in the future, without having to go through a renaming discussion for each individual case. (Note that the plugging of many of those holes depend on editors who are not computer experts and don't have the time or knowhow to start such discussions.)
Your suggestion that all Unicode block articles should be eliminated is not that absurd. Unicode is obviously a notable concept, but its division into blocks is merely an internal administrative choice by the Unicode Consortium, that has no relevance whatsoever for the users of the character set. (You may have noticed that the only reason why the block start and length must be multiples of 16 is that the Consortium document typesets the reference forms in tables with 16 columns. They say so themselves.)
Indeed, those Unicode block articles are only another instance of an unfortunate trend, whereby some well-meaning editors decide to create inside Wikipedia a mirror of some arbitrary classification, index, or database that was created by some external agency. That is always a bad idea. Wikipedia should note the existence of such database, and describe it generically -- but then group, split, organize, and rename its contents in whatever way is most appropriate for Wikipedia readers.
In the case of Unicode, logically there should be just one master article that describes the whole set, and su-articles on logically defined subsets (like "Numerals in Unicode", "Latin-based letters in Unicode", etc.) even if they cut across the Unicode blocks.
But those "Unicode block" articles are here, so let them stand. Still, since they are rather obscure subjects that are not what their names seem to say, it is fair that they get ungainly names -- just as Barrackpore_(Vidhan_Sabha_constituency) should have a qualifier by default, whether the article on the city of Barrackpore exists or not.
--Jorge Stolfi (talk) 23:48, 5 May 2019 (UTC)Reply

Comment: This requested move has been listed at WP:CENT for wider community input. --qedk (t 桜 c) 18:21, 9 May 2019 (UTC)Reply
Oppose unnecessary disambiguation. The article is for explaining what the title is about, it's not the title's job to give that explanation. -- Tavix ^(talk) 18:32, 9 May 2019 (UTC)Reply
Support per nom, WP:NAMINGCRITERIA points 1, 3, 5, and WP:IAR. While I understand we should avoid unnecessary disambiguation, especially parenthetical, the proposal makes good sense on its face. I don't think we should be bending over backwards to satisfy WP:ATDAB when it's not even clear it is against this. There is a very reasonable argument that most of these names are either not precise enough (e.g., Mathematical Operators) or not natural enough (e.g. CJK Unified Ideographs Extension A), and that rather than going through these one by one, we can make the entire naming scheme consistent. The benefits seem to outweigh the small aesthetic cost of a parenthetical dab. Wugapodes [t^hɑk] [ˈkan.ˌʧɹɪbz] 20:32, 9 May 2019 (UTC)Reply
Support per nom and Wugapodes. The proposal is simple and consistent, and Gorobay's rebuttal is not strong enough to convince me that this shouldn't be done. I particularly disagree with their response to point 1. —⁠烏⁠Γ ^(kaw) │ 23:50, 09 May 2019 (UTC)Reply
Oppose It goes against Wikipedia naming practice to add unnecessary parenthetical disambiguation. The block names already have a consistent and easily-understood naming system, which is to use the exact Unicode name (spelling and capitalization), and only add "(Unicode block)" after the name if there is ambiguity. Adding "(Unicode block)" in all cases is unnecessary and in many cases would be so awkward that in future other editors will certainly attempt to remove the unnecessary modifier. BabelStone (talk) 00:11, 10 May 2019 (UTC)Reply

The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page or in a move review. No further edits should be made to this section.

change the font stack used in the inline css of some unicode block templates

Latest comment: 3 years ago2 comments2 people in discussion

The following templates have an inline css written into their HTML using a <style> tag:
{{Unicode chart CJK Unified Ideographs Extension E}} and {{Unicode chart CJK Unified Ideographs Extension F}}

The css in question is as follows style="border-collapse:collapse;background:#FFFFFF;font-size:large; text-align:center;font-family: sans-serif, 'Unicode内码天珩输入法配套字体', '方正宋体S-超大字符集', '方正宋体S-超大字符集(SIP)', '文泉驿等宽正黑', 'HanaMinB', 'HanaMinC', 'HanaMinExC', 'BabelStone Han Plain', 'BabelStone Han', 'FZSong-Extended', 'Arial Unicode MS', Code2002, DFSongStd, 'STHeiti SC', unifont, LastResort;"

I request that the following font families get added into the font stack:
TH-Tshyn-P0,TH-Tshyn-P1,TH-Tshyn-P2,TH-Tshyn-P16

TH-Tshyn is a font that supports all the characters in Unicode13.0, you can read more about it in http://cheonhyeong.com/English.html

P.S: why do these templates have inline css, as opposed to having their own class?

2806:264:4408:8E19:81B0:5A7:6C9F:5933 (talk) 07:30, 14 May 2021 (UTC)Reply

Assuming these are reusing the font list, this would be a good use for template styles (which can all share one template style) - not seeing a need for a site-wide class here though. — xaosflux ^Talk 22:26, 14 May 2021 (UTC)Reply

Image previews beside the glyphs

Latest comment: 2 years ago2 comments2 people in discussion

Most/all Unicode block pages and List of Unicode characters have the template:Contains_special_characters, which is good (it can reduce confusion and save time looking for errors), but not everyone can install full fonts on every device. I propose we can add images of the characters in the mentioned articles, either

inside the table cell, next to the glyph,
in a column (for List of Unicode characters),
in a separate table or
one image of a table containing the characters and their U+numbers.

Is there a simple method to get the server to make the images? MediaWiki has a way of converting eg. <math> to images (probably using LaTeX), but that uses a fancy font, which I'm afraid will be illegible for many symbols.

Once we have a solution and agreement, can someone write a bot to make the edits? --Ziom 2.0 (talk) 12:38, 10 February 2022 (UTC)Reply

Here are my thoughts:

I don't really have an opinion about adding images to List of Unicode characters although I wonder how the page would load with tens of thousands of images on it and how it would be kept up-to-date.
I oppose adding images inside the Unicode block tables/templates for several reasons:

It would be visually confusing.
It would make cut-and-paste of text more difficult.
Usually when new characters are added there isn't an available font to use to create an image.
How would you decide which font to use. Taking CJK Unified Ideographs as an example, would you choose a font for Japan or China or Taiwan for the image? Another example is U+1F52B PISTOL: Are you going to use a font that matches the Unicode chart of a gun or one that matches most phone implementations which look like a toy water gun?
New Unicode releases sometimes update the example character shapes, making it difficult to keep the images up-to-date.

I also oppose adding an image of the entire block to the Unicode block articles the same reasons as above but mainly because every block table starts with a link to the "Official Unicode Consortium code chart (PDF)" which already has a chart that is independent of font support. BTW: I don't think it's fair use to just screen shot Unicode PDF charts to make the Wikipedia images.
I do agree that the lack of good font support is frustrating. DRMcCreedy (talk) 23:01, 10 February 2022 (UTC)Reply

Add topic