Talk:Unicode block
This article is rated List-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||
|
Proposed moves of unicode block articles
edit- The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review after discussing it on the closer's talk page. No further edits should be made to this discussion.
The result of the move request was: Not moved. No consensus to move any of these because it's unnecessary disambiguation. There may be consensus to move the most generic sounding ones, but probably only if we have a better destination, either a dab page or as primary redirect to another article, for the generic name. Those can be proposed separately, as I don't see consensus for anything specific like that here. (non-admin closure) В²C ☎ 01:08, 10 May 2019 (UTC)
- Alphabetic Presentation Forms → Alphabetic Presentation Forms (Unicode block)
- Ancient Greek Musical Notation → Ancient Greek Musical Notation (Unicode block)
- Arabic Extended-A → Arabic Extended-A (Unicode block)
- Arabic Mathematical Alphabetic Symbols → Arabic Mathematical Alphabetic Symbols (Unicode block)
- Arabic Presentation Forms-A → Arabic Presentation Forms-A (Unicode block)
- Arabic Supplement → Arabic Supplement (Unicode block)
- Bamum Supplement → Bamum Supplement (Unicode block)
- Block Elements → Block Elements (Unicode block)
- Bopomofo Extended → Bopomofo Extended (Unicode block)
- Braille Patterns → Braille Patterns (Unicode block)
- Byzantine Musical Symbols → Byzantine Musical Symbols (Unicode block)
- CJK Compatibility Forms → CJK Compatibility Forms (Unicode block)
- CJK Compatibility Ideographs Supplement → CJK Compatibility Ideographs Supplement (Unicode block)
- CJK Compatibility Ideographs → CJK Compatibility Ideographs (Unicode block)
- CJK Compatibility → CJK Compatibility (Unicode block)
- CJK Symbols and Punctuation → CJK Symbols and Punctuation (Unicode block)
- CJK Unified Ideographs Extension A → CJK Unified Ideographs Extension A (Unicode block)
- CJK Unified Ideographs Extension B → CJK Unified Ideographs Extension B (Unicode block)
- CJK Unified Ideographs Extension C → CJK Unified Ideographs Extension C (Unicode block)
- CJK Unified Ideographs Extension D → CJK Unified Ideographs Extension D (Unicode block)
- CJK Unified Ideographs Extension E → CJK Unified Ideographs Extension E (Unicode block)
- CJK Unified Ideographs Extension F → CJK Unified Ideographs Extension F (Unicode block)
- Cherokee Supplement → Cherokee Supplement (Unicode block)
- Combining Diacritical Marks Extended → Combining Diacritical Marks Extended (Unicode block)
- Combining Diacritical Marks Supplement → Combining Diacritical Marks Supplement (Unicode block)
- Combining Diacritical Marks for Symbols → Combining Diacritical Marks for Symbols (Unicode block)
- Combining Diacritical Marks → Combining Diacritical Marks (Unicode block)
- Combining Half Marks → Combining Half Marks (Unicode block)
- Common Indic Number Forms → Common Indic Number Forms (Unicode block)
- Control Pictures → Control Pictures (Unicode block)
- Coptic Epact Numbers → Coptic Epact Numbers (Unicode block)
- Cuneiform Numbers and Punctuation → Cuneiform Numbers and Punctuation (Unicode block)
- Cyrillic Extended-A → Cyrillic Extended-A (Unicode block)
- Cyrillic Extended-B → Cyrillic Extended-B (Unicode block)
- Cyrillic Extended-C → Cyrillic Extended-C (Unicode block)
- Cyrillic Supplement → Cyrillic Supplement (Unicode block)
- Devanagari Extended → Devanagari Extended (Unicode block)
- Domino Tiles → Domino Tiles (Unicode block)
- Early Dynastic Cuneiform → Early Dynastic Cuneiform (Unicode block)
- Egyptian Hieroglyph Format Controls → Egyptian Hieroglyph Format Controls (Unicode block)
- Enclosed Alphanumeric Supplement → Enclosed Alphanumeric Supplement (Unicode block)
- Enclosed Alphanumerics → Enclosed Alphanumerics (Unicode block)
- Enclosed CJK Letters and Months → Enclosed CJK Letters and Months (Unicode block)
- Enclosed Ideographic Supplement → Enclosed Ideographic Supplement (Unicode block)
- Ethiopic Extended-A → Ethiopic Extended-A (Unicode block)
- Ethiopic Extended → Ethiopic Extended (Unicode block)
- Ethiopic Supplement → Ethiopic Supplement (Unicode block)
- General Punctuation → General Punctuation (Unicode block)
- Geometric Shapes Extended → Geometric Shapes Extended (Unicode block)
- Geometric Shapes → Geometric Shapes (Unicode block)
- Georgian Extended → Georgian Extended (Unicode block)
- Georgian Supplement → Georgian Supplement (Unicode block)
- Glagolitic Supplement → Glagolitic Supplement (Unicode block)
- Greek Extended → Greek Extended (Unicode block)
- Greek and Coptic → Greek and Coptic (Unicode block)
- Hangul Jamo Extended-A → Hangul Jamo Extended-A (Unicode block)
- Hangul Jamo Extended-B → Hangul Jamo Extended-B (Unicode block)
- Hangul Syllables → Hangul Syllables (Unicode block)
- IPA Extensions → IPA Extensions (Unicode block)
- Ideographic Symbols and Punctuation → Ideographic Symbols and Punctuation (Unicode block)
- Kana Extended-A → Kana Extended-A (Unicode block)
- Kana Supplement → Kana Supplement (Unicode block)
- Katakana Phonetic Extensions → Katakana Phonetic Extensions (Unicode block)
- Khmer Symbols → Khmer Symbols (Unicode block)
- Latin Extended Additional → Latin Extended Additional (Unicode block)
- Latin Extended-A → Latin Extended-A (Unicode block)
- Latin Extended-B → Latin Extended-B (Unicode block)
- Latin Extended-C → Latin Extended-C (Unicode block)
- Latin Extended-D → Latin Extended-D (Unicode block)
- Latin Extended-E → Latin Extended-E (Unicode block)
- Letterlike Symbols → Letterlike Symbols (Unicode block)
- Linear B Ideograms → Linear B Ideograms (Unicode block)
- Linear B Syllabary → Linear B Syllabary (Unicode block)
- Mathematical Alphanumeric Symbols → Mathematical Alphanumeric Symbols (Unicode block)
- Mathematical Operators → Mathematical Operators (Unicode block)
- Meetei Mayek Extensions → Meetei Mayek Extensions (Unicode block)
- Miscellaneous Symbols and Pictographs → Miscellaneous Symbols and Pictographs (Unicode block)
- Miscellaneous Symbols → Miscellaneous Symbols (Unicode block)
- Miscellaneous Technical → Miscellaneous Technical (Unicode block)
- Modifier Tone Letters → Modifier Tone Letters (Unicode block)
- Mongolian Supplement → Mongolian Supplement (Unicode block)
- Myanmar Extended-A → Myanmar Extended-A (Unicode block)
- Myanmar Extended-B → Myanmar Extended-B (Unicode block)
- Number Forms → Number Forms (Unicode block)
- Ornamental Dingbats → Ornamental Dingbats (Unicode block)
- Phonetic Extensions Supplement → Phonetic Extensions Supplement (Unicode block)
- Phonetic Extensions → Phonetic Extensions (Unicode block)
- Rumi Numeral Symbols → Rumi Numeral Symbols (Unicode block)
- Shorthand Format Controls → Shorthand Format Controls (Unicode block)
- Sinhala Archaic Numbers → Sinhala Archaic Numbers (Unicode block)
- Small Kana Extension → Small Kana Extension (Unicode block)
- Spacing Modifier Letters → Spacing Modifier Letters (Unicode block)
- Sundanese Supplement → Sundanese Supplement (Unicode block)
- Supplemental Arrows-C → Supplemental Arrows-C (Unicode block)
- Supplemental Punctuation → Supplemental Punctuation (Unicode block)
- Supplemental Symbols and Pictographs → Supplemental Symbols and Pictographs (Unicode block)
- Symbols and Pictographs Extended-A → Symbols and Pictographs Extended-A (Unicode block)
- Syriac Supplement → Syriac Supplement (Unicode block)
- Tai Viet → Tai Viet (Unicode block)
- Tamil Supplement → Tamil Supplement (Unicode block)
- Tangut Components → Tangut Components (Unicode block)
- Transport and Map Symbols → Transport and Map Symbols (Unicode block)
- Unified Canadian Aboriginal Syllabics Extended → Unified Canadian Aboriginal Syllabics Extended (Unicode block)
- Variation Selectors Supplement → Variation Selectors Supplement (Unicode block)
- Vedic Extensions → Vedic Extensions (Unicode block)
- Vertical Forms → Vertical Forms (Unicode block)
- Yi Radicals → Yi Radicals (Unicode block)
- Yi Syllables → Yi Syllables (Unicode block)
Jorge Stolfi (talk) 22:01, 1 May 2019 (UTC)
– A little more than half of the articles about unicode character blocks have the qualifier "(Unicode block)", the others (listed above) don't. I propose to rename the latter so that all of them do. Here are some reasons:
- The current names are not descriptive of the subjects. A name like "Control Pictures", "Geometric Shapes", "Mathematical Operators", "Greek and Coptic" or even "Latin Extended-B" gives no clue that the topic of the article is a section of a specific computer character set.
- The current names are not specific to the subject of the article. Even names like "Greek Extended" or "Miscellaneous Symbols", that one could infer are about character encodings, could apply to other character sets, besides Unicode, that were in use in the rather recent past, and would still deserve articles of their own.
- The naming of the Unicode blocks is not consistent. The fact that only half of the articles have the "(Unicode block)" qualifier causes difficulties for editors and potential confusion for readers. For example, an editor intending to link to an article on the history and use of Braille system may link to Braille Patterns instead by mistake.
- The articles violate the Wikipedia standards for titles. Many of the unqualified Unicode block articles are in the plural, have unnecessary capitalizations, or violate the standards in other ways. For example, to satisfy the standards the article Mathematical Operators should be named Mathematical operator. But of course that is not the name of the Unicode block; and it is the name of an article with a very different subject. Adding the qualifier "(Unicode block)" would satisfy the naming standards, besides avoiding confusion.
- Those topics are not very notable and are only of specialized and ephemeral technical interest. The division of the Unicode character space into blocks is mostly an artifact of the way the Unicode Consortium discusses, approves, and documents proposals to include characters. It has only tenuous (and often very questionable) connections to the history, usage, or semantics of those characters.
The division is relevant only to those who are interested in the history of Unicode, or who intend to propose new symbols for it.
The division is not relevant to users of Unicode. On the contrary, to find the Unicode for a desired glyph, like a special math symbol or a letter with a certain modifier, one should ignore the block division and use Google or some other generic search tool -- because one cannot tell which block that symbol has been put into.
The division is not even useful to font designers. While at some point one would find computer fonts that were limited to one or two specific blocks, that has never been a rule, and fonts are increasingly cutting across the Unicode block boundaries.
Apparently the names above were assigned without the "(Unicode block)" because it was felt that the qualifier was unnecessary, since there was no other page in Wikipedia with that name. But that is not what "unnecessary" means. Most of the names above have a common-sense meaning that has nothing to do with Unicode; so a qualifier is necessary to differentiate them from those common meanings. If you say "Geeometric Patterns", "Number Forms", or "Greek and Coptic" to someone, even to a computer expert, the last thing she will think of is the Unicode block of that name. Initially the moves will create a redirect from each unqualified name to the coresponding qualified name. I will try to replace all uses of the former by the latter. In some cases, like "Tai Viet" or "Mathematical Operators" the redirect is inappropriate or pointless, in which case it will be deleted or redirected to a more appropriate article. Note that, if one will type "Tai Viet" to the search window, the "(Unicode block)" article will be listed anyway as one of the suggested alternatives. There are also half a dozen cases where the Unicode block article was merged into an article about a language, script, or typography article:
These merges should be undone, since the article about the Unicode block is suposed to have a lengthy section that documents the history of the block and the relevant Unicode Consortium publications, that do not belong to the articles above. For reference, the following articles are already named with the qualifier:
- Adlam (Unicode block) • Aegean Numbers (Unicode block) • Ahom (Unicode block) • Alchemical Symbols (Unicode block) • Anatolian Hieroglyphs (Unicode block) • Ancient Greek Numbers (Unicode block) • Arabic (Unicode block) • Armenian (Unicode block) • Arrows (Unicode block) • Avestan (Unicode block) • Balinese (Unicode block) • Bamum (Unicode block) • Basic Latin (Unicode block) • Bassa Vah (Unicode block) • Batak (Unicode block) • Bengali (Unicode block) • Bhaiksuki (Unicode block) • Bopomofo (Unicode block) • Brahmi (Unicode block) • Buginese (Unicode block) • Buhid (Unicode block) • CJK Strokes (Unicode block) • CJK Unified Ideographs (Unicode block) • Carian (Unicode block) • Caucasian Albanian (Unicode block) • Chakma (Unicode block) • Cham (Unicode block) • Cherokee (Unicode block) • Chess Symbols (Unicode block) • Coptic (Unicode block) • Cuneiform (Unicode block) • Currency Symbols (Unicode block) • Cypriot Syllabary (Unicode block) • Cyrillic (Unicode block) • Deseret (Unicode block) • Devanagari (Unicode block) • Dogra (Unicode block) • Duployan (Unicode block) • Egyptian Hieroglyphs (Unicode block) • Elbasan (Unicode block) • Elymaic (Unicode block) • Emoticons (Unicode block) • Ethiopic (Unicode block) • Georgian (Unicode block) • Glagolitic (Unicode block) • Gothic (Unicode block) • Grantha (Unicode block) • Gujarati (Unicode block) • Gunjala Gondi (Unicode block) • Gurmukhi (Unicode block) • Hangul Jamo (Unicode block) • Hanifi Rohingya (Unicode block) • Hanunoo (Unicode block) • Hatran (Unicode block) • Hebrew (Unicode block) • Hiragana (Unicode block) • Ideographic Description Characters (Unicode block) • Imperial Aramaic (Unicode block) • Indic Siyaq Numbers (Unicode block) • Inscriptional Pahlavi (Unicode block) • Inscriptional Parthian (Unicode block) • Javanese (Unicode block) • Kaithi (Unicode block) • Kanbun (Unicode block) • Kannada (Unicode block) • Katakana (Unicode block) • Kayah Li (Unicode block) • Kharoshthi (Unicode block) • Khmer (Unicode block) • Khojki (Unicode block) • Khudawadi (Unicode block) • Lao (Unicode block) • Latin-1 Supplement (Unicode block) • Lepcha (Unicode block) • Limbu (Unicode block) • Linear A (Unicode block) • Lisu (Unicode block) • Lycian (Unicode block) • Lydian (Unicode block) • Mahajani (Unicode block) • Mahjong Tiles (Unicode block) • Makasar (Unicode block) • Malayalam (Unicode block) • Mandaic (Unicode block) • Manichaean (Unicode block) • Marchen (Unicode block) • Masaram Gondi (Unicode block) • Mayan Numerals (Unicode block) • Medefaidrin (Unicode block) • Meetei Mayek (Unicode block) • Mende Kikakui (Unicode block) • Meroitic Cursive (Unicode block) • Meroitic Hieroglyphs (Unicode block) • Miao (Unicode block) • Modi (Unicode block) • Mongolian (Unicode block) • Mro (Unicode block) • Multani (Unicode block) • Musical Symbols (Unicode block) • Myanmar (Unicode block) • NKo (Unicode block) • Nabataean (Unicode block) • Nandinagari (Unicode block) • New Tai Lue (Unicode block) • Newa (Unicode block) • Nushu (Unicode block) • Nyiakeng Puachue Hmong (Unicode block) • Ogham (Unicode block) • Ol Chiki (Unicode block) • Old Hungarian (Unicode block) • Old Italic (Unicode block) • Old North Arabian (Unicode block) • Old Permic (Unicode block) • Old Persian (Unicode block) • Old Sogdian (Unicode block) • Old South Arabian (Unicode block) • Old Turkic (Unicode block) • Optical Character Recognition (Unicode block) • Oriya (Unicode block) • Osage (Unicode block) • Osmanya (Unicode block) • Ottoman Siyaq Numbers (Unicode block) • Pahawh Hmong (Unicode block) • Palmyrene (Unicode block) • Pau Cin Hau (Unicode block) • Phags-pa (Unicode block) • Phaistos Disc (Unicode block) • Psalter Pahlavi (Unicode block) • Rejang (Unicode block) • Runic (Unicode block) • Samaritan (Unicode block) • Saurashtra (Unicode block) • Sharada (Unicode block) • Shavian (Unicode block) • Siddham (Unicode block) • Sogdian (Unicode block) • Sora Sompeng (Unicode block) • Soyombo (Unicode block) • Specials (Unicode block) • Sundanese (Unicode block) • Superscripts and Subscripts (Unicode block) • Sutton SignWriting (Unicode block) • Syloti Nagri (Unicode block) • Syriac (Unicode block) • Tagalog (Unicode block) • Tagbanwa (Unicode block) • Tags (Unicode block) • Tai Le (Unicode block) • Tai Tham (Unicode block) • Takri (Unicode block) • Tamil (Unicode block) • Tangut (Unicode block) • Telugu (Unicode block) • Thaana (Unicode block) • Thai (Unicode block) • Tibetan (Unicode block) • Tifinagh (Unicode block) • Tirhuta (Unicode block) • Ugaritic (Unicode block) • Vai (Unicode block) • Variation Selectors (Unicode block) • Wancho (Unicode block) • Warang Citi (Unicode block) • Zanabazar Square (Unicode block)
Jorge Stolfi (talk) 22:01, 1 May 2019 (UTC)
- Comment. I think you're on the right track, but taken individually, I would say that while some clearly make sense (e.g. Mathematical Operators), others are a bit questionable (e.g. CJK Unified Ideographs Extension A), and the only justification for adding the parenthetical disambiguator in that case would be for WP:CONSISTENCY. I normally don't favor unnecessarily disambiguation, especially not unnecessary parenthetical disambiguation, but I recognize in rare circumstances it can be a valid solution and a general rule like this is not without precedent, cf. Wikipedia:Naming conventions (UK Parliament constituencies). I will wait to see what others have to say about this proposal. -- King of ♥ ♦ ♣ ♠ 02:58, 2 May 2019 (UTC)
- Agree. Makes sense, and I think this qualifies as a circumstance for which we can have a specific naming convention. Rajanala Samyak (talk) 01:52, 3 May 2019 (UTC)
- Oppose moves per User:King of Hearts except for a few that could legitimately be ambiguous and/or that should direct to another article (Yi Radicals?, Ancient Greek Musical Notation?). Mass adding of parenthetical disambiguators where there is no ambiguity is against current Wikipedia practice (see WP:PRECISION). Agnostic on the unmerging proposal. — AjaxSmack 01:22, 4 May 2019 (UTC)
- In general, Support: a lot of the titles make more sense as redirects to more general coverage, Combining Diacritical Marks to combining character for example, given that combining diacritical marks as a concept are neither limited to that Unicode block nor unique to Unicode (see for example Windows-1258, ANSEL…). Control Pictures as a general concept would make most sense as a section in control character. And so forth. This is similar to the mentioned UK parliament constituencies, which take their name from a region, and said region would in general be the primary topic for the constituency name.
The mentioned example of CJK Unified Ideographs Extension A would potentially be an exception, given that CJK Unified Ideographs is a specifically Unicode concept already: while JIS X 0208 does unify Kanji variants, it's only on the scale of one language; GB18030 (the current GB(K) version) is now based on Unicode; KS C 5601 didn't even unify certain different uses of identical written characters within the one orthography; Unicode's main comptetitors in pan-CJK encoding such as TRON (encoding) or CCCII eschew glyph form unification altogether. (I'm remaining agnostic on whether "Extension A" or "Supplement" names need the qualifier in general.)
Halfwidth and fullwidth forms would logically take a broad view of the subject both in legacy encodings and in Unicode, possibly merging in (edit: or even moving into place and expanding) half-width kana to be honest. (As a speculative sidenote, the current article looks like it's expanded off from a Unicode block article as a place to provide more well-rounded coverage than just katakana, but that's a guess, I haven't dug through their histories.)
But in general, just because there's a Unicode block named with a given exact form of a term, does not always logically make it the primary topic for that term. {{redirect}} can be used,{{redirect|Combining Diacritical Marks|the Unicode block|Combining Diacritical Marks (Unicode block)}}
in the above example. -- HarJIT (talk) 16:49, 4 May 2019 (UTC)
- Comment I can agree about "CJK" since the only meaning of the word "CJK" is the Unicode plane. --Jorge Stolfi (talk) 22:46, 5 May 2019 (UTC)
- Oppose with the possible exception of the generic-sounding names like “Mathematical Operators”. Here is a point-by-point rebuttal.
The current names are not descriptive of the subjects.
– That does not matter. Names of things need not be descriptive. For example, Joseph Andrews sounds like a person, but it is about a novel; Abbey Road sounds like a road, but it is about an album; Tarantula hawk sounds like a kind of hawk, but it is about a kind of wasp. These Unicode blocks are no different.The current names are not specific to the subject of the article.
– Names like “Mathemetical Operators” are indeed ambiguous, if you ignore the capitalization. Specific individual arguments should be made for disambiguating them, not as part of a mass renaming. Names like “Greek Extended” could theoretically have been used for other character encodings, but you would need to demonstrate that each such name actually is used for something else, which you haven’t. Names like “Arabic Presentation Forms-B” could not plausibly refer to anything other than Unicode blocks.The naming of the Unicode blocks is not consistent.
– The rule is that the article title includes “(Unicode block)” if and only if the bare title is ambiguous. This rule has been applied consistently. Adding “(Unicode block)” indiscriminately would be inconsistent with the rest of Wikipedia.The articles violate the Wikipedia standards for titles. Many of the unqualified Unicode block articles are in the plural, have unnecessary capitalizations, or violate the standards in other ways.
– The rules about plurals and capital letters do not apply to proper nouns. For example, the article about the novel Vile Bodies is at “Vile Bodies”, not “Vile body”. Similarly, Unicode block names are proper nouns: they are names, not mere descriptions. That they are names is clear because of the idiosyncratic style of names like “Arabic Presentation Forms-B”, which is not standard English prose.Those topics are not very notable and are only of specialized and ephemeral technical interest.
– In that case, nominate them for deletion. As long as they are considered notable enough to have articles, and as long as there are no other notable topics with the same names, there is no need to disambiguate them, so they should not be disambiguated. Gorobay (talk) 15:32, 5 May 2019 (UTC)
- Comment: The examples are not quite convincing. The name of an article about a book like John Andrews must obviously be the title of the book, no matter whether it is descriptive or not. The article abut the album is named Abbey Road without the "(album)" only because editors decided that (for now!) 99% of the searches for that name are for the album, not the actual road. However, titles of books, albums, and songs are usually put in italics in English texts precisely because they would otherwise be confusing. They must somehow be marked to make it clear that they are not to be parsed for their English meaning, but taken unparsed as a whole, as the proper name of something.
As for "Tarantula hawk", that is the common English name of the insect (not a proper name), so Wikipedia is not to be blamed if someone somehow thinks that it may be a spider of bird.
But, otherwise, names of articles must be as descriptive of their concepts as possible, especially to readers who are not familiar with the subject but may want to read about it. The name of the article about X should not be how the X-ologists call X, but how a sufficiently broader community refers to X.
Now, the names of the Unicode blocks are neither titles of novels nor common names for the concepts. They are merely the names of sections in the Unicode standard document, not in the crharacter encoding itself. Again, not even a computer expert will think of the Unicode block when reading "IPA Extensions" or "Ornamental Dingbats" or "Mongolian Supplement". In common English texts, those references would have to be qualified as "the Mongolian Supplement block of Unicode".
The rule that "a qualifier should be omitted when there is no other article with that name" is should not be followed strictly and blindly. It is only a "testable" approximation to the ideal rule "avoid unnecessary qualifiers". One should take into account also the existence of articles or redirects whose names differ only in subtle details like capitalization or plurals. And it makes sense to consider also articles that don't exist, but should -- like "Rumi Numeral Symbols" -- or redirects that should point to other articles -- like "Linear B Ideograms". The wholesale renaming proposal above would clear the way for plugging such holes in the future, without having to go through a renaming discussion for each individual case. (Note that the plugging of many of those holes depend on editors who are not computer experts and don't have the time or knowhow to start such discussions.)
Your suggestion that all Unicode block articles should be eliminated is not that absurd. Unicode is obviously a notable concept, but its division into blocks is merely an internal administrative choice by the Unicode Consortium, that has no relevance whatsoever for the users of the character set. (You may have noticed that the only reason why the block start and length must be multiples of 16 is that the Consortium document typesets the reference forms in tables with 16 columns. They say so themselves.)
Indeed, those Unicode block articles are only another instance of an unfortunate trend, whereby some well-meaning editors decide to create inside Wikipedia a mirror of some arbitrary classification, index, or database that was created by some external agency. That is always a bad idea. Wikipedia should note the existence of such database, and describe it generically -- but then group, split, organize, and rename its contents in whatever way is most appropriate for Wikipedia readers.
In the case of Unicode, logically there should be just one master article that describes the whole set, and su-articles on logically defined subsets (like "Numerals in Unicode", "Latin-based letters in Unicode", etc.) even if they cut across the Unicode blocks.
But those "Unicode block" articles are here, so let them stand. Still, since they are rather obscure subjects that are not what their names seem to say, it is fair that they get ungainly names -- just as Barrackpore_(Vidhan_Sabha_constituency) should have a qualifier by default, whether the article on the city of Barrackpore exists or not.
--Jorge Stolfi (talk) 23:48, 5 May 2019 (UTC)
- Comment: The examples are not quite convincing. The name of an article about a book like John Andrews must obviously be the title of the book, no matter whether it is descriptive or not. The article abut the album is named Abbey Road without the "(album)" only because editors decided that (for now!) 99% of the searches for that name are for the album, not the actual road. However, titles of books, albums, and songs are usually put in italics in English texts precisely because they would otherwise be confusing. They must somehow be marked to make it clear that they are not to be parsed for their English meaning, but taken unparsed as a whole, as the proper name of something.
- Comment: This requested move has been listed at WP:CENT for wider community input. --qedk (t 桜 c) 18:21, 9 May 2019 (UTC)
- Oppose unnecessary disambiguation. The article is for explaining what the title is about, it's not the title's job to give that explanation. -- Tavix (talk) 18:32, 9 May 2019 (UTC)
- Support per nom, WP:NAMINGCRITERIA points 1, 3, 5, and WP:IAR. While I understand we should avoid unnecessary disambiguation, especially parenthetical, the proposal makes good sense on its face. I don't think we should be bending over backwards to satisfy WP:ATDAB when it's not even clear it is against this. There is a very reasonable argument that most of these names are either not precise enough (e.g., Mathematical Operators) or not natural enough (e.g. CJK Unified Ideographs Extension A), and that rather than going through these one by one, we can make the entire naming scheme consistent. The benefits seem to outweigh the small aesthetic cost of a parenthetical dab. Wugapodes [thɑk] [ˈkan.ˌʧɹɪbz] 20:32, 9 May 2019 (UTC)
- Support per nom and Wugapodes. The proposal is simple and consistent, and Gorobay's rebuttal is not strong enough to convince me that this shouldn't be done. I particularly disagree with their response to point 1. —烏Γ (kaw) │ 23:50, 09 May 2019 (UTC)
- Oppose It goes against Wikipedia naming practice to add unnecessary parenthetical disambiguation. The block names already have a consistent and easily-understood naming system, which is to use the exact Unicode name (spelling and capitalization), and only add "(Unicode block)" after the name if there is ambiguity. Adding "(Unicode block)" in all cases is unnecessary and in many cases would be so awkward that in future other editors will certainly attempt to remove the unnecessary modifier. BabelStone (talk) 00:11, 10 May 2019 (UTC)
- The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page or in a move review. No further edits should be made to this section.
change the font stack used in the inline css of some unicode block templates
editThe following templates have an inline css written into their HTML using a <style>
tag:
{{Unicode chart CJK Unified Ideographs Extension E}} and {{Unicode chart CJK Unified Ideographs Extension F}}
The css in question is as follows
style="border-collapse:collapse;background:#FFFFFF;font-size:large; text-align:center;font-family: sans-serif, 'Unicode内码天珩输入法配套字体', '方正宋体S-超大字符集', '方正宋体S-超大字符集(SIP)', '文泉驿等宽正黑', 'HanaMinB', 'HanaMinC', 'HanaMinExC', 'BabelStone Han Plain', 'BabelStone Han', 'FZSong-Extended', 'Arial Unicode MS', Code2002, DFSongStd, 'STHeiti SC', unifont, LastResort;"
I request that the following font families get added into the font stack:
TH-Tshyn-P0,TH-Tshyn-P1,TH-Tshyn-P2,TH-Tshyn-P16
TH-Tshyn is a font that supports all the characters in Unicode13.0, you can read more about it in http://cheonhyeong.com/English.html
P.S: why do these templates have inline css, as opposed to having their own class?
2806:264:4408:8E19:81B0:5A7:6C9F:5933 (talk) 07:30, 14 May 2021 (UTC)
- Assuming these are reusing the font list, this would be a good use for template styles (which can all share one template style) - not seeing a need for a site-wide class here though. — xaosflux Talk 22:26, 14 May 2021 (UTC)
Image previews beside the glyphs
editMost/all Unicode block pages and List of Unicode characters have the template:Contains_special_characters, which is good (it can reduce confusion and save time looking for errors), but not everyone can install full fonts on every device. I propose we can add images of the characters in the mentioned articles, either
- inside the table cell, next to the glyph,
- in a column (for List of Unicode characters),
- in a separate table or
- one image of a table containing the characters and their U+numbers.
Is there a simple method to get the server to make the images? MediaWiki has a way of converting eg. <math> to images (probably using LaTeX), but that uses a fancy font, which I'm afraid will be illegible for many symbols.
Once we have a solution and agreement, can someone write a bot to make the edits? --Ziom 2.0 (talk) 12:38, 10 February 2022 (UTC)
- Here are my thoughts:
- I don't really have an opinion about adding images to List of Unicode characters although I wonder how the page would load with tens of thousands of images on it and how it would be kept up-to-date.
- I oppose adding images inside the Unicode block tables/templates for several reasons:
- It would be visually confusing.
- It would make cut-and-paste of text more difficult.
- Usually when new characters are added there isn't an available font to use to create an image.
- How would you decide which font to use. Taking CJK Unified Ideographs as an example, would you choose a font for Japan or China or Taiwan for the image? Another example is U+1F52B PISTOL: Are you going to use a font that matches the Unicode chart of a gun or one that matches most phone implementations which look like a toy water gun?
- New Unicode releases sometimes update the example character shapes, making it difficult to keep the images up-to-date.
- I also oppose adding an image of the entire block to the Unicode block articles the same reasons as above but mainly because every block table starts with a link to the "Official Unicode Consortium code chart (PDF)" which already has a chart that is independent of font support. BTW: I don't think it's fair use to just screen shot Unicode PDF charts to make the Wikipedia images.
I do agree that the lack of good font support is frustrating. DRMcCreedy (talk) 23:01, 10 February 2022 (UTC)