Wikidata:Property proposal/has reading
has kanji reading
[edit]Originally proposed at Wikidata:Property proposal/Lexemes
Description | phonetic reading or pronunciation of the kanji |
---|---|
Data type | String |
Domain | instances of sinogram (Q17300291) |
Example 1 | 四 (Q3594955)→よん |
Example 2 | 四 (Q3594955)→シ |
Example 3 | 海 (Q3594998)→うみ |
Example 4 | 海 (Q3594998)→カイ |
See also | sinogram reading pattern (P5244) |
Motivation
[edit]In japanese, chinese characters can be read as different vocalisations. With lexemes we currently only cover those sounds that make up actual words. See the examples 四/よん (L625228) and 四/し (L641752) where forms that use the kanji have a sinogram reading pattern (P5244) statement.
Sometimes however, readings don't make up real words but are merely affixes that can be used in compounds. We currently clutter these readings under a lexeme, that happens to have the same Kanji representation. But those usually have a different ethymology and external ids that don't apply to the reading. These readings also sometimes don't share the same senses.
I want to split all these lexemes, so that every lexemes only represents a single reading. Those readings that do not constitute words would be deleted in the process, but I'd strive to preserve those. And I think the sinogram entity is the right place for that. –Shisma (talk) 10:00, 27 August 2024 (UTC)
I'm merely interested in, but am not a speaker of japanese. If I said something horribly wrong here, please correct me. –Shisma (talk) 10:18, 27 August 2024 (UTC)
should we transliterate on'yomi readings to katakana? – Shisma (talk) 11:45, 27 August 2024 (UTC)
- Indeed, in kanji dictionaries published in Japan, on'yomi (Q718498) readings are usually written in katakana (Q82946). --Okkn (talk) 01:37, 28 August 2024 (UTC)
- updated –Shisma (talk) 14:14, 28 August 2024 (UTC)
Discussion
[edit]@Duesentrieb, Afaz, Was a bee, Deryck Chan, NMaia, Okkn: pinging everybody involved with the proposal of sinogram reading pattern (P5244) –Shisma (talk) 10:16, 27 August 2024 (UTC)
Notified participants of WikiProject Japan
- Support Some users and I had previously tried to do something similar using name in kana (P1814) (Query), but I think this proposed property is much better. Since the proposed property is limited at this time to use for Japanese kanji, I am only concerned about the confusion that might arise from a generic name "has reading". --Okkn (talk) 01:28, 28 August 2024 (UTC)
- I assumed it could be used for other languages that use sinograms, like say Korean and Vietnamese (?). I just didn't mention it because I know nothing about it. I aggree that the name is too vague. Let's update it to sinogram has reading? – Shisma (talk) 07:22, 28 August 2024 (UTC)
- I think it would make sense to limit it to Japanese for now, until there's been some discussion about whether/how other languages should use this. We already have Vietnamese reading (P5625) for Vietnamese and Hangul pronunciation (P5537) for Korean. For Chinese, we don't have the language code cmn for Mandarin yet and we would need to decide whether we should be using properties like Hanyu Pinyin transliteration (P1721) and Jyutping transliteration (P9311) instead. - Nikki (talk) 14:09, 3 September 2024 (UTC)
- I wasn't aware of these. I suggest the label should then be changed to Japanese reading and the field can be of the string type rather then multilingual – Shisma (talk) 14:34, 3 September 2024 (UTC)
- The sinogram used in Japan is called kanji (kanji (Q82772)) in Japanese language. How about the label "has kanji reading"? --Okkn (talk) 14:44, 3 September 2024 (UTC)
- Incidentally, sinogram reading pattern (P5244) was also initially intended to apply only to Japanese kanji, so the original label was "reading pattern of kanji". https://www.wikidata.org/w/index.php?title=Property:P5244&oldid=690306551 --Okkn (talk) 14:58, 3 September 2024 (UTC)
- I'm also fine with reading pattern of kanji 😅 – Shisma (talk) 15:30, 3 September 2024 (UTC)
- I wasn't aware of these. I suggest the label should then be changed to Japanese reading and the field can be of the string type rather then multilingual – Shisma (talk) 14:34, 3 September 2024 (UTC)
- I think it would make sense to limit it to Japanese for now, until there's been some discussion about whether/how other languages should use this. We already have Vietnamese reading (P5625) for Vietnamese and Hangul pronunciation (P5537) for Korean. For Chinese, we don't have the language code cmn for Mandarin yet and we would need to decide whether we should be using properties like Hanyu Pinyin transliteration (P1721) and Jyutping transliteration (P9311) instead. - Nikki (talk) 14:09, 3 September 2024 (UTC)
- I assumed it could be used for other languages that use sinograms, like say Korean and Vietnamese (?). I just didn't mention it because I know nothing about it. I aggree that the name is too vague. Let's update it to sinogram has reading? – Shisma (talk) 07:22, 28 August 2024 (UTC)
- I've added a link to this proposal on Wikidata talk:WikiProject CJKV character since this seems relevant to that wikiproject too.
I don't think we should use subject lexeme (P6254) as a qualifier. We already have Han character in this lexeme (P5425) which links in the other direction (which is used on compounds too, but it is easy to determine whether a lemma only contains one character) and we try to avoid modelling things in ways that require linking in both directions, because it creates redundant data that's difficult to maintain.
It would make sense to allow it as a qualifier of Han character in this lexeme (P5425) on lexemes too, to replace transliteration or transcription (P2440) (e.g. on 姉妹/しまい (L406337)).
- Nikki (talk) 14:09, 3 September 2024 (UTC)- Since one sinogram item can have multiple "has_reading" property values, I wonder if it would be difficult to identify it from the opposite direction unless the lexeme corresponding to the value is explicitly indicated in some way. Also, the information on sinogram reading pattern (P5244) as a qualifier is also redundant with the information on the corresponding lexeme, but if the qualifier is not used, the Wikidata cannot have this information unless the lexeme exists (Not all sinogram readings are worthy of lexeme), so the method proposed by Shisma seems to be better after all. --Okkn (talk) 14:38, 3 September 2024 (UTC)
- also, there are cases where same word can be written with different Kanji (like 綺麗/きれい/キレイ (L1234276)): It is not a 1:1 relationship. The subject lexeme (P6254) qualifier only makes sense if the reading by itself is a lexeme. – Shisma (talk) 15:20, 3 September 2024 (UTC)
- I updated the type and description in accordance with this discussion –Shisma (talk) 09:17, 11 September 2024 (UTC)
- @Nikki and @Okkn, would you like to give your opinions? Regards, ZI Jony (Talk) 18:55, 16 September 2024 (UTC)
- I agree with the proposal as is. --Okkn (talk) 00:17, 17 September 2024 (UTC)
- @Nikki and @Okkn, would you like to give your opinions? Regards, ZI Jony (Talk) 18:55, 16 September 2024 (UTC)
- Support --Afaz (talk) 06:36, 25 September 2024 (UTC)
- @Shisma, Nikki, Okkn, Afaz: Done as has kanji reading (P13045) Regards, ZI Jony (Talk) 21:04, 3 October 2024 (UTC)