Japanese language: Supplements for Digits, Counters, Popular readings, etc.
I am a native speaker of Japanese.
This collection, but I felt uncomfortable with it, so I have an opinion.
How many native speakers have verified this collection and in what ways?
Better to have multiple native speakers verify the collection, including my post.
2020-11-14: I also mentioned it in Post #41 on Help create Common Voice's first target segment.
Requests & Questions
- Indicates that a word is a number.
- For example, annotate the sentence card,
This is part of the first target segment [digits 0-9 / Yes / No / Hey / Firefox]
- Japanese language has so many homophones that simply showing "さん" and "ご" does not tell that this is a number.
- Speaker may notice it by the back and forth of the word, but even if so, it is unfriendly. Speaker does not necessarily re-record it.
- Yes, there are links to Help create Common Voice's first target segment, but many Japanese are not very good at English (and neither am I). They probably haven't read it.
- Plus, it's hard to see what exactly あなたは私たちの最初の目標セグメントに貢献しています (You’re contributing to our first target segment) label says, either. Why not indicate that it is a "number"?
- If we want an accurate sense of speech, we should use kanji or indicate that it is a number.
- Even if we use kanji, it may not be recognizable as a number. For example, "一" is just a bar line and is indistinguishable from a symbol.
- So it would be best to indicate that it is a number.
- There are multiple ways to read Japanese numbers, but only one type of text is currently available (ref. common-voice/singleword-benchmark.txt). Is there any rationale for this selection?
- Hey and Firefox are not included because it's a foreign language?
- Couldn't we just use "ファイアフォックス"?
- Hey is certainly not a word used in Japan, but for example, "Hey Siri" on the iPhone is recognized by "へいしり".
Digits
- 0 is indeed read as both れい and まる, but ぜろ is also a common reading.
- 4 is also read as よん.
- 9 is also read as きゅう. This is the most popular way to read numbers alone.
Also, the reading of Informal in the current Single-digit numbers + yes + no table is the way it should be read when a counter is added. For example: ひとつ, ふたつ, and よっつ.
If Common Voice is going to collect the readings when the counter is excluded, there is no problem, but there are some readings that are not used by the numbers alone. For example, よ and や. (These are not incorrect readings, but today they are usually read adding a counter.)
Example
The intended use-case is talking to an automated system over the phone. In this case, how would these numbers be read if you were talking to a voice-bot, counting out loud, or reading a long number out loud digit-by-digit?
Yeah, I would read Use Case like this:
Example of Reading
言葉 (Word) | 漢数字 (Kanji) | 読み (Reading) |
0 | 零 | ぜろ | れい | まる
|
---|
1 | 一 | いち | |
|
---|
2 | 二 | に | |
|
---|
3 | 三 | さん | |
|
---|
4 | 四 | よん | し |
|
---|
5 | 五 | ご | |
|
---|
6 | 六 | ろく | |
|
---|
7 | 七 | なな | しち |
|
---|
8 | 八 | はち | |
|
---|
9 | 九 | きゅう | |
|
---|
Hey | | ヘイ | |
|
---|
Firefox | | ファイアフォックス | |
|
---|
- 0 is most clearly identified as the number 0 when read as ぜろ.
- 4 is probably clearer when read as よん.
- 7 is often pronounced as なな on the phone, for example, because of the similarity in pronunciation between しち and いち.
- く in 9 is not incorrect, but it's usually read as きゅう and it's clearer.
- There is more than one kanji, but I have written the most popular one here.
- Hey and Firefox are written in katakana. We can write in hiragana, but it's easier to read the foreign words in katakana.
- The spelling of foreign languages is not standardized. For example, on Wikipedia it's "ファイアーフォックス"; we would be better to ask the translators at Mozilla for their opinion.
With a counter
In Japanese, we count as いち, に ......, but adding a counter is also popular. The counter changes according to the object to be counted, but if there is no specific object in particular (i.e., the universal counter), it is つ or 個.
数字 (Digit) | つ (tsu) | 個 (ko) |
1 | 一つ | 一個
|
---|
2 | 二つ | 二個
|
---|
3 | 三つ | 三個
|
---|
4 | 四つ | 四個
|
---|
5 | 五つ | 五個
|
---|
6 | 六つ | 六個
|
---|
7 | 七つ | 七個
|
---|
8 | 八つ | 八個
|
---|
9 | 九つ | 九個
|
---|
Of course, these can be written in kanji. For example, 一つ, 二つ, 三個, 九個. As a Japanese person, it feels more natural and easy to understand if the kanji are used.
Ancient reading
Even today, there are people who count this way, but it is rare.
I don't know the exact reading, either; I've searched the web, and while there is a trend, it's not uniform.
数字 (Digit) | 読み (Reading) |
1 | ひ | ひー
|
---|
2 | ふ | ふー
|
---|
3 | み | みー
|
---|
4 | よ | よー
|
---|
5 | いつ | いー
|
---|
6 | む | むー
|
---|
7 | なな | なー
|
---|
8 | や | やー
|
---|
9 | ここ | こー
|
---|
Reference