2 December 2007 - 17:571 Potato, 2 Potato, 3 Potato, 4
I’ve been working on a bug to better display the download time remaining [bugzilla.mozilla.org] because right now it only shows either seconds or minutes. Not too useful for really long downloads (but friendlier than the hh:mm:ss of Firefox 2).
It’s gone from 1) the current seconds/minutes display to 2) displaying fractional times for seconds/minutes/hours/days to 3) whole times with pairs of sub-units to 4) figuring out how to implement features from the upcoming l10n v2 (l20n) [wiki.mozilla.org] because we don’t have them now.
In particular.. how to correctly display plural forms of words for various numbers. In this case, 1 day vs 2 days.
English is fairly simple with just 2 plural forms: a singular one for 1 and a plural one for everything else. (Asian languages are even easier! Everything is plural or singular.. however you want to look at it. )
Turns out some other languages aren’t so easy.
Russian has 3 plural forms: a singular form for values ending in 1 like 1, 21, 31, 41.. except 11; a special plural form for values ending in 2, 3 or 4 but skipping 12, 13, 14; and a common plural form for everything else (5-20, 25-30..).
Eventually Zbigniew (gandalf) provided a useful link to GNU’s gettext utility [gnu.org] that lists 11 rules of shared plural forms such as a single plural form for the Asian and Turkic family; and two forms exactly like English’s for the Germanic, Latin/Greek, Semitic, etc. families.
The list seemed pretty comprehensive, so I figured why not just implement those 11 rules and have localizers pick the matching rule number and then fill in the appropriate plural forms of the words.
At first I also provided a way to allow custom rules, but now I’m thinking about backing away from that. This would keep things much simpler — localizers wouldn’t worry about the special procedure to use a custom rule and wouldn’t need to create a rule that perfectly matches the language.
For example, if a language that’s almost-like-Russian where the plural forms are the same except 11 is singular, that localizer would just use the same rule for Russian as that’s the closest match. It’s not ideal, but the users in that language would still understand what the download was trying to inform them. And for the common case of all the other values, the plural forms match up perfectly.
The last task to be done is informing the localizers on how to make use of this system. This will probably involve linking to the l10n wiki with some simple instructions for picking out the appropriate rule and filling in the plural forms of the words.
A rule entry on the wiki would let the localizer know how many plural forms there are, which languages use it, and what each form means similar to the gettext page. Additionally it would have a list of sample values that match each form to be totally explicit.
Plural rule #1 (2 forms)
Families: Germanic (Danish, Dutch, English, Faroese), Finno-Ugric (Estonian, Finnish), …
is 1: 1
everything else: 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
Plural rule #7 (3 forms)
Families: Slavic (Croatian, Serbian, Russian, Ukrainian)
ends in 1; not 11: 1, 21, 31, 41, 51, 61, 71, 81, 91, 101, 121, 131, …
ends in 2,3,4; not 12,13,14: 2, 3, 4, 22, 23, 24, 32, 33, 34, 42, 43, 44, …
everything else: 0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, …
So to localize the strings for the download manager, someone doing English would scan the list and find Rule #1 with the appropriate plural forms and localize the strings for “seconds”, “minutes”, etc. matching the order of the plural forms listed under the rules.