8000 Properly parse Plural-Forms header of PO/MO files (2.6) by jiru · Pull Request #5348 · cakephp/cakephp · GitHub
[go: up one dir, main page]

Skip to content

Properly parse Plural-Forms header of PO/MO files (2.6) #5348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

jiru
Copy link
Contributor
@jiru jiru commented Dec 7, 2014

So this follows #5347, this time targeting the 2.6 branch, and hopefully compatible with PHP 5.2. I hope it’s fine even if it includes a few other commits not from me.

markstory and others added 7 commits December 2, 2014 11:01
…cached-engine

SASL with Memcached is only supported with binary protocol
This just contains the bare minimum to interpret common plural formulas,
but new operators can be easily added.
Now, pretty much every language plural case is covered
and the implementation follows to the standard.
Removed usages of create_function() as well.
@markstory markstory modified the milestones: 2.5.7, 2.6.0 Dec 7, 2014
return (int)($a < $b);
}

protected function _greaterThanOperator($a, $b) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally include doc blocks for every function.

@AD7six
Copy link
Member
AD7six commented Dec 7, 2014

I don't know about this; plural rules don't change (at all? much? At least not with any significant frequency) - is it really necessary to parse plural rules at run time? Is a plural-rule-parsing "framework" really necessary?

For reference I used the following logic to parse plural rules and convert them to php code.

i.e.:

<?php

$rule = "nplurals=6; plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 && n%100<=99 ? 4 : 5;";

$split = strpos($rule, 'plural=');
$rule = rtrim(substr($rule, $split + 7), ';');
while ($rule[0] === '(') {
    $rule = substr($rule, 1, -1);
}
$docRule = str_replace(' ', '', $rule);
$rule = str_replace('n', '$n', $rule);
$subrules = substr_count($rule, ':');
if ($subrules) {
    $rule = str_replace(':', ': (', $rule) . str_repeat(')', $subrules);
}

echo $rule;

Output:

$n==0 ? 0 : ( $n==1 ? 1 : ( $n==2 ? 2 : ( $n%100>=3 && $n%100<=10 ? 3 : ( $n%100>=11 && $n%100<=99 ? 4 : ( 5)))))

Which after injecting some whitespace and reformatting I simply used to generate the plural rule function, applied here that could be:

protected static function _checkRule<PickAName>($n) {
    if ($n === 0) {
        return 0;
    }
    if ($n === 1) {
        return 1;
    }
    if ($n === 2) {
        return 2;
    }
    if ($n % 100 >= 3 && $n % 100 <= 10) {
        return 3;
    }
    if ($n % 100 >= 11) {
        return 4;
    }
    return 5;
}

That's a lot easier to read and maintain than a plural rule parsing class.

@jiru
Copy link
Contributor Author
jiru commented Dec 7, 2014

@AD7six You’re right that plurals do not change very frequently. Actually they don’t change, they just start being used when you start translating to some “new” language CakePHP don’t know yet. But translators probably don’t want to have to wait for CakePHP each time this happens. I don’t know to what extend the current naive implementation covers languages, but only to think that a language as widely used as Arabic isn’t supported don’t make me trust it at all. That’s the reason I implemented plural formula parsing.

That being said, I admit your suggested approach is smart and could make both translators and CakePHP developers happy. But when exactly such code generation would take place?

@AD7six
Copy link
Member
AD7six commented Dec 7, 2014

to think that a language as widely used as Arabic isn’t supported don’t make me trust it at all

I completely agree.

when exactly such code generation would take place?

Only as a developer aide - and in fact it wouldn't be present in any resultant class.

Some years ago plural rules declarations looked to me like an alien language - and deriving what logic they represented was some confusing complex task. I'm sure for many developers/users it still is quite confusing, however for me (and I hope/assume for you) the mystery is all gone since it's just a simple mathematical formula of nested ternary expressions. We could parse these rules but to a certain extent that's propagating the confusion _pluralGuess caused (to anyone trying to modify/understand it) - which I'm quite happy to see disappear.

There's a known finite list of plural rule forms (16 of them). That list of rules might be updated - but at a glacial pace, so I don't think it's necessary to consider rules not expressed there.

Given that I suggest:

  • No run time plural-rules parsing
  • Implement 16 methods to handle the 16 plural-rule-forms
  • EITHER
    • map a normalised plural rule string (or the hash of it) to a plural rule method name
    • map a language (ignore plural rules in po files entirely) to a plural rule method name
  • Throw an exception for any case whereby the plural rule is not known
  • Call the php method which represents the applicable plural form as required

What do you think about that @jiru (or anyone else)?

@markstory
Copy link
Member

I too am surprised that Arabic did not work out of the box as we've had effectively the same code for the last 5 years. I suspect that there are probably a few more omissions but not many.

Given that the spoken/written languages of the world evolve pretty slowly a finite list/map of rules might be simpler to maintain in the long term and run more efficiently.

@AD7six
Copy link
Member
AD7six commented Dec 7, 2014

I wasn't really aware of 3.0's plural rules class, it would be relatively low effort to back port it to 2.6 and preferable compared to having/maintaining 2 different implementations.

@ADmad
Copy link
Member
ADmad commented Dec 7, 2014

Backporting 3.0's plural rules class does sound better.

@markstory
Copy link
Member

@ADmad Backporting and adding Arabic support sounds like a good idea to me.

@ADmad
Copy link
Member
ADmad commented Dec 8, 2014

@markstory PluralRules class already has support for Arabic.

@markstory
Copy link
Member

I must have misread one of the earlier comments then, I thought it had 16 forms - my bad.

@ADmad
Copy link
Member
ADmad commented Dec 8, 2014

6 forms not 16 😄 16 is the total number of plural rules covering all languages. Though the rule numbering isn't exactly standard.

@AD7six
Copy link
Member
AD7six commented Dec 8, 2014

If it helps, there are 17 listed plural rules, since I can't 0-base count =).

There's a reference also on unicode.org:

http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html

Which could probably be considered canonical

@ADmad
Copy link
Member
ADmad commented Dec 8, 2014

Righto, it's 17. Working with I18n messes with one's brains 😛

I think I can patch the existing 2.x code to correctly handle Arabic. Later on whenever someone has the time the 3.0 PluralRule class can be backported for 2.x.

@markstory
Copy link
Member

So does the 3.x PluralRule class need to be updated? If not I can backport it to 2.x.

I'm confused as to which list on unicode.org has 17 rules, I counted more than that for Cardinal-Integer, and Ordinal-Integer. The MDN plural rule list has 17 rules of which PluralRules only implements 16.

@AD7six
Copy link
Member
AD7six commented Dec 8, 2014

@markstory unicode lists 26 rulesets, whereas MDN has 17.

ADmad already added one correction to the 3.0 plural rules class, but it already catered for Arabic for example.

With some tedious cross referencing, we could identify which languages/rulesets are missing - I don't know if that's really worth it. If it is I could do that and update the switch of doom.

ADmad added a commit to ADmad/cakephp that referenced this pull request Dec 8, 2014
@markstory
Copy link
Member

Closing in favor of #5360

@markstory markstory closed this Dec 8, 2014
jiru referenced this pull request in Tatoeba/tatoeba2 May 7, 2015
jiru added a commit to Tatoeba/tatoeba2 that referenced this pull request May 9, 2015
This is a backport of cakephp/cakephp#5348
to CakePHP 1.3.

The CakePHP implementation assumes a static well-known finite list of plural
rules (in _pluralGuess()). However, not only Arabic wasn’t initially supported
(and we fixed it the hard way in 0e20cd0), but now Russian and Belarusian rules
have been updated recently on Transifex in 3b6e63b (for good, they now follows
the Unicode reference [1]), which totally breaks the original implementation.

The algorithm first converts the expression to postfix using a shunting-yard
algorithm, and then evaluates it each time a pluralized string is translated.

[1] http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html#rules
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0