8000 Use custom symbols based on the language's defaultNumberingSystem by alexandrukis · Pull Request #470 · python-babel/babel · GitHub
[go: up one dir, main page]

Skip to content

Use custom symbols based on the language's defaultNumberingSystem #470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

alexandrukis
Copy link
@alexandrukis alexandrukis commented Feb 6, 2017

This PR tackles two of the problems mentioned here:

  1. Digits are not translated to the locale's default numbering system, and
  2. The default numbering system is not taken into consideration when parsing the number symbols.

@alexandrukis alexandrukis force-pushed the 446-use-custom-symbols-based-on-the-defaultNumberingSystem branch from 0861e18 to c2ba4f3 Compare February 6, 2017 08:47
@alexandrukis alexandrukis force-pushed the 446-use-custom-symbols-based-on-the-defaultNumberingSystem branch from c2ba4f3 to 5c36cd2 Compare February 6, 2017 09:02
@alexandrukis alexandrukis force-pushed the 446-use-custom-symbols-based-on-the-defaultNumberingSystem branch from 5c36cd2 to 2f00832 Compare February 6, 2017 11:21
@alexandrukis alexandrukis force-pushed the 446-use-custom-symbols-based-on-the-defaultNumberingSystem branch 2 times, most recently from 465f40c to 52c4ab5 Compare February 6, 2017 12:37
on the defaultNumberingSystem. Translated the numbers and dates to the
locale’s defaultNumberingSystem.
@alexandrukis alexandrukis force-pushed the 446-use-custom-symbols-based-on-the-defaultNumberingSystem branch from 52c4ab5 to 9981cb0 Compare February 6, 2017 12:44
Copy link
Member
@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @alexandrukis, and thanks for the patch! Sorry it's taken a while for anyone to look at it.

Anyway, there's a couple comments in-code, and in addition I'm thinking numbering system selection should probably be an option – probably so it defaults to the locale's numbering system, but could be changed to 'latn', etc? What do you think?

@@ -286,6 +286,28 @@ def dst(self, dt):
return ZERO


def translate(string, keys, values):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably just want to use str.translate (and its friend str.maketrans).

Copy link
Author
@alexandrukis alexandrukis Jan 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial instinct, but there are certain problems with it, such as:

  • for python2, translate (and its friend maketrans) are actually in the string module
  • codec errors explained here

So I decided I'll just roll my own.

@@ -415,11 +416,11 @@ def _set_mime_headers(self, headers):
self._num_plurals = int(params.get('nplurals', 2))
self._plural_expr = params.get('plural', '(n != 1)')
elif name == 'pot-creation-date':
self.creation_date = _parse_datetime_header(value)
self.creation_date = _parse_datetime_header(to_latn_numbering_system(value, locale=self.locale))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think pofiles can have headers in non-latin numbering?

< 8000 /tr>
elif name == 'po-revision-date':
# Keep the value if it's not the default one
if 'YEAR' not in value:
self.revision_date = _parse_datetime_header(value)
self.revision_date = _parse_datetime_header(to_latn_numbering_system(value, locale=self.locale))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think pofiles can have headers in non-latin numbering?


# If the default_number_system_node is non-existent, the package will inherit it from its parent
if default_number_system_node is not None:
number_symbols['defaultNumberingSystem'] = numbering_systems[default_number_system_node.text]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this data doesn't belong in the number_symbols tree.
It would probably be a good idea to parse <defaultNumberingSystem> and <otherNumberingSystems> into, say, numbering_systems – something like (example from el.xml):

		<defaultNumberingSystem>latn</defaultNumberingSystem>
		<otherNumberingSystems>
			<native>latn</native>
			<traditional>grek</traditional>
		</otherNumberingSystems>

->

"numbering_systems": {
  "default": "latn",
  "native": ["latn"],
  "traditional": ["grek"]
}

(Do note how there can be multiple instances of the same otherNumberingSystems tag according to the DTD, hence a list.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, this copies the content of the numbering system into the locale, instead of the ID? That doesn't sound DRY.

if default_number_system_node is not None:
number_symbols['defaultNumberingSystem'] = numbering_systems[default_number_system_node.text]

for elem in tree.findall('.//numbers/symbols'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we start to have support for non-latn numbering systems, I think we should also import <symbols> for all the numbering systems declared in the XML – for instance, ur.xml has <symbols> for both latn and arabext, zh.xml has four of them... This code seems like it will erroneously merge the various numbering systems?

The resultant shape of number_symbols should then be something like

"number_symbols": {
  "latn": {"decimal": ...},
  "arabext": {"decimal": ...},
}

@akx akx self-requested a review August 18, 2017 12:07
@akx akx self-assigned this Aug 18, 2017
@akx
Copy link
Member
akx commented Jan 16, 2018

@alexandrukis Are you still around? :)

@alexandrukis
Copy link
Author

Hey, I am, but I'm experiencing some really busy months. If anyone needs this, I can make an effort and try to get it merged. Is that the case?

…46-use-custom-symbols-based-on-the-defaultNumberingSystem

# Conflicts:
#	babel/dates.py
#	babel/numbers.py
#	tests/test_numbers.py
#	tests/test_util.py
@alexandrukis alexandrukis force-pushed the 446-use-custom-symbols-based-on-the-defaultNumberingSystem branch 2 times, most recently from 79b4453 to f1225bb Compare January 18, 2018 07:35
Kept only the ids of the used numberingsystems in the locale.
Moved the actual data into a global.
Imported symbols for every numbering system.
@alexandrukis alexandrukis force-pushed the 446-use-custom-symbols-based-on-the-defaultNumberingSystem branch from f1225bb to 3300559 Compare January 18, 2018 07:40
8000
@codecov-io
Copy link
codecov-io commented Jan 18, 2018

Codecov Report

Merging #470 into master will increase coverage by 0.1%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #470     +/-   ##
=========================================
+ Coverage   90.22%   90.33%   +0.1%     
=========================================
  Files          24       24             
  Lines        4042     4076     +34     
=========================================
+ Hits         3647     3682     +35     
+ Misses        395      394      -1
Impacted Files Coverage Δ
babel/util.py 92.54% <100%> (+0.44%) ⬆️
babel/numbers.py 98.14% <100%> (+0.13%) ⬆️
babel/dates.py 91.54% <100%> (+0.01%) ⬆️
babel/core.py 95.89% <100%> (+0.37%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d72eb13...3300559. Read the comment docs.

@alexandrukis
Copy link
Author
alexandrukis commented Jan 18, 2018

Hey @akx !

I have made some time to implement most of the changes you've requested such as moving the actual number systems outside the locale (and keep only the ids) and loading all the symbols.

Regarding the usage of other number systems, that's a little more complicated.

For example, let's assume a user wants to format a date in arabic.

The date format is dd‏/MM‏/y G but the ‏/ characters are actually RTL. When combining them with other RTL characters, you'll get ٠٧‏/٠٦‏/٢٠١٦ which is perfectly fine, but when you'll combine them with, the latn numbering system, for example, you'll get 07‏/06‏/2016 which is not that great.

Now, I don't really have much experience with i18n or with python-babel for that matter, but, if the user wants to use the latn numbering system, why not simply use a locale which defaults to that?

Please let me know what you think.

@alexandrukis
Copy link
Author

@akx Are you still around? :)

akx added a commit that referenced this pull request May 28, 2018
Previously the script could have inadvertently merged formatting rules between numbering systems due to the XML selectors used.  This makes sure only Latin rules are used for the time being.  When support for other numbering systems is properly added (see #470), these checks can be changed.
akx added a commit that referenced this pull request May 28, 2018
Previously the script could have inadvertently merged formatting rules between numbering systems due to the XML selectors used.  This makes sure only Latin rules are used for the time being.  When support for other numbering systems is properly added (see #470), these checks can be changed.
@jace
Copy link
jace commented Jan 8, 2021

This is not being implemented?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0