[go: up one dir, main page]

Skip to content

LBeaudoux/iso639

Repository files navigation

iso639-lang

PyPI Supported Python versions PyPI - Downloads

iso639-lang handles the ISO 639 code for individual languages and language groups.

>>> from iso639 import Lang
>>> Lang("French")
Lang(name='French', pt1='fr', pt2b='fre', pt2t='fra', pt3='fra', pt5='')

Installation

$ pip install iso639-lang

iso639-lang supports Python 3.8+.

Usage

Begin by importing the Lang class.

>>> from iso639 import Lang

Let's try with the identifier of an individual language.

>>> lg = Lang("deu")
>>> lg.name # 639-3 reference name
'German'
>>> lg.pt1 # 639-1 identifier
'de'
>>> lg.pt2b # 639-2/B bibliographic identifier
'ger'
>>> lg.pt2t # 639-2/T terminological identifier
'deu'
>>> lg.pt3 # 639-3 identifier
'deu'

And now with the identifier of a group of languages.

>>> lg = Lang("cel")
>>> lg.name # 639-5 English name
'Celtic languages'
>>> lg.pt2b # 639-2/B bibliographic identifier
'cel'
>>> lg.pt2t # 639-2/T terminological identifier
'cel'
>>> lg.pt5 # 639-5 identifier
'cel'

Lang is instantiable with any ISO 639 identifier or reference name.

>>> Lang("German") == Lang("de") == Lang("deu") == Lang("ger")
True

Lang also recognizes all non-reference English names associated with a language identifier in ISO 639.

>>> Lang("Chinese, Mandarin") # 639-3 inverted name
Lang(name='Mandarin Chinese', pt1='', pt2b='', pt2t='', pt3='cmn', pt5='')
>>> Lang("Uyghur") # other 639-3 printed name
Lang(name='Uighur', pt1='ug', pt2b='uig', pt2t='uig', pt3='uig', pt5='')
>>> Lang("Valencian") # other 639-2 English name
Lang(name='Catalan', pt1='ca', pt2b='cat', pt2t='cat', pt3='cat', pt5='')

Please note that Lang is case-sensitive.

>>> Lang("ak")
Lang(name='Akan', pt1='ak', pt2b='aka', pt2t='aka', pt3='aka', pt5='')
>>> Lang("Ak")
Lang(name='Ak', pt1='', pt2b='', pt2t='', pt3='akq', pt5='')

You can use the asdict method to return ISO 639 values as a Python dictionary.

>>> Lang("fra").asdict()
{'name': 'French', 'pt1': 'fr', 'pt2b': 'fre', 'pt2t': 'fra', 'pt3': 'fra', 'pt5': ''}

Other Language Names

In addition to their reference name, some language identifiers may be associated with other names. You can list them using the other_names method.

>>> lg = Lang("ast")
>>> lg.name
'Asturian'
>>> lg.other_names()
['Asturleonese', 'Bable', 'Leonese']

Language Types

The type of a language is accessible thanks to the type method.

>>> lg = Lang("Latin")
>>> lg.type()
'Historical'

Macrolanguages

You can easily determine whether a language is a macrolanguage or an individual language.

>>> lg = Lang("Arabic")
>>> lg.scope()
'Macrolanguage'

Use the macro method to get the macrolanguage of an individual language.

>>> lg = Lang("Wu Chinese")
>>> lg.macro()
Lang(name='Chinese', pt1='zh', pt2b='chi', pt2t='zho', pt3='zho', pt5='')

Conversely, you can also list all the individual languages that share a common macrolanguage.

>>> lg = Lang("Persian")
>>> lg.individuals()
[Lang(name='Iranian Persian', pt1='', pt2b='', pt2t='', pt3='pes', pt5=''), 
Lang(name='Dari', pt1='', pt2b='', pt2t='', pt3='prs', pt5='')]

In Data Structures

As Lang is hashable, Lang instances can be added to a set or used as dictionary keys.

>>> {Lang("de"): "foo", Lang("fr"):  "bar"}
{Lang(name='German', pt1='de', pt2b='ger', pt2t='deu', pt3='deu', pt5=''): 'foo', Lang(name='French', pt1='fr', pt2b='fre', pt2t='fra', pt3='fra', pt5=''): 'bar'}

Lists of Lang instances are sortable by name.

>>> [lg.name for lg in sorted([Lang("deu"), Lang("rus"), Lang("eng")])]
['English', 'German', 'Russian']

Iterator

iter_langs() iterates through all possible Lang instances, ordered alphabetically by name.

>>> from iso639 import iter_langs
>>> [lg.name for lg in iter_langs()]
["'Are'are", "'Auhelawa", "A'ou", ... , 'ǂHua', 'ǂUngkue', 'ǃXóõ']

Exceptions

When an invalid language value is passed to Lang, an InvalidLanguageValue exception is raised.

>>> from iso639.exceptions import InvalidLanguageValue
>>> try:
...     Lang("foobar")
... except InvalidLanguageValue as e:
...     e.msg
... 
"'foobar' is not a valid Lang argument."

When a deprecated language value is passed to Lang, a DeprecatedLanguageValue exception is raised.

>>> from iso639.exceptions import DeprecatedLanguageValue
>>> try:
...     Lang("gsc")
... except DeprecatedLanguageValue as e:
...     lg = Lang(e.change_to)
...     f"{e.name} replaced by {lg.name}."
...
'Gascon replaced by Occitan (post 1500).'

Note that you can use the is_language language checker if you don't want to handle exceptions.

Checker

The is_language function checks if a language value is valid according to ISO 639.

>>> from iso639 import is_language
>>> is_language("fr")
True
>>> is_language("French")
True

You can restrict the check to certain identifiers or names by passing an additional argument.

>>> is_language("fr", "pt3") # only 639-3
False
>>> is_language("fre", ("pt2b", "pt2t")) # only 639-2/B or 639-2/T
True

Speed

iso639-lang loads its mappings into memory to process calls much faster than Python libraries that rely on an embedded database.

Sources

As of October 23, 2024, iso639-lang is based on the latest tables provided by the ISO 639 registration authorities. Please open a new issue if you find that this library uses out-of-date data files.

Set Description Registration Authority Last Modified
Set 1 two-letter language identifiers for major, mostly national individual languages Infoterm 2009-09-01
Set 2 three-letter language identifiers for a larger number of widely known individual languages and a number of language groups Library of Congress 2017-12-21
Set 3 three-letter language identifiers covering all individual languages, including living, extinct and ancient languages SIL International 2024-04-15
Set 5 three-letter language identifiers covering a larger set of language groups, living and extinct Library of Congress 2013-02-11

To learn more about how the source tables are processed by the iso639-lang library, read the generate.py script.