8000 Add initial CONTRIBUTING.md file · gigapay/python-stdnum@b1dc313 · GitHub
[go: up one dir, main page]

Skip to content

Commit b1dc313

Browse files
committed
Add initial CONTRIBUTING.md file
Initial description of the information needed for adding new number formats and some coding and testing guidelines.
1 parent df894c3 commit b1dc313

File tree

3 files changed

+170
-0
lines changed

3 files changed

+170
-0
lines changed

CONTRIBUTING.md

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
Contributing to python-stdnum
2+
=============================
3+
4+
This document describes general guidelines for contributing new formats or
5+
other enhancement to python-stdnum.
6+
7+
8+
Adding number formats
9+
---------------------
10+
11+
Basically any number or code that has some validation mechanism available or
12+
some common formatting is eligible for inclusion into this library. If the
13+
only specification of the number is "it consists of 6 digits" implementing
14+
validation may not be that useful.
15+
16+
Contributions of new formats or requests to implement validation for a format
17+
should include the following:
18+
19+
* The format name and short description.
20+
* References to (official) sources that describe the format.
21+
* A one or two paragraph description containing more details of the number
22+
(e.g. purpose and issuer and possibly format information that might be
23+
useful to end users).
24+
* If available, a link to an (official) validation service for the number,
25+
reference implementations or similar sources that allow validating the
26+
correctness of the implementation.
27+
* A set of around 20 to 100 "real" valid numbers for testing (more is better
28+
during development but only around 100 will be retained for regression
29+
testing).
30+
* If the validation depends on some (online) list of formats, structures or
31+
parts of the identifier (e.g. a list of region codes that are part of the
32+
number) a way to easily update the registry information should be
33+
available.
34+
35+
36+
Code contributions
37+
------------------
38+
39+
Improvements to python-stdnum are most welcome. Integrating contributions
40+
will be done on a best-effort basis and can be made easier if the following
41+
are considered:
42+
43+
* Ideally contributions are made as GitHub pull requests, but contributions
44+
by email (privately or through the python-stdnum-users mailing list) can
45+
also be considered.
46+
* Submitted contributions will often be reformatted and sometimes
47+
restructured for consistency with other parts.
48+
* Contributions will be acknowledged in the release notes.
49+
* Contributions should add or update a copyright statement if you feel the
50+
contribution is significant.
51+
* All contribution should be made with compatible applicable copyright.
52+
* It is not needed to modify the NEWS, README.md or files under docs for new
53+
formats; these files will be updated on release.
54+
* Marking valid numbers as invalid should be avoided and are much worse than
55+
marking invalid numbers as valid. Since the primary use case for
56+
python-stdnum is to validate entered data having an implementation that
57+
results in "computer says no" should be avoided.
58+
* Number format implementations should include links to sources of
59+
information: generally useful links (e.g. more details about the number
60+
itself) should be in the module docstring, if it relates more to the
61+
implementation (e.g. pointer to reference implementation, online API
62+
documentation or similar) a comment in the code is better
63+
* Country-specific numbers and codes go in a country or region package (e.g.
64+
stdnum.eu.vat or stdnum.nl.bsn) while global numbers go in the toplevel
65+
name space (e.g. stdnum.isbn).
66+
* All code should be well tested and achieve 100% code coverage.
67+
* Existing code structure conventions (e.g. see README for interface) should
68+
be followed.
69+
* Git commit messages should follow the usual 7 rules.
70+
* Declarative or functional constructs are preferred over an iterative
71+
approach, e.g.::
72+
73+
s = sum(int(c) for c in number)
74+
75+
over::
76+
77+
s = 0
78+
for c in number:
79+
s += int(c)
80+
81+
82+
Testing
83+
-------
84+
85+
Tests can be run with `tox`. Some basic code style tests can be run with `tox
86+
-e flake8` and most other targets run the test suite with various supported
87+
Python interpreters.
88+
89+
Module implementations have a couple of smaller test cases that also serve as
90+
basic documentation of the happy flow.
91+
92+
More extensive tests are available, per module, in the tests directory. These
93+
tests (also doctests) cover more corner cases and should include a set of
94+
valid numbers that demonstrate that the module works correctly for real
95+
numbers.
96+
97+
The normal tests should never require online sources for execution. All
98+
functions that deal with online lookups (e.g. the EU VIES service for VAT
99+
validation) should only be tested using conditional unittests.
100+
101+
102+
Finding test numbers
103+
--------------------
104+
105+
Some company numbers are commonly published on a company's website contact
106+
page (e.g. VAT or other registration numbers, bank account numbers). Doing a
107+
web search limited to a country and some key words generally turn up a lot of
108+
pages with this information.
109+
110+
Another approach is to search for spreadsheet-type documents with some
111+
keywords that match the number. This sometimes turns up lists of companies
112+
(also occasionally works for personal identifiers).
113+
114+
For information that is displayed on ID cards or passports it is sometimes
115+
useful to do an image search.
116+
117+
For dealing with numbers that point to individuals it is important to:
118+
119+
* Only keep the data that is needed to test the implementation.
120+
* Ensure that no actual other data relation to a person or other personal
121+
information is kept or can be inferred from the kept data.
122+
* The presence of a number in the test set should not provide any information
123+
about the person (other than that there is a person with the number or
124+
information that is present in the number itself).
125+
126+
Sometimes numbers are part of a data leak. If this data is used to pick a few
127+
sample numbers from the selection should be random and the leak should not be
128+
identifiable from the picked numbers. For example, if the leaked numbers
129+
pertain only to people with a certain medical condition, membership of some
130+
organisation or other specific property the leaked data should not be used.
131+
132+
133+
Reverse engineering
134+
-------------------
135+
136+
Sometimes a number format clearly has a check digit but the algorithm is not
137+
publicly documented. It is sometimes possible to reverse engineer the used
138+
check digit algorithm from a large set of numbers.
139+
140+
For example, given numbers that, apart from the check digit, only differ in
141+
one digit will often expose the weights used. This works reasonably well if
142+
the algorithm uses modulo 11 is over a weighted sums over the digits.
143+
144+
See https://github.com/arthurdejong/python-stdnum/pull/203#issuecomment-623188812
145+
146+
147+
Registries
148+
----------
149+
150+
Some numbers or parts of numbers use validation base on a registry of known
151+
good prefixes, ranges or formats. It is only useful to fully base validation
152+
on these registries if the update frequency to these registries is very low.
153+
154+
If there is a registry that is used (a list of known values, ranges or
155+
otherwise) the downloaded information should be stored in a data file (see
156+
the stdnum.numdb module). Only the minimal amount of data should be kept (for
157+
validation or identification).
158+
159+
The data files should be able to be created and updated using a script in the
160+
`update` directory.

docs/contributing.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.. include:: ../CONTRIBUTING.md

docs/index.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -332,3 +332,12 @@ Changes in python-stdnum
332332
:maxdepth: 2
333333

334334
changes
335+
336+
337+
Contributing to python-stdnum
338+
-----------------------------
339+
340+
.. toctree::
341+
:maxdepth: 2
342+
343+
contributing

0 commit comments

Comments
 (0)
0