8000 Put it in contributing.rst · NicolasHug/scikit-learn@d1c6e7b · GitHub
[go: up one dir, main page]

Skip to content

Commit d1c6e7b

Browse files
committed
Put it in contributing.rst
1 parent 43d2351 commit d1c6e7b

File tree

2 files changed

+53
-53
lines changed

2 files changed

+53
-53
lines changed

doc/developers/contributing.rst

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1395,3 +1395,56 @@ that implement common linear model patterns.
13951395

13961396
The :mod:`sklearn.utils.multiclass` module contains useful functions
13971397
for working with multiclass and multilabel problems.
1398+
1399+
Reading the existing code base
1400+
==============================
1401+
1402+
Reading and digesting an existing code base is always a difficult exercise that
1403+
takes time and experience to master. Even though we try to write simple code in
1404+
scikit-learn, understanding the code can seem overwhelming at first, given the
1405+
sheer size of the project. Here is a list of tips that may help make this task
1406+
easier and faster (in no particular order).
1407+
1408+
- Get acquainted with the :ref:`api_overview`: understand what :term:`fit`,
1409+
:term:`predict`, :term:`transform`, etc. are used for.
1410+
- Before diving into reading the code of a function / class, go through the
1411+
docstrings first and try to get an idea of what each parameter / attribute
1412+
is doing. It may also help to stop a minute and think *how would I do this
1413+
myself if I had to?*.
1414+
- The trickiest thing is often to identify which portions of the code are
1415+
relevent, and which are not. In scikit-learn **a lot** of input checking
1416+
is performed, especially at the beginning of the :term:`fit` methods.
1417+
Sometimes, only a very small portion of the code is doing the actual job. For
1418+
example looking at the ``fit()`` method of
1419+
:class:`sklearn.linear_model.LinearRegression`, what you're looking for
1420+
might just be the call the ``scipy.linalg.lstsq``, but it is burried into
1421+
multiple lines of input checking and the handling of different kinds of
1422+
parameters.
1423+
- Sometimes, reading the tests for a given function will give you an idea of
1424+
what is its intended purpose. You can use ``git grep`` (see below) to find
1425+
all the tests written for a function.
1426+
- You'll often see code looking like this:
1427+
``out = Parallel(...)(delayed(some_function)(param) for param in
1428+
some_iterable)``. This runs ``some_function`` in parallel using `Joblib
1429+
<https://joblib.readthedocs.io/>`_. ``out`` is then an iterable containing
1430+
the values returned by ``some_function`` for each call.
1431+
- We use `Cython <https://cython.org/>`_ to write fast code. Cython code is
1432+
located in ``.pyx`` and ``.pxd`` files. Cython code has a more C-like
1433+
flavor: we use pointers, perform manual memory allocation, use OUT
1434+
variables (variables whose value is changed after a function call, which
1435+
is frowned upon in pure Python but extremely common in C), etc. Having
1436+
some minimal experience in C / C++ is pretty much mandatory here.
1437+
- Master your tools.
1438+
1439+
- With such a big project, being efficient with your favorite editor or
1440+
IDE goes a long way towards digesting the code base. Being able to quickly
1441+
jump (or *peek*) to a function/class/attribute definition helps a lot.
1442+
So does being able to quickly see where a given name is used in a file.
1443+
- `git <https://git-scm.com/book/en>`_ also has some built-in killer
1444+
features. It is often useful to understand how a file changed over time,
1445+
using e.g. ``git blame`` (`manual
1446+
<https://git-scm.com/docs/git-blame>`_). This can also be done directly
1447+
on GitHub. ``git grep`` (`examples
1448+
<https://git-scm.com/docs/git-grep#_examples>`_) is also extremely
1449+
useful to see every occurence of a pattern (e.g. a function call or a
1450+
variable) in the code base.

doc/developers/tips.rst

Lines changed: 0 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -256,56 +256,3 @@ give you clues as to the source of your memory error.
256256

257257
For more information on valgrind and the array of options it has, see the
258258
tutorials and documentation on the `valgrind web site <http://valgrind.org>`_.
259-
260-
Reading the existing code base
261-
==============================
262-
263-
Reading and digesting an existing code base is always a difficult exercise that
264-
takes time and experience to master. Even though we try to write simple code in
265-
scikit-learn, understanding the code can seem overwhelming at first, given the
266-
sheer size of the project. Here is a list of tips that may help make this task
267-
easier and faster (in no particular order).
268-
269-
- Get acquainted with the :ref:`api_overview`: understand what :term:`fit`,
270-
:term:`predict`, :term:`transform`, etc. are used for.
271-
- Before diving into reading the code of a function / class, go through the
272-
docstrings first and try to get an idea of what each parameter / attribute
273-
is doing. It may also help to stop a minute and think *how would I do this
274-
myself if I had to?*.
275-
- The trickiest thing is often to identify which portions of the code are
276-
relevent, and which are not. In scikit-learn **a lot** of input checking
277-
is performed, especially at the beginning of the :term:`fit` methods.
278-
Sometimes, only a very small portion of the code is doing the actual job. For
279-
example looking at the ``fit()`` method of
280-
:class:`sklearn.linear_model.LinearRegression`, what you're looking for
281-
might just be the call the ``scipy.linalg.lstsq``, but it is burried into
282-
multiple lines of input checking and the handling of different kinds of
283-
parameters.
284-
- Sometimes, reading the tests for a given function will give you an idea of
285-
what is its intended purpose. You can use ``git grep`` (see below) to find
286-
all the tests written for a function.
287-
- You'll often see code looking like this:
288-
``out = Parallel(...)(delayed(some_function)(param) for param in
289-
some_iterable)``. This runs ``some_function`` in parallel using `Joblib
290-
<https://joblib.readthedocs.io/>`_. ``out`` is then an iterable containing
291-
the values returned by ``some_function`` for each call.
292-
- We use `Cython <https://cython.org/>`_ to write fast code. Cython code is
293-
located in ``.pyx`` and ``.pxd`` files. Cython code has a more C-like
294-
flavor: we use pointers, perform manual memory allocation, use OUT
295-
variables (variables whose value is changed after a function call, which
296-
is frowned upon in pure Python but extremely common in C), etc. Having
297-
some minimal experience in C / C++ is pretty much mandatory here.
298-
- Master your tools.
299-
300-
- With such a big project, being efficient with your favorite editor or
301-
IDE goes a long way towards digesting the code base. Being able to quickly
302-
jump (or *peek*) to a function/class/attribute definition helps a lot.
303-
So does being able to quickly see where a given name is used in a file.
304-
- `git <https://git-scm.com/book/en>`_ also has some built-in killer
305-
features. It is often useful to understand how a file changed over time,
306-
using e.g. ``git blame`` (`manual
307-
<https://git-scm.com/docs/git-blame>`_). This can also be done directly
308-
on GitHub. ``git grep`` (`examples
309-
<https://git-scm.com/docs/git-grep#_examples>`_) is also extremely
310-
useful to see every occurence of a pattern (e.g. a function call or a
311-
variable) in the code base.

0 commit comments

Comments
 (0)
0