8000 DOC Added tips for reading the code base (#12874) · xhluca/scikit-learn@a3748d2 · GitHub
[go: up one dir, main page]

Skip to content

Commit a3748d2

Browse files
NicolasHugXing
authored and
Xing
committed
DOC Added tips for reading the code base (scikit-learn#12874)
* Added tips for reading the code base * Put it in contributing.rst * Added bullet point about inheritance
1 parent 0bc8a94 commit a3748d2

File tree

2 files changed

+62
-0
lines changed

2 files changed

+62
-0
lines changed

CONTRIBUTING.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ Quick links to:
3434
* [Submitting a bug report or feature request](http://scikit-learn.org/dev/developers/contributing.html#submitting-a-bug-report-or-a-feature-request)
3535
* [Contributing code](http://scikit-learn.org/dev/developers/contributing.html#contributing-code)
3636
* [Coding guidelines](http://scikit-learn.org/dev/developers/contributing.html#coding-guidelines)
37+
* [Tips to read current code](http://scikit-learn.org/dev/developers/contributing.html#reading-code)
3738

3839
Code of Conduct
3940
---------------

doc/developers/contributing.rst

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1397,3 +1397,64 @@ that implement common linear model patterns.
13971397

13981398
The :mod:`sklearn.utils.multiclass` module contains useful functions
13991399
for working with multiclass and multilabel problems.
1400+
1401+
.. _reading-code:
1402+
1403+
Reading the existing code base
1404+
==============================
1405+
1406+
Reading and digesting an existing code base is always a difficult exercise
1407+
that takes time and experience to master. Even though we try to write simple
1408+
code in general, understanding the code can seem overwhelming at first,
1409+
given the sheer size of the project. Here is a list of tips that may help
1410+
make this task easier and faster (in no particular order).
1411+
1412+
- Get acquainted with the :ref:`api_overview`: understand what :term:`fit`,
1413+
:term:`predict`, :term:`transform`, etc. are used for.
1414+
- Before diving into reading the code of a function / class, go through the
1415+
docstrings first and try to get an idea of what each parameter / attribute
1416+
is doing. It may also help to stop a minute and think *how would I do this
1417+
myself if I had to?*
1418+
- The trickiest thing is often to identify which portions of the code are
1419+
relevant, and which are not. In scikit-learn **a lot** of input checking
1420+
is performed, especially at the beginning of the :term:`fit` methods.
1421+
Sometimes, only a very small portion of the code is doing the actual job.
1422+
For example looking at the ``fit()`` method of
1423+
:class:`sklearn.linear_model.LinearRegression`, what you're looking for
1424+
might just be the call the ``scipy.linalg.lstsq``, but it is buried into
1425+
multiple lines of input checking and the handling of different kinds of
1426+
parameters.
1427+
- Due to the use of `Inheritance
1428+
<https://en.wikipedia.org/wiki/Inheritance_(object-oriented_programming)>`_,
1429+
some methods may be implemented in parent classes. All estimators inherit
1430+
at least from :class:`BaseEstimator <sklearn.base.BaseEstimator>`, and
1431+
from a ``Mixin`` class (e.g. :class:`ClassifierMixin
1432+
<sklearn.base.ClassifierMixin>`) that enables default behaviour depending
1433+
on the nature of the estimator (classifier, regressor, transformer, etc.).
1434+
- Sometimes, reading the tests for a given function will give you an idea of
1435+
what its intended purpose is. You can use ``git grep`` (see below) to find
1436+
all the tests written for a function. Most tests for a specific
1437+
function/class are placed under the ``tests/`` folder of the module
1438+
- You'll often see code looking like this:
1439+
``out = Parallel(...)(delayed(some_function)(param) for param in
1440+
some_iterable)``. This runs ``some_function`` in parallel using `Joblib
1441+
<https://joblib.readthedocs.io/>`_. ``out`` is then an iterable containing
1442+
the values returned by ``some_function`` for each call.
1443+
- We use `Cython <https://cython.org/>`_ to write fast code. Cython code is
1444+
located in ``.pyx`` and ``.pxd`` files. Cython code has a more C-like
1445+
flavor: we use pointers, perform manual memory allocation, etc. Having
1446+
some minimal experience in C / C++ is pretty much mandatory here.
1447+
- Master your tools.
1448+
1449+
- With such a big project, being efficient with your favorite editor or
1450+
IDE goes a long way towards digesting the code base. Being able to quickly
1451+
jump (or *peek*) to a function/class/attribute definition helps a lot.
1452+
So does being able to quickly see where a given name is used in a file.
1453+
- `git <https://git-scm.com/book/en>`_ also has some built-in killer
1454+
features. It is often useful to understand how a file changed over time,
1455+
using e.g. ``git blame`` (`manual
1456+
<https://git-scm.com/docs/git-blame>`_). This can also be done directly
1457+
on GitHub. ``git grep`` (`examples
1458+
<https://git-scm.com/docs/git-grep#_examples>`_) is also extremely
1459+
useful to see every occurrence of a pattern (e.g. a function call or a
1460+
variable) in the code base.

0 commit comments

Comments
 (0)
0