@@ -1397,3 +1397,64 @@ that implement common linear model patterns.
1397
1397
1398
1398
The :mod: `sklearn.utils.multiclass ` module contains useful functions
1399
1399
for working with multiclass and multilabel problems.
1400
+
1401
+ .. _reading-code :
1402
+
1403
+ Reading the existing code base
1404
+ ==============================
1405
+
1406
+ Reading and digesting an existing code base is always a difficult exercise
1407
+ that takes time and experience to master. Even though we try to write simple
1408
+ code in general, understanding the code can seem overwhelming at first,
1409
+ given the sheer size of the project. Here is a list of tips that may help
1410
+ make this task easier and faster (in no particular order).
1411
+
1412
+ - Get acquainted with the :ref: `api_overview `: understand what :term: `fit `,
1413
+ :term: `predict `, :term: `transform `, etc. are used for.
1414
+ - Before diving into reading the code of a function / class, go through the
1415
+ docstrings first and try to get an idea of what each parameter / attribute
1416
+ is doing. It may also help to stop a minute and think *how would I do this
1417
+ myself if I had to? *
1418
+ - The trickiest thing is often to identify which portions of the code are
1419
+ relevant, and which are not. In scikit-learn **a lot ** of input checking
1420
+ is performed, especially at the beginning of the :term: `fit ` methods.
1421
+ Sometimes, only a very small portion of the code is doing the actual job.
1422
+ For example looking at the ``fit() `` method of
1423
+ :class: `sklearn.linear_model.LinearRegression `, what you're looking for
1424
+ might just be the call the ``scipy.linalg.lstsq ``, but it is buried into
1425
+ multiple lines of input checking and the handling of different kinds of
1426
+ parameters.
1427
+ - Due to the use of `Inheritance
1428
+ <https://en.wikipedia.org/wiki/Inheritance_(object-oriented_programming)> `_,
1429
+ some methods may be implemented in parent classes. All estimators inherit
1430
+ at least from :class: `BaseEstimator <sklearn.base.BaseEstimator> `, and
1431
+ from a ``Mixin `` class (e.g. :class: `ClassifierMixin
1432
+ <sklearn.base.ClassifierMixin> `) that enables default behaviour depending
1433
+ on the nature of the estimator (classifier, regressor, transformer, etc.).
1434
+ - Sometimes, reading the tests for a given function will give you an idea of
1435
+ what its intended purpose is. You can use ``git grep `` (see below) to find
1436
+ all the tests written for a function. Most tests for a specific
1437
+ function/class are placed under the ``tests/ `` folder of the module
1438
+ - You'll often see code looking like this:
1439
+ ``out = Parallel(...)(delayed(some_function)(param) for param in
1440
+ some_iterable) ``. This runs ``some_function `` in parallel using `Joblib
1441
+ <https://joblib.readthedocs.io/> `_. ``out `` is then an iterable containing
1442
+ the values returned by ``some_function `` for each call.
1443
+ - We use `Cython <https://cython.org/ >`_ to write fast code. Cython code is
1444
+ located in ``.pyx `` and ``.pxd `` files. Cython code has a more C-like
1445
+ flavor: we use pointers, perform manual memory allocation, etc. Having
1446
+ some minimal experience in C / C++ is pretty much mandatory here.
1447
+ - Master your tools.
1448
+
1449
+ - With such a big project, being efficient with your favorite editor or
1450
+ IDE goes a long way towards digesting the code base. Being able to quickly
1451
+ jump (or *peek *) to a function/class/attribute definition helps a lot.
1452
+ So does being able to quickly see where a given name is used in a file.
1453
+ - `git <https://git-scm.com/book/en >`_ also has some built-in killer
1454
+ features. It is often useful to understand how a file changed over time,
1455
+ using e.g. ``git blame `` (`manual
1456
+ <https://git-scm.com/docs/git-blame> `_). This can also be done directly
1457
+ on GitHub. ``git grep `` (`examples
1458
+ <https://git-scm.com/docs/git-grep#_examples> `_) is also extremely
1459
+ useful to see every occurrence of a pattern (e.g. a function call or a
1460
+ variable) in the code base.
0 commit comments