@@ -1397,64 +1397,3 @@ that implement common linear model patterns.
1397
1397
1398
1398
The :mod: `sklearn.utils.multiclass ` module contains useful functions
1399
1399
for working with multiclass and multilabel problems.
1400
-
1401
- .. _reading-code :
1402
-
1403
- Reading the existing code base
1404
- ==============================
1405
-
1406
- Reading and digesting an existing code base is always a difficult exercise
1407
- that takes time and experience to master. Even though we try to write simple
1408
- code in general, understanding the code can seem overwhelming at first,
1409
- given the sheer size of the project. Here is a list of tips that may help
1410
- make this task easier and faster (in no particular order).
1411
-
1412
- - Get acquainted with the :ref: `api_overview `: understand what :term: `fit `,
1413
- :term: `predict `, :term: `transform `, etc. are used for.
1414
- - Before diving into reading the code of a function / class, go through the
1415
- docstrings first and try to get an idea of what each parameter / attribute
1416
- is doing. It may also help to stop a minute and think *how would I do this
1417
- myself if I had to? *
1418
- - The trickiest thing is often to identify which portions of the code are
1419
- relevant, and which are not. In scikit-learn **a lot ** of input checking
1420
- is performed, especially at the beginning of the :term: `fit ` methods.
1421
- Sometimes, only a very small portion of the code is doing the actual job.
1422
- For example looking at the ``fit() `` method of
1423
- :class: `sklearn.linear_model.LinearRegression `, what you're looking for
1424
- might just be the call the ``scipy.linalg.lstsq ``, but it is buried into
1425
- multiple lines of input checking and the handling of different kinds of
1426
- parameters.
1427
- - Due to the use of `Inheritance
1428
- <https://en.wikipedia.org/wiki/Inheritance_(object-oriented_programming)> `_,
1429
- some methods may be implemented in parent classes. All estimators inherit
1430
- at least from :class: `BaseEstimator <sklearn.base.BaseEstimator> `, and
1431
- from a ``Mixin `` class (e.g. :class: `ClassifierMixin
1432
- <sklearn.base.ClassifierMixin> `) that enables default behaviour depending
1433
- on the nature of the estimator (classifier, regressor, transformer, etc.).
1434
- - Sometimes, reading the tests for a given function will give you an idea of
1435
- what its intended purpose is. You can use ``git grep `` (see below) to find
1436
- all the tests written for a function. Most tests for a specific
1437
- function/class are placed under the ``tests/ `` folder of the module
1438
- - You'll often see code looking like this:
1439
- ``out = Parallel(...)(delayed(some_function)(param) for param in
1440
- some_iterable) ``. This runs ``some_function `` in parallel using `Joblib
1441
- <https://joblib.readthedocs.io/> `_. ``out `` is then an iterable containing
1442
- the values returned by ``some_function `` for each call.
1443
- - We use `Cython <https://cython.org/ >`_ to write fast code. Cython code is
1444
- located in ``.pyx `` and ``.pxd `` files. Cython code has a more C-like
1445
- flavor: we use pointers, perform manual memory allocation, etc. Having
1446
- some minimal experience in C / C++ is pretty much mandatory here.
1447
- - Master your tools.
1448
-
1449
- - With such a big project, being efficient with your favorite editor or
1450
- IDE goes a long way towards digesting the code base. Being able to quickly
1451
- jump (or *peek *) to a function/class/attribute definition helps a lot.
1452
- So does being able to quickly see where a given name is used in a file.
1453
- - `git <https://git-scm.com/book/en >`_ also has some built-in killer
1454
- features. It is often useful to understand how a file changed over time,
1455
- using e.g. ``git blame `` (`manual
1456
- <https://git-scm.com/docs/git-blame> `_). This can also be done directly
1457
- on GitHub. ``git grep `` (`examples
1458
- <https://git-scm.com/docs/git-grep#_examples> `_) is also extremely
1459
- useful to see every occurrence of a pattern (e.g. a function call or a
1460
- variable) in the code base.
0 commit comments