@@ -256,56 +256,3 @@ give you clues as to the source of your memory error.
256
256
257
257
For more information on valgrind and the array of options it has, see the
258
258
tutorials and documentation on the `valgrind web site <http://valgrind.org >`_.
259
-
260
- Reading the existing code base
261
- ==============================
262
-
263
- Reading and digesting an existing code base is always a difficult exercise that
264
- takes time and experience to master. Even though we try to write simple code in
265
- scikit-learn, understanding the code can seem overwhelming at first, given the
266
- sheer size of the project. Here is a list of tips that may help make this task
267
- easier and faster (in no particular order).
268
-
269
- - Get acquainted with the :ref: `api_overview `: understand what :term: `fit `,
270
- :term: `predict `, :term: `transform `, etc. are used for.
271
- - Before diving into reading the code of a function / class, go through the
272
- docstrings first and try to get an idea of what each parameter / attribute
273
- is doing. It may also help to stop a minute and think *how would I do this
274
- myself if I had to? *.
275
- - The trickiest thing is often to identify which portions of the code are
276
- relevent, and which are not. In scikit-learn **a lot ** of input checking
277
- is performed, especially at the beginning of the :term: `fit ` methods.
278
- Sometimes, only a very small portion of the code is doing the actual job. For
279
- example looking at the ``fit() `` method of
280
- :class: `sklearn.linear_model.LinearRegression `, what you're looking for
281
- might just be the call the ``scipy.linalg.lstsq ``, but it is burried into
282
- multiple lines of input checking and the handling of different kinds of
283
- parameters.
284
- - Sometimes, reading the tests for a given function will give you an idea of
285
- what is its intended purpose. You can use ``git grep`` (see below) to find
286
- all the tests written for a function.
287
- - You'll often see code looking like this:
288
- ``out = Parallel(...)(delayed(some_function)(param) for param in
289
- some_iterable) ``. This runs ``some_function `` in parallel using `Joblib
290
- <https://joblib.readthedocs.io/> `_. ``out `` is then an iterable containing
291
- the values returned by ``some_function `` for each call.
292
- - We use `Cython <https://cython.org/ >`_ to write fast code. Cython code is
293
- located in ``.pyx `` and ``.pxd `` files. Cython code has a more C-like
294
- flavor: we use pointers, perform manual memory allocation, use OUT
295
- variables (variables whose value is changed after a function call, which
296
- is frowned upon in pure Python but extremely common in C), etc. Having
297
- some minimal experience in C / C++ is pretty much mandatory here.
298
- - Master your tools.
299
-
300
- - With such a big project, being efficient with your favorite editor or
301
- IDE goes a long way towards digesting the code base. Being able to quickly
302
- jump (or *peek *) to a function/class/attribute definition helps a lot.
303
- So does being able to quickly see where a given name is used in a file.
304
- - `git <https://git-scm.com/book/en >`_ also has some built-in killer
305
- features. It is often useful to understand how a file changed over time,
306
- using e.g. ``git blame `` (`manual
307
- <https://git-scm.com/docs/git-blame> `_). This can also be done directly
308
- on GitHub. ``git grep `` (`examples
309
- <https://git-scm.com/docs/git-grep#_examples> `_) is also extremely
310
- useful to see every occurence of a pattern (e.g. a function call or a
311
- variable) in the code base.
0 commit comments