8000 [MRG+2] DOC add links to github sourcecode in API reference by jnothman · Pull Request #2777 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+2] DOC add links to github sourcecode in API reference #2777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 21, 2014

Conversation

jnothman
Copy link
Member

I enjoy using the API reference, but think a link to the source code on github would often help.

Because sphinx.ext.linkcode does not provide the full path or line number of the class/function, I'm getting this stuff using grep, which relies on a unix shell and certain path structures, which is a bit not-nice. Since github needs a branch/commit reference, I'm using git rev-parse --abbrev-ref HEAD which can return a branch/tag name, but a more static commit reference (i.e. rev-parse --short) might be more sensible.

WDYT? ping @jaquesgrobler

@jaquesgrobler
Copy link
Member

I'm a fan of this.. There was some discussion on this idea a while back on #1680 but no consensus was reached..
I scanned through your code and looks fine .. Don't have time for an in detail look this second but will try later tonight or tomorrow.. that asside i'm +1 on this as a few people have requested this

👍

@jnothman
Copy link
Member Author

What do you think of making the compilation specific to a git checkout, and
more-or-less to a shell? I can obviously extract the grepped content
through equivalent Python (rather than shell) code, or by importing the
objects themselves (I think we already assume that the compiled Python
package is available at doc build time?) and using inspect. I don't know if
I can get the commit ref another nice way. Do you have a preference for how
this is done?

And I wouldn't mind if this was turned into a generic sphinx extension, but
don't think I'll have time to do so in the short term.

On 22 January 2014 03:17, Jaques Grobler notifications@github.com wrote:

I'm a fan of this.. There was some discussion on this idea a while back
on #1680 #1680 but
no consensus was reached..
I scanned through your code and looks good .. Don't have time for an in
detail look this second but will try later tonight or tomorrow.. that
asside i'm +1 on this

[image: 👍]


Reply to this email directly or view it on GitHubhttps://github.com//pull/2777#issuecomment-32900529
.

@jnothman
Copy link
Member Author

Advantages of using inspect over grep

  • not shell-specific (although calling system git rev-parse, or employing GitPython, is still needed)
  • can easily produce source links for methods as well as module-level functions
  • doesn't rely on relative path structures, instead using Python's path
    Disadvantages:
  • copy installed on path may not be in sync (but should be for the rest of doc compilation to work)
  • may be a litle slower than grep: not a single pass through files
  • may fail or get incorrect source for wrapped objects such as abstractmethods
  • requires an alternative solution (open and grep) for pyx objects (currently just sklearn.svm.libsvm.* of http://scikit-learn.org/stable/modules/classes.html#low-level-methods) that may require using relative path knowledge

@GaelVaroquaux
8000 Copy link
Member

I'd be much more favorable to using inspect rather than grep.

I'd also say that in a first time we could punt on the *.pyx.

@jnothman
Copy link
Member Author

I've updated with links for classes, functions and methods in Python (but not Cython), using inspect.

I've also changed from using rev-parse --abbrev-ref to rev-parse --short to give a fixed commit reference, rather than a name. Note that I still use the shell for this, but could change it to a file operation (find closest dir containing .git, read .git/HEAD, and dereference the ref from the refs dir), but I think it's safer to rely on the official tools.

1 similar comment
@jnothman
Copy link
Member Author

I've updated with links for classes, functions and methods in Python (but not Cython), using inspect.

I've also changed from using rev-parse --abbrev-ref to rev-parse --short to give a fixed commit reference, rather than a name. Note that I still use the shell for this, but could change it to a file operation (find closest dir containing .git, read .git/HEAD, and dereference the ref from the refs dir), but I think it's safer to rely on the official tools.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling cf91053 on jnothman:doc_linkcode into 8c3efd4 on scikit-learn:master.

@jnothman
Copy link
Member Author

I've now noticed scipy has similar, at https://github.com/scipy/scipy/blob/6a4460f68315f0669604054be91ceeacd606f0b6/doc/source/conf.py#L314. I need to look closer to see what cases they handle that we don't, etc. Since they explicitly version their dev builds with git revisions, they are able to use different commit references in the URL when for dev or release (has or tag name respectively).

@GaelVaroquaux
Copy link
Member

Hey Joel, I think I'll let you have z look at scipy before I review this PR.

@jnothman
Copy link
Member Author
jnothman commented Feb 1, 2014

Diffs:

  • scipy uses master or version tags. I'm using commit hashes
  • scipy handled __init__.py where I failed. Now fixed.

@jnothman
Copy link
Member Author

@GaelVaroquaux, I think we should pull this in, and see if there are any problems with the current commit refs.

@GaelVaroquaux
Copy link
Member

I'll have a look.

I however have a hard time feeling excited about this feature. For me it is additional code that will at some point fail and require debugging or evolution (all code does) for little feature.

I satisfy my need to look at source code by using "symbol??" in IPython.

Sorry, I should be thankful for the feature and it's not much code actually. It's just that I have the feeling that we are quite heavily loaded in terms of maintenance and I personally never have time to push features that I really need. So feel like focusing on the high-value code, where of course high value varies from one person to another. :)

return

try:
revision = linkcode_resolve.revision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit dense, but where is this set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a previous 8000 version of the code, apparently :s

@jnothman
Copy link
Member Author

Sure, I understand that. I prefer symbol?? too... except when I'm
following a link from the narrative doc to the API reference, and then want
to clarify the code.

On 13 February 2014 08:54, Gael Varoquaux notifications@github.com wrote:

I'll have a look.

I however have a hard time feeling excited about this feature. For me it
is additional code that will at some point fail and require debugging or
evolution (all code does) for little feature.

I satisfy my need to look at source code by using "symbol??" in IPython.

Sorry, I should be thankful for the feature and it's not much code
actually. It's just that I have the feeling that we are quite heavily
loaded in terms of maintenance and I personally never have time to push
features that I really need. So feel like focusing on the high-value code,
where of course high value varies from one person to another. :)

Reply to this email directly or view it on GitHubhttps://github.com//pull/2777#issuecomment-34922636
.

'blob/{revision}/{pre_path}{path}#L{lineno}')


def linkcode_resolve(domain, info):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you should indicate (eg in a comment, or in the docstring) that this is used by the sphinx linkcode extension.

@GaelVaroquaux
Copy link
Member

Lightweight and elegant code.

I made a couple of comment. My only additional and non trivial comments would be:

  • Could you please move the corresponding code in another file in the sphinx_ext. It would make the conf.py easier to read, and all you need is to import the right symbol (linkcode_resolve).
  • Can you check whether it is possible to have nosetest run tests in the sphinx_ext folder (it might require a bit of work on the setup.cfg). I am getting more and more nervous about or sphinx_ext folder and the fact that it is untested.

Thanks a lot for your work, as usual.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 4b7973c on jnothman:doc_linkcode into b895586 on scikit-learn:master.

@jnothman
Copy link
Member Author

I've recently moved to a new computer and it seems I can't even run make html-noplot.

I don't think there'll be a problem testing sphinxext if we want to.

On 13 February 2014 09:16, Coveralls notifications@github.com wrote:

[image: Coverage Status] https://coveralls.io/builds/514068

Coverage remained the same when pulling 4b7973c
4b7973c
on jnothman:doc_linkcode
into b895586
b895586
on scikit-learn:master
.

Reply to this email directly or view it on GitHubhttps://github.com//pull/2777#issuecomment-34924867
.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 8b87b78 on jnothman:doc_linkcode into b895586 on scikit-learn:master.

@jnothman
Copy link
Member Author

The refactored version can now be more easily tested, e.g. using standard library or sphinx modules for independence:

>>> from github_link import _linkcode_resolve
>>> _linkcode_resolve('py', {'module': 'distutils', 'fullname': 'config.PyPIRCCommand.initialize_options'},
...     package='distutils',
...     url_fmt='http://hg.python.org/cpython/file/{revision}/Lib/{package}/{path}#L{lineno}',
...     revision='xxxx')
'http://hg.python.org/cpython/file/xxxx/Lib/distutils/config.py#L105'

@kastnerkyle
Copy link
Member

Getting an error building the docs with make html after rebasing this branch with master:

Exception occurred:
  File "/volatile/accounts/kkastner/src/scikit-learn/doc/sphinxext/github_link.py", line 31, in _linkcode_resolve
    class_name = info['fullname'].encode('utf-8').split('.')[0]
TypeError: Type str doesn't support the buffer API

I am also getting an error with nosetests, in the doctest part

ERROR: Failure: ImportError (No module named 'github_link')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/volatile/accounts/kkastner/anaconda3/lib/python3.4/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/volatile/accounts/kkastner/anaconda3/lib/python3.4/site-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/volatile/accounts/kkastner/anaconda3/lib/python3.4/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/volatile/accounts/kkastner/anaconda3/lib/python3.4/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/volatile/accounts/kkastner/anaconda3/lib/python3.4/imp.py", line 235, in load_module
    return load_source(name, filename, file)
  File "/volatile/accounts/kkastner/anaconda3/lib/python3.4/imp.py", line 171, in load_source
    module = methods.load()
  File "<frozen importlib._bootstrap>", line 1220, in load
  File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1129, in _exec
  File "<frozen importlib._bootstrap>", line 1471, in exec_module
  File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
  File "/volatile/accounts/kkastner/src/scikit-learn/doc/conf.py", line 26, in <module>
    from github_link import make_linkcode_resolve
ImportError: No module named 'github_link'

I am using Python 3.4, other packages are from Anaconda 2.0 - looks like the first might be a Py3 strings issue, specifically buffer vs. memoryview? As for the second, @ogrisel mentioned a possible issue with local imports in python 3?

@kastnerkyle
Copy link
Member

In Python 2.7 I only get the github_link error - looks like the first is specific to Python 3.

Additionally, the docs that are built work fine locally (well, as good as master, which still has a few issues it seems). But, if you build from a git branch and click the link, the corresponding github text does not work. IE I get https://github.com/scikit-learn/scikit-learn/blob/03d48dd/sklearn/cluster/dbscan_.py#L178 . Changing 03d48dd to master, so the link becomes https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cluster/dbscan_.py#L178 , everything works as expected.

I don't know if this is critical or not - a local doc with git already has the source directly, and on the website it should work as expected (I think). Thoughts?

After talking with @ogrisel, it seems this is happening because of the rebase with master - since rebase changed the commit hash the link doesn't show up correctly. It should work fine once merged, beyond the github_link thing, which is not happening on Travis apparently.

@jnothman
Copy link
Member Author

@fabianp I'll revert it and see what happens.

@jnothman
Copy link
Member Author

I can't understand the AppVeyor interface - it seems to be missing the consoles that I usually go to for explanation of the failure.

@ogrisel
Copy link
Member
ogrisel commented Jul 15, 2014

@ogrisel
Copy link
Member
ogrisel commented Jul 15, 2014

I will disable appveyor in the mean time.

@ogrisel
Copy link
Member
ogrisel commented Jul 17, 2014

@fabianp I'll revert it and see what happens.

@jnothman what do you want to revert? Is this PR ready for merge or not?

@jnothman
Copy link
Member Author

I've done with revert, forcing a Travis check by rebasing.

@jnothman
Copy link
Member Author

Failing with

======================================================================
ERROR: Failure: ImportError (No module named github_link)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/nose/loader.py", line 411, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/travis/build/scikit-learn/scikit-learn/doc/conf.py", line 26, in <module>
    from github_link import make_linkcode_resolve
ImportError: No module named github_link

from functools import partial


def _get_git_revision(CMD='git rev-parse --short HEAD'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not crazy about this function definition. I can't think of any context where you'd want to call this function with an argument, and argument names shouldn't be all caps.

I'd make CMD=... a variable defined within the function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goes to current head, so everything is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just commenting about the style.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes default args are used as a not-quite-global way to store constants (which was more relevant when github_link wasn't its own module). However, I may have also done this because one may choose to use --abrev-ref (which shows the branch name) instead of --short (which shows the SHA). I'll change it to a global constant.

@jnothman
Copy link
Member Author

Ubuntu 14.04 LTS, Python3.4, NumPy 1.8.1: I get the same error, @kastnerkyle , on this branch, but not on master

@jnothman
Copy link
Member Author

Found and fixed, @kastnerkyle.

10000
@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 79afc76 on jnothman:doc_linkcode into 0d57c23 on scikit-learn:master.

@kastnerkyle
Copy link
Member

It builds fine for me now - the only thing I still see is

Warning: Embedding documentation hyperlinks requires internet access

because I am offline right now on my docbuild machine.

This seems fine to me but just wanted to report it. Other than that, this is 👍 for me

@jnothman
Copy link
Member Author

That's actually a different matter. It's referring to links to numpy and
scipy documentation.

On 21 July 2014 21:21, Kyle Kastner notifications@github.com wrote:

It builds fine for me now - the only thing I still see is

Warning: Embedding documentation hyperlinks requires internet access

because I am offline.

This seems fine to me but just wanted to report it. Other than that, this
is [image: 👍] for me


Reply to this email directly or view it on GitHub
#2777 (comment)
.

@kastnerkyle
Copy link
Member

OK great. I am usually online all the time - my connection bumped sometime during the doc build I guess. This seems like a really nice feature for usability and browsing!

@jnothman jnothman changed the title [MRG+1] DOC add links to github sourcecode in API reference [MRG+2] DOC add links to github sourcecode in API reference Jul 21, 2014
@ogrisel
Copy link
Member
ogrisel commented Jul 21, 2014

I am building the doc from this PR on my box as well. Will let you know when it's done.

@ogrisel
Copy link
Member
ogrisel commented Jul 21, 2014

Under Python 3.4 I get URL such as:

https://github.com/scikit-learn/scikit-learn/blob/b%2779afc76%27/sklearn/ensemble/forest.py#L1116

The git hash should be decode with the .decode('ascii') method prior to building the URL.

@kastnerkyle
Copy link
Member

Good catch - I didn't notice cause I have no internet :)

On Mon, Jul 21, 2014 at 2:01 PM, Olivier Grisel notifications@github.com
wrote:

Under Python 3.4 I get URL such as:

https://github.com/scikit-learn/scikit-learn/blob/b%2779afc76%27/sklearn/ensemble/forest.py#L1116

The git hash should be decode with the .decode('ascii') method prior to
building the URL.


Reply to this email directly or view it on GitHub
#2777 (comment)
.

@ogrisel
Copy link
Member
ogrisel commented Jul 21, 2014

If I fix the URL manually to remobe the leading b' and the trailing ' in my browser, it I get the right page / line in my browser.

'blob/{revision}/{package}/'
'{path}#L{lineno}')
"""
revision = _get_git_revision()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revision = _get_git_revision().decode('ascii') for instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's probably better to fold the decode call into the _get_git_revision function itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ogrisel

@jnothman
Copy link
Member Author

I've squashed the commit history.

@ogrisel
Copy link
Member
ogrisel commented Jul 21, 2014

Before your pushed 060c248, I launched a full make clean html under Python 3 with the following change:

diff --git a/doc/sphinxext/github_link.py b/doc/sphinxext/github_link.py
index 7e606a6..4f28477 100644
--- a/doc/sphinxext/github_link.py
+++ b/doc/sphinxext/github_link.py
@@ -14,7 +14,7 @@ def _get_git_revision():
     except subprocess.CalledProcessError:
         print('Failed to execute git to get revision')
         return None
-    return revision
+    return revision.decode('ascii')

and I confirm that it works as expected. I have not checked with Python 2 though.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 060c248 on jnothman:doc_linkcode into e79d2bb on scikit-learn:master.

@jnothman
Copy link
Member Author

I'm now generating the URL as a str in both versions; it works in both. If you'd rather an explicit unicode string in Py2, I'll try that now.

@jnothman
Copy link
Member Author

Changed to produce a unicode string.

@ogrisel
Copy link
Member
ogrisel commented Jul 21, 2014
  • bytes is the type for byte strings in Python 3
  • str is the type for byte strings in Python 2 and the type for unicode strings in Python 3
  • unicode is the type for unicode string in Python 2

So I think we should always decode the bytes to generate URL that is a unicode string, that is str in Python 3 and unicode in Python 2.

@ogrisel
Copy link
Member
ogrisel commented Jul 21, 2014

Thanks for the fix. merging!

ogrisel added a commit that referenced this pull request Jul 21, 2014
[MRG+2] DOC add links to github sourcecode in API reference
@ogrisel ogrisel merged commit 7a7fca8 into scikit-learn:master Jul 21, 2014
@jnothman
Copy link
Member Author

Great! I look forward to using this when introducing people to particular scikit-learn components.

@jnothman jnothman deleted the doc_linkcode branch July 21, 2014 21:18
@arjoly
Copy link
Member
arjoly commented Jul 22, 2014

Does it point on master or a particular branch?

@jnothman
Copy link
Member Author

A commit ref, to ensure line number match. It's live, e.g.
http://scikit-learn.org/dev/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator

On 22 July 2014 17:45, Arnaud Joly notifications@github.com wrote:

Does it point on master or a particular branch?


Reply to this email directly or view it on GitHub
#2777 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants
0