8000 DOC Reorganize general installation instructions by cmarmo · Pull Request #15011 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC Reorganize general installation instructions #15011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Oct 4, 2019

Conversation

cmarmo
Copy link
Contributor
@cmarmo cmarmo commented Sep 18, 2019

Reference Issues/PRs

See also this discussion.

What does this implement/fix? Explain your changes.

This PR

  • update installation instructions from distributions
  • merge installation.rst and other_distributions.rst
  • add python3 virtualenv references in pip standard installation

Any other comments?

  • python virtualenv is unavoidable in practice because depending on the system pip --user or pip3 --user give different interactions with installed dependencies (tried on Fedora and Debian).
  • it is not clear to me if OpenMP needs to be added to the requirements or not (in latest release instructions), even if I will probably say yes ... could you please precise? Thanks!

Copy link
Member
@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few first comments. I'll take a deeper look tomorrow

@ogrisel
Copy link
Member
ogrisel commented Sep 19, 2019

it is not clear to me if OpenMP needs to be added to the requirements or not (in latest release instructions), even if I will probably say yes ... could you please precise?

TL;DNR: this is only necessary in the section on "building scikit-learn from source"

The necessary OpenMP runtimes are automatically installed when installing a binary release of scikit-learn (wheels, conda packages or Linux distribution packages), so no need to mention anything in this case.

When building scikit-learn from source, it's necessary to use a compiler configured with the proper OpenMP runtime. gcc/g++ always come with their libgomp runtime on linux so nothing special is required. This is also the case if gcc/g++ are installed with homebrew on macOS.
On Windows, the Microsoft compilers also come with their own OpenMP implementation by default.

The only tricky case is the clang compiler shipped by default on macOS. On this system it is necessary to manually install llvm-openmp (also known as libomp) either via homebrow or (condat tricky to do) and configure additional environment variable to make clang use it when building scikit-learn.

Copy link
Member
@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been discussed many times before whether we really want to document how to create a virtual env... We always decided against it. There are many different ways to create a virtual env (I personally use virtualenvwrapper, other will use something else). The problem is that documenting this properly requires a lot of work, not just giving an example of how it may be done.

I'd rather say something like "in practice, you most likely want to use a virtual env" and link to an external resource that we trust.


(Writing this here because github doesn't let me comment on the code since it wasn't changed)

In the Installing the latest release section, the first part about "scikit-learn requires" could be removed since these dependencies are now installed directly.

Also I would move the note about matplotlib and the warning at the end of the section. The installation instruction should be a the very top of the section

@cmarmo
Copy link
Contributor Author
cmarmo commented Oct 1, 2019

It's been discussed many times before whether we really want to document how to create a virtual env... We always decided against it. There are many different ways to create a virtual env (I personally use virtualenvwrapper, other will use something else). The problem is that documenting this properl 8000 y requires a lot of work, not just giving an example of how it may be done.

But without virtual env the user needs to improvise: I have installed scikit-learn from scratch , not beeing a scikit-learn developer (ie without previous isolated environment) using the spacy documentation https://spacy.io/usage ... So I tried to reproduce it.
I can try to reproduce the css scheme too but maybe this is for another PR...

In the Installing the latest release section, the first part about "scikit-learn requires" could be removed since these dependencies are now installed directly.

Also I would move the note about matplotlib and the warning at the end of the section. The installation instruction should be a the very top of the section

Fixed in fcb1341

@NicolasHug
Copy link
Member

what do you mean improvise?

@cmarmo
Copy link
Contributor Author
cmarmo commented Oct 2, 2019

what do you mean improvise?

Sorry, my English is far from perfect... :(
I mean that, as a beginner, I've found the scikit-learn documentation really amazing with respect to the machine learning stuff, but not really useful about the installation steps... so I've looked around on other packages instructions and made something up: the only way the installation went through without errors (executables not found, python2 incompatibility, no admin rights,...) was isolating the install with virtual env, even for the non-development version (note that I'm on Linux).
Hope that I made myself clearer...

@ogrisel
Copy link
Member
ogrisel commented Oct 2, 2019

Indeed it's true that on Linux, it's highly recommended never to install anything with pip on the system Python and for beginner Python/scikit-learn users under Linux it can be confusing to get started, as just typing "pip install scikit-learn" will first lead to "pip command not found error" then if the user chooses to use the system "pip3".

However if the system version of pip3 is recent enough (>=9.1? which is the version on Ubuntu 18.04) pip will automatically install the home folder "$HOME/.local/lib/python3.6/site-packages". So while not perfect at least it's not incentivizing the user to use sudo to install anything in the system site-packages anymore.

On Windows and macOS (~88% or people browsing the documentation), creating a venv probably not as necessary as on Linux because there is no conflicting package manager for Python packages: Windows has no package so it's natural to use pip once Python is installed. Both the python.org installer and the anaconda installer will install Python in a folder that does not require administrative permissions by default. On macOS, homebrew only installs python and pip and then the user uses pip to install python packages).

I think vast majority of scikit-learn developers would rather use either a conda environment or a virtualenv (created python -m venv nowadays) so we should at least link to instructions on how to use environment isolation either with virtual environments or conda environments.

@ogrisel
Copy link
Member
ogrisel commented Oct 2, 2019

+1 for setting something similar to https://spacy.io/usage .

@NicolasHug
Copy link
Member
NicolasHug commented Oct 2, 2019

I'd be happy with an interactive selector like in spacy, as long as we advertize conda, venv and virtualenvwrapper (+ any other standard that I don't know about?), and link to something that explain why venvs are needed

@cmarmo
Copy link
Contributor Author
cmarmo commented Oct 2, 2019

I'd be happy with an interactive selector like in spacy, as long as we advertize conda, venv and virtualenvwrapper (+ any other standard that I don't know about?), and link to something that explain why venvs are needed

Ok, I'm in. May I just ask to merge this one before? The reorganization of this page was meant to solve also the sphinx warning about other_distributions not inserted in any toctree (see #15002 (comment)).

Copy link
Member
@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a few comments but mostly LGTM

@cmarmo
Copy link
Contributor Author
cmarmo commented Oct 3, 2019

@ogrisel , @NicolasHug command lines about python3 venv have been removed, this reduces changes with respect to the on-line version, waiting for a complete "restyling" of the paragraph.

Copy link
Member
@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @cmarmo !

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A final suggestion and then LGTM!

@ogrisel ogrisel merged commit 3046990 into scikit-learn:master Oct 4, 2019
@ogrisel
Copy link
Member
ogrisel commented Oct 4, 2019

Merged! Thanks again @cmarmo!

@cmarmo cmarmo deleted the install_instructions branch October 7, 2019 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0