RFC On the relative harm of cosmetic changes #11336

jnothman · 2018-06-21T12:24:07Z

I can't remember where, but there was a recent PR offering cosmetic (e.g. PEP8) fixes, and we usually reject these PRs as causing merge conflicts with existing open PRs. @amueller wanted to know exactly how many PRs were affected by each change.

If we knew it would affect no or few open/active PRs, we might be keen to fix flake8 issues, or to modernise tests (pytest style).

Thus, using https://gist.github.com/jnothman/41a5e05c82c4508afa7bee3b493752dd, I have determined which lines are modified (or adjacent to modified lines) by which open PRs.

https://www.dropbox.com/s/jsf07i40npq5o5g/lines-modified-by-open-prs.zip?dl=0 shows the results with a file of results for PRs that can currently be merged into master, and a file of results for PRs that cannot currently be merged into master. Each file has three columns: PR#, file name, line#.

Looking at just the file names, the following files in master are never modified by open PRs:

diff <(git ls-tree -r master --name-only | sort) <(cat lines-modified-merge* | cut -f2 | sort -u) | grep '<'

.coveragerc
.landscape.yml
AUTHORS.rst
MANIFEST.in
benchmarks/bench_20newsgroups.py
benchmarks/bench_glm.py
benchmarks/bench_glmnet.py
benchmarks/bench_isotonic.py
benchmarks/bench_multilabel_metrics.py
benchmarks/bench_plot_neighbors.py
benchmarks/bench_plot_parallel_pairwise.py
benchmarks/bench_random_projections.py
benchmarks/bench_tree.py
build_tools/appveyor/install.ps1
build_tools/appveyor/run_with_env.cmd
doc/images/cds-logo.png
doc/images/dysco.png
doc/images/inria-logo.jpg
doc/images/iris.pdf
doc/images/iris.svg
doc/images/last_digit.png
doc/images/lda_model_graph.png
doc/images/ml_map.png
doc/images/multilayerperceptron_network.png
doc/images/no_image.png
doc/images/nyu_short_color.png
doc/images/plot_digits_classification.png
doc/images/plot_face_recognition_1.png
doc/images/plot_face_recognition_2.png
doc/images/rbm_graph.png
doc/images/scikit-learn-logo-notext.png
doc/images/sloan_banner.png
doc/includes/big_toc_css.rst
doc/includes/bigger_toc_css.rst
doc/logos/favicon.ico
doc/logos/identity.pdf
doc/logos/scikit-learn-logo-notext.png
doc/logos/scikit-learn-logo-small.png
doc/logos/scikit-learn-logo-thumb.png
doc/logos/scikit-learn-logo.bmp
doc/logos/scikit-learn-logo.png
doc/logos/scikit-learn-logo.svg
doc/model_selection.rst
doc/modules/glm_data/lasso_enet_coordinate_descent.png
doc/modules/isotonic.rst
doc/preface.rst
doc/sphinxext/MANIFEST.in
doc/sphinxext/github_link.py
doc/templates/class.rst
doc/templates/class_with_call.rst
doc/templates/class_without_init.rst
doc/templates/deprecated_class.rst
doc/templates/deprecated_class_with_call.rst
doc/templates/deprecated_class_without_init.rst
doc/templates/deprecated_function.rst
doc/templates/function.rst
doc/templates/generate_deprecated.sh
doc/testimonials/README.txt
doc/testimonials/images/Makefile
doc/testimonials/images/aweber.png
doc/testimonials/images/bestofmedia-logo.png
doc/testimonials/images/betaworks.png
doc/testimonials/images/birchbox.jpg
doc/testimonials/images/booking.png
doc/testimonials/images/change-logo.png
doc/testimonials/images/dataiku_logo.png
doc/testimonials/images/datapublica.png
doc/testimonials/images/datarobot.png
doc/testimonials/images/evernote.png
doc/testimonials/images/howaboutwe.png
doc/testimonials/images/huggingface.png
doc/testimonials/images/infonea.jpg
doc/testimonials/images/inria.png
doc/testimonials/images/lovely.png
doc/testimonials/images/machinalis.png
doc/testimonials/images/okcupid.png
doc/testimonials/images/ottogroup_logo.png
doc/testimonials/images/peerindex.png
doc/testimonials/images/phimeca.png
doc/testimonials/images/rangespan.png
doc/testimonials/images/solido_logo.png
doc/testimonials/images/spotify.png
doc/testimonials/images/telecomparistech.jpg
doc/testimonials/images/yhat.png
doc/testimonials/images/zopa.png
doc/themes/scikit-learn/static/css/bootstrap-responsive.css
doc/themes/scikit-learn/static/css/bootstrap-responsive.min.css
doc/themes/scikit-learn/static/css/bootstrap.css
doc/themes/scikit-learn/static/css/bootstrap.min.css
doc/themes/scikit-learn/static/css/examples.css
doc/themes/scikit-learn/static/img/FNRS-logo.png
doc/themes/scikit-learn/static/img/columbia.png
doc/themes/scikit-learn/static/img/forkme.png
doc/themes/scikit-learn/static/img/glyphicons-halflings-white.png
doc/themes/scikit-learn/static/img/glyphicons-halflings.png
doc/themes/scikit-learn/static/img/google.png
doc/themes/scikit-learn/static/img/inria-small.jpg
doc/themes/scikit-learn/static/img/inria-small.png
doc/themes/scikit-learn/static/img/nyu_short_color.png
doc/themes/scikit-learn/static/img/plot_classifier_comparison_1.png
doc/themes/scikit-learn/static/img/plot_manifold_sphere_1.png
doc/themes/scikit-learn/static/img/scikit-learn-logo-notext.png
doc/themes/scikit-learn/static/img/scikit-learn-logo-small.png
doc/themes/scikit-learn/static/img/scikit-learn-logo.png
doc/themes/scikit-learn/static/img/scikit-learn-logo.svg
doc/themes/scikit-learn/static/img/sloan_logo.jpg
doc/themes/scikit-learn/static/img/sydney-primary.jpeg
doc/themes/scikit-learn/static/img/sydney-stacked.jpeg
doc/themes/scikit-learn/static/img/telecom.png
doc/themes/scikit-learn/static/jquery.maphilight.js
doc/themes/scikit-learn/static/jquery.maphilight.min.js
doc/themes/scikit-learn/static/js/bootstrap.js
doc/themes/scikit-learn/static/js/bootstrap.min.js
doc/themes/scikit-learn/static/js/copybutton.js
doc/themes/scikit-learn/theme.conf
doc/tune_toc.rst
doc/tutorial/common_includes/info.txt
doc/tutorial/statistical_inference/finding_help.rst
doc/tutorial/statistical_inference/index.rst
doc/tutorial/text_analytics/.gitignore
doc/tutorial/text_analytics/data/movie_reviews/fetch_data.py
doc/tutorial/text_analytics/data/twenty_newsgroups/fetch_data.py
doc/tutorial/text_analytics/solutions/exercise_02_sentiment.py
doc/tutorial/text_analytics/solutions/generate_skeletons.py
doc/unsupervised_learning.rst
examples/applications/README.txt
examples/applications/svm_gui.py
examples/bicluster/README.txt
examples/bicluster/plot_spectral_biclustering.py
examples/bicluster/plot_spectral_coclustering.py
examples/classification/README.txt
examples/classification/plot_classification_probability.py
examples/classification/plot_lda.py
examples/cluster/README.txt
examples/covariance/README.txt
examples/covariance/plot_lw_vs_oas.py
examples/cross_decomposition/README.txt
examples/datasets/README.txt
examples/datasets/plot_random_multilabel_dataset.py
examples/decomposition/README.txt
examples/decomposition/plot_beta_divergence.py
examples/decomposition/plot_incremental_pca.py
examples/decomposition/plot_pca_vs_lda.py
examples/ensemble/README.txt
examples/ensemble/plot_forest_importances.py
examples/ensemble/plot_forest_importances_faces.py
examples/ensemble/plot_gradient_boosting_oob.py
examples/ensemble/plot_gradient_boosting_quantile.py
examples/exercises/README.txt
examples/exercises/plot_cv_digits.py
examples/feature_selection/README.txt
examples/feature_selection/plot_rfe_digits.py
examples/feature_selection/plot_rfe_with_cross_validation.py
examples/feature_selection/plot_select_from_model_boston.py
examples/gaussian_process/README.txt
examples/gaussian_process/plot_compare_gpr_krr.py
examples/gaussian_process/plot_gpr_co2.py
examples/linear_model/README.txt
examples/linear_model/plot_huber_vs_ridge.py
examples/linear_model/plot_iris_logistic.py
examples/linear_model/plot_logistic.py
examples/linear_model/plot_ols_ridge_variance.py
examples/linear_model/plot_omp.py
examples/linear_model/plot_polynomial_interpolation.py
examples/linear_model/plot_ridge_coeffs.py
examples/linear_model/plot_robust_fit.py
examples/manifold/README.txt
examples/manifold/plot_mds.py
examples/mixture/README.txt
examples/mixture/plot_gmm.py
examples/mixture/plot_gmm_covariances.py
examples/mixture/plot_gmm_pdf.py
examples/mixture/plot_gmm_selection.py
examples/mixture/plot_gmm_sin.py
examples/model_selection/README.txt
examples/neighbors/README.txt
examples/neural_networks/README.txt
examples/neural_networks/plot_mnist_filters.py
examples/preprocessing/README.txt
examples/preprocessing/plot_function_transformer.py
examples/semi_supervised/README.txt
examples/svm/README.txt
examples/text/README.txt
examples/tree/README.txt
site.cfg
sklearn/__check_build/_check_build.pyx
sklearn/__check_build/setup.py
sklearn/cluster/tests/__init__.py
sklearn/cluster/tests/common.py
sklearn/compose/tests/__init__.py
sklearn/covariance/tests/__init__.py
sklearn/cross_decomposition/tests/__init__.py
sklearn/datasets/data/diabetes_data.csv.gz
sklearn/datasets/data/diabetes_target.csv.gz
sklearn/datasets/data/digits.csv.gz
sklearn/datasets/data/linnerud_exercise.csv
sklearn/datasets/data/linnerud_physiological.csv
sklearn/datasets/images/README.txt
sklearn/datasets/images/china.jpg
sklearn/datasets/images/flower.jpg
sklearn/datasets/tests/__init__.py
sklearn/datasets/tests/data/svmlight_classification.txt
sklearn/datasets/tests/data/svmlight_invalid.txt
sklearn/datasets/tests/data/svmlight_invalid_order.txt
sklearn/datasets/tests/data/svmlight_multilabel.txt
sklearn/decomposition/_online_lda.pyx
sklearn/decomposition/tests/__init__.py
sklearn/ensemble/setup.py
sklearn/ensemble/tests/__init__.py
sklearn/externals/__init__.py
sklearn/externals/joblib/_multiprocessing_helpers.py
sklearn/externals/six.py
sklearn/feature_extraction/__init__.py
sklearn/feature_extraction/tests/__init__.py
sklearn/feature_selection/tests/__init__.py
sklearn/feature_selection/tests/test_variance_threshold.py
sklearn/feature_selection/variance_threshold.py
sklearn/gaussian_process/tests/__init__.py
sklearn/linear_model/tests/__init__.py
sklearn/manifold/tests/__init__.py
sklearn/metrics/cluster/tests/__init__.py
sklearn/metrics/pairwise_fast.pyx
sklearn/metrics/tests/__init__.py
sklearn/mixture/tests/__init__.py
sklearn/model_selection/tests/__init__.py
sklearn/model_selection/tests/common.py
sklearn/neighbors/tests/__init__.py
sklearn/neural_network/_stochastic_optimizers.py
sklearn/neural_network/tests/__init__.py
sklearn/neural_network/tests/test_stochastic_optimizers.py
sklearn/preprocessing/tests/__init__.py
sklearn/semi_supervised/tests/__init__.py
sklearn/src/cblas/ATL_drefasum.c
sklearn/src/cblas/ATL_drefcopy.c
sklearn/src/cblas/ATL_drefgemv.c
sklearn/src/cblas/ATL_drefgemvN.c
sklearn/src/cblas/ATL_drefgemvT.c
sklearn/src/cblas/ATL_drefger.c
sklearn/src/cblas/ATL_drefrot.c
sklearn/src/cblas/ATL_drefrotg.c
sklearn/src/cblas/ATL_dsrefdot.c
sklearn/src/cblas/ATL_srefasum.c
sklearn/src/cblas/ATL_srefcopy.c
sklearn/src/cblas/ATL_srefgemv.c
sklearn/src/cblas/ATL_srefgemvN.c
sklearn/src/cblas/ATL_srefgemvT.c
sklearn/src/cblas/ATL_srefger.c
sklearn/src/cblas/ATL_srefnrm2.c
sklearn/src/cblas/ATL_srefrot.c
sklearn/src/cblas/ATL_srefrotg.c
sklearn/src/cblas/README.txt
sklearn/src/cblas/atlas_aux.h
sklearn/src/cblas/atlas_dsysinfo.h
sklearn/src/cblas/atlas_enum.h
sklearn/src/cblas/atlas_level1.h
sklearn/src/cblas/atlas_level2.h
sklearn/src/cblas/atlas_ptalias1.h
sklearn/src/cblas/atlas_ptalias2.h
sklearn/src/cblas/atlas_refalias1.h
sklearn/src/cblas/atlas_refalias2.h
sklearn/src/cblas/atlas_reflevel2.h
sklearn/src/cblas/atlas_reflvl2.h
sklearn/src/cblas/atlas_refmisc.h
sklearn/src/cblas/atlas_ssysinfo.h
sklearn/src/cblas/atlas_type.h
sklearn/src/cblas/cblas_dasum.c
sklearn/src/cblas/cblas_daxpy.c
sklearn/src/cblas/cblas_ddot.c
sklearn/src/cblas/cblas_dgemv.c
sklearn/src/cblas/cblas_dger.c
sklearn/src/cblas/cblas_dnrm2.c
sklearn/src/cblas/cblas_drot.c
sklearn/src/cblas/cblas_drotg.c
sklearn/src/cblas/cblas_dscal.c
sklearn/src/cblas/cblas_errprn.c
sklearn/src/cblas/cblas_sasum.c
sklearn/src/cblas/cblas_saxpy.c
sklearn/src/cblas/cblas_sdot.c
sklearn/src/cblas/cblas_sgemv.c
sklearn/src/cblas/cblas_sger.c
sklearn/src/cblas/cblas_snrm2.c
sklearn/src/cblas/cblas_srot.c
sklearn/src/cblas/cblas_srotg.c
sklearn/src/cblas/cblas_sscal.c
sklearn/svm/src/liblinear/COPYRIGHT
sklearn/svm/src/liblinear/tron.cpp
sklearn/svm/src/liblinear/tron.h
sklearn/svm/src/libsvm/LIBSVM_CHANGES
sklearn/svm/src/libsvm/libsvm_template.cpp
sklearn/svm/tests/__init__.py
sklearn/tests/__init__.py
sklearn/tests/test_check_build.py
sklearn/tree/tests/__init__.py
sklearn/utils/_logistic_sigmoid.pyx
sklearn/utils/_scipy_sparse_lsqr_backport.py
sklearn/utils/arrayfuncs.pyx
sklearn/utils/fast_dict.pyx
sklearn/utils/lgamma.pxd
sklearn/utils/lgamma.pyx
sklearn/utils/murmurhash.pxd
sklearn/utils/murmurhash.pyx
sklearn/utils/optimize.py
sklearn/utils/sparsetools/tests/__init__.py
sklearn/utils/src/MurmurHash3.cpp
sklearn/utils/src/MurmurHash3.h
sklearn/utils/src/cholesky_delete.h
sklearn/utils/src/gamma.c
sklearn/utils/src/gamma.h
sklearn/utils/tests/__init__.py
sklearn/utils/tests/test_bench.py
sklearn/utils/tests/test_fast_dict.py
sklearn/utils/tests/test_linear_assignment.py
sklearn/utils/tests/test_murmurhash.py
sklearn/utils/tests/test_optimize.py
sklearn/utils/tests/test_shortest_path.py

with the following flake8 errors:

benchmarks/bench_plot_neighbors.py:39:5: E265 block comment should start with '# '
benchmarks/bench_plot_neighbors.py:62:5: E265 block comment should start with '# '
benchmarks/bench_plot_neighbors.py:85:5: E265 block comment should start with '# '
benchmarks/bench_plot_parallel_pairwise.py:11:1: E302 expected 2 blank lines, found 1
benchmarks/bench_random_projections.py:77:9: E128 continuation line under-indented for visual indent
benchmarks/bench_random_projections.py:85:28: E128 continuation line under-indented for visual indent
benchmarks/bench_random_projections.py:86:28: E128 continuation line under-indented for visual indent
benchmarks/bench_random_projections.py:96:80: E501 line too long (80 > 79 characters)
benchmarks/bench_random_projections.py:218:15: E128 continuation line under-indented for visual indent
examples/applications/svm_gui.py:24:1: E402 module level import not at top of file
examples/applications/svm_gui.py:27:1: E402 module level import not at top of file
examples/applications/svm_gui.py:28:1: E402 module level import not at top of file
examples/applications/svm_gui.py:29:1: E402 module level import not at top of file
examples/applications/svm_gui.py:30:1: E402 module level import not at top of file
examples/applications/svm_gui.py:38:1: E402 module level import not at top of file
examples/applications/svm_gui.py:39:1: E402 module level import not at top of file
examples/applications/svm_gui.py:41:1: E402 module level import not at top of file
examples/applications/svm_gui.py:42:1: E402 module level import not at top of file
examples/applications/svm_gui.py:43:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_biclustering.py:23:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_biclustering.py:24:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_biclustering.py:26:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_biclustering.py:27:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_biclustering.py:28:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_biclustering.py:29:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_coclustering.py:22:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_coclustering.py:23:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_coclustering.py:25:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_coclustering.py:26:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_coclustering.py:27:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_coclustering.py:28:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:19:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:20:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:22:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:23:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:24:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:25:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:26:1: E402 module level import not at top of file
examples/classification/plot_lda.py:48:80: E501 line too long (84 > 79 characters)
examples/classification/plot_lda.py:49:80: E501 line too long (82 > 79 characters)
examples/covariance/plot_lw_vs_oas.py:25:1: E402 module level import not at top of file
examples/covariance/plot_lw_vs_oas.py:26:1: E402 module level import not at top of file
examples/covariance/plot_lw_vs_oas.py:27:1: E402 module level import not at top of file
examples/covariance/plot_lw_vs_oas.py:29:1: E402 module level import not at top of file
examples/decomposition/plot_incremental_pca.py:26:1: E402 module level import not at top of file
examples/decomposition/plot_incremental_pca.py:27:1: E402 module level import not at top of file
examples/decomposition/plot_incremental_pca.py:29:1: E402 module level import not at top of file
examples/decomposition/plot_incremental_pca.py:30:1: E402 module level import not at top of file
examples/decomposition/plot_pca_vs_lda.py:21:1: E402 module level import not at top of file
examples/decomposition/plot_pca_vs_lda.py:23:1: E402 module level import not at top of file
examples/decomposition/plot_pca_vs_lda.py:24:1: E402 module level import not at top of file
examples/decomposition/plot_pca_vs_lda.py:25:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances.py:15:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances.py:16:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances.py:18:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances.py:19:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances.py:51:8: E128 continuation line under-indented for visual indent
examples/ensemble/plot_forest_importances_faces.py:15:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances_faces.py:16:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances_faces.py:18:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances_faces.py:19:1: E402 module level import not at top of file
examples/ensemble/plot_gradient_boosting_oob.py:32:1: E402 module level import not at top of file
examples/ensemble/plot_gradient_boosting_oob.py:33:1: E402 module level import not at top of file
examples/ensemble/plot_gradient_boosting_oob.py:35:1: E402 module level import not at top of file
examples/ensemble/plot_gradient_boosting_oob.py:36:1: E402 module level import not at top of file
examples/ensemble/plot_gradient_boosting_oob.py:37:1: E402 module level import not at top of fi
10000
le
examples/ensemble/plot_gradient_boosting_quantile.py:22:1: E265 block comment should start with '# '
examples/exercises/plot_cv_digits.py:14:1: E402 module level import not at top of file
examples/exercises/plot_cv_digits.py:15:1: E402 module level import not at top of file
examples/exercises/plot_cv_digits.py:16:1: E402 module level import not at top of file
examples/exercises/plot_cv_digits.py:34:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_digits.py:11:80: E501 line too long (94 > 79 characters)
examples/feature_selection/plot_rfe_digits.py:16:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_digits.py:17:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_digits.py:18:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_digits.py:19:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_with_cross_validation.py:11:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_with_cross_validation.py:12:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_with_cross_validation.py:13:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_with_cross_validation.py:14:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_with_cross_validation.py:15:1: E402 module level import not at top of file
examples/feature_selection/plot_select_from_model_boston.py:14:1: E402 module level import not at top of file
examples/feature_selection/plot_select_from_model_boston.py:15:1: E402 module level import not at top of file
examples/feature_selection/plot_select_from_model_boston.py:17:1: E402 module level import not at top of file
examples/feature_selection/plot_select_from_model_boston.py:18:1: E402 module level import not at top of file
examples/feature_selection/plot_select_from_model_boston.py:19:1: E402 module level import not at top of file
examples/feature_selection/plot_select_from_model_boston.py:25:80: E501 line too long (84 > 79 characters)
examples/feature_selection/plot_select_from_model_boston.py:46:29: W291 trailing whitespace
examples/gaussian_process/plot_compare_gpr_krr.py:53:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:55:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:57:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:59:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:60:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:61:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:62:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:113:80: E501 line too long (80 > 79 characters)
examples/gaussian_process/plot_gpr_co2.py:66:1: E402 module level import not at top of file
examples/gaussian_process/plot_gpr_co2.py:68:1: E402 module level import not at top of file
examples/gaussian_process/plot_gpr_co2.py:70:1: E402 module level import not at top of file
examples/gaussian_process/plot_gpr_co2.py:71:1: E402 module level import not at top of file
examples/gaussian_process/plot_gpr_co2.py:73:1: E402 module level import not at top of file
examples/linear_model/plot_huber_vs_ridge.py:20:1: E402 module level import not at top of file
examples/linear_model/plot_huber_vs_ridge.py:21:1: E402 module level import not at top of file
examples/linear_model/plot_huber_vs_ridge.py:23:1: E402 module level import not at top of file
examples/linear_model/plot_huber_vs_ridge.py:24:1: E402 module level import not at top of file
examples/linear_model/plot_iris_logistic.py:21:1: E402 module level import not at top of file
examples/linear_model/plot_iris_logistic.py:22:1: E402 module level import not at top of file
examples/linear_model/plot_iris_logistic.py:23:1: E402 module level import not at top of file
examples/linear_model/plot_logistic.py:21:1: E402 module level import not at top of file
examples/linear_model/plot_logistic.py:22:1: E402 module level import not at top of file
examples/linear_model/plot_logistic.py:24:1: E402 module level import not at top of file
examples/linear_model/plot_ols_ridge_variance.py:31:1: E402 module level import not at top of file
examples/linear_model/plot_ols_ridge_variance.py:32:1: E402 module level import not at top of file
examples/linear_model/plot_ols_ridge_variance.py:34:1: E402 module level import not at top of file
examples/linear_model/plot_omp.py:11:1: E402 module level import not at top of file
examples/linear_model/plot_omp.py:12:1: E402 module level import not at top of file
examples/linear_model/plot_omp.py:13:1: E402 module level import not at top of file
examples/linear_model/plot_omp.py:14:1: E402 module level import not at top of file
examples/linear_model/plot_omp.py:15:1: E402 module level import not at top of file
examples/linear_model/plot_polynomial_interpolation.py:30:1: E402 module level import not at top of file
examples/linear_model/plot_polynomial_interpolation.py:31:1: E402 module level import not at top of file
examples/linear_model/plot_polynomial_interpolation.py:33:1: E402 module level import not at top of file
examples/linear_model/plot_polynomial_interpolation.py:34:1: E402 module level import not at top of file
examples/linear_model/plot_polynomial_interpolation.py:35:1: E402 module level import not at top of file
examples/linear_model/plot_ridge_coeffs.py:44:1: E402 module level import not at top of file
examples/linear_model/plot_ridge_coeffs.py:45:1: E402 module level import not at top of file
examples/linear_model/plot_ridge_coeffs.py:47:1: E402 module level import not at top of file
examples/linear_model/plot_ridge_coeffs.py:48:1: E402 module level import not at top of file
examples/linear_model/plot_ridge_coeffs.py:49:1: E402 module level import not at top of file
examples/linear_model/plot_robust_fit.py:69:80: E501 line too long (101 > 79 characters)
examples/linear_model/plot_robust_fit.py:70:80: E501 line too long (83 > 79 characters)
examples/manifold/plot_mds.py:16:1: E402 module level import not at top of file
examples/manifold/plot_mds.py:18:1: E402 module level import not at top of file
examples/manifold/plot_mds.py:19:1: E402 module level import not at top of file
examples/manifold/plot_mds.py:21:1: E402 module level import not at top of file
examples/manifold/plot_mds.py:22:1: E402 module level import not at top of file
examples/manifold/plot_mds.py:23:1: E402 module level import not at top of file
examples/manifold/plot_mds.py:68:80: E501 line too long (80 > 79 characters)
examples/neural_networks/plot_mnist_filters.py:25:1: E402 module level import not at top of file
examples/neural_networks/plot_mnist_filters.py:26:1: E402 module level import not at top of file
examples/neural_networks/plot_mnist_filters.py:27:1: E402 module level import not at top of file
sklearn/cluster/tests/common.py:22:22: E124 closing bracket does not match visual indentation
sklearn/externals/six.py:12:80: E501 line too long (80 > 79 characters)
sklearn/externals/six.py:44:20: F821 undefined name 'basestring'
sklearn/externals/six.py:45:27: F821 undefined name 'long'
sklearn/externals/six.py:47:17: F821 undefined name 'unicode'
sklearn/externals/six.py:134:1: E303 too many blank lines (3)
sklearn/externals/six.py:141:80: E501 line too long (91 > 79 characters)
sklearn/externals/six.py:151:80: E501 line too long (91 > 79 characters)
sklearn/externals/six.py:161:80: E501 line too long (87 > 79 characters)
sklearn/externals/six.py:174:80: E501 line too long (80 > 79 characters)
sklearn/externals/six.py:175:80: E501 line too long (80 > 79 characters)
sklearn/externals/six.py:188:80: E501 line too long (82 > 79 characters)
sklearn/externals/six.py:189:80: E501 line too long (82 > 79 characters)
sklearn/externals/six.py:190:80: E501 line too long (82 > 79 characters)
sklearn/externals/six.py:202:1: E303 too many blank lines (3)
sklearn/externals/six.py:226:80: E501 line too long (111 > 79 characters)
sklearn/externals/six.py:227:80: E501 line too long (111 > 79 characters)
sklearn/externals/six.py:243:80: E501 line too long (111 > 79 characters)
sklearn/externals/six.py:244:80: E501 line too long (111 > 79 characters)
sklearn/externals/six.py:266:80: E501 line too long (83 > 79 characters)
sklearn/externals/six.py:289:80: E501 line too long (117 > 79 characters)
sklearn/externals/six.py:290:80: E501 line too long (117 > 79 characters)
sklearn/externals/six.py:307:80: E501 line too long (120 > 79 characters)
sklearn/externals/six.py:308:80: E501 line too long (120 > 79 characters)
sklearn/externals/six.py:322:80: E501 line too long (129 > 79 characters)
sklearn/externals/six.py:323:80: E501 line too long (129 > 79 characters)
sklearn/externals/six.py:327:80: E501 line too long (83 > 79 characters)
sklearn/externals/six.py:335:80: E501 line too long (93 > 79 characters)
sklearn/externals/six.py:433:1: E302 expected 2 blank lines, found 1
sklearn/externals/six.py:437:1: E302 expected 2 blank lines, found 1
sklearn/externals/six.py:441:1: E302 expected 2 blank lines, found 1
sklearn/externals/six.py:449:5: E301 expected 1 blank line, found 0
sklearn/externals/six.py:467:5: E301 expected 1 blank line, found 0
sklearn/externals/six.py:468:16: F821 undefined name 'unicode'
sklearn/externals/six.py:471:5: E301 expected 1 blank line, found 0
sklearn/externals/six.py:473:5: E301 expected 1 blank line, found 0
sklearn/externals/six.py:475:5: E301 expected 1 blank line, found 0
sklearn/externals/six.py:488:5: E303 too many blank lines (2)
sklearn/externals/six.py:494:5: E303 too many blank lines (2)
sklearn/externals/six.py:511:5: E303 too many blank lines (2)
sklearn/externals/six.py:516:5: E303 too many blank lines (2)
sklearn/externals/six.py:521:9: E301 expected 1 blank line, found 0
sklearn/externals/six.py:522:37: F821 undefined name 'basestring'
sklearn/externals/six.py:528:32: F821 undefined name 'unicode'
sklearn/externals/six.py:534:32: F821 undefined name 'unicode'
sklearn/externals/six.py:542:36: F821 undefined name 'unicode'
sklearn/externals/six.py:546:23: F821 undefined name 'unicode'
sklearn/externals/six.py:547:21: F821 undefined name 'unicode'
sklearn/externals/six.py:568:1: E302 expected 2 blank lines, found 1
sklearn/utils/_scipy_sparse_lsqr_backport.py:56:1: E402 module level import not at top of file
sklearn/utils/_scipy_sparse_lsqr_backport.py:57:1: E402 module level import not at top of file
sklearn/utils/_scipy_sparse_lsqr_backport.py:58:1: E402 module level import not at top of file
sklearn/utils/_scipy_sparse_lsqr_backport.py:262:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:263:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:264:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:265:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:266:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:267:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:268:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:284:5: F841 local variable 'nstop' is assigned to but never used
sklearn/utils/_scipy_sparse_lsqr_backport.py:303:5: F841 local variable '__xm' is assigned to but never used
sklearn/utils/_scipy_sparse_lsqr_backport.py:304:5: F841 local variable '__xn' is assigned to but never used
sklearn/utils/tests/test_shortest_path.py:12:5: E265 block comment should start with '# '
sklearn/utils/tests/test_shortest_path.py:15:5: E265 block comment should start with '# '
sklearn/utils/tests/test_shortest_path.py:32:5: E265 block comment should start with '# '
sklearn/utils/tests/test_shortest_path.py:36:5: E265 block comment should start with '# '
sklearn/utils/tests/test_shortest_path.py:39:5: E265 block comment should start with '# '
sklearn/utils/tests/test_shortest_path.py:43:5: E265 block comment should start with '# '

The following files are modified by a single open PR:

cat lines-modified-merge* | cut -f1-2 | sort -u | cut -f2 | sort | uniq -c | sort -n | awk '$1 == 2 {print $2}'

.codecov.yml
COPYING
ISSUE_TEMPLATE.md
PULL_REQUEST_TEMPLATE.md
benchmarks/.gitignore
benchmarks/bench_isolation_forest.py
benchmarks/bench_lof.py
benchmarks/bench_plot_fastkmeans.py
benchmarks/bench_plot_incremental_pca.py
benchmarks/bench_plot_omp_lars.py
benchmarks/bench_plot_svd.py
benchmarks/bench_plot_ward.py
benchmarks/bench_rcv1_logreg_convergence.py
benchmarks/bench_sample_without_replacement.py
benchmarks/bench_sparsify.py
benchmarks/bench_text_vectorizers.py
benchmarks/bench_tsne_mnist.py
benchmarks/plot_tsne_mnist.py
build_tools/appveyor/requirements.txt
build_tools/circle/checkout_merge_commit.sh
build_tools/travis/after_success.sh
build_tools/windows/windows_testing_downloader.ps1
doc/datasets/labeled_faces.rst
doc/datasets/olivetti_faces.rst
doc/datasets/rcv1.rst
doc/datasets/twenty_newsgroups.rst
doc/developers/index.rst
doc/make.bat
doc/modules/cross_decomposition.rst
doc/modules/density.rst
doc/modules/kernel_ridge.rst
doc/modules/label_propagation.rst
doc/modules/lda_qda.rst
doc/modules/random_projection.rst
doc/presentations.rst
doc/templates/numpydoc_docstring.rst
doc/themes/scikit-learn/static/ML_MAPS_README.rst
doc/themes/scikit-learn/static/jquery.js
doc/themes/scikit-learn/static/js/extra.js
doc/tutorial/index.rst
doc/tutorial/machine_learning_map/ML_MAPS_README.txt
doc/tutorial/machine_learning_map/index.rst
doc/tutorial/machine_learning_map/pyparsing.py
doc/tutorial/machine_learning_map/svg2imagemap.py
doc/tutorial/statistical_inference/putting_together.rst
doc/tutorial/statistical_inference/settings.rst
doc/tutorial/text_analytics/data/languages/fetch_data.py
doc/tutorial/text_analytics/skeletons/exercise_01_language_train_model.py
doc/tutorial/text_analytics/skeletons/exercise_02_sentiment.py
doc/tutorial/text_analytics/solutions/exercise_01_language_train_model.py
doc/whats_new/v0.13.rst
doc/whats_new/v0.14.rst
doc/whats_new/v0.15.rst
doc/whats_new/v0.16.rst
doc/whats_new/v0.17.rst
doc/whats_new/v0.18.rst
doc/whats_new/v0.19.rst
examples/.flake8
examples/applications/plot_face_recognition.py
examples/applications/plot_model_complexity_influence.py
examples/applications/plot_out_of_core_classification.py
examples/applications/plot_outlier_detection_housing.py
examples/applications/plot_prediction_latency.py
examples/applications/plot_species_distribution_modeling.py
examples/calibration/README.txt
examples/calibration/plot_calibration_multiclass.py
examples/classification/plot_digits_classification.py
examples/cluster/plot_agglomerative_clustering.py
examples/cluster/plot_agglomerative_clustering_metrics.py
examples/cluster/plot_birch_vs_minibatchkmeans.py
examples/cluster/plot_cluster_iris.py
examples/cluster/plot_coin_segmentation.py
examples/cluster/plot_coin_ward_segmentation.py
examples/cluster/plot_color_quantization.py
examples/cluster/plot_dict_face_patches.py
examples/cluster/plot_digits_agglomeration.py
examples/cluster/plot_digits_linkage.py
examples/cluster/plot_feature_agglomeration_vs_univariate_selection.py
examples/cluster/plot_kmeans_assumptions.py
examples/cluster/plot_kmeans_digits.py
examples/cluster/plot_kmeans_silhouette_analysis.py
examples/cluster/plot_linkage_comparison.py
examples/cluster/plot_mini_batch_kmeans.py
examples/cluster/plot_segmentation_toy.py
examples/cluster/plot_ward_structured_vs_unstructured.py
examples/compose/README.txt
examples/compose/plot_column_transformer.py
examples/compose/plot_column_transformer_mixed_types.py
examples/compose/plot_compare_reduction.py
examples/compose/plot_digits_pipe.py
examples/compose/plot_feature_union.py
examples/compose/plot_transformed_target.py
examples/covariance/plot_covariance_estimation.py
examples/covariance/plot_mahalanobis_distances.py
examples/covariance/plot_robust_vs_empirical_covariance.py
examples/covariance/plot_sparse_cov.py
examples/cross_decomposition/plot_compare_cross_decomposition.py
examples/datasets/plot_digits_last_image.py
examples/datasets/plot_iris_dataset.py
examples/datasets/plot_random_dataset.py
examples/decomposition/plot_ica_blind_source_separation.py
examples/decomposition/plot_ica_vs_pca.py
examples/decomposition/plot_kernel_pca.py
examples/decomposition/plot_pca_3d.py
examples/decomposition/plot_pca_vs_fa_model_selection.py
examples/decomposition/plot_sparse_coding.py
examples/ensemble/plot_adaboost_hastie_10_2.py
examples/ensemble/plot_adaboost_multiclass.py
examples/ensemble/plot_adaboost_regression.py
examples/ensemble/plot_adaboost_twoclass.py
examples/ensemble/plot_bias_variance.py
examples/ensemble/plot_forest_iris.py
examples/ensemble/plot_gradient_boosting_regression.py
examples/ensemble/plot_gradient_boosting_regularization.py
examples/ensemble/plot_partial_dependence.py
examples/ensemble/plot_random_forest_regression_multioutput.py
examples/ensemble/plot_voting_decision_regions.py
examples/ensemble/plot_voting_probas.py
examples/exercises/plot_cv_diabetes.py
examples/exercises/plot_digits_classification_exercise.py
examples/exercises/plot_iris_exercise.py
examples/feature_selection/plot_f_test_vs_mi.py
examples/feature_selection/plot_feature_selection.py
examples/feature_selection/plot_permutation_test_for_classification.py
examples/gaussian_process/plot_gpc.py
examples/gaussian_process/plot_gpc_iris.py
examples/gaussian_process/plot_gpc_isoprobability.py
examples/gaussian_process/plot_gpc_xor.py
examples/gaussian_process/plot_gpr_noisy.py
examples/gaussian_process/plot_gpr_noisy_targets.py
examples/gaussian_process/plot_gpr_prior_posterior.py
examples/linear_model/plot_ard.py
examples/linear_model/plot_bayesian_ridge.py
examples/linear_model/plot_lasso_dense_vs_sparse_data.py
examples/linear_model/plot_lasso_lars.py
examples/linear_model/plot_lasso_model_selection.py
examples/linear_model/plot_logistic_l1_l2_sparsity.py
examples/linear_model/plot_logistic_multinomial.py
examples/linear_model/plot_logistic_path.py
examples/linear_model/plot_multi_task_lasso_support.py
examples/linear_model/plot_ols.py
examples/linear_model/plot_ols_3d.py
examples/linear_model/plot_ransac.py
examples/linear_model/plot_ridge_path.py
examples/linear_model/plot_sgd_comparison.py
examples/linear_model/plot_sgd_iris.py
examples/linear_model/plot_sgd_loss_functions.py
examples/linear_model/plot_sgd_penalties.py
examples/linear_model/plot_sgd_separating_hyperplane.py
examples/linear_model/plot_sgd_weighted_samples.py
examples/linear_model/plot_sparse_logistic_regression_20newsgroups.py
examples/linear_model/plot_sparse_logistic_regression_mnist.py
examples/linear_model/plot_theilsen.py
examples/manifold/plot_swissroll.py
examples/manifold/plot_t_sne_perplexity.py
examples/mixture/plot_concentration_prior.py
examples/model_selection/grid_search_text_feature_extraction.py
examples/model_selection/plot_cv_predict.py
examples/model_selection/plot_grid_search_digits.py
examples/model_selection/plot_learning_curve.py
examples/model_selection/plot_multi_metric_evaluation.py
examples/model_selection/plot_roc.py
examples/model_selection/plot_roc_crossval.py
examples/model_selection/plot_train_error_vs_test_error.py
examples/model_selection/plot_underfitting_overfitting.py
examples/model_selection/plot_validation_curve.py
examples/multioutput/README.txt
examples/multioutput/plot_classifier_chain_yeast.py
examples/neighbors/plot_digits_kde_sampling.py
examples/neighbors/plot_species_kde.py
examples/neural_networks/plot_mlp_alpha.py
examples/neural_networks/plot_mlp_training_curves.py
examples/plot_isotonic_regression.py
examples/plot_johnson_lindenstrauss_bound.py
examples/plot_kernel_ridge_regression.py
examples/plot_multilabel.py
examples/plot_multioutput_face_completion.py
examples/preprocessing/plot_all_scaling.py
examples/semi_supervised/plot_label_propagation_digits.py
examples/semi_supervised/plot_label_propagation_digits_active_learning.py
examples/semi_supervised/plot_label_propagation_versus_svm_iris.py
examples/svm/plot_custom_kernel.py
examples/svm/plot_separating_hyperplane.py
examples/svm/plot_svm_anova.py
examples/svm/plot_svm_kernels.py
examples/svm/plot_svm_margin.py
examples/svm/plot_svm_nonlinear.py
examples/svm/plot_svm_scale_c.py
examples/svm/plot_weighted_samples.py
examples/text/plot_hashing_vs_dict_vectorizer.py
examples/tree/plot_tree_regression.py
examples/tree/plot_tree_regression_multioutput.py
examples/tree/plot_unveil_tree_structure.py
sklearn/__check_build/__init__.py
sklearn/_config.py
sklearn/cluster/_dbscan_inner.pyx
sklearn/cluster/_hierarchical.pyx
sklearn/cluster/_k_means_elkan.pyx
sklearn/cluster/setup.py
sklearn/compose/tests/test_target.py
sklearn/covariance/__init__.py
sklearn/covariance/robust_covariance.py
sklearn/covariance/shrunk_covariance_.py
sklearn/covariance/tests/test_covariance.py
sklearn/covariance/tests/test_elliptic_envelope.py
sklearn/datasets/_svmlight_format.pyx
sklearn/datasets/data/boston_house_prices.csv
sklearn/datasets/data/breast_cancer.csv
sklearn/datasets/data/iris.csv
sklearn/datasets/data/wine_data.csv
sklearn/datasets/descr/boston_house_prices.rst
sklearn/datasets/descr/diabetes.rst
sklearn/datasets/descr/iris.rst
sklearn/datasets/descr/linnerud.rst
sklearn/datasets/descr/wine_data.rst
sklearn/datasets/setup.py
sklearn/datasets/tests/test_california_housing.py
sklearn/datasets/tests/test_common.py
sklearn/datasets/tests/test_covtype.py
sklearn/datasets/tests/test_kddcup99.py
sklearn/datasets/tests/test_lfw.py
sklearn/datasets/tests/test_mldata.py
sklearn/datasets/tests/test_rcv1.py
sklearn/decomposition/cdnmf_fast.pyx
sklearn/decomposition/tests/test_factor_analysis.py
sklearn/externals/README
sklearn/externals/conftest.py
sklearn/externals/joblib/disk.py
sklearn/feature_extraction/setup.py
sklearn/feature_selection/mutual_info_.py
sklearn/feature_selection/tests/test_base.py
sklearn/feature_selection/tests/test_chi2.py
sklearn/gaussian_process/__init__.py
sklearn/gaussian_process/correlation_models.py
sklearn/gaussian_process/regression_models.py
sklearn/linear_model/sgd_fast.pxd
sklearn/linear_model/sgd_fast_helpers.h
sklearn/linear_model/tests/test_omp.py
sklearn/manifold/setup.py
sklearn/manifold/tests/test_locally_linear.py
sklearn/metrics/cluster/expected_mutual_info_fast.pyx
sklearn/metrics/cluster/setup.py
sklearn/mixture/__init__.py
sklearn/neighbors/nearest_centroid.py
sklearn/neighbors/quad_tree.pxd
sklearn/neighbors/quad_tree.pyx
sklearn/neighbors/tests/test_quad_tree.py
sklearn/neighbors/typedefs.pyx
sklearn/preprocessing/_encoders.py
sklearn/preprocessing/tests/test_encoders.py
sklearn/semi_supervised/__init__.py
sklearn/src/cblas/atlas_misc.h
sklearn/src/cblas/atlas_reflevel1.h
sklearn/src/cblas/cblas.h
sklearn/src/cblas/cblas_dcopy.c
sklearn/src/cblas/cblas_scopy.c
sklearn/src/cblas/cblas_xerbla.c
sklearn/svm/bounds.py
sklearn/svm/liblinear.pxd
sklearn/svm/liblinear.pyx
sklearn/svm/libsvm.pxd
sklearn/svm/setup.py
sklearn/svm/src/liblinear/linear.h
sklearn/svm/src/libsvm/libsvm_sparse_helper.c
sklearn/tests/test_init.py
sklearn/tests/test_random_projection.py
sklearn/tree/setup.py
sklearn/utils/bench.py
sklearn/utils/fast_dict.pxd
sklearn/utils/graph_shortest_path.pyx
sklearn/utils/sparsetools/__init__.py
sklearn/utils/tests/test_deprecation.py
sklearn/utils/tests/test_metaestimators.py
sklearn/utils/tests/test_stats.py
sklearn/utils/weight_vector.pxd

The following are modified by 2 open PRs:

.circleci/config.yml
.mailmap
CONTRIBUTING.md
benchmarks/bench_covertype.py
benchmarks/bench_mnist.py
benchmarks/bench_plot_lasso_path.py
benchmarks/bench_plot_randomized_svd.py
benchmarks/bench_saga.py
benchmarks/bench_sgd_regression.py
build_tools/circle/list_versions.py
build_tools/travis/flake8_diff.sh
conftest.py
doc/README.md
doc/conftest.py
doc/data_transforms.rst
doc/datasets/covtype.rst
doc/datasets/kddcup99.rst
doc/developers/maintainer.rst
doc/modules/computational_performance.rst
doc/modules/covariance.rst
doc/modules/learning_curve.rst
doc/modules/unsupervised_reduction.rst
doc/sphinxext/sphinx_issues.py
doc/supervised_learning.rst
doc/tutorial/machine_learning_map/parse_path.py
doc/user_guide.rst
doc/whats_new/older_versions.rst
examples/README.txt
examples/applications/plot_tomography_l1_reconstruction.py
examples/applications/wikipedia_principal_eigenvector.py
examples/classification/plot_classifier_comparison.py
examples/classification/plot_lda_qda.py
examples/cluster/plot_adjusted_for_chance_measures.py
examples/cluster/plot_dbscan.py
examples/cluster/plot_face_compress.py
examples/cluster/plot_kmeans_stability_low_dim_dense.py
examples/decomposition/plot_pca_iris.py
examples/ensemble/plot_ensemble_oob.py
examples/ensemble/plot_feature_transformation.py
examples/ensemble/plot_gradient_boosting_early_stopping.py
examples/ensemble/plot_isolation_forest.py
examples/ensemble/plot_random_forest_embedding.py
examples/feature_selection/plot_feature_selection_pipeline.py
examples/linear_model/plot_lasso_and_elasticnet.py
examples/manifold/plot_compare_methods.py
examples/manifold/plot_lle_digits.py
examples/manifold/plot_manifold_sphere.py
examples/model_selection/plot_confusion_matrix.py
examples/model_selection/plot_randomized_search.py
examples/neighbors/plot_kde_1d.py
examples/neighbors/plot_lof.py
examples/neighbors/plot_nearest_centroid.py
examples/neural_networks/plot_rbm_logistic_classification.py
examples/preprocessing/plot_power_transformer.py
examples/preprocessing/plot_scaling_importance.py
examples/semi_supervised/plot_label_propagation_structure.py
examples/svm/plot_iris.py
examples/svm/plot_oneclass.py
examples/svm/plot_rbf_parameters.py
examples/svm/plot_separating_hyperplane_unbalanced.py
examples/svm/plot_svm_regression.py
examples/tree/plot_iris.py
sklearn/_build_utils/__init__.py
sklearn/_isotonic.pyx
sklearn/cluster/_feature_agglomeration.py
sklearn/cluster/bicluster.py
sklearn/cluster/tests/test_affinity_propagation.py
sklearn/cluster/tests/test_feature_agglomeration.py
sklearn/compose/__init__.py
sklearn/covariance/elliptic_envelope.py
sklearn/covariance/empirical_covariance_.py
sklearn/covariance/tests/test_graphical_lasso.py
sklearn/covariance/tests/test_robust_covariance.py
sklearn/cross_decomposition/__init__.py
sklearn/datasets/descr/breast_cancer.rst
sklearn/datasets/descr/digits.rst
sklearn/datasets/mlcomp.py
sklearn/datasets/mldata.py
sklearn/datasets/tests/test_20news.py
sklearn/decomposition/base.py
sklearn/decomposition/tests/test_incremental_pca.py
sklearn/externals/_pilutil.py
sklearn/externals/copy_joblib.sh
sklearn/externals/funcsigs.py
sklearn/externals/joblib/_compat.py
sklearn/externals/joblib/_memory_helpers.py
sklearn/externals/joblib/_parallel_backends.py
sklearn/externals/joblib/backports.py
sklearn/externals/joblib/format_stack.py
sklearn/externals/joblib/logger.py
sklearn/externals/joblib/memory.py
sklearn/externals/joblib/numpy_pickle_compat.py
sklearn/externals/joblib/numpy_pickle_utils.py
sklearn/externals/setup.py
sklearn/feature_extraction/stop_words.py
sklearn/feature_extraction/tests/test_dict_vectorizer.py
sklearn/feature_selection/tests/test_mutual_info.py
sklearn/gaussian_process/tests/test_gaussian_process.py
sklearn/gaussian_process/tests/test_gpc.py
sklearn/linear_model/randomized_l1.py
sklearn/linear_model/sag_fast.pyx
sklearn/linear_model/tests/test_passive_aggressive.py
sklearn/linear_model/tests/test_perceptron.py
sklearn/linear_model/tests/test_theil_sen.py
sklearn/linear_model/theil_sen.py
sklearn/manifold/_barnes_hut_tsne.pyx
sklearn/manifold/_utils.pyx
sklearn/metrics/cluster/bicluster.py
sklearn/metrics/cluster/tests/test_bicluster.py
sklearn/metrics/cluster/tests/test_common.py
sklearn/metrics/setup.py
sklearn/mixture/dpgmm.py
sklearn/mixture/tests/test_bayesian_mixture.py
sklearn/mixture/tests/test_dpgmm.py
sklearn/mixture/tests/test_mixture.py
sklearn/neighbors/graph.py
sklearn/neighbors/tests/test_nearest_centroid.py
sklearn/neighbors/typedefs.pxd
sklearn/neural_network/_base.py
sklearn/svm/__init__.py
sklearn/svm/libsvm_sparse.pyx
sklearn/svm/src/liblinear/liblinear_helper.c
sklearn/svm/src/liblinear/linear.cpp
sklearn/svm/src/libsvm/libsvm_helper.c
sklearn/tests/test_config.py
sklearn/tests/test_docstring_parameters.py
sklearn/tests/test_isotonic.py
sklearn/tree/_criterion.pxd
sklearn/utils/_random.pxd
sklearn/utils/_unittest_backport.py
sklearn/utils/arpack.py
sklearn/utils/deprecation.py
sklearn/utils/linear_assignment_.py
sklearn/utils/sparsetools/setup.py
sklearn/utils/tests/test_graph.py
sklearn/utils/tests/test_random.py
sklearn/utils/tests/test_seq_dataset.py
sklearn/utils/tests/test_utils.py
sklearn/utils/weight_vector.pyx

What do we think of offering contributors to do cleanups of flake8 or removing assert_{true,false,equal,dict_equal} in these files?

The text was updated successfully, but these errors were encountered:

jnothman · 2018-06-21T13:43:30Z

I should summarise: of 1122 files in the repo at 73b7d07:

type	n PRs	n files
c	0	59
	1	9
	2	3
	3	1
	4-10	1
data	0	7
	1	4
examples	0	59
	1	136
	2	35
	3	12
	4-10	10
externals	0	3
	1	3
	2	13
	3	3
	4-10	4
other	0	112
	1	32
	2	16
	3	7
	4-10	10
py	0	17
	1	39
	2	33
	3	24
	4-10	97
	11-20	46
	20+	10
rst	0	18
	1	30
	2	13
	3	13
	4-10	32
	11-20	8
	20+	4
tests	0	40
	1	22
	2	26
	3	25
	4-10	63
	11-20	22
	20+	1

(Sorry: being lazy about visualisation)

Definition of type:

def path_type(path):
    if path.startswith('examples'):
        return 'examples'
    if '/externals/' in path:
        return 'externals'
    if '/tests/' in path:
        return 'tests'
    if path.endswith('.rst'):
        return 'rst'
    if not path.startswith('sklearn/'):
        return 'other'
    if any(path.endswith(ext) for ext in ['.c', '.h', '.cpp']):
        return 'c'
    if any(path.endswith(ext) for ext in ['.py', '.pxd', '.pyx', '.pxi']):
        return 'py'
    if any(path.endswith(ext) for ext in ['.jpg', '.csv', '.gz']):
        return 'data'
    return 'other'

lesteve · 2018-06-29T06:46:56Z

Wow this is quite an impressive endeavour! I am guessing that there is a bit of inaccuracy because the line numbers of the flake8 warnings change with history (or maybe this is just on a per-file level). I would still take that as a useful first-order estimate.

Personally I have to admit I have a bit of a bias towards flake8ing all (or at least as much the consensus deems acceptable) of the code, sacrifice some old outstanding PRs and help the people that need it on the ongoing still alive PRs.

Ideally it would be very nice to have some anecdotal evidence that sometimes conflicts (caused by flake8 or by some similar automated change) is the thing that prevents resuscitating a PR (or makes a PR die). Talking for myself, I have had some cases where resolving the conflicts was too painful (#10663 reviving #4807 stands to mind). I ended up rewriting the code, with a lot of copying and pasting from the PR.

One of the arguments I have heard against flake8ing the code is that flake8 changes with time (PEP8 changes with time and is open to interpretation too) and that it's not something you do once and forget about it for the rest of your life. I definitely have seen cases where different versions of flake8 were giving different warnings but I would be (maybe too naively) optimistic that we can control this effect through ignoring flake8 warnings in setup.cfg or on a line-level through comments.

Side-comment: if we decide to flake8 some parts of the code (e.g. tests may be a good candidate), there is no guarantee that flake8_diff.sh prevents flake8 errors to be reintroduced (mainly because the diff does not have enough context), so we will need to adapt our flake8 testing script.

jnothman · 2018-06-30T08:55:51Z

the line numbers are now out of date, but they reflect the "left hand side of the diff", i.e. master I might also note that lgtm.com checks for many key pyflakes issues (unused import, etc), so the benefits here would be some uncontroversial pep8 fixes, and pytest updates. the benefit is that contributors won't try fixing pep8 in feature PRs, and will have a good example to follow for testing.

rth · 2018-07-03T09:22:57Z

Very impressing analysis! This will be very useful for PR introducing cosmetic changes.

What do we think of offering contributors to do cleanups of flake8 or removing assert_{true,false,equal,dict_equal} in these files?

I'm still biased toward "yes", particularly for running nose2pytest on the test files, don't really have a strong opinion about flake.

Also I was following with interest projects that started to use black (e.g. dask/dask-ml#237) but that's a whole another level of potential merge conflicts so probably not even worth considering here (even assuming the advantages were unanimous, which is not the case I think).

rth · 2019-05-14T19:10:10Z

On a related topic Django recently accepted an Enhancement Proposal to re-format the whole code base with black (https://github.com/django/deps/blob/master/accepted/0008-black.rst) if I understood that right. They "only" have 230 open PRs but still.

thomasjpfan · 2019-05-14T22:04:25Z

The way dask-ml tries to enforce style is through its documentation and using pre-commit to manage pre-commit hooks.

Even without pre-commit hooks, if we can get contributors to run black on their PRs, all the merge conflicts should be resolved by themselves. Ideally, there would be no need to manually resolve merge conflicts.

jnothman · 2019-05-14T23:44:07Z

But black would be a process we apply to the whole codebase beforehand, not just to contributions. I have no real objection to this.

jnothman · 2019-05-14T23:46:12Z

Although black's approach to indentation is very different to ours.

thomasjpfan · 2019-05-15T02:42:36Z

Although black's approach to indentation is very different to ours.

That is the blocker for using black. scikit-learn prefers:

def hello_world(var_1, var_2, var_3,
			    var_4, ...):
    pass

while black prefers:

def hello_world(
    var_1,
    var_2,
    var_3,
    ...):
    pass

When not working in scikit-learn, I tend to prefer the black indentation, because it allows space for type annotations.

NicolasHug · 2019-05-15T15:34:38Z

I'd be OK with using black, the benefit seem to outweigh the downsides.

Merge conflicts with existing PRs would be trivial to solve, contributors would just have to run black.

rth · 2019-05-15T18:55:17Z

Merge conflicts with existing PRs would be trivial to solve, contributors would just have to run black.

black would help some, but there are still cases where conflicts would need to be resolved (e.g. if the same line was changed in 2 different ways).

rth · 2020-04-14T23:08:02Z

The issue is that for instance applying flake8 on the diff, after a certain point does not make code more PEP8 compatible -- it does not converge.

flake8 --exclude=sklearn/externals sklearn/ | wc -l

On,

master - 227 errors
v0.22.1 - 282 errors
v0.21.0 - 209 errors
v0.20.0 - 268 errors
v0.19.0 - 482 errors

So honestly I think we should just fix these 200 LoC (or which many are removed/added newlines), be done with it and remove ~160 lines of bash hacks in build_tools/circle/linting.sh in favor of flake8 sklearn/

rth · 2020-04-14T23:09:33Z

Or maybe we should just discuss using black in the next dev meeting..

jnothman · 2020-04-19T12:15:20Z

I can't say I like black's style all the time... but I agree that it's an easy way out. Happy to discuss.

rth · 2020-04-19T15:18:52Z

rth commented

Apr 19, 2020

•

For code style as far as can tell we have 3 possibilities,
1). Keep the current situation: i.e. apply flake8 on the diff, and a few selected flake8 rules on the full code base (e.g. avoid unused imports). The limitation of it is that we have to maintain a significant amount of code to check this and new PR may introduce PEP8 incompatible changes, particularly at the edges of the diff.
2) Fix flake8 issues, to address limitation from 1 but otherwise keep the current situation.
3). Use automatic code formatting such as black. I agree that using 1 parameter per line for longer functions #11336 (comment) is not necessarily ideal. Though it could make more sense if we start to add some code annotations. Also for functions that have 20+ input parameters (e.g. TfidfVectorizer) we also have to consider whether there is API issue to start with.

@jorisvandenbossche I see pandas has applied black last year pandas-dev/pandas#27076. Could you share your experience of it? Particularly with respect to managing the transition, whether it had an impact on merge conflicts for existing PR, as well as what impact in PR review and interaction with contributors it had.

chkoar · 2020-05-22T11:46:54Z

I am +1 for the black adoption.

amueller · 2020-05-25T11:55:28Z

I think I'd vote 2, though I'd also be fine with 3. Completely agree with @jnothman :)

NicolasHug · 2020-05-26T22:53:32Z

Just an anecdotal remark in favor of black: I've written a bunch of PRs recently where I needed to do some global changes to the codebase (e.g. adding a strict_mode arguement to all the checks, or replacing check_array by _validate_data).

Having black would have made my life much easier. Right now, I have to fix dozens (sometimes hundreds) of linting issues before I can push, otherwise I know the CI won't run most tests. It's a bit of a pain especially when you're just trying potential ideas.

jnothman · 2020-06-26T01:50:20Z

I think transitioning could look like:

Make a PR to blacken master.
Update documentation and potentially adopt pre-commit.com's black runner.
Add a commit to all open PRs, automatically applying black.

I'm not yet sure about applying black to examples, where we'll often want to make the example visually clear. I'm thinking about the layout of a 2d matrix input specifically.

chkoar · 2020-06-26T01:56:37Z

I'm not yet sure about applying black to examples, where we'll often want to make the example visually clear.

That's an understandable point but probably is something that we are used to it.
IMHO having just black . removes the mental complexity on how to format the code.

thomasjpfan · 2020-06-26T02:39:45Z

If we are doing this, I would try to push for a slightly higher line-length (maybe 100). I recall that @adrinjalali was not happy with changing the line length.

rth · 2020-06-26T06:25:56Z

If we are doing this, I would try to push for a slightly higher line-length (maybe 100). I recall that @adrinjalali was not happy with changing the line length.

Also Gaël I think. 100 would be a bit too large even on my laptop I think. I would prefer either the default (88) or 79.

IMHO having just black . removes the mental complexity on how to format the code.

Yeah, asking contributors to rely on pre-commit for code changes except for examples where they would need to manually do it can be confusing.

I think transitioning could look like:

Sounds like a plan.

Add a commit to all open PRs, automatically applying black.

Yes, that would indeed be ideal. Something like,

for pr_id in get_open_prs():
    subprocess.call(['hub', 'pr', 'checkout', f'#{pr_id}'])
    subprocess.call(['black', '.'])
    subprocess.call(['git', 'commit', '-a'])
    subprocess.call(['git', 'push'])

we would need to think about how to make that a) maybe include manual validation in the beginning b) store state of what was migrated what wasn't c) make it more fault tolerant d) be aware that we probably won't be able to do it all in one run due to Github API limitations. The list of open PRs can be obtained with,

hub pr list -s open

mitar · 2020-07-03T20:33:00Z

Just 2 cents here: I was heavy user of flake, pycodestyle, and other tools on my Python projects with many students working on them. I wanted that students learn good code style and that code is consistent as everyone was participating on just parts of it. The downside was that students spend so much time trying to fix those style issues because even if those tools detect issues, authors do not necessary know how to fix them and then do trial and error. Even more, some students never installed those tools locally and just used CI to check, so there was a crazy amount of commits trying to fix those and waiting for CI.

With black, we can just focus on code and ignore the style. Students are happier and they also learn good code style by seeing what black fixes. Which eventually take a hold on them.

Since then I also used go which also has a standard formatter. And it really makes life much easier. Even if you disagree with some style choices, at the end consistency is the most important and there is really no reason to spend time manually fixing those consistency issues.

I also believe sklearn should get typing information, so black formatting of attributes makes sense in that context.

jnothman · 2020-07-04T23:10:40Z

Thanks. I think the general consensus at the last dev meeting was that, while several of us would grieve our loss of aesthetic freedom, this would be a valuable way to lower the bar to entry, and none of us would be strongly against. I think what needs clarification is the path to transition.

…

cmarmo · 2022-01-15T09:18:47Z

Now that black has been adopted, am I wrong or this issue can be closed?

adrinjalali · 2022-01-15T11:43:29Z

Joel's script is a really nice way to measure impact for future potential changes. But I think the main issue is resolved.

rth mentioned this issue Jul 6, 2018

Doc consistency #11450

Closed

This was referenced Jun 4, 2019

Using black for code style scikit-learn-contrib/scikit-learn-extra#14

Closed

MNT Format code with black scikit-learn-contrib/scikit-learn-extra#15

Merged

rth mentioned this issue Jan 29, 2020

TST Fix unreachable code in tests #16110

Merged

rth mentioned this issue Mar 30, 2020

[MRG+1] MNT fix E731 #16786

Merged

rth mentioned this issue Apr 18, 2020

MNT Add pre-comit configuration #16957

Merged

rth mentioned this issue May 10, 2020

MNT properly activate the env in the linting CI #17177

Merged

NicolasHug mentioned this issue May 12, 2020

better code formatting #17190

Closed

rth mentioned this issue May 24, 2020

[DISCUSSION] Replace old string formatting syntax with f-strings #17327

Closed

thomasjpfan mentioned this issue Jul 3, 2020

ENH Adds typing support to LogisticRegression #17799

Open

rth mentioned this issue Jul 8, 2020

Optimize import #17840

Closed

lorentzenchr mentioned this issue Oct 26, 2020

CLN some code cleansing in preprocessing #18686

Merged

thomasjpfan mentioned this issue Dec 1, 2020

MNT Applies black formatting to most of the code base #18948

Merged

rth mentioned this issue Jun 18, 2021

Fix flake8 in examples #20296

Closed

lorentzenchr mentioned this issue Aug 23, 2021

RFC Consistency for meta infos in files (license, encoding, authors, ..) #20813

Closed

cmarmo added the Needs Decision - Close Requires decision for closing label Jan 15, 2022

adrinjalali closed this as completed Jan 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC On the relative harm of cosmetic changes #11336

RFC On the relative harm of cosmetic changes #11336

RFC On the relative harm of cosmetic changes #11336

RFC On the relative harm of cosmetic changes #11336

Comments