8000 refactor code v1.0.0 (#128) · feature-engine/feature_engine@e40457c · GitHub
[go: up one dir, main page]

Skip to content

Commit e40457c

Browse files
solegalliNicoGalliTejash-ShahOkroshiashvili
authored
refactor code v1.0.0 (#128)
* sort modules in subfolders, separate classes in individual modules, rename encoding classes, rename few other classes * fix WoE name in doc * sort selection in subfolder, separate class into modules * fix selection imports * update test_selection * update VERSION to 1.0.0 * match doc name with submodule name, fix imports and class names (#129) * reorganise test submodules as per package submodules, create individual test files for each class (#130) * rename class init params and defo values (#131) * reorganise base transformers and functions, clean imports (#133) * fix style check (#141) * improve code style throughout (#142) * remove iterable from init MathematicalCombination * replace listcomp by genexp in arbitrary discretiser * create abstraction decision tree discretiser * create abstraction in base cat encoder * replace listcomp by genexp * create abstraction base numerical imputer * create baseOutlier * final cleanup of code * replace listcomp by genexp in randomsampleimputer * split tests into individual functions (#147) * split imputation tests * fix typo, unify test names in math combination test * reformat line space in sklearwrapper test * remove comment in drop constant tests * split discretisation tests * split encoding tests * split test outliers * split transformer tests * separate woe and ratio encoder closes issue #143 (#149) * Issue 143 * doc issue 143 documentation issue 143 * Update PRatioEncoder.rst 'C' Removed from RareLabelCEncoder whch was causing and error. Also enconder_dict_ ratio results added * Changes in docstrings #143 Changes requested by Sole in docstrings, after first pull request related to this issue. * More docstring changes related to #143 minor updates in docstrings * separate woe and ratio tests into functions #147 separate woe and ratio tests into functions #147 * move PRatioEncoder under WoEencoder in list as these are related transformers * add ' to log_ratio in docstring * fix docstring with intro to transformer funcion * add more detail about the encoding in docstrings * renamed test functions * reword tests Co-authored-by: Soledad Galli <solegalli@protonmail.com> * reorganised folders with jupyter notebooks (#155) * Drop duplicate features #114 (#144) * add DropDuplicateFeatures in init * add fixture for duplicate features * add DropDuplicateFeatures functionality * add test for DropDuplicateFeatures * add DropDuplicateFeatures in init * add fixture for duplicate features * add DropDuplicateFeatures functionality * add test for DropDuplicateFeatures * create drop duplicate transformer * delete extra fixture Co-authored-by: Soledad Galli <solegalli@protonmail.com> * reformat code style with black (#153) * style formatting base scripts * reformat style creation modules * reformat style discretisers * reformat style imputers * rewords strings, minor changes * reformat codestyle outliers * reformat code style selection * reformat style transformers * reformat style wrappers * separate woe and ratio encoder closes issue #143 (#149) * Issue 143 * doc issue 143 documentation issue 143 * Update PRatioEncoder.rst 'C' Removed from RareLabelCEncoder whch was causing and error. Also enconder_dict_ ratio results added * Changes in docstrings #143 Changes requested by Sole in docstrings, after first pull request related to this issue. * More docstring changes related to #143 minor updates in docstrings * separate woe and ratio tests into functions #147 separate woe and ratio tests into functions #147 * move PRatioEncoder under WoEencoder in list as these are related transformers * add ' to log_ratio in docstring * fix docstring with intro to transformer funcion * add more detail about the encoding in docstrings * renamed test functions * reword tests Co-authored-by: Soledad Galli <solegalli@protonmail.com> * reorganised folders with jupyter notebooks (#155) * Drop duplicate features #114 (#144) * add DropDuplicateFeatures in init * add fixture for duplicate features * add DropDuplicateFeatures functionality * add test for DropDuplicateFeatures * add DropDuplicateFeatures in init * add fixture for duplicate features * add DropDuplicateFeatures functionality * add test for DropDuplicateFeatures * create drop duplicate transformer * delete extra fixture Co-authored-by: Soledad Galli <solegalli@protonmail.com> * reformat code style encoders * shorten dosctrings with flake8 Co-authored-by: NicoGalli <72278140+NicoGalli@users.noreply.github.com> Co-authored-by: Tejash Shah <stejash15@gmail.com> * reformat test code style (#157) * expand style check in tox, add black to test_req (#154) * update docs, shorten lines, test code (#158) * update docs, shorten lines, test code * include v1.0.0 changes in changelog * fix linebreaks in changelog * Add type hints, docstrings, and expand test. Introduce bug fix in _define_variables (#159) * Add .vscode in gitignore * add type hints and docstrings * Add type hints, docstrings, and introduce bug fix * add type hints and docstrings * add type hints, docstrings, and stylistic modifications * add type hints, docstrings, and stylistic modifications * some stylistic modifications * add test to check null values in dataframe * remove redundant docstring * stylistic modification * add new test and expanded existing one * fix flake8 suggestions * fix some indentation errors * fix indention error * add type hints and docstrings in boxcox transformer * add type hints and docstrings in log transformation * add type hints, docstrings, and recaftor constructor in power transformation * add type hints and docstrings in reciprocal transformer * add type hints * add type hints and docstrings * add test case to test the type of math operation argument * remove extra blank line * add test cases for parameter_checks * black it files * update type hints and docstring * black them and update type hints and docstrings * update type hints and docstrings * black it * update type hints * tidy imports Co-authored-by: NicoGalli <72278140+NicoGalli@users.noreply.github.com> Co-authored-by: Tejash Shah <stejash15@gmail.com> Co-authored-by: Nodar Okroshiashvili <n.okroshiashvili@gmail.com>
1 parent 6670181 commit e40457c

File tree

167 files changed

+9415
-6750
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

167 files changed

+9415
-6750
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,8 @@ venv.bak/
107107

108108
# Miscelaneous
109109
.idea
110+
.vscode
110111
*.csv
111112

112-
*.DS_Store
113+
*.DS_Store
114+
untitled9.py

docs/changelog.rst

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,63 @@
33
Changelog
44
=========
55

6+
Version 1.0.0
7+
-------------
8+
Deployed: TBD
9+
10+
Contributors:
11+
- Nodar Okroshiashvili
12+
- Nicolas Galli
13+
- Tejash Shah
14+
- Soledad Galli
15+
16+
17+
**Renaming of Modules within Feature-engine**:
18+
19+
Feature-engine transformers have been sorted into submodules to smooth the development
20+
of the package and shorten import syntax for users.
21+
22+
- **Module imputation**: missing data imputers are now imported from ``feature_engine.imputation`` instead of ``feature_engine.missing_data_imputation``.
23+
- **Module encoding**: categorical variable encoders are now imported from ``feature_engine.encoding`` instead of ``feature_engine_categorical_encoders``.
24+
- **Module discretisation**: discretisation transformers are now imported from ``feature_engine.discretisation`` instead of ``feature_engine.discretisers``.
25+
- **Module transformation**: transformers are now imported from ``feature_engine.transformation`` instead of ``feature_engine.variable_transformers``.
26+
- **Module outliers**: transformers to remove or censor outliers are now imported from ``feature_engine.outliers`` instead of ``feature_engine.outlier_removers``.
27+
- **Module selection**: new module hosts transformers to select or remove variables from a dataset.
28+
- **Module creation**: new module hosts transformers that combine variables into new features using mathematical or other operations.
29+
30+
**Renaming of Classes**:
31+
32+
In this release, we have shortened the name of categorical encoders, and also renamed
33+
other classes of Feature-engine to simplify import syntax.
34+
35+
- **Encoders**: the word ``Categorical`` was removed from the classes name. Now, instead of ``MeanCategoricalEncoder``, the class is called ``MeanEncoder``. Instead of ``RareLabelCategoricalEncoder`` it is ``RareLabelEncoder`` and so on. Please check the encoders documentation for more details.
36+
- **Imputers**: the ``CategoricalVariableImputer`` is now called ``CategoricalImputer``.
37+
- **Discretisers**: the ``UserInputDiscretiser`` is now called ``ArbitraryDiscretiser``.
38+
- **Creation**: the ``MathematicalCombinator`` is not called ``MathematicalCombination``.
39+
- **WoEEncoder and PRatioEncoder**: the ``WoEEncoder`` now applies only encoding with the weight of evidence. To apply encoding by probability ratios, use a different transformer: the ``PRatioEncoder`` (**by Nicolas Galli**).
40+
41+
**Renaming of class init Parameters**:
42+
43+
We renamed a few parameters to unify the nomenclature across the Package.
44+
45+
- **EndTailImputer**: the parameter ``distribution`` is now called ``imputation_method`` to unify convention among imputers. To impute using the IQR, we now need to pass ``imputation_method="iqr"`` instead of ``imputation_method="skewed"``.
46+
- **AddMissingIndicator**: the parameter ``missing_only`` now takes the boolean values ``True`` or ``False``.
47+
- **Winzoriser and OutlierTrimmer**: the parameter ``distribution`` is now called ``capping_method`` to unify names across Feature-engine transformers.
48+
49+
**New transformers and classes**:
50+
- **DropConstantFeatures**: DropConstantFeatures finds and removes constant and quasi-constant features from a dataframe (**by Tejash Shah**)
51+
- **DropDuplicateFeatures**: DropDuplicateFeatures finds and removes duplicated features from a dataset (**by Tejash Shah and Soledad Galli**)
52+
53+
**Code Architecture - Important for Contributors and Developers**:
54+
- **Submodules**: transformers have been grouped within relevant submodules and modules.
55+
- **Individual tests**: testing classes have been subdivided into individual tests
56+
- **Code Style**: we adopted the use of flake8 for linting and PEP8 style checks, and black for automatic re-styling of code.
57+
- **Type hint**: we are slowly rolling out the use of type hint throughout Feature-engine classes and functions (**by Nodar Okroshiashvili**)
58+
59+
**Other Changes**:
60+
- **Updated documentation**: documentation reflects the current use of Feature-engine transformers
61+
- **Typo fixes**: Thank you to all who contributed to typo fixes (Tim Vink, Github user @piecot)
62+
663
Version 0.6.1
764
-------------
865
Deployed: Friday, September 18, 2020

docs/code_of_conduct.rst

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,14 @@
11
Code of Conduct
22
===============
33

4-
Feature-engine is an open source Python project. We follow the `Python Software Foundation Code of Conduct <http://www.python.org/psf/codeofconduct/>`_. All interactions among members of the Feature-engine community must meet those guidelines. This includes (but is not limited to) interactions through the mailing list, GitHub and StackOverflow.
4+
Feature-engine is an open source Python project. We follow the
5+
`Python Software Foundation Code of Conduct <http://www.python.org/psf/codeofconduct/>`_.
6+
All interactions among members of the Feature-engine community must meet those
7+
guidelines. This includes (but is not limited to) interactions through the mailing
8+
list, GitHub and StackOverflow.
59

6-
Everyone is expected to be open, considerate, and respectful of others no matter what their position is within the project. We show gratitude for any contribution, big or small. We welcome feedback and participation. We want to make Feature-engine a nice, welcoming and safe place for you to do your first contribution to open source, and why not the second, the third and so on :).
10+
Everyone is expected to be open, considerate, and respectful of others no matter what
11+
their position is within the project. We show gratitude for any contribution, big or
12+
small. We welcome feedback and participation. We want to make Feature-engine a nice,
13+
welcoming and safe place for you to do your first contribution to open source, and why
14+
not the second, the third and so on :).

docs/conf.py

Lines changed: 48 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@
2020
import os
2121
import sys
2222

23-
sys.path.insert(0, os.path.abspath('.'))
24-
sys.path.insert(0, os.path.abspath('../'))
25-
sys.path.insert(1, os.path.dirname(os.path.abspath('../')) + os.sep + 'feature_engine')
23+
sys.path.insert(0, os.path.abspath("."))
24+
sys.path.insert(0, os.path.abspath("../"))
25+
sys.path.insert(1, os.path.dirname(os.path.abspath("../")) + os.sep + "feature_engine")
2626

2727
# -- General configuration ------------------------------------------------
2828

@@ -33,19 +33,20 @@
3333
# Add any Sphinx extension module names here, as strings. They can be
3434
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
3535
# ones.
36-
extensions = ['sphinx.ext.autodoc',
37-
# 'sphinx.ext.doctest',
38-
'sphinx.ext.intersphinx',
39-
'sphinx.ext.todo',
40-
'sphinx.ext.coverage',
41-
# 'sphinx.ext.mathjax',
42-
# 'sphinx.ext.ifconfig',
43-
'sphinx.ext.viewcode',
44-
'sphinx.ext.githubpages',
45-
# 'sphinx.ext.autosummary',
46-
'sphinx.ext.napoleon',
47-
'numpydoc'
48-
]
36+
extensions = [
37+
"sphinx.ext.autodoc",
38+
# 'sphinx.ext.doctest',
39+
"sphinx.ext.intersphinx",
40+
"sphinx.ext.todo",
41+
"sphinx.ext.coverage",
42+
# 'sphinx.ext.mathjax',
43+
# 'sphinx.ext.ifconfig',
44+
"sphinx.ext.viewcode",
45+
"sphinx.ext.githubpages",
46+
# 'sphinx.ext.autosummary',
47+
"sphinx.ext.napoleon",
48+
"numpydoc",
49+
]
4950

5051
numpydoc_show_class_members = False
5152

@@ -54,28 +55,28 @@
5455
napoleon_use_ivar = False
5556

5657
# Add any paths that contain templates here, relative to this directory.
57-
templates_path = ['_templates']
58+
templates_path = ["_templates"]
5859

5960
# The suffix(es) of source filenames.
6061
# You can specify multiple suffix as a list of string:
6162
#
6263
# source_suffix = ['.rst', '.md']
63-
source_suffix = '.rst'
64+
source_suffix = ".rst"
6465

6566
# The master toctree document.
66-
master_doc = 'index'
67+
master_doc = "index"
6768

6869
# General information about the project.
69-
project = 'feature-engine'
70-
copyright = '2018-2020, Soledad Galli'
71-
author = 'Soledad Galli'
70+
project = "feature-engine"
71+
copyright = "2018-2020, Soledad Galli"
72+
author = "Soledad Galli"
7273

7374
# The version info for the project you're documenting, acts as replacement for
7475
# |version| and |release|, also used in various other places throughout the
7576
# built documents.
7677

77-
VERSION_PATH = '../feature_engine/VERSION'
78-
with open(VERSION_PATH, 'r') as version_file:
78+
VERSION_PATH = "../feature_engine/VERSION"
79+
with open(VERSION_PATH, "r") as version_file:
7980
v = version_file.read().strip()
8081
#
8182
# The short X.Y version.
@@ -107,7 +108,7 @@
107108
show_authors = False
108109

109110
# The name of the Pygments (syntax highlighting) style to use.
110-
pygments_style = 'sphinx'
111+
pygments_style = "sphinx"
111112

112113
# If true, `todo` and `todoList` produce output, else they produce nothing.
113114
todo_include_todos = False
@@ -117,7 +118,7 @@
117118
# The theme to use for HTML and HTML Help pages. See the documentation for
118119
# a list of builtin themes.
119120
#
120-
html_theme = 'sphinx_rtd_theme'
121+
html_theme = "sphinx_rtd_theme"
121122

122123
# Theme options are theme-specific and customize the look and feel of a theme
123124
# further. For a list of options available for each theme, see the
@@ -129,7 +130,7 @@
129130
# relative to this directory. They are copied after the builtin static files,
130131
# so a file named "default.css" will overwrite the builtin "default.css".
131132
html_static_path = []
132-
autoclass_content = 'both'
133+
autoclass_content = "both"
133134

134135
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
135136
html_show_sphinx = True
@@ -140,23 +141,20 @@
140141
# -- Options for HTMLHelp output ------------------------------------------
141142

142143
# Output file base name for HTML help builder.
143-
htmlhelp_basename = 'feature_enginedoc'
144+
htmlhelp_basename = "feature_enginedoc"
144145

145146
# -- Options for LaTeX output ---------------------------------------------
146147

147148
latex_elements = {
148149
# The paper size ('letterpaper' or 'a4paper').
149150
#
150151
# 'papersize': 'letterpaper',
151-
152152
# The font size ('10pt', '11pt' or '12pt').
153153
#
154154
# 'pointsize': '10pt',
155-
156155
# Additional stuff for the LaTeX preamble.
157156
#
158157
# 'preamble': '',
159-
160158
# Latex figure (float) alignment
161159
#
162160
# 'figure_align': 'htbp',
@@ -166,17 +164,21 @@
166164
# (source start file, target name, title,
167165
# author, documentclass [howto, manual, or own class]).
168166
latex_documents = [
169-
(master_doc, 'feature_engine.tex', 'feature\\_engine Documentation',
170-
'Soledad Galli', 'manual'),
167+
(
168+
master_doc,
169+
"feature_engine.tex",
170+
"feature\\_engine Documentation",
171+
"Soledad Galli",
172+
"manual",
173+
),
171174
]
172175

173176
# -- Options for manual page output ---------------------------------------
174177

175178
# One entry per manual page. List of tuples
176179
# (source start file, name, description, authors, manual section).
177180
man_pages = [
178-
(master_doc, 'feature_engine', 'feature_engine Documentation',
179-
[author], 1)
181+
(master_doc, "feature_engine", "feature_engine Documentation", [author], 1)
180182
]
181183

182184
# -- Options for Texinfo output -------------------------------------------
@@ -185,9 +187,15 @@
185187
# (source start file, target name, title, author,
186188
# dir menu entry, description, category)
187189
texinfo_documents = [
188-
(master_doc, 'feature_engine', 'feature_engine Documentation',
189-
author, 'feature_engine', 'One line description of project.',
190-
'Miscellaneous'),
190+
(
191+
master_doc,
192+
"feature_engine",
193+
"feature_engine Documentation",
194+
author,
195+
"feature_engine",
196+
"One line description of project.",
197+
"Miscellaneous",
198+
),
191199
]
192200

193201
# -- Options for Epub output ----------------------------------------------
@@ -208,7 +216,7 @@
208216
# epub_uid = ''
209217

210218
# A list of files that should not be packed into the epub file.
211-
epub_exclude_files = ['search.html']
219+
epub_exclude_files = ["search.html"]
212220

213221
# Example configuration for intersphinx: refer to the Python standard library.
214-
intersphinx_mapping = {'https://docs.python.org/': None}
222+
intersphinx_mapping = {"https://docs.python.org/": None}

docs/contributing/building_the_docs.rst

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,28 +3,31 @@
33
Getting started with Feature-engine documentation
44
=================================================
55

6-
Feature-engine documentation is built using `Sphinx <https://www.sphinx-doc.org>`_ and is hosted on `Read the Docs <https://readthedocs.org/>`_.
6+
Feature-engine documentation is built using `Sphinx <https://www.sphinx-doc.org>`_ and
7+
is hosted on `Read the Docs <https://readthedocs.org/>`_.
78

8-
To learn more about Sphinx follow the `Sphinx Quickstart documentation <https://www.sphinx-doc.org/en/master/usage/quickstart.html>`_.
9+
To learn more about Sphinx follow the
10+
`Sphinx Quickstart documentation <https://www.sphinx-doc.org/en/master/usage/quickstart.html>`_.
911

1012

1113
Building the documentation
1214
--------------------------
1315

1416
First, make sure you have properly installed Sphinx and the required dependencies.
1517

16-
1. If you haven't done so, in your virtual environment, from the root folder of the repository, install the requirements for the documentation::
18+
1. If you haven't done so, in your virtual environment, from the root folder of the
19+
repository, install the requirements for the documentation::
1720

1821
$ pip install -r docs/requirements.txt
1922

2023
2. To build the documentation (and test if it is working properly)::
2124

2225
$ sphinx-build -b html docs build
2326

24-
This command tells sphinx that the documentation files are within the docs folder, and the html files should be placed in the
25-
build folder.
27+
This command tells sphinx that the documentation files are within the docs folder, and
28+
the html files should be placed in the build folder.
2629

27-
If everything worked fine, you can navigate the html files located in build. Alternatively, you need to troubleshoot through
28-
the error messages returned by sphinx.
30+
If everything worked fine, you can navigate the html files located in build.
31+
Alternatively, you need to troubleshoot through the error messages returned by sphinx.
2932

3033
Good luck and get in touch if stuck!

0 commit comments

Comments
 (0)
0