8000 reformat code style with black (#153) · feature-engine/feature_engine@48bdf41 · GitHub
[go: up one dir, main page]

Skip to content

Commit 48bdf41

Browse files
solegalliNicoGalliTejash-Shah
committed
reformat code style with black (#153)
* style formatting base scripts * reformat style creation modules * reformat style discretisers * reformat style imputers * rewords strings, minor changes * reformat codestyle outliers * reformat code style selection * reformat style transformers * reformat style wrappers * separate woe and ratio encoder closes issue #143 (#149) * Issue 143 * doc issue 143 documentation issue 143 * Update PRatioEncoder.rst 'C' Removed from RareLabelCEncoder whch was causing and error. Also enconder_dict_ ratio results added * Changes in docstrings #143 Changes requested by Sole in docstrings, after first pull request related to this issue. * More docstring changes related to #143 minor updates in docstrings * separate woe and ratio tests into functions #147 separate woe and ratio tests into functions #147 * move PRatioEncoder under WoEencoder in list as these are related transformers * add ' to log_ratio in docstring * fix docstring with intro to transformer funcion * add more detail about the encoding in docstrings * renamed test functions * reword tests Co-authored-by: Soledad Galli <solegalli@protonmail.com> * reorganised folders with jupyter notebooks (#155) * Drop duplicate features #114 (#144) * add DropDuplicateFeatures in init * add fixture for duplicate features * add DropDuplicateFeatures functionality * add test for DropDuplicateFeatures * add DropDuplicateFeatures in init * add fixture for duplicate features * add DropDuplicateFeatures functionality * add test for DropDuplicateFeatures * create drop duplicate transformer * delete extra fixture Co-authored-by: Soledad Galli <solegalli@protonmail.com> * reformat code style encoders * shorten dosctrings with flake8 Co-authored-by: NicoGalli <72278140+NicoGalli@users.noreply.github.com> Co-authored-by: Tejash Shah <stejash15@gmail.com>
1 parent 4af3a5b commit 48bdf41

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+924
-685
lines changed

feature_engine/__init__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
import feature_engine
33

44
PACKAGE_ROOT = pathlib.Path(feature_engine.__file__).resolve().parent
5-
VERSION_PATH = PACKAGE_ROOT / 'VERSION'
5+
VERSION_PATH = PACKAGE_ROOT / "VERSION"
66

77
name = "feature_engine"
88

9-
with open(VERSION_PATH, 'r') as version_file:
10-
__version__ = version_file.read().strip()
9+
with open(VERSION_PATH, "r") as version_file:
10+
__version__ = version_file.read().strip()

feature_engine/creation/__init__.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
11
"""
2-
The module creation includes classes to create new variables by combination of existing variables in the
3-
dataframe.
2+
The module creation includes classes to create new variables by combination of existing
3+
variables in the dataframe.
44
"""
55

66
from .mathematical_combination import MathematicalCombination
77

8-
__all__ = [
9-
'MathematicalCombination'
10-
]
8+
__all__ = ["MathematicalCombination"]

feature_engine/creation/mathematical_combination.py

Lines changed: 40 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,10 @@ class MathematicalCombination(BaseNumericalTransformer):
66
MathematicalCombination() applies basic mathematical operations across features,
77
returning 1 or more additional features as a result.
88
9-
For example, if we have the variables number_payments_first_quarter, number_payments_second_quarter,
10-
number_payments_third_quarter and number_payments_fourth_quarter, we can use MathematicalCombination()
11-
to calculate the total number of payments and mean number of payments as follows:
9+
For example, if we have the variables number_payments_first_quarter,
10+
number_payments_second_quarter, number_payments_third_quarter and
11+
number_payments_fourth_quarter, we can use MathematicalCombination() to calculate
12+
the total number of payments and mean number of payments as follows:
1213
1314
.. code-block:: python
1415
@@ -31,8 +32,8 @@ class MathematicalCombination(BaseNumericalTransformer):
3132
3233
transformer.fit_transform(X)
3334
34-
The transformed X will contain the additional features total_number_payments and mean_number_payments,
35-
plus the original set of variables.
35+
The transformed X will contain the additional features total_number_payments and
36+
mean_number_payments, plus the original set of variables.
3637
3738
Parameters
3839
----------
@@ -51,7 +52,8 @@ class MathematicalCombination(BaseNumericalTransformer):
5152
Each operation should be a string and must be one of the elements
5253
from the list: ['sum', 'prod', 'mean', 'std', 'max', 'min']
5354
54-
Each operation will result in a new variable that will be added to the transformed dataset.
55+
Each operation will result in a new variable that will be added to the
56+
transformed dataset.
5557
5658
new_variables_names: list, default=None
5759
Names of the newly created variables. The user can enter a name or a list
@@ -64,47 +66,57 @@ class MathematicalCombination(BaseNumericalTransformer):
6466
6567
The name of the variables indicated by the user should coincide with the order
6668
in which the mathematical operations are initialised in the transformer.
67-
That is, if you set math_operations = ['mean', 'prod'], the first new variable name
68-
will be assigned to the mean of the variables and the second variable name
69+
That is, if you set math_operations = ['mean', 'prod'], the first new variable
70+
name will be assigned to the mean of the variables and the second variable name
6971
to the product of the variables.
7072
7173
If new_variable_names=None, the transformer will assign an arbitrary name
72-
to the newly created features starting by the name of the mathematical operation,
73-
followed by the variables combined separated by -.
74+
to the newly created features starting by the name of the mathematical
75+
operation, followed by the variables combined separated by -.
7476
7577
"""
7678

7779
def __init__(self, variables=None, math_operations=None, new_variables_names=None):
7880

7981
if math_operations is None:
80-
math_operations = ['sum', 'prod', 'mean', 'std', 'max', 'min']
82+
math_operations = ["sum", "prod", "mean", "std", "max", "min"]
8183

8284
self.variables = variables
8385
self.new_variables_names = new_variables_names
84-
self._math_operations_permitted = ['sum', 'prod', 'mean', 'std', 'max', 'min']
86+
self._math_operations_permitted = ["sum", "prod", "mean", "std", "max", "min"]
8587

8688
if not isinstance(math_operations, list):
8789
raise KeyError("math_operations parameter must be a list or None")
8890

89-
if any(operation not in self._math_operations_permitted for operation in math_operations):
90-
raise KeyError("At least one of math_operations is not found in permitted operations set. "
91-
"Choose one of ['sum', 'prod', 'mean', 'std', 'max', 'min']")
91+
if any(
92+
operation not in self._math_operations_permitted
93+
for operation in math_operations
94+
):
95+
raise KeyError(
96+
"At least one of math_operations is not permitted operation. "
97+
"Choose one of ['sum', 'prod', 'mean', 'std', 'max', 'min']"
98+
)
9299
else:
93100
self.math_operations = math_operations
94101

95102
if self.variables and len(self.variables) <= 1:
96103
raise KeyError(
97-
"MathematicalCombination requires two or more features to make proper transformations.")
104+
"MathematicalCombination requires two or more features to make proper "
105+
"transformations."
106+
)
98107

99-
if self.new_variables_names and len(self.new_variables_names) != len(self.math_operations):
108+
if self.new_variables_names and len(self.new_variables_names) != len(
109+
self.math_operations
110+
):
100111
raise KeyError(
101-
"Number of items in New_variables_names must be equal to number of items in math_operations."
112+
"Number of items in New_variables_names must be equal to number of "
113+
"items in math_operations."
102114
)
103115

104116
def fit(self, X, y=None):
105117
"""
106-
Performs dataframe checks. Selects variables to transform if None were indicated by the user.
107-
Creates dictionary of column to transformation mappings
118+
Performs dataframe checks. Selects variables to transform if None were indicated
119+
by the user. Creates dictionary of column to transformation mappings.
108120
109121
X : pandas dataframe of shape = [n_samples, n_features]
110122
The training input samples.
@@ -118,12 +130,13 @@ def fit(self, X, y=None):
118130
self.input_shape_ = X.shape
119131

120132
if self.new_variables_names:
121-
self.combination_dict_ = dict(zip(self.new_variables_names, self.math_operations))
133+
self.combination_dict_ = dict(
134+
zip(self.new_variables_names, self.math_operations)
135+
)
122136
else:
123137
self.combination_dict_ = {
124138
f"{operation}({'-'.join(self.variables)})": operation
125-
for operation
126-
in self.math_operations
139+
for operation in self.math_operations
127140
}
128141

129142
return self
@@ -132,7 +145,8 @@ def transform(self, X):
132145
"""
133146
Transforms source dataset.
134147
135-
Adds column for each operation with calculation based on variables and operation.
148+
Adds a column for each operation with the calculation based on the variables
149+
and operations indicated when setting up the transformer.
136150
137151
Parameters
138152
----------
@@ -143,8 +157,8 @@ def transform(self, X):
143157
Returns
144158
-------
145159
146-
X_transformed : pandas dataframe of shape = [n_samples, n_features + n_operations]
147-
The dataframe with operations results added.
160+
X_transformed : pandas dataframe, shape = [n_samples, n_features + n_operations]
161+
The dataframe with the operations results added as columns.
148162
"""
149163
X = super().transform(X)
150164

feature_engine/dataframe_checks.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,16 @@ def _check_input_matches_training_df(X, reference):
1616
# check that dataframe to transform has the same number of columns
1717
# that the dataframe used during fit method
1818
if X.shape[1] != reference:
19-
raise ValueError('The number of columns in this data set is different from the one used to fit this '
20-
'transformer (when using the fit method)')
19+
raise ValueError(
20+
"The number of columns in this data set is different from the one used to "
21+
"fit this transformer (when using the fit method)"
22+
)
2123
return None
2224

2325

2426
def _check_contains_na(X, variables):
2527
if X[variables].isnull().values.any():
26-
raise ValueError('Some of the variables to transform contain missing values. Check and remove those '
27-
'before using this transformer.')
28+
raise ValueError(
29+
"Some of the variables to transform contain missing values. Check and "
30+
"remove those before using this transformer."
31+
)
Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
11
"""
2-
The module discretisation includes classes to sort continuous variables into bins / intervals.
2+
The module discretisation includes classes to sort continuous variables into bins or
3+
intervals.
34
"""
45

56
from .decision_tree import DecisionTreeDiscretiser
6-
from . equal_frequency import EqualFrequencyDiscretiser
7-
from .equal_width import EqualWidthDiscretiser
7+
from .equal_frequency import EqualFrequencyDiscretiser
8+
from .equal_width import EqualWidthDiscretiser
89
from .arbitrary import ArbitraryDiscretiser
910

1011
__all__ = [
11-
'DecisionTreeDiscretiser',
12-
'EqualFrequencyDiscretiser',
13-
'EqualWidthDiscretiser',
14-
'ArbitraryDiscretiser'
12+
"DecisionTreeDiscretiser",
13+
"EqualFrequencyDiscretiser",
14+
"EqualWidthDiscretiser",
15+
"ArbitraryDiscretiser",
1516
]

feature_engine/discretisation/arbitrary.py

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,9 @@ class ArbitraryDiscretiser(BaseNumericalTransformer):
1515
'var2':[5, 10, 15, 20]}.
1616
1717
The UserInputDiscretiser() works only with numerical variables. The discretiser will
18-
check if the dictionary entered by the user contains variables present in the training
19-
set, and if these variables are cast as numerical, before doing any transformation.
18+
check if the dictionary entered by the user contains variables present in the
19+
training set, and if these variables are cast as numerical, before doing any
20+
transformation.
2021
2122
Then it transforms the variables, that is, it sorts the values into the intervals,
2223
transform.
@@ -25,8 +26,10 @@ class ArbitraryDiscretiser(BaseNumericalTransformer):
2526
----------
2627
2728
binning_dict : dict
28-
The dictionary with the variable : interval limits pairs, provided by the user. A
29-
valid dictionary looks like this: {'var1':[0, 10, 100, 1000], 'var2':[5, 10, 15, 20]}.
29+
The dictionary with the variable : interval limits pairs, provided by the user.
30+
A valid dictionary looks like this:
31+
32+
binning_dict = {'var1':[0, 10, 100, 1000], 'var2':[5, 10, 15, 20]}.
3033
3134
return_object : bool, default=False
3235
Whether the numbers in the discrete variable should be returned as
@@ -42,10 +45,12 @@ class ArbitraryDiscretiser(BaseNumericalTransformer):
4245
def __init__(self, binning_dict, return_object=False, return_boundaries=False):
4346

4447
if not isinstance(binning_dict, dict):
45-
raise ValueError("Please provide at a dictionary with the interval limits per variable")
48+
raise ValueError(
49+
"Please provide at a dictionary with the interval limits per variable"
50+
)
4651

4752
if not isinstance(return_object, bool):
48-
raise ValueError('return_object must be True or False')
53+
raise ValueError("return_object must be True or False")
4954

5055
self.binning_dict = binning_dict
5156
self.variables = [x for x in binning_dict.keys()]
@@ -54,7 +59,8 @@ def __init__(self, binning_dict, return_object=False, return_boundaries=False):
5459

5560
def fit(self, X, y=None):
5661
"""
57-
Checks that the user entered variables are in the train set and cast as numerical.
62+
Checks that the user entered variables are in the train set and cast as
63+
numerical.
5864
5965
Parameters
6066
----------
@@ -80,15 +86,17 @@ def fit(self, X, y=None):
8086
if all(variable in X.columns for variable in self.variables):
8187
self.binner_dict_ = self.binning_dict
8288
else:
83-
raise ValueError('There are variables in the provided dictionary which are not present in the train set '
84-
'or not cast as numerical')
89+
raise ValueError(
90+
"There are variables in the provided dictionary which are not present "
91+
"in the train set or not cast as numerical"
92+
)
8593

8694
self.input_shape_ = X.shape
8795

8896
return self
8997

9098
def transform(self, X):
91-
""" Sorts the variable values into the intervals.
99+
"""Sorts the variable values into the intervals.
92100
93101
Parameters
94102
----------
@@ -112,10 +120,12 @@ def transform(self, X):
112120

113121
else:
114122
for feature in self.variables:
115-
X[feature] = pd.cut(X[feature], self.binner_dict_[feature], labels=False)
123+
X[feature] = pd.cut(
124+
X[feature], self.binner_dict_[feature], labels=False
125+
)
116126

117127
# return object
118128
if self.return_object:
119-
X[self.variables] = X[self.variables].astype('O')
129+
X[self.variables] = X[self.variables].astype("O")
120130

121131
return X

0 commit comments

Comments
 (0)
0