Add Filter and Pre Selection Components #65

samborba · 2020-05-29T14:26:47Z

There is a temporary measure to deal with custom transformers: there is a wrapping of the code to be imported in both Training and Inference notebooks.

Filter Selection
Removes selected features from the dataset.
Pre Selection
Removes features with low-variance and high correlation.

There is a temporary measure to deal with custom transformers: there is a wrapping of the code to be imported in both Training and Inference notebooks. - Filter Selection Removes selected features from the dataset. - Pre Selection Removes features with low-variance and high correlation.

samples/filter-selection/Training.ipynb

samples/filter-selection/Inference.ipynb

samples/filter-selection/Training.ipynb

samples/pre-selection/Inference.ipynb

samples/pre-selection/Training.ipynb

samples/filter-selection/Training.ipynb

samples/pre-selection/Training.ipynb

Pre Selection: - Change correlation method - Implementation of the fit method so that we can call the transform method in the Inference file - Import features_after_pipeline Filter: - Changes in contract.json - Implementation of the fit method so that we can call the transform method in the Inference file - Import features_after_pipeline

fberanizo

@samborba você precisa copiar a célula com classe CustomTransformer.py no Inference.ipynb.
No fluxo da plataforma, Training.ipynb e Inference.ipynb rodam em momentos diferentes e containers diferentes.

samples/pre-selection/Inference.ipynb

samples/pre-selection/Training.ipynb

fberanizo · 2020-06-02T00:52:16Z

Dá erro quando uso o dataset com categóricos. Ex: titanic

- Add Wrapping Custom Transformer step in Inference.ipynb - Remove target param from Pre Selection - Handle categorical features in Pre Selection

- Bring back the target variable for Pre Selection

samples/filter-selection/Training.ipynb

- Change features_to_filter to feature type

- Add target variable in Filter Selection - Refactoring Pre Selection: improving code documentation; remove imports and unused variables; remove target column from features_after_pipeline - Refactoring in attribute engineering components: will no longer need to save numerical_indexes list values using the save_model method, as the model will be able to remeber what are the types of each columns - Change in Inference targets

samples/normalizer/Inference.ipynb

- Clear Normalizer output

samples/filter-selection/Training.ipynb

samples/robust-scaler/Training.ipynb

samples/normalizer/Training.ipynb

samples/imputer/Training.ipynb

- Add parameter tag in Filter - Get new numerical features indexes after make_column_transformer

samples/imputer/Training.ipynb

samples/normalizer/Training.ipynb

samples/robust-scaler/Training.ipynb

fberanizo · 2020-06-06T13:52:04Z

@lucaslzl @lborro
@samborba aparentemente de uns tempos pra cá o scikit-learn aceita pandas.DataFrame nas chamadas de .fit, transform, .predict ...

Alguém quer fazer uns teste e trocar os lugares que usam ndarray por DataFrame?
Essa parte da ordenação de colunas parece desnecessariamente complicada, com esses vários saves e reordenações, etc...
Parece que com pandas.DataFrame o próprio scikit cuida dessa parte.

Edit: procurei melhor e parece que não é bem como pensei. Ainda é um problema em aberto no scikit-learn:
scikit-learn/scikit-learn#7242
scikit-learn/scikit-learn#12627

Não adianta fazer a sugestão aí de cima. Testei e não rolou.

O jeito parece ser usar um:

save_model(
     ...,
     feature_names_in=feature_names_in,
     feature_names_out=feature_names_out,
)

- Save features_after_pipeline and original column

fberanizo

Só tem o drop lá normalizer. O resto deu certo.

samples/normalizer/Training.ipynb

- Make_column_transform remainder change: if the column is not specified, it should not be dropped after transformation

fberanizo · 2020-06-07T13:12:57Z

LGTM

samborba requested review from fberanizo, lucaslzl and lborro May 29, 2020 14:26

fberanizo requested changes May 30, 2020

View reviewed changes

lborro suggested changes May 30, 2020

View reviewed changes

samples/pre-selection/Training.ipynb Outdated Show resolved Hide resolved

fberanizo requested changes Jun 1, 2020

View reviewed changes

samples/pre-selection/Inference.ipynb Outdated Show resolved Hide resolved

samples/pre-selection/Inference.ipynb Outdated Show resolved Hide resolved

samples/pre-selection/Training.ipynb Outdated Show resolved Hide resolved

Request Changes

ae8446e

- Add Wrapping Custom Transformer step in Inference.ipynb - Remove target param from Pre Selection - Handle categorical features in Pre Selection

samborba requested review from lborro and fberanizo June 2, 2020 04:12

Requested Changes

faf8370

- Bring back the target variable for Pre Selection

fberanizo requested changes Jun 2, 2020

View reviewed changes

samples/filter-selection/Training.ipynb Outdated Show resolved Hide resolved

Requested Changes

19ec121

- Change features_to_filter to feature type

samborba requested a review from fberanizo June 4, 2020 00:26

fberanizo requested changes Jun 5, 2020

View reviewed changes

samples/normalizer/Inference.ipynb Outdated Show resolved Hide resolved

samples/normalizer/Inference.ipynb Outdated Show resolved Hide resolved

Requested Changes

b6d6410

- Clear Normalizer output

fberanizo requested changes Jun 5, 2020

View reviewed changes

samples/filter-selection/Training.ipynb Outdated Show resolved Hide resolved

samples/robust-scaler/Training.ipynb Outdated Show resolved Hide resolved

fberanizo requested changes Jun 5, 2020

View reviewed changes

samples/normalizer/Training.ipynb Outdated Show resolved Hide resolved

samples/imputer/Training.ipynb Outdated Show resolved Hide resolved

Requested Changes

e10107f

- Add parameter tag in Filter - Get new numerical features indexes after make_column_transformer

samborba requested a review from fberanizo June 6, 2020 08:04

fberanizo requested changes Jun 6, 2020

View reviewed changes

samples/imputer/Training.ipynb Outdated Show resolved Hide resolved

samples/normalizer/Training.ipynb Outdated Show resolved Hide resolved

samples/robust-scaler/Training.ipynb Outdated Show resolved Hide resolved

This was referenced Jun 6, 2020

Recursive feature elimination (RFE) #74

Merged

Added auto featuring componentes to the platform. #77

Merged

Requested Changes

ed7d0c0

- Save features_after_pipeline and original column

samborba requested a review from fberanizo June 6, 2020 17:21

fberanizo requested changes Jun 6, 2020

View reviewed changes

samples/normalizer/Training.ipynb Outdated Show resolved Hide resolved

Requested Changes

c00fd0f

- Make_column_transform remainder change: if the column is not specified, it should not be dropped after transformation

samborba requested a review from fberanizo June 6, 2020 23:35

fberanizo approved these changes Jun 7, 2020

View reviewed changes

fberanizo merged commit df40935 into platiagro:master Jun 7, 2020

samborba deleted the feature/add-new-components branch June 8, 2020 12:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Filter and Pre Selection Components #65

Add Filter and Pre Selection Components #65

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add Filter and Pre Selection Components #65

Add Filter and Pre Selection Components #65

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!