-
Notifications
You must be signed in to change notification settings - Fork 7
Add Filter and Pre Selection Components #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Filter and Pre Selection Components #65
Conversation
There is a temporary measure to deal with custom transformers: there is a wrapping of the code to be imported in both Training and Inference notebooks. - Filter Selection Removes selected features from the dataset. - Pre Selection Removes features with low-variance and high correlation.
Pre Selection: - Change correlation method - Implementation of the fit method so that we can call the transform method in the Inference file - Import features_after_pipeline Filter: - Changes in contract.json - Implementation of the fit method so that we can call the transform method in the Inference file - Import features_after_pipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samborba você precisa copiar a célula com classe CustomTransformer.py
no Inference.ipynb.
No fluxo da plataforma, Training.ipynb e Inference.ipynb rodam em momentos diferentes e containers diferentes.
- Add Wrapping Custom Transformer step in Inference.ipynb - Remove target param from Pre Selection - Handle categorical features in Pre Selection
- Bring back the target variable for Pre Selection
- Change features_to_filter to feature type
- Add target variable in Filter Selection - Refactoring Pre Selection: improving code documentation; remove imports and unused variables; remove target column from features_after_pipeline - Refactoring in attribute engineering components: will no longer need to save numerical_indexes list values using the save_model method, as the model will be able to remeber what are the types of each columns - Change in Inference targets
- Clear Normalizer output
- Add parameter tag in Filter - Get new numerical features indexes after make_column_transformer
@lucaslzl @lborro Alguém quer fazer uns teste e trocar os lugares que usam ndarray por DataFrame? Edit: procurei melhor e parece que não é bem como pensei. Ainda é um problema em aberto no scikit-learn: Não adianta fazer a sugestão aí de cima. Testei e não rolou. O jeito parece ser usar um: save_model(
...,
feature_names_in=feature_names_in,
feature_names_out=feature_names_out,
) |
- Save features_after_pipeline and original column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Só tem o drop lá normalizer. O resto deu certo.
- Make_column_transform remainder change: if the column is not specified, it should not be dropped after transformation
LGTM |
There is a temporary measure to deal with custom transformers: there is a wrapping of the code to be imported in both Training and Inference notebooks.
Filter Selection
Removes selected features from the dataset.
Pre Selection
Removes features with low-variance and high correlation.