10000 InteractionTransformer · Issue #25412 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

InteractionTransformer #25412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mayer79 opened this issue Jan 16, 2023 · 3 comments
Open

InteractionTransformer #25412

mayer79 opened this issue Jan 16, 2023 · 3 comments

Comments

@mayer79
Copy link
Contributor
mayer79 commented Jan 16, 2023

Describe the workflow you want to enable

The latest 1.2 release is full of great features, e.g., full column name support. This brings me to one of my most desired features regarding building strong and realistic linear models: Interactions!

It is currently very hard to add interaction terms between two feature groups, especially if they involve 1 to m transforms like OHE or SplineTransformers.

Describe your proposed solution

An idea of @lorentzenchr that I try to summarize: Create the transforms like a ColumnTransformer, but also adding interaction terms between columns generated by each transformer. The resulting columns could be glued to an other ColumnTransformer using FeatureUnion.

Sketch of the API

InteractionTransformer(
    transformers,
    interaction_only=False
    include_bias=True,
    verbose_feature_names_out=True
)

# transformers: list of tuples
#     List of (name, transformer, columns) tuples specifying the transformer objects to be applied to subsets of the data.
#     Each transformer would specify a feature (group) that would interact with other transformers.
# interaction_only: If False, also return main effects

Example 1: Interactions between two 1-m transforms

Here, we would let each dummy variable of "f1" interact with each spline basis of "f2":

InteractionTransformer(
    transformers=[
        ("f1_ohe", OneHotEncoder(drop="first"), ["f1"]),
        ("f2_spline", SplineTransformer(), ["f2"]),
    ]
)

Example 2: Interaction between OHE and other features

Each column generated by OHE "f1" would interact with numeric feature "f2" and also with numeric feature "f3".

InteractionTransformer(
    transformers=[
        ("f1_ohe", OneHotEncoder(drop="first"), ["f1"]),
        ("f2_others", "passthrough", ["f2", "f3"]),
    ]
)

Example 3: Interaction between two OHE, further features linear

interactor = InteractionTransformer(
    transformers=[
        ("f1_ohe", OneHotEncoder(drop="first"), ["f1"]),
        ("f2_ohe", OneHotEncoder(drop="first"), ["f2"]),
    ]
)

preprocessor = FeatureUnion(
  interactor,
  ColumnTransformer("other", "passthrough", ["f3", "f4"])
)

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@mayer79 mayer79 added Needs Triage Issue requires triage New Feature labels Jan 16, 2023
@ogrisel ogrisel removed the Needs Triage Issue requires triage label Jan 27, 2023
@ogrisel
Copy link
Member
ogrisel commented Jan 27, 2023

Let's see a pull request then :)

It sounds like a pragmatic answer to a need that many people expressed over the years.

@ogrisel
Copy link
Member
ogrisel commented Jan 27, 2023

How would this design interact with hyper-parameter tuning? We should try to craft a design that will lead to a nice UX assuming we move forward with #21784.

@mayer79
Copy link
Contributor Author
mayer79 commented Jan 27, 2023

@ogrisel Thanks for your consideration!

Regarding the tuning logic: We could use an argument kind="pairwise" that would specify which feature groups would interact:

  • "pairwise": All pairwise interactions
  • None: Only main effects (?)
  • A tuple of pairs of names. In Example 3 above, "pairwise" would be equivalent to (("f1_ohe", "f2_ohe"),).

Alternative API

We could pack the interaction functionality also directly into into the existing columnTransformer() :

columnTransformer(
    transformers=[
        ("f1_ohe", OneHotEncoder(drop="first"), ["f1"]),
        ("f2_others", "passthrough", ["f2", "f3"]),
    ],
    interactions="pairwise"  # or (("f1_ohe", "f2_others"),)
)

Maybe you got an even more bold idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
0