MRG Feature stacker #1173

amueller · 2012-09-20T16:43:43Z

This estimator provides a Y piece for the pipeline.
I used it to combine word ngrams and char ngrams into a single transformer.
Basically it just concatenates the output of several transformers into one large feature.

If you think this is helpful, I'll add some docs and an example.
With this, together with Pipeline, one can build arbitrary complex graphs (with one source and one sink) of estimators in sklearn :)

TODO

~~tests~~
~~narrative documentation~~
~~example~~

Thanks to the awesome implementation of the BaseEstimator, grid search simply works - though with complicated graphs you get parameter names like feature_stacker__first_feature__feature_selection__percentile (more or less from my code ^^).

ogrisel · 2012-09-20T16:55:58Z

sklearn/linear_model/tests/test_randomized_l1.py

+    clf = RandomizedLogisticRegression(verbose=False, C=1., random_state=42,
+                                scaling=scaling, n_resampling=50, tol=1e-3)
+    feature_scores_sp = clf.fit(X_sp, y).scores_
+    assert_equal(feature_scores, feature_scores_sp)


This hunk seems to be unrelated to this PR.

whoops sorry, forked from wrong branch. just a sec.

ogrisel · 2012-09-20T16:56:37Z

Very interesting. I want an example first! (then documentation and tests :)

amueller · 2012-09-20T16:58:32Z

on it :)

ogrisel · 2012-09-20T17:07:15Z

@amueller to avoid forking from non-master branches you should use something such as http://volnitsky.com/project/git-prompt/

ogrisel · 2012-09-20T17:19:23Z

sklearn/pipeline.py

+            features.append(trans.transform(X))
+        issparse = [sparse.issparse(f) for f in features]
+        if np.any(issparse):
+            features = sparse.hstack(features).tocsr()


Maybe the tocsr() can be avoided. For instance the downstream model might prefer CSC such as ElasticNet for instance.

Then again, bugs crop up every now and then where estimators that are supposed to handle any sparse format turn out to only handle CSR. It's a good defensive strategy to produce CSR by default (and it's unfortunate that sparse.hstack doesn't do this already).

I wrote this thing in the heat of the battle and I don't remember if there was a reason or if it was just a precaution. I'm inclined to think that I put it there because something, somewhere, broke.

amueller · 2012-09-20T17:28:25Z

Yes, it should derive from transformer mixin.
@larsmans can I interpret your comments such that you think this is a good thing to have?

amueller · 2012-09-20T17:28:56Z

Added a toy example.

ogrisel · 2012-09-20T17:33:32Z

I think such a feature stack should provide some way to do feature group normalization in one way or another. But this probably require some experiments to know which normalization pattern is useful on such beast in practice.

Anybody has practical experience or insight to share on this?

larsmans · 2012-09-20T17:34:30Z

GREAT idea! However, I don't like the name FeatureStacker much, as stacking implies putting things on top of each other, while this class concatenates things side-by-side.

I tried to find a "plumbing equivalent" of this class to keep with the pipeline metaphor, but I can't seem to find it. It's not quite a tee as it connects the various streams back together in the end. Maybe one of the other devs is more experienced with plumbing? :)

ogrisel · 2012-09-20T17:35:32Z

BTW I think the example could be improved my using a less trivial example (e.g. using the digits dataset) and showing that the cross validate score best grid searched parameter set of the pipeline with stacked features is better than the pipeline with individual feature transformers separately.

ogrisel · 2012-09-20T17:36:28Z

@larsmans maybe FeatureConcatenator?

ogrisel · 2012-09-20T17:37:09Z

FeatureUnion?

larsmans · 2012-09-20T17:48:52Z

MultiTransformer?

amueller · 2012-09-20T19:47:00Z

glad you like it. the estimator and the example even more are in a v 8000 ery rough state.i wasn't sure if the was interest and i had to leave my desk without really testing the example. I'll try to polish it asap. thanks for your suggestions. i don't think this exists in plumbing btw. it's a t followed by a y. ..
Andy

Lars Buitinck notifications@github.com schrieb:

MultiTransformer?

Reply to this email directly or view it on GitHub:
#1173 (comment)

Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

GaelVaroquaux · 2012-09-21T09:02:16Z

FeatureUnion?

My favorite so far

amueller · 2012-09-21T10:15:05Z

I also like FeatureUnion.
Other possibilities: FeatureBinder, FeatureAssembler, FeatureCombiner.
Or maybe go away from feature? TransformerUnion, TransformBinder, TransformerBundle?

Hm i think I like TransformerBundle

ogrisel · 2012-09-21T10:20:17Z

+1 for FeatureAssembler or FeatureUnion or TransformerBundle

larsmans · 2012-09-21T10:30:28Z

+1 for TransformerBundle.

amueller · 2012-09-21T11:55:36Z

In my application, I found the get_feature_names very helpful - I was using text data and some handcrafted features.
I fear in general this is hard to do. I thought about doing hasattr("get_feature_names") and otherwise just return estimator_name_0, estimator_name_1,.... This might be a bit problematic, though, as I don't think there is a reliable method to get the output dimensionality of a transformer :-/

Oh and @ogrisel for the normalization, each feature should be normalized separately, right?
This is "easily" possible but feeding the object pipelines of preprocessing and transformers. As normalization might be quite application specific, I think this solution is ok for the moment.
The code doesn't actually get too messy doing this.

amueller · 2012-09-21T11:59:21Z

ugh I just tried to work on the example and noticed that #1034 wasn't in master yet.
Without a good way to look at the grid search results, this PR is a lot less useful I think.
Have to work on #1034 more :-/

larsmans · 2012-09-21T12:00:02Z

We might introduce an n_features_out_ attribute/property on all transformers that work on feature vectors. For now, only supporting get_feature_names only when all underlying transformers do would be good enough, IMHO.

amueller · 2012-09-21T12:01:30Z

@larsmans ok, will do that. Should be easy enough.

amueller · 2012-09-21T12:26:47Z

Having a bit of a hard time creating a good example :-/

ogrisel · 2012-09-21T12:29:51Z

Have you been able to use this kind of tool successfully for your kaggle contest? If so then we can stick to a simplistic toy example and tell in the narrative documentation which kind of feature bundle was proven useful in practice on which kind of problem (e.g. PCA feature + raw TF-IDF for text classification for instance).

amueller · 2012-09-21T12:32:46Z

I can tell you how successful I was tomorrow ;)
It was definitely helpful to combine handcrafted features with word n-grams. Doing it using this estimator, I was still able to grid-seach for count-vectorize parameters such as min_df, ngram_range, etc. So that definitely helped.

mblondel · 2012-09-21T17:36:08Z

sklearn/pipeline.py

+
+    This estimator applies a list of transformer objects in parallel to the
+    input data, then concatenates the results. This is useful to combine
+    several feature extraction mechanisms into a single estimator.


single feature representation?

I prefer it the way it is, as getting the features out is not the important part, the important part is formulating it as an estimator.

I misunderstood what you meant. Since you're talking about extraction mechanisms, it may be clearer to say "in a single transformer".

mblondel · 2012-09-21T17:39:38Z

Nice idea indeed!

amueller · 2012-09-21T19:34:04Z

@mblondel any votes on the name?

mblondel · 2012-09-22T04:10:58Z

Some I like include FeatureAssembler, FeatureCombiner and FeatureUnion.

amueller · 2012-09-22T11:12:41Z

Name votes:
FeatureAssembler II
FeatureCombiner I
FeatureUnion IIII
TransformerBundle III

(If I counted correctly, which is unlikely given my degree in math)
If no-one objects I'll rename to FeatureUnion and change the state of the PR to MRG.

agramfort · 2012-09-22T12:08:37Z

+1 for FeatureUnion

I would have thought of FeatureConcat (FeatureConcatenator?) but FeatureUnion
is fine with me.

amueller · 2012-09-23T16:24:49Z

Renamed, think this is good to go.

amueller · 2012-09-26T19:07:27Z

Any more comments? (github claims this can not be merged but I just rebased, so it should be a fast-forward merge).

ogrisel · 2012-09-26T19:08:48Z

This cannot be merged in master currently but appart from that +1 for merging :)

GaelVaroquaux · 2012-09-29T14:35:07Z

LGTM. 👍 for merge. Thanks @amueller !

vene · 2012-10-28T15:32:25Z

Thank you for this convenient transformer. In my application I had to hack it a bit, and I wonder whether the feature I wanted could be more generally useful.

Basically, sometimes you want to concatenate the same feature extractor multiple times, and have some of the parameters tied when grid searching.

In my case, I was learning a hyphenator, so my data points consist of 2 strings: the one to the left of the current position and the one to the right of the current position. For this I defined a ProjectionVectorizer that has a column attribute that just says "I only work on X[:, column]" and concatenated two of these. Now, when grid searching, it is common sense to use the same n-gram range for both transformers, so the cleanest way to do this was this quick hack (no error handling):

class HomogeneousFeatureUnion(FeatureUnion):
    def set_params(self, **params):
        for key, value in params.iteritems():
            for _, transf in self.transformer_list:
                transf.set_params(**{key: value})

This can be easily extended to support both tied params and specific params. I'm not sure whether I overengineered this, but I still have the feeling that this might pop up in other people's applications, so I wanted to raise the question.

ogrisel reviewed Sep 20, 2012
View reviewed changes

ogrisel closed this Sep 20, 2012

larsmans reopened this Sep 20, 2012

mblondel reviewed Sep 21, 2012
View reviewed changes

amueller added 11 commits September 30, 2012 11:55

ENH add FeatureStacker estimator

4b4d8fd

ENH add feature stacker example

7b78abb

COSMIT + DOC more dosctrings, minor improvements

67c5a07

ENH implement get_feature_names

26d17c6

TST added tests, fix feature names.

d27e8fc

ENH add parallel fit and transform with joblib.

3d5e6cb

ENH add transformer weights

ec08e8c

TST add test for feature weights in feature stacker

dac5e59

DOC move example (there is nothing to plot) and add some text

098cfbc

MISC renaming FeatureStacker to FeatureUnion, adding docs

78789a1

DOC added FeatureUnion to whatsnew.

d087830

amueller merged commit d087830 into scikit-learn:master Sep 30, 2012

Uh oh!

MRG Feature stacker #1173

MRG Feature stacker #1173

Uh oh!

Conversation

TODO

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!