Re: [Scikit-learn-general] Pickling custom Transformers in a Pipeline

Andreas Mueller Tue, 05 Apr 2016 13:27:02 -0700

What's the type of self.custom?

Also, you can step into the debugger to see which function it is thatcan not be pickled.




On 04/05/2016 04:14 PM, Fred Mailhot wrote:

Hi all,
I've got a pipeline with some custom transformers that's not pickling,and I'm not sure why. I've had this previously when using custompreprocessors & tokenizers with CountVectorizers. I dealt with it thenby defining the custom bits at the module level.
I assumed I could avoid that by creating custom transformers thatdirectly subclass TransformerMixin and importing them to the modulewhere the pipeline is defined.
The transformer is implemented like this:

*==============================*
*[...imports...]*
*from text_preprocess import TextPreprocess*
*
*
*class CustomTransformer(TransformerMixin):*
*
*
*    def __init__(self, param_file_1="params.txt"):*
*self.pattern_file = pattern_file*
*self.custom = TextPreprocess(self.param_file)
*
*
*
*    def transform(self, X, *_):*
*        if isinstance(X, basestring):*
*X = [X]*
*return ["%s %s" % (x, " ".join([item["rewrite"] for item in*
*       self.custom.match(x)["info"] if "rewrite" in item])) for x in X]*
*
*
*    def fit(self, *_):*
*return self*
*==============================*
*
*
the full pipeline look like this:

*==============================**
*
*cm = CustomTransformer()*
*
*
*vec = FeatureUnion([("char_ng",*
*         CountVectorizer(analyzer="char_wb", tokenizer=string.split,*
* ngram_range=(3, 5), max_features=None,min_df=1,*
*                         max_df=0.5, **stop_words=None, binary=False)),*
*        ("word_ng",*
*         CountVectorizer(analyzer="word", ngram_range=(2, 3), *
*                         max_features=5000, min_df=1, max_df=0.5,*
*                         stop_words="english", **binary=False))])*
*
*
*pipeline = Pipeline([("custom", cm), ("vec", vec),*
*         ("lr", LogisticRegressionCV(scoring="f1_macro"))])*
*==============================*
*
*
And I get the following error when I fit & dump:
*
*
*==============================**
*
*In [62]: pipeline.fit(docs, [0, 0, 0, 1])*
*Out[62]:*
*Pipeline(steps=[('custom', <cm_transformer.CustomTransformer objectat 0x113dd2310>), ('vec',FeatureUnion(n_jobs=1,** transformer_list=[('char_ng',CountVectorizer(analyzer='char_wb', binary=False, decode_error=u'strict',*
*  ...None,*
* refit=True, scoring='f1_macro', solver='lbfgs', tol=0.0001,*
* verbose=0))])*
*
*
*In [63]: pickle.dump(pipeline, open("test_pl_dump.pkl", "wb"),pickle.HIGHEST_PROTOCOL)*
*---------------------------------------------------------------------------*
*PicklingError Traceback (most recent calllast)*
*<ipython-input-63-99a63544716d> in <module>()*
*----> 1 pickle.dump(pipeline, open("test_pl_dump.pkl", "wb"),pickle.HIGHEST_PROTOCOL)*
*
*
*PicklingError: Can't pickle <type 'function'>: attribute lookup__builtin__.function failed*
*==============================*
*
*
Any pointers would be appreciated. There are hints here and there onSO, but most point to the solution I referred to above...
Thanks!
Fred.



------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Pickling custom Transformers in a Pipeline

Reply via email to