What's the type of self.custom?

Also, you can step into the debugger to see which function it is that can not be pickled.



On 04/05/2016 04:14 PM, Fred Mailhot wrote:
Hi all,

I've got a pipeline with some custom transformers that's not pickling, and I'm not sure why. I've had this previously when using custom preprocessors & tokenizers with CountVectorizers. I dealt with it then by defining the custom bits at the module level.

I assumed I could avoid that by creating custom transformers that directly subclass TransformerMixin and importing them to the module where the pipeline is defined.

The transformer is implemented like this:

*==============================*
*[...imports...]*
*from text_preprocess import TextPreprocess*
*
*
*class CustomTransformer(TransformerMixin):*
*
*
*    def __init__(self, param_file_1="params.txt"):*
*self.pattern_file = pattern_file*
*self.custom = TextPreprocess(self.param_file)
*
*
*
*    def transform(self, X, *_):*
*        if isinstance(X, basestring):*
*X = [X]*
*return ["%s %s" % (x, " ".join([item["rewrite"] for item in*
*       self.custom.match(x)["info"] if "rewrite" in item])) for x in X]*
*
*
*    def fit(self, *_):*
*return self*
*==============================*
*
*
the full pipeline look like this:

*==============================**
*
*cm = CustomTransformer()*
*
*
*vec = FeatureUnion([("char_ng",*
*         CountVectorizer(analyzer="char_wb", tokenizer=string.split,*
* ngram_range=(3, 5), max_features=None, min_df=1,*
*                         max_df=0.5, **stop_words=None, binary=False)),*
*        ("word_ng",*
*         CountVectorizer(analyzer="word", ngram_range=(2, 3), *
*                         max_features=5000, min_df=1, max_df=0.5,*
*                         stop_words="english", **binary=False))])*
*
*
*pipeline = Pipeline([("custom", cm), ("vec", vec),*
*         ("lr", LogisticRegressionCV(scoring="f1_macro"))])*
*==============================*
*
*
And I get the following error when I fit & dump:
*
*
*==============================**
*
*In [62]: pipeline.fit(docs, [0, 0, 0, 1])*
*Out[62]:*
*Pipeline(steps=[('custom', <cm_transformer.CustomTransformer object at 0x113dd2310>), ('vec', FeatureUnion(n_jobs=1,** transformer_list=[('char_ng', CountVectorizer(analyzer='char_wb', binary=False, decode_error=u'strict',*
*  ...None,*
* refit=True, scoring='f1_macro', solver='lbfgs', tol=0.0001,*
* verbose=0))])*
*
*
*In [63]: pickle.dump(pipeline, open("test_pl_dump.pkl", "wb"), pickle.HIGHEST_PROTOCOL)*
*---------------------------------------------------------------------------*
*PicklingError Traceback (most recent call last)*
*<ipython-input-63-99a63544716d> in <module>()*
*----> 1 pickle.dump(pipeline, open("test_pl_dump.pkl", "wb"), pickle.HIGHEST_PROTOCOL)*
*
*
*PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed*
*==============================*
*
*
Any pointers would be appreciated. There are hints here and there on SO, but most point to the solution I referred to above...

Thanks!
Fred.



------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to