What's the type of self.custom?
Also, you can step into the debugger to see which function it is that
can not be pickled.
On 04/05/2016 04:14 PM, Fred Mailhot wrote:
Hi all,
I've got a pipeline with some custom transformers that's not pickling,
and I'm not sure why. I've had this previously when using custom
preprocessors & tokenizers with CountVectorizers. I dealt with it then
by defining the custom bits at the module level.
I assumed I could avoid that by creating custom transformers that
directly subclass TransformerMixin and importing them to the module
where the pipeline is defined.
The transformer is implemented like this:
*==============================*
*[...imports...]*
*from text_preprocess import TextPreprocess*
*
*
*class CustomTransformer(TransformerMixin):*
*
*
* def __init__(self, param_file_1="params.txt"):*
*self.pattern_file = pattern_file*
*self.custom = TextPreprocess(self.param_file)
*
*
*
* def transform(self, X, *_):*
* if isinstance(X, basestring):*
*X = [X]*
*return ["%s %s" % (x, " ".join([item["rewrite"] for item in*
* self.custom.match(x)["info"] if "rewrite" in item])) for x in X]*
*
*
* def fit(self, *_):*
*return self*
*==============================*
*
*
the full pipeline look like this:
*==============================**
*
*cm = CustomTransformer()*
*
*
*vec = FeatureUnion([("char_ng",*
* CountVectorizer(analyzer="char_wb", tokenizer=string.split,*
* ngram_range=(3, 5), max_features=None,
min_df=1,*
* max_df=0.5, **stop_words=None, binary=False)),*
* ("word_ng",*
* CountVectorizer(analyzer="word", ngram_range=(2, 3), *
* max_features=5000, min_df=1, max_df=0.5,*
* stop_words="english", **binary=False))])*
*
*
*pipeline = Pipeline([("custom", cm), ("vec", vec),*
* ("lr", LogisticRegressionCV(scoring="f1_macro"))])*
*==============================*
*
*
And I get the following error when I fit & dump:
*
*
*==============================**
*
*In [62]: pipeline.fit(docs, [0, 0, 0, 1])*
*Out[62]:*
*Pipeline(steps=[('custom', <cm_transformer.CustomTransformer object
at 0x113dd2310>), ('vec',
FeatureUnion(n_jobs=1,** transformer_list=[('char_ng',
CountVectorizer(analyzer='char_wb', binary=False, decode_error=u'strict',*
* ...None,*
* refit=True, scoring='f1_macro', solver='lbfgs', tol=0.0001,*
* verbose=0))])*
*
*
*In [63]: pickle.dump(pipeline, open("test_pl_dump.pkl", "wb"),
pickle.HIGHEST_PROTOCOL)*
*---------------------------------------------------------------------------*
*PicklingError Traceback (most recent call
last)*
*<ipython-input-63-99a63544716d> in <module>()*
*----> 1 pickle.dump(pipeline, open("test_pl_dump.pkl", "wb"),
pickle.HIGHEST_PROTOCOL)*
*
*
*PicklingError: Can't pickle <type 'function'>: attribute lookup
__builtin__.function failed*
*==============================*
*
*
Any pointers would be appreciated. There are hints here and there on
SO, but most point to the solution I referred to above...
Thanks!
Fred.
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general