You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[PorterStemmer] Remove stem_word from PorterStemmer. Breaks backwards compatability!
Prior to this change, the public API of the PorterStemmer was a mess. NLTK's version was
based off Vivake Gupta's implementation at http://tartarus.org/~martin/PorterStemmer/python.txt,
endorsed by Martin himself at http://tartarus.org/~martin/PorterStemmer/. However,
Gupta's implementation is a shoddy port of Martin Porter's own implementation in C,
and had several vestigial quirks lying around. These include the claim that the stem()
method takes a "char pointer" as an argument (no such thing in Python) and the need
to pass in start and end indexes between which stem() should read the word from the
given char array.
At some point in nltk's history, during or prior to the 2006 commit that added porter.py
to the current Git repository:
nltk@edf4677
this was "solved" by renaming Vivake's stem() method to stem_word() and creating
a wrapper for it called stem() that conformed to the StemmerI interface.
This was completely pointless; the right thing to do would've been to remove the
unnecessary parts of Vivake's stem() method and thereby acheive conformity to StemmerI.
This commit does this, but at the cost of breaking backwards compatibility for
anyone who was using stem_word(word) instead of stem(word); those people will
need to adjust their application code when updating to the latest version of NLTK.
0 commit comments