You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[PorterStemmer] Remove _adjust_case for consistency with Lancaster and Snowball
Handling of upper and lower case is not specified in Martin Porter's
"An algorithm for suffix stripping" paper; the algorithm description
there never even mentions the existence of difference letter cases.
Nor does Martin's C implementation of the stemmer at:
http://tartarus.org/~martin/PorterStemmer/c.txt
handle case in the way that NLTK's version has been doing; instead,
it simply requires that the user convert their word to lowercase
before calling stem().
Since there is no Porter-specific reason to preserve our (odd)
behaviour here, and our other StemmerI implementations don't do it,
we should probably purge it, as this commit does.
0 commit comments