8000 [PorterStemmer] Remove _adjust_case for consistency with Lancaster an… · ExplodingCabbage/nltk@a2d1dfa · GitHub
[go: up one dir, main page]

Skip to content

Commit a2d1dfa

Browse files
[PorterStemmer] Remove _adjust_case for consistency with Lancaster and Snowball
Handling of upper and lower case is not specified in Martin Porter's "An algorithm for suffix stripping" paper; the algorithm description there never even mentions the existence of difference letter cases. Nor does Martin's C implementation of the stemmer at: http://tartarus.org/~martin/PorterStemmer/c.txt handle case in the way that NLTK's version has been doing; instead, it simply requires that the user convert their word to lowercase before calling stem(). Since there is no Porter-specific reason to preserve our (odd) behaviour here, and our other StemmerI implementations don't do it, we should probably purge it, as this commit does.
1 parent 8b7ffe6 commit a2d1dfa

File tree

1 file changed

+1
-13
lines changed

1 file changed

+1
-13
lines changed

nltk/stem/porter.py

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -529,18 +529,6 @@ def _step5(self, word):
529529

530530
return word
531531

532-
def _adjust_case(self, word, stem):
533-
lower = word.lower()
534-
535-
ret = ""
536-
for x in range(len(stem)):
537-
if lower[x] == stem[x]:
538-
ret += word[x]
539-
else:
540-
ret += stem[x]
541-
542-
return ret
543-
544532
def stem(self, word):
545533
stem = word.lower()
546534

@@ -562,7 +550,7 @@ def stem(self, word):
562550
stem = self._step4(stem)
563551
stem = self._step5(stem)
564552

565-
return self._adjust_case(word, stem)
553+
return stem
566554

567555
def __repr__(self):
568556
return '<PorterStemmer>'

0 commit comments

Comments
 (0)
0