Added function to calculate Corpus-level BLEU and RIBES #1229

alvations · 2015-12-09T21:19:41Z

In the process of adding functions to calculate corpus-level BLEU and RIBES, the following changes were made:

Set the default uniform weights for 4grams when calculating BLEU scores.
- The original paper also set default weights and limits to 4grams, thus the default weights were at at [0.25] * 4.
Added function for corpus-level BLEU by calculating mirco-average precision
- The original BLEU paper (Papineni et al. 2002) was meant to calculate micro-average corpus-level score. So the modified precision and brevity penalty scores does return a float but the sum of numerators and denominators. Simply averaging sentence level BLEU (i.e. marco-average) will not be the same as the BLEU as intended in the paper.
Change _modified_precision() to return numerator and denominator instead of a single-sentence BLEU score, i.e. corpus_bleu()
- sentence_modified_precision() will return the floating point BLEU score for a single sentence.
- Also, by disentangling the float return from the precision and penalty functions, smoothing on the precision can be easily extended, now that _modified_precision() and _brevity_penalty() returns numerators and denominators. E.g. when someone wants to implement BLEU smoothing from http://acl2014.org/acl2014/W14-33/pdf/W14-3346.pdf.
Change the nltk.translate namespaces
- To preserve current user's code. the tranlsate.__init__.py imports the sentence_bleu as the default nltk.translate.bleu. So from the user's end, BLEU still works the same way and calculating corpus-level bleu will require importing from nltk.translate.bleu_score import corpus_bleu.
Added function for corpus-level RIBES score, i.e. corpus_ribes()
- Different from BLEU's micro-average precision, RIBES calculates the macro-average precision by averaging the best RIBES score for each pair of hypothesis and its corresponding references. So we can safely sum and divide by len(). See line 307 from RIBES v1.03.1 (http://www.kecl.ntt.co.jp/icl/lirg/ribes/)

Syncing with bleeding edge develop branch

alvations · 2015-12-13T16:18:13Z

@hoontw Thanks for the review! The Fraction return type suggestion is great! That simplifies all the silly duck functions I was creating. The revised version of the code looks much cleaner now.

hoontw · 2015-12-18T03:49:21Z

Looks much better than before! Just a few more minor tweaks and it is good to go.

alvations · 2015-12-21T20:39:08Z

@hoontw sorry for the late commits, I was moving around and at last I've reached where I'm supposed to be =)

hoontw · 2015-12-22T01:51:42Z

Thanks @alvations for the changes. It's great to have corpus-level metrics for MT.

Added function to calculate Corpus-level BLEU and RIBES

alvations · 2015-12-22T05:17:31Z

Now we can do some tuning using the corpus-level metrics, possibly something like https://github.com/alvations/mosesdecoder/blob/master/scripts/training/propy/pro.py.

dimazest · 2015-12-22T10:02:12Z

nltk/translate/bleu_score.py

 from nltk.util import ngrams


-def bleu(references, hypothesis, weights):
+def sentence_bleu(references, hypothesis, weights=[0.25, 0.25, 0.25, 0.25]):


better set weights to a tuple, not a list, to avoid mutable default parameters.

dimazest · 2015-12-22T10:04:06Z

nltk/translate/bleu_score.py

-        for i, _ in enumerate(weights, start=1)
-    )
+    # Calculates the modified precision *p_n* for each order of ngram.
+    p_ns = [] 


list comprehension would look much better here.

alvations · 2015-12-22T10:14:55Z

@dimazest Thanks for the suggestions! Changes made in #1238

alvations added 9 commits December 8, 2015 02:00

Merge pull request #41 from nltk/develop

35e8a91

Syncing with bleeding edge develop branch

Merge pull request #42 from nltk/develop

575700d

Syncing with bleeding edge develop branch

enable corpus-level bleu

927f5d3

fix typo in comment for BP

1bc69e3

Replace nltk.compat.Counter with python >2.7 collections.Counter

b0deea9

correct the imports for sentence_bleu as nltk.translate.bleu

27d5d54

added parameter docstring for corpus_bleu

6ba02e8

fix typo in docstring

8000
4642fba

Added corpus-level ribes score

aee761d

alvations changed the title ~~Added function to calculate Corpus-level BLEU~~ Added function to calculate Corpus-level BLEU and RIBES Dec 9, 2015

alvations added 8 commits December 9, 2015 23:05

use fsum to preserve floating point precisions.

fdf08f4

sum corpus-level ribes as float instead of list.

d186844

fix typo in param name

4cefcbe

fixed typo in param name.

5fb9a0b

fixed doctest for corpus_ribes

1bd4a64

rounded doctest

6fa6e82

added example of corpus_bleu vs average(sentence_bleu)

8000

ddf332e

fixed minor formatting in comment

f0de655

hoontw self-assigned this Dec 10, 2015

alvations added 2 commits December 13, 2015 17:09

Revision after code review

a01647a

Added return and rtype in docstring

8c8a852

made changes as per review

c7c8dfb

hoontw added a commit that referenced this pull request Dec 22, 2015

Merge pull request #1229 from alvations/develop

1d7966a

Added function to calculate Corpus-level BLEU and RIBES

hoontw merged commit 1d7966a into nltk:develop Dec 22, 2015

dimazest reviewed Dec 22, 2015
View reviewed changes

alvations mentioned this pull request Dec 22, 2015

Improvement to BLEU scores #1238

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added function to calculate Corpus-level BLEU and RIBES #1229

Added function to calculate Corpus-level BLEU and RIBES #1229

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Added function to calculate Corpus-level BLEU and RIBES #1229

Added function to calculate Corpus-level BLEU and RIBES #1229

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!