8000 Added function to calculate Corpus-level BLEU and RIBES by alvations · Pull Request #1229 · nltk/nltk · GitHub
[go: up one dir, main page]

Skip to content

Added function to calculate Corpus-level BLEU and RIBES #1229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Dec 22, 2015

Conversation

alvations
Copy link
Contributor

In the process of adding functions to calculate corpus-level BLEU and RIBES, the following changes were made:

  • Set the default uniform weights for 4grams when calculating BLEU scores.
    • The original paper also set default weights and limits to 4grams, thus the default weights were at at [0.25] * 4.
  • Added function for corpus-level BLEU by calculating mirco-average precision
    • The original BLEU paper (Papineni et al. 2002) was meant to calculate micro-average corpus-level score. So the modified precision and brevity penalty scores does return a float but the sum of numerators and denominators. Simply averaging sentence level BLEU (i.e. marco-average) will not be the same as the BLEU as intended in the paper.
  • Change _modified_precision() to return numerator and denominator instead of a single-sentence BLEU score, i.e. corpus_bleu()
    • sentence_modified_precision() will return the floating point BLEU score for a single sentence.
    • Also, by disentangling the float return from the precision and penalty functions, smoothing on the precision can be easily extended, now that _modified_precision() and _brevity_penalty() returns numerators and denominators. E.g. when someone wants to implement BLEU smoothing from http://acl2014.org/acl2014/W14-33/pdf/W14-3346.pdf.
  • Change the nltk.translate namespaces
    • To preserve current user's code. the tranlsate.__init__.py imports the sentence_bleu as the default nltk.translate.bleu. So from the user's end, BLEU still works the same way and calculating corpus-level bleu will require importing from nltk.translate.bleu_score import corpus_bleu.
  • Added function for corpus-level RIBES score, i.e. corpus_ribes()
    • Different from BLEU's micro-average precision, RIBES calculates the macro-average precision by averaging the best RIBES score for each pair of hypothesis and its corresponding references. So we can safely sum and divide by len(). See line 307 from RIBES v1.03.1 (http://www.kecl.ntt.co.jp/icl/lirg/ribes/)

@alvations alvations changed the title Added function to calculate Corpus-level BLEU Added function to calculate Corpus-level BLEU and RIBES Dec 9, 2015
@hoontw hoontw self-assigned this Dec 10, 2015
@alvations
Copy link
Contributor Author

@hoontw Thanks for the review! The Fraction return type suggestion is great! That simplifies all the silly duck functions I was creating. The revised version of the code looks much cleaner now.

@hoontw
Copy link
Contributor
hoontw commented Dec 18, 2015

Looks much better than before! Just a few more minor tweaks and it is good to go.

@alvations
Copy link
Contributor Author

@hoontw sorry for the late commits, I was moving around and at last I've reached where I'm supposed to be =)

@hoontw
Copy link
Contributor
hoontw commented Dec 22, 2015

Thanks @alvations for the changes. It's great to have corpus-level metrics for MT.

hoontw added a commit that referenced this pull request Dec 22, 2015
Added function to calculate Corpus-level BLEU and RIBES
@hoontw hoontw merged commit 1d7966a into nltk:develop Dec 22, 2015
@alvations
Copy link
Contributor Author

Now we can do some tuning using the corpus-level metrics, possibly something like https://github.com/alvations/mosesdecoder/blob/master/scripts/training/propy/pro.py.

from nltk.util import ngrams


def bleu(references, hypothesis, weights):
def sentence_bleu(references, hypothesis, weights=[0.25, 0.25, 0.25, 0.25]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better set weights to a tuple, not a list, to avoid mutable default parameters.

for i, _ in enumerate(weights, start=1)
)
# Calculates the modified precision *p_n* for each order of ngram.
p_ns = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list comprehension would look much better here.

@alvations
Copy link
Contributor Author

@dimazest Thanks for the suggestions! Changes made in #1238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0