8000 Trying to download the IWSLT17 test set causes a HTTP Error 404. · Issue #128 · mjpost/sacrebleu · GitHub
[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to download the IWSLT17 test set causes a HTTP Error 404. #128

Closed
seanswyi opened this issue Dec 14, 2020 · 3 comments · Fixed by #133
Closed

Trying to download the IWSLT17 test set causes a HTTP Error 404. #128

seanswyi opened this issue Dec 14, 2020 · 3 comments · Fixed by #133
Assignees

Comments

@seanswyi
Copy link

Hi. I'm trying to run the code sacrebleu -t iwslt17 -l fr-en --echo src > ${INPUT_NAME} and am getting the following error:

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/condaenv/lib/python3.7/site-packages/sacrebleu.py", line 1159, in download_test_set
    with urllib.request.urlopen(dataset) as f, open(tarball, 'wb') as out:
  File "/home/user/anaconda3/envs/condaenv/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/home/user/anaconda3/envs/condaenv/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/home/user/anaconda3/envs/condaenv/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/user/anaconda3/envs/condaenv/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/home/user/anaconda3/envs/condaenv/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/home/user/anaconda3/envs/condaenv/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

I've tried visiting the website https://wit3.fbk.eu/ and tried going to the URL in the SacreBLEU code (

'iwslt17': {
) and have noticed that the website is "under maintenance." Contacting Mauro Cettolo who I believe to be the website maintainer has led me to believe the new link is https://wit3.fbk.eu/2017-01-d.

Is there any way that this may be addressed and the code I initially tried to run will work? Thanks.

@ozancaglayan
Copy link
Collaborator

Not only that, the download links there seem to point to Google Drive instead of wit3.fbk.eu. This is bad, do you have further news on that @mjpost ?

ozancaglayan added a commit that referenced this issue Jan 12, 2021
Use GitHub links for IWSLT test/dev sets. Also, perform an MD5 check
if dataset.py is launched from CLI i.e. 'python sacrebleu/dataset.py'
@mjpost
Copy link
Owner
mjpost commented Jan 12, 2021

Has anyone tested these, e.g., to ensure they have the same checksums?

@ozancaglayan
Copy link
Collaborator

@ozancaglayan ozancaglayan linked a pull request Jan 12, 2021 that will close this issue
@ozancaglayan ozancaglayan self-assigned this Jan 12, 2021
ozancaglayan added a commit that referenced this issue Jan 18, 2021
This commit incorporates several bugfixes and API improvements for the upcoming release

- TER: Correctly handle the --short option (#131)
- sacrebleu: use correct method with sacrelogger
- Update docstrings for bleu methods
- BLEU: Change default value for `floor` smoothing to 0.1 (#129)
- test: fix test case for the floor param changes
- Use 'exp' smoothing for compat.sentence_bleu() (#98)
-  Bleu: add smoothing value to signature (#98)
- dataset: Fix IWSLT links (#128)
- API: move __repr__() to BaseScore
- Change version to 1.5.0, update Changelog
* compat: let raw_corpus_bleu() use the floor default value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0

Yes I did it through the __main__() function of dataset.py (see the PR) which does this for every URL that has an md5 sum.