Google Scholar

A repository of conversational datasets

M Henderson, P Budzianowski, I Casanueva… - arXiv preprint arXiv …, 2019 - arxiv.org

M Henderson, P Budzianowski, I Casanueva, S Coope, D Gerz, G Kumar, N Mrkšić…

arXiv preprint arXiv:1904.06472, 2019•arxiv.org

Progress in Machine Learning is often driven by the availability of large datasets, and
consistent evaluation metrics for comparing modeling approaches. To this end, we present a
repository of conversational datasets consisting of hundreds of millions of examples, and a
standardised evaluation procedure for conversational response selection models using'1-of-
100 accuracy'. The repository contains scripts that allow researchers to reproduce the
standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We …

Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.

arxiv.org

Show moreShow less

Save Cite Cited by 110 Related articles All 4 versions View as HTML

Cite

Advanced search

Saved to My library

A repository of conversational datasets