QBLink is a dataset for sequential question answering where the questioner asks multiple related questions about the same concept one-by-one. After each question, the answerer provides an answer before the next question is asked. The dataset is designed to evaluate the ability of question-answering systems to leveragea the additional context in the form of a history of previously asked questions and answers.
The dataset consists of 18,644 sequences (56,000 question-answer pairs). Each sequence starts with a lead-in that defines the topic of the questions, followed up three questions and answers. Two examples are provided below.
Lead-in: Only twentyone million units in this system will ever be created.
Question 1: Name this digital payment system whose transactions are recorded on a “block chain”.
Answer: Bitcoin
Question 2: Bitcoin was invented by this person, who, according to a dubious Newsweek cover story, is a 64 year old JapaneseAmerican man who lives in California.
Answer: Satoshi Nakamoto
Question 3: This online drugs marketplace, Chris Borglum’s onetime favorite, used bitcoins to conduct all of its transactions. It was started in 2011 by Ross Ulbricht using the pseudonym Dread Pirate Roberts.
Answer: Silk Road
Lead-in: He signed the GoldwaterNichols Act after the Packard Commission investigated the Department of Defense.
Question 1: Name this Republican president who firmly advocated supplyside economics. He was an actor and a governor of California before becoming president.
Answer: Ronald Wilson Reagan
Question 2: In March of 1981, this man shot President Reagan in order to impress the actress Jodie Foster. His acquittal due to insanity angered many Americans and led to the Insanity Defense Reform Act of 1984.
Answer: John Warnock Hinckley, Jr.
Question 3: Codenamed Operation Urgent Fury, the invasion of this Caribbean island was launched by Reagan in order to protect American students and topple a Communist government set up after the deposition of Maurice Bishop.
Answer: Invasion of Grenada
The dataset is pre-partitioned into training, development and testing subsets.
Training: QBLink-train.json
Development: QBLink-dev.json
Testing: QBLink-test.json
Each file is an array of sequences. Each sequence has the following fields:
id: Sequence id.
tournament: The Quiz Bowl tournament the sequence appeared in.
lead-in: The lead-in sentence of the sequence (the sentence that introduce the sequence of questions as in the two examples above).
category: Category of the sequence in (History, Literature, Philosophy, .. etc).
sub-category: Sub-category of the sequence.
question1, question2, question3: First, second and third question of the sequence. Each contains
question_text: Question text.
raw_answer: Answer text.
wiki_page: The wikipedia page that corresponds to the answer. For example, if the answer is 'The Magic Flute', the corresponding wikipedia page is https://en.wikipedia.org/wiki/The_Magic_Flute. For short, we set wiki_page to 'The_Magic_Flute'.
Here is a JSON representation of an example sequence
{
"id": 1,
"tournament": "2014 PACE NSC",
"lead_in": "The speaker of this poem declares \"I miss Europe with its ancient parapets!\"
before describing \"sidereal archipelagos\" and \"islands whose delirious skies
are open to the sea-wanderer\". For 10 points each:",
"category": "Literature",
"sub_category": "Literature European",
"q1": {
"quetsion_text": "Name this poem which opens with its title object relating the murder
of its crew, after which it runs into the \"furious lashing of the
tides\".",
"raw_answer": "The Drunken Boat [or Le Bateau Ivre]",
"wiki_page": "Random_walk"
},
"q2": {
"quetsion_text": "The Drunken Boat was written by this French poet of A Season in Hell.
He engaged in a torrid affair with Paul Verlaine and abandoned poetry
by age 20.",
"raw_answer": "Arthur Rimbaud [or Jean Nicolas Arthur Rimbaud]",
"wiki_page": "Arthur_Rimbaud"
},
"q3": {
"quetsion_text": "Rimbaud wrote of a \"sublime Trumpet full of strange piercing
sounds\" and the \"divine shudderings of viridian seas\" in a poem
assigning colors to these entities. Georges Perec's novel A Void only
includes four of them.",
"raw_answer": "the vowels [accept A, E, I, O, and U in any order]",
"wiki_page": "Vowel"
}
}
Ahmed Elgohary <elgohary@cs.umd.edu>, Chen Zhao <chenz@cs.umd.edu>, and Jordan Boyd-Graber <jbg@cs.umd.edu>
EMNLP'18 Paper Bibtex:
@inproceedings{Elgohary:Zhao:Boyd-Graber-2018,
Title = {Dataset and Baselines for Sequential Open-Domain Question Answering},
Author = {Ahmed Elgohary and Chen Zhao and Jordan Boyd-Graber},
Booktitle = {Empirical Methods in Natural Language Processing},
Year = {2018},
Location = {Brussels, Belgium}
}