Medi-Test: Generating Tests from Medical Reference Texts
Round 1
Reviewer 1 Report
The authors proposed a method of generating test questions using a structured NLP approach where information is a) taken from known sources, b) organized and processed into expected patterns of relationships, c) items are generated from these templated patterns matched onto an expected content and d) media supplemental information can be appended through washing of original data in Tesseract.
This approach for generating test item seems to follow the development of Ha and Mitkov and requires the generation of known patterns of relationships to produce test items. Conceptually this is a viable method, but I think there is room to improve for this manuscript.
The main criticism I have of this manuscript is that it presents an idea on the approach with few evidence to support its claim. Arguably the information presented here can be replicated but given there are existing studies already pursuing the field of item generation (which in essence is a two-part linked natural language generation application), I would suggest a more detailed description on the pilot application that should include: The unique number of items requested, the number of users, the characteristics of the text required and the number of items produced should be presented and scrutinized.
A second critique I have for this study is the focus on OCR rather than the logic for creating questions. In many instances of the manuscript (background, methods, and images even) it seems the study has focused on text recognition and modification rather than the novel contribution of the knowledge structure created to accept information for generating questions. The complexity of creating knowledge structures for the generation of questions is a focus for me, and it seems the focus of the study described the procedural pieces around that actual production of items.
I think this is a novel contribution and should be presented, but there are lots of information that was described in the current manuscript (item difficulty, actually created items, actual use cases of the items, and limitations of this approach). These topics are needed for readers and myself to evaluate whether this approach presented is a viable method or an idea of implementation. At its current state, I feel more needs to be presented to address the claims made in this paper.
Author Response
Thank you for reviewing our paper. We will briefly describe below the modification we made to our paper to adhere to reviewer’s comments.
The first step in presenting evidence to support our claims was to include an overview of our system at the end of the introduction section. Then, the test generation section (section 6) was completed with a more detailed discussion about the unique number of concepts and instances in the ontology, the number of users test questions generated for each image, details about the characteristics of the text required to generate ontology and tests (i.e. raw, un-annotated medical texts, in doc or pdf format), etc.
Additionally, we checked the image presenting the architecture of our MediTest system and examples for every processing module. Thus, we better explained that the main focus of our system is to generate ontologies from raw textbooks and generate test questions, and that extracting test questions from images using a OCR is only a secondary aim, which increases in the attractiveness and variability of our tests. Since medical ontologies already exists since several decades, the introduction section included a section to detail this aspect. This also helps in better explaining that the novel contribution of this paper consists in putting the accent on the whole automatic process of generating test questions for different medical fields starting only from textbooks.
Also, a discussion on the limitations of our system was added to section 7.
Reviewer 2 Report
The manuscript describes Medi-test, a syste, to generate exams from medical references texts. THe paper is interesting overall, but it needs major changes to 1)meet the formal structure of a scientific report and 2) an enhanced description of the methodology and its limitations.
Major changes needed:
1)Authors should provide more bacground. HOw the literature is facing the problem of generating medical exams from scientific/medical texts? Authors mention some examples, but they should include the approach of these and position their proposal with respect to them.
2) I am affraid that the heuristic block in Figure 1 has to do with all the tasks that cannot be automatized. Authors should describe the whole process of test generation in detail. Which parts are manual and which parts are automatic?
3) Why authors are not using clinical reference ontologies? The manuscript states that several ontologies can be used and compared. Can you give examples?
4) Does the tests need to be supervised by experts? Did authors check with clinical experts the understandability/validation of samples of tests in different combinations of input parameters? Abstract states this, but the manuscript does not have a reference to it.
5) In discussion ( missing now) or conclusion, authors should compare their approach with previous examples ( those stated in the background. Which are the benefits of Medi-test? which are its limitations?
Minor comments:
1) Please do not replicate the affiliation
2)The text between author's contribution and conflict of interest should be updated and disclosed.
3) English writting should be revised.
Author Response
Thank you for reviewing our paper. We will briefly describe below the modification we made to our paper to adhere to reviewer’s comments.
The first step in presenting evidence to support our claims was to include further details in the introductory section to better describe the state of the art in automatically generating test questions and medical ontology, as well as an overview of our system at the end of the section.
Additionally, we checked the image presenting the architecture of our MediTest system and examples for every processing module. Thus, we better explained that the main focus of our system is to generate ontologies from raw textbooks and generate test questions, and that extracting test questions from images using a OCR is only a secondary aim, which increases in the attractiveness and variability of our tests.
All processes in figure 1 are automatically performed, except for the visual inspection of the ontology in Protege, which is however only used in order to check the ontology. However, this is not a mandatory step in generating test questions, since, as better explained in section 6, we only used a small set of concepts and relations from the generated ontologies, in order to force the system only generate questions which have a minimal possible error rate. Then, the test generation section (section 6) was completed with a more detailed discussion about the unique number of concepts and instances in the ontology, the number of users test questions generated for each image, details about the characteristics of the text required to generate ontology and tests (i.e. raw, un-annotated medical texts, in doc or pdf format), etc. Also, a discussion on the limitations of our system was added to section 7.
In the conclusion section, our approach was compared with previous examples to highlight the benefits of Medi-test.
We also performed all minor changes suggested (i.e deleted the replication in the affiliation, disclosed contribution, revisited English).
Round 2
Reviewer 2 Report
Authors have improved the paper.