Frankenstein- a Platform Enabling Reuse of Question Answering Components
Author(s): Kuldeep Singh, Andreas Both, Arun Sethupat Radhakrishna, Saeedeh Shekarpour
Full text: submitted version
Abstract: Recently remarkable trials of the question answering (QA) community yielded in developing core components accomplishing QA tasks. However, implementing a QA system still was costly.
While aiming at providing an efficient way for the collaborative development of QA systems, the Frankenstein framework was developed that allows dynamic composition of question answering pipelines based on the input question.
In this paper, we are providing a full range of reusable components as independent modules of Frankenstein populating the ecosystem leading to the option of creating many different components and QA systems. Just by using the components described here, 380 different QA systems can be created offering the QA community many new insights.
Additionally, we are providing resources which support the performance analyses of QA tasks, QA components and complete QA systems.
Hence, Frankenstein is dedicated to improve the efficiency within the research process w.r.t. QA.
Keywords: Question Answering; Reusability; Integration; Annotation Model; Evaluation; Model
Review 1 (by Tommaso Pasini)
The paper is about the integration of already available resources into the FRANKENSTEIN framework. I have few concerns about the content: 1)At page 3 you claim that, differently by QALL-ME that has "configuration difficulties" your system solved this problem. But at page 9 you say that in future you will provide a simple configuration that directs the SPARQL results to GERBIL. So now is it not simple? Is it not present at all? This is not clear to me. 2)While for other Entity Linkers you say what is their output you should also say for completeness that BabelFy output is a BabelNet synset. 3)Why BabelFy is not included in the Components for Class Linking. 4) It is not clear to me how the alignment to QA component annotations is performed. Is that an automatic process? Is it error prone? If it is not, how the error is evaluated? Please explain it better. 5) Does each new module has to have a REST service that output a json in the required format in order to be integrated in the framework? 6) Why didn't you develop a user interface to arrange component together instead of Bash scripts? 7) At page 10-11 you wrote kind of README that should not be in the paper. 8) Finally, could you discuss about the implications of having each step of the pipeline as a web service? I mean, I expect the time performance to be very poor when we want the system to answer many questions. Do you have benchmark of this kind? === POST REBUTTAL === I have read and appreciate the authors' remarks. They have been useful and I'm happy to rise my score to one.
Review 2 (by Serena Villata)
*** I thank the author for their rebuttal, the review is unchanged but they answered my main questions. *** The paper presents a platform called Frankenstein for the reuse of the different components to be employed in a Question-Answering system. The system can be of valuable support to researchers in the NLP and Semantic Web domains as it provides not only a complete set of components for "constructing" the QA pipeline, but also the textual resources (annotated corpora) to evaluate the performance of such systems. The paper is well written and structured. The authors followed precisely the guidelines for the submissions of the Resource Track easing the work of reviewers in assessing the different evaluation criteria. The main drawback of the paper is (1) a lack of novelty, and (2) a lack of proper comparison with the related literature. More details below. Typo: - page 2: tools to BE easily integrated and reused in the Frankenstein framework. --- Potential impact - Does the resource break new ground? No, I cannot say it breaks new ground. However, this is the unique software framework of this kind for QA. - Does the resource plug an important gap? No, I cannot say it either, even if it surely provides a useful and impactful resource. - How does the resource advance the state of the art? No system of this kind for QA has been proposed yet. In a way, it collects the different approaches and allows users to combine them as they prefer. - Has the resource been compared to other existing resources (if any) of similar scope? Yes, it has been compared with QA state-of-the-art systems. However, this comparison should be improved to properly highlight from the technical point of view the pros and cons with respect to the present contribution. - Is the resource of interest to the Semantic Web community? Yes, definitely. QA is one of the main tasks addressed in the Semantic Web community exploiting NLP methods. - Is the resource of interest to society in general? Yes, it is of interest for NLP researchers. - Will the resource have an impact, especially in supporting the adoption of Semantic Web technologies? Yes. - Is the resource relevant and sufficiently general, does it measure some significant aspect? Yes, it is highly general. This is one of the main advantages of this resource. --- Reusability - Is there evidence of usage by a wider community beyond the resource creators or their project? Alternatively, what is the resource’s potential for being (re)used; for example, based on the activity volume on discussion forums, mailing list, issue tracker, support portal, etc? Yes. - Is the resource easy to (re)use? For example, does it have good quality documentation? Are there tutorials availability? etc. Yes, actually the goal of the resource is reuse of the different modules and testing. - Is the resource general enough to be applied in a wider set of scenarios, not just for the originally designed use? Yes. - Is there potential for extensibility to meet future requirements? Yes. - Does the resource clearly explain how others use the data and software? Yes. - Does the resource description clearly state what the resource can and cannot do, and the rationale for the exclusion of some functionality? Yes, the scope is clear. --- Design & Technical quality - Does the design of the resource follow resource specific best practices? Yes - Did the authors perform an appropriate re-use or extension of suitable high-quality resources? For example, in the case of ontologies, authors might extend upper ontologies and/or reuse ontology design patterns. Yes - Is the resource suitable to solve the task at hand? Yes, definitely - Does the resource provide an appropriate description (both human and machine readable), thus encouraging the adoption of FAIR principles? Is there a schema diagram? For datasets, is the description available in terms of VoID/DCAT/DublinCore? The description of the resource is appropriate. - If the resource proposes performance metrics, are such metrics sufficiently broad and relevant? Yes. - If the resource is a comparative analysis or replication study, was the coverage of systems reasonable, or were any obvious choices missing? Absolutely reasonable. -- Availability - Is the resource (and related results) published at a persistent URI (PURL, DOI, w3id)? Yes. - Does the resource provide a license specification? (See creativecommons.org, opensource.org for more information) Yes, GPL 3.0. - How is the resource publicly available? For example as API, Linked Open Data, Download, Open Code Repository. GitHub - Is the resource publicly findable? Is it registered in (community) registries (e.g. Linked Open Vocabularies, BioPortal, or DataHub)? Is it registered in generic repositories such as FigShare, Zenodo or GitHub? GitHub - Is there a sustainability plan specified for the resource? Is there a plan for the maintenance of the resource? Yes, the resource is currently maintained by WDAqua project and then transferred to AskNow. - Does it use open standards, when applicable, or have good reason not to? Yes, it uses open standards.
Review 3 (by Simon Walk)
The authors present a set of independent reusable components of Frankenstein to augment and support QA research and improve the interoperability of QA results. The paper is riddled with grammar mistakes, which makes reading the paper a bit of a chore, but the general idea and presentation of the different modules of Frankenstein is well done. Personally, I liked the paper and I think it will be well received at ESWC and in the Semantic Web Community in general. One thing that could be further improved in the paper is a little more information about how the presented resource is different and how it extends all of the previously published contributions of the authors. I know that this was partly done, but given that Frankenstein builds upon the results of quite a few previous papers, I think this is rather important. Overall, I recommend to weak accept the paper. I would like to thank the authors for their answers, which explain - in more detail - how the presented modules for Frankenstein extends previous work. I will keep my initial overall score.
Review 4 (by Elena Cabrio)
The paper "Frankenstein: a Platform Enabling Reuse of Question Answering Components" describes a platform that allows to plug-in components of existing question answering systems as independent modules, to allow for their evaluation, reuse and adaptability to different domains. The topic of the paper is interesting, and goes in the (wise) direction of limiting the proliferation of independent QA systems, towards the convergence of a unique and modular platform, allowing for software and components reuse. The paper presents a solid work, where all the classical and most frequents modules of a QA pipeline are considered and analyzed. My main concern with this paper is on the amount of novelty w.r.t. the papers of the same (or part of) the authors on the same topic. In particular, given that the WWW paper of the same authors is not available yet, the differences with this paper (and therefore the new contributions) should be better clarified since the beginning. Questions and remarks: - Among all the existing QA systems, according to which criteria were selected the ones you incorporated into your platform? - [Page 10] "we developed individual benchmarks out of them for each separate QA task": how exactly was this carried out? are the out-coming datasets available in the platform to allow for component testing? - [Page 12] Why creating a new platform, and not extending the GERBIL framework since the beginning? (some authors of the paper are contributors of both platforms) The paper requires proofreading, there are some typos e.g.: - [Page 2] to easily integrated -> to be easily integrated - [Page 9] briefly describe -> briefly described - [Page 10] user can -> users can - [Page 11] WDAqua -> wrong word splitting - [Page 12] It providing -> it provides I acknowledge that I read the rebuttal, and I thank the authors to have addressed my main concern specifying how this paper extends previous work. I will therefore change my initial overall score to weak accept.