Paper 34 (Research track)

Formal Query Generation for Question Answering over Knowledge Bases

Author(s): Hamid Zafar, Giulio Napolitano, Jens Lehmann

Full text: submitted version

camera ready version

Decision: accept

Abstract: Question answering (QA) systems often consist of several components such as Named Entity Disambiguation (NED), Relation Extraction (RE), and Query Generation (QG). In this paper, we focus on the QG process of a QA pipeline on a large-scale Knowledge Base (KB), with noisy annotations and complex sentence structures. We therefore propose SQG, a SPARQL Query Generator with modular architecture, enabling easy integration with other components for the construction of a fully functional QA pipeline. SQG can be used on large open-domain KBs and handle noisy inputs by discovering a minimal subgraph based on uncertain inputs, that it receives from the NED and RE components. This ability allows SQG to consider a set of candidate entities/relations, as opposed to the most probable ones, which leads to a significant boost in the performance of the QG component. The captured subgraph covers multiple candidate walks, which correspond to SPARQL queries. To enhance the accuracy, we present a ranking model based on Tree-LSTM that takes into account the syntactical structure of the question and the tree representation of the candidate queries to find the one representing the correct intention behind the question.

Keywords: Knowledge Bases; Question Answering; Query Generation

Review 1 (by anonymous reviewer)

 

(RELEVANCE TO ESWC) The paper deals with SPARQL query generation over a knowledge graph, and thus is highly relevant to ESWC community.
(NOVELTY OF THE PROPOSED SOLUTION) The approach proposed in the paper is novel as far as I am aware (using the similarity of the graph structure in the candidate SPARQL query with the utterance dependency pattern).
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) I am not convinced if the intuition behind the solution is valid (details later). While some measure utterance-query similarity guides all query ranking methods, why the dependency pattern in the utterance should match the KB-based query structure is not clear to me.
(EVALUATION OF THE STATE-OF-THE-ART) The paper has weak baselines. The coverage of related work is really narrow and confined to the Semantic Web community. In-house comparison systems like EARL are used, which are yet to be peer-reviewed.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The method is explained well and evaluation explores many aspects, yet misses out on key expectations like necessity of TreeLSTM vis-a-vis simpler models.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The method is fairly reproducible in my opinion.
(OVERALL SCORE) ** Summary of the Paper **
The paper tries to decouple the formal SPARQL query generation component in a KB-QA system, and proposes a novel method for this task that leverages similarity of utterance dependency structures and graph patterns in the candidate SPARQL queries. Evaluation on the recent LC-QuAD dataset shows reasonable performance of the proposed approach.
** Short description of the problem tackled in the paper, main contributions, and results **
- Generating a formal query that can be executed over the KB from utterance string-to-entity mappings
- A novel method for the above problem, based on graph walks and tree similarity LSTMs
- Satisfactory results on the LC-QuAD dataset
** Strong Points (SPs) **
- The problem tackled is relevant: the perspective of decoupling and improving individual components of KB-QA systems is a timely problem to work on.
- The method based on walks in a localized graph neighborhood is interesting, and novel as far as I know 
- The evaluation explores several aspects, and the paper is generally clear and well-written
** Weak Points (WPs) ** 
- The intuition is not clear as to why the KB-based relationship structure in the SPARQL query should be structurally similar to the dependency graph pattern of the utterance, which is guided by English grammar. This aspect could have been brought out by an evaluation of TreeLSTMs vs. normal LSTMs or RNNs, but that is missing.
- Baselines are weak. I would have expected an attempt at comparison with Abujabal et al. (WWW 2017), which tries to formulate a backbone query under similar situations, and the mapping is established via ILP. The paper tries to treat query generation as an independent block, but given that this is the core of KB-QA systems, a complete decoupling is difficult, necessitating stronger end-to-end system performance comparison.
- Related work coverage is really narrow - and looks only at the semantic web community. This is highly detrimental in my opinion, given that semantic parsing has been a very active research area in the NLP and Web IR communities - there is no reference to the series of works by Percy Liang's group, Luke Zettlemoyer's group, and Scott Yih's works also require deeper discussion:
Semantic parsing on freebase from question-answer pairs. J Berant, A Chou, R Frostig, P Liang
EMNLP 2015.
Paraphrase-driven learning for open question answering. A Fader, L Zettlemoyer, O Etzioni. ACL 2013.
More accurate question answering on freebase. H Bast, E Haussmann. CIKM 2013.
While most of these works deal with Freebase, several of them involve query generation and ranking (Bast and Haussmann use learning-to-rank). Lack of concrete discussion on semantic parsing makes the paper weak. I would strongly recommend including 3-4 references from the NLP and IR communities, and not just survey papers, which tend to ignore several of these impactful works too. The title should also be made more specific.
- According to the response, the input comprises of "interesting utterances" with entity and relation annotations. What is "interesting" here is not clear to me. Maybe just better to mention "utterances".
- While a walk does allow traversing the same edge more than once, I think it should be made explicit in the authors' model. Additionally, some examples where this is needed may be provided. For e.g., which Brazilian footballers played both for Barcelona and Real Madrid?, along with the specific walk applicable here.

 

Review 2 (by anonymous reviewer)

 

(RELEVANCE TO ESWC) The paper presents a framework for query generation to support question answering over knowledge graphs. As such the work is highly relevant to ESWC.
The authors argue for a strictly modular approach in question answering pipelines, where the query generation (i.e. translation of natural language query strings to SPARQL) has a well-defined role. The specific task is formally well described.
(NOVELTY OF THE PROPOSED SOLUTION) The ideas are not entirely novel overall, there is a vast amount of related work on question answering generally and also query generation specifically. The authors are aware of this state-of-the-art and combine it to a coherent piece of work.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The idea of the proposed approach mainly revolves around finding and ranking walks in a graph that relate target entities, relations and answer nodes in the knowledge graph. As such, the approach is rather light-weight when it comes to the analysing the semantics of the query (linguistically) and the representation of the query (logically). Still, the presented approach is correct and complete.
(EVALUATION OF THE STATE-OF-THE-ART) The proposed approach has been extensively evaluated against selected baselines. The evaluation is methodologically sound and detailed in the results. The results clearly show an advantage over the selected baselines. However, the selection of the baselines may be questionable. (see below)
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Overall, the properties of the proposed approach are well described and explained, both formally and through the evaluations.
The expressiveness of the supported queries could be made more explicit.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) It may be true that for the tasks as defined by the authors, these are the only comparable baselines.
However, if one does look at narrower tasks (graph exploration for query generation, or ranking models) or broader tasks End-to-End Question Answering, the scope may be quite different. For example, there are other related systems for query generation, using  transformation rules based on sentence grammars or query templates and slot filling that claim to be the state-of-the-art. Unfortunately, these are evaluated using different benchmarks, albeit for very similar tasks.
(OVERALL SCORE) The paper presents a framework for query generation to support question answering over knowledge graphs. As such the work is highly relevant to ESWC.
The authors argue for a strictly modular approach in question answering pipelines, where the query generation (i.e. translation of natural language query strings to SPARQL) has a well-defined role. The specific task is formally well described.
The idea of the proposed approach mainly revolves around finding and ranking walks in a graph that relate target entities, relations and answer nodes in the knowledge graph. As such, the approach is rather light-weight when it comes to the analysing the semantics of the query (linguistically) and the representation of the query (logically). Still, the presented approach is correct and complete. The expressiveness of the supported queries could be made more explicit. 
The ideas are not entirely novel overall, there is a vast amount of related work on question answering generally and also query generation specifically. The authors are aware of this state-of-the-art and combine it to a coherent piece of work.
Overall, the properties of the proposed approach are well described and explained, both formally and through the evaluations.
The expressiveness of the supported queries could be made more explicit. 
The proposed approach has been extensively evaluated against selected baselines. The evaluation is methodologically sound and detailed in the results. The results clearly show an advantage over the selected baselines. However, the selection of the baselines may be questionable.
It may be true that for the tasks as defined by the authors, these are the only comparable baselines.
However, if one does look at narrower tasks (graph exploration for query generation, or ranking models) or broader tasks End-to-End Question Answering, the scope may be quite different. For example, there are other related systems for query generation, using  transformation rules based on sentence grammars or query templates and slot filling that claim to be the state-of-the-art. Unfortunately, these are evaluated using different benchmarks, albeit for very similar tasks.

 

Review 3 (by Michel Dumontier)

 

(RELEVANCE TO ESWC) The paper addresses the problem of query generation (QG) as part of the a query answering (QA) system over a large knowledge graph (KGs). Accurate QG is an important because it would otherwise generate incorrect or inadequate queries over a KG that would result in poor quality answers. QG/QA work is therefore important to the ESWC community because it will enable non-(SPARQL)query approaches to answering questions - here, primarily focusing on natural language as an input.
(NOVELTY OF THE PROPOSED SOLUTION) The authors describe SQG, a SPARQL Query Generator, as a modular query builder for QA pipelines. SQG employs a ranking mechanism of candidate queries based on Tree-LSTM similarity. The solution includes support for a number of features (coping with KBs; identification of question type; managing noisy inputs; support for more complex queries -  aggregation, sort, comparison; and dealing with syntactic ambiguity in the question). Overall the approach to use the Tree-LSTM similarity for this problem appears novel, overcomes the limitations of linear solutions, and yields good results.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The proposed solution is clearly described, appears correct, and is illustrated through examples. The approach is evaluated using the LC-QuAD dataset that consists of 5,000 question-answer pairs of various complexity and types (simple and compound, boolean and count). I assume this this is a standard dataset to evaluate such work in the community.
(EVALUATION OF THE STATE-OF-THE-ART) The authors measure the performance of SQG in terms of precision, recall and F1-measure on a subset of LC-QuAD containing 3,200 questions that corresponds to a previously reported assessment for Sina QB and NLIWOD QB. The reported approach outperforms other systems, but it is unclear whether any kind of parameterization is possible or was explored.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Experiments demonstrate how their approach can generate better alignment between the question and query.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) SQG is implemented using Python/pytorch and the code of SQG is published at https://github.com/AskNowQA/query_generation. The LC-QuAD dataset is also freely available. Performance metrics are clear.
(OVERALL SCORE) A clearly written paper with a novel QG approach with good results that improves the state of the art. Moreover, QG as a independent module has the potential to improve arbitrary QA pipelines (providing they engineer a decoupling), and improving the overall performance of query answering for natural language questions.

 

Metareview by Adrian Paschke

 

The paper presents a framework for query generation to support question answering over knowledge graphs. It is a well written paper, with a well developed methodology and evaluation. Although the following weak points brought up by the reviewers should be addressed in the final camera version:
1. Include the Deep LSTM results mentioned in rebuttal
2. Improve state of the art review - NLP area
3. Add some words on the expressiveness of the supported queries
4. Add a justification for the simpler baselines
5. Clarify the novelty of the work - reviewers are in disagreement over this because the paper is not very clear on it

 

Share on

Leave a Reply

Your email address will not be published. Required fields are marked *