EARL- Joint Entity and Relation Linking for Question Answering over Knowledge Graphs
Author(s): Mohnish Dubey, Debayan Banerjee, Debanjan Chaudhuri, Jens Lehmann
Full text: submitted version
Abstract: In order to answer natural language questions over knowledge graphs, most processing pipelines involve entity and relation linking. Traditionally, entity linking and relation linking has been performed either as dependent sequential tasks or independent parallel tasks. In this paper, we propose a framework, called EARL, which performs entity linking and relation linking as a joint single task. We model the linking task as an instance of the Generalised Travelling Salesman problem (GTSP). EARL uses a graph connection based solution to the problem. The system determines the best semantic connection between all keywords of
the question by referring to the knowledge graph. This is achieved by exploiting the connection density between entity candidates and relation candidates. The Connection-Density based solution performs at par with GTSP solution and approximate GTSP solution. We have empirically evaluated the framework on a dataset with 5000 questions. Our system surpasses state-of-the-art scores for entity linking task by reporting an accuracy of 0.65 to 0.40 from the next best entity linker.
Keywords: Entity Linking; Relation Linking; Generalised Travelling Salesman Problem; Question Answering
Review 1 (by anonymous reviewer)
(RELEVANCE TO ESWC) The authors present a system for question answering by exploiting a knowledge graph and semantic web techniques, as such, it seems very relevant to ESWC. (NOVELTY OF THE PROPOSED SOLUTION) The main novelty of the system presented by the authors is the approach to joint entity and relation linking to a knowledge graph, based on an adapted version of the GTSP. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The proposed system is explained in details and thoroughly evaluated. The only aspect that could make it more complete would be the effect of the proposed entity and relation linking system to the question answering, as such it is not obvious why this system would be better for QA rather than any other NLP task. (EVALUATION OF THE STATE-OF-THE-ART) The state-of-the-art of the entity and relation linking task is well presented in the Related work section, along with a comparison table. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The authors define the problem and present their solution with an architecture diagram, example figures and an algorithm of the features used. They compare 3 variations of their system, using a GTSP solver, an approximate solver, and connection density, and evaluate various aspects of the system. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) A github page with the code, data, models and experimental setup is provided. (OVERALL SCORE) The paper presents a system for entity and relation linking in the context question answering. The authors compare “Generalized traveling salesman problem” solvers and their approach in terms of accuracy and time complexity. They show that their connection density approach can produce similar results of GTSP solvers with a better runtime performance. The system is well explained, the methodology is easy to follow due to the examples and figures provided. For this reason I recommend the acceptance of this paper as long as the issues with Figure 5 and Experiment 3 are addressed. Strong points: -The problem formulation is clearly explained, as well as the proposed solution -The authors evaluate the solution on an openly available dataset, which they augment with annotations to evaluate entity and relation linking. - Various aspects of the system are evaluated and compared to state of the art systems. Weak points: -No study of the effect of improved entity and relation linking in retrieving the correct answers -Figure 5 is unnecessary: the X axis has no meaning (folds are not continuous) and it would be enough to present the average MRR of each method on 5 folds. -Experiment 3 has no context. It is not clear why you couldn’t compare with the systems mentioned in Section 2 (BOA and PATTY in this case) or even to a baseline approach. Questions to authors -Why is there almost no discussion about Experiment 3? How good is an accuracy of 0.85? -Likewise, you manually annotated the dataset, but you don’t provide any details of this annotation process. How many entity and relations labels? How many people annotated? Most of your results are based on this dataset so it would be useful to have these details -Could your system be used by a full QA system to generate RDF queries?
Review 2 (by Chenyan Xiong)
(RELEVANCE TO ESWC) Knowlege graph question answering is a common task for semantic web and a popular application of knowledge graphs. (NOVELTY OF THE PROPOSED SOLUTION) The joint linking of entity and relation is almost a standard approach now on state-of-the-art knowledge graph question answering systems. Nevertheless, the connectivity features are novel and should be able to improve the STOA systems as well. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The connectivity features are intuitive and straightforward. (EVALUATION OF THE STATE-OF-THE-ART) Several state-of-the-art KGQA systems are also jointly grounding entities and relations, for example, in AQQU and STAGG. This paper should include these methods in baselines or provide a more convincing argument that it does not have to. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The influence of joint entity-relation linking in the entity part is well demonstrated. However, the evaluation of the relation linking part is missing. As a joint learning method for KGQA task, the relation linking is at least as important as the entity side. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The connectivity features should be easily implemented and adopted by many KGQA systems. (OVERALL SCORE) This paper presents an entity-relation joint linking method for knowledge graph question answering (KGQA). Instead of linking entities and then relations separately, this paper conducts the two steps jointly by finding the set of entities and relations that best matches the question. The authors first formalize the joint linking as a Generalized Traveling Salesman Problem (GTSP) and then argue that it is not a good fit for KGQA. They then developed the EARL algorithms that basically ranks the candidate set by the connectivity of its entities and relations. Experiments are conducted to evaluate the linking accuracy. Strong Points: The motivation of joint learning in KGQA is strong and is the trend KGQA. The connectivity features are intuitive and effective, while also being novel as a consistency signal compared to other semantic parsing system. The improvement on the entity linking is good. Weakness and Concerns: There are many concerns for various aspects of the current paper. The first one is about the differences with other KGQA systems which also do joint learning. For example, in AQQU [“more accurate question answering over Freebase”] and STAGG , the setup is very similar to that in this paper. All three (AQQU, STAGG, and this paper) first get a set of candidate triples and then re-rank them using signals from entities and relation matches with the question. There are multiple candidate entities kept for each spot. Both the entity linking scores and the relation matching scores are used in the ranking model of AQQU and STAGG: They also jointly link entity and relations. In my opinion, the novelty of this paper is in the explicit modeling of the candidate set consistency/connectivity. However, this paper failed to address these similarities with prior research and only compared with relatively weaker baselines. The stated contributions are not well-supported. The evaluation of the relation linking accuracy is rather limited. Other than an overall accuracy score, nothing else is provided. The goal of joint linking in KGQA is to improve the final performance of the system; relation linking accuracy is a crucial part of that. Only the entity linking comparison (even with the joint-linking results of AQQU or STAGG) is not sufficient to demonstrate the advantage of joint linking in KGQA. The writing of this paper needs major improvements. The description of the proposed method, which is rather straightforward, is not clear. Much guessing is required to understanding what the proposed approach does. The related work and the evaluation sections are also hard to read. It is unclear what the connection to GTSP problem contributes to this paper. The eventual approach, EARL, is rather a specific KGQA solution rather than not a solution of GTSP. A positive contribution might be serve as a basic baseline, which is not clear whether worth the space and added complexity in this paper. I recommend to reject this paper. My personal suggestion would be reshaping this paper, making it focusing on the connectivity signals, and conducting more comprehensive experiments to demonstrate the effectiveness of these novel features in more state-of-the-art KGQA systems.
Review 3 (by Haofen Wang)
(RELEVANCE TO ESWC) Entity linking and relation linking are important for KBQA, which are relevant topics to ESWC. (NOVELTY OF THE PROPOSED SOLUTION) EARL performs entity linking and relation linking as a joint single task, and use a graph connection based solution to the problem. Specifically, they model the task as an instance of the Generalized Traveling Salesman Problem (GTSP) and use GTSP approximate algorithm solutions, which is close to an exact solution of the GTSP in terms of accuracy while being significantly more efficient. The proposed solution has some novelty. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) This paper proposes EARL, which performs entity linking and relation linking as a joint single task. The proposed solution is correct and complete. (EVALUATION OF THE STATE-OF-THE-ART) In this paper, EARL is compared with AGDISTIS in the disambiguation task and DBpedia Spotlight as well as FOX + AGDISTIS in the entity linking evaluation experiment. However, In Experiment3, there are no other methods comparing with EARL. Without comparison experiments, an accuracy of 0.85 cannot provide much meaningful message. More experiments and results are needed here. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) EARL performs entity linking and relation linking in a unified way and analyzes the subdivision graph of the knowledge graph fragment containing the candidates for relevant entities and relations. Therefore, EARL can handle the cases where only entities or relations are recognized in the utterance. Furthermore, EARL models the entity and relation linking as an instance of the Generalized Traveling Salesman Problem (GTSP), which is a novel idea. In addition, the GTSP approximate algorithm solutions they use is close to an exact solution of the GTSP in terms of accuracy while being significantly more efficient. The complexity of the approximate algorithm is discussed in detail. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) EARL now is open source on github, so it’s easy to reproduce the experimental study. (OVERALL SCORE) Summary: This paper proposes a framework called EARL. EARL performs entity linking and relation linking as a joint single task, and uses a graph connection based solution to the problem. EARL improves the state-of-the-art performance for entity linking from 40% to 65% accuracy. In addition, a fully annotated LC-QuAD dataset is published in this paper. Strong Points 1. EARL models entity linking and relation linking as a joint single task. They model the task as an instance of the Generalized Travelling Salesman Problem (GTSP) and use GTSP approximate algorithm solutions, which is close to an exact solution of the GTSP in terms of accuracy while being significantly more efficient. 2. EARL improves the state-of-the-art performance for entity linking from 40% to 65% accuracy. 3. A fully annotated LC-QuAD dataset is published in this paper. 4. EARL performs entity linking and relation linking in a unified way and analyzes the subdivision graph of the knowledge graph fragment containing the candidates for relevant entities and relations. Therefore, EARL can handle the cases where only entities or relations are recognized in the utterance. Weak Points 1. Comparison experiments are not sufficient to show the effectiveness of EARL. For example, In Experiment 3, there are no other methods comparing with EARL. Without such comparison experiments, an accuracy of 0.85 cannot convince me the advance of EARL. 2. There are several parts of contents need to improve clarity. For example, in section 4.1 E/R Prediction, it’s not very clear to understand the training process and the network architecture. In section 4.3, why the hop count of U1 is 3 and the connection count of U1 is 2? What is the relationship between Fig.4(d) and Fig.(a)(b)(c)? 3. Lack of analysis compared with related work. Questions to the Authors (QAs) 1. In section 4.1 E/R Prediction, it’s not clearly introduced the training process. What is the meaning of “The network is trained using labels for resources, ontology and properties in the knowledge graph”? 2. In section 4.2, top candidates for each keyword are retrieved from elasticsearch, it’s not clear how the candidates for each keyword are ranked. 3. In section 4.2, Wikidata labels are used for entity candidates as an external knowledge base. However, in the experiment sections, Wikidata labels are not used to other comparing methods, is it fair to other methods? 4. In section 4.3, why the hop count of U1 is 3 and the connection count of U1 is 2? What is the relationship between Fig.4(d) and Fig.(a)(b)(c)? 5. In the experiment 2, EARL is compared with AGDISTSTIS and DBpediaSpotlight on two datasets: LC-Quad and QALD-7. Why EARL is not compared with more methods, e.g. deep learning based methods（DSSM based similarity comparison methods）? 6. What are the experiment settings in experiment2 for EARL , AGDISTSTIS and DBpediaSpotlight? How do you tune the parameters in experiment2? 7. Why no comparison experiments in experiment 3? What are the experiment settings in experiment3? 8. In the discussion section, the complexity of EARL is O(N^2L^2). This complexity is just the complexity of the calculation of connection density. Is the complexity still O(N^2L^2) when considering the whole process involved in the EARL pipeline? In addition, EARL can process a question in a few hundred milliseconds on a standard desktop computer on average. Does the time contain all the processing time including keyword extraction, candidate generation and disambiguations? After Rebuttal The rebuttal of the author answered most of the questions, and the authors admitted their weakness in clarity and lack of comparison experiments. The writing of this paper needs improvements in clarity, and more comparison experiments are needed. We keep our review score unchanged.
Metareview by John McCrae
This paper presents the EARL system for entity linking and applications of this to question answering. The work on entity linking is well-presented and the joint learning framework is novel, however the application to question answering is much less clearly defined. The evaluation is seen as particularly lacking and as such it seems that this paper does not sufficiently support its claims. I note the authors promise to expand the evaluation in the rebuttal, however it would not be possible for the reviewers to re-evaluate this and as such I would recommend that the authors carry out this extended evaluation and resubmit