# Paper 197 (Research track)

Author(s): Hawre Hosseini, Tam T. Nguyen, Ebrahim Bagheri

Full text: submitted version

Keywords: Entity linking; Semantic retrieval; DBpedia; Knowledge graph

Decision: reject

Review 1 (by anonymous reviewer)

(RELEVANCE TO ESWC) The topic is quite relevant to ESWC since entity linking and its related tasks are very important for Semantic Web community.
(NOVELTY OF THE PROPOSED SOLUTION) In this paper, an incremental solution is presented for an already established problem.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The proposed solution aims to improve the existing work [15] in terms of involving online user-generated content and proposing a Markov Random Field retrieval framework with neural embedding-based features.
(EVALUATION OF THE STATE-OF-THE-ART) The proposed solution has been compared with the state-of-the-art [15], however, the results are not convincing (see Questions to the Authors below).
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) In general, the properties of the proposed approach has been appropriately demonstrated and discussed.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experiments have been conducted based on the datasets used by existing work [15]. However, for the proposed method, more experiments would be needed to show that the incorporation of online user-generated content, i.e., reviews in this work, is applicable for other domains, e.g., when there are no reviews available.
(OVERALL SCORE) Summary of the Paper
This paper deals with the problem of implicit entity linking, introduced earlier by Perera et al. [15],  which aims to link entities that are not explicitly mentioned in tweets but are core to the understanding of the tweets to resources in knowledge bases. The main contributions include the online user-generated content as the additional context, a Markov Random Field based method and a experimental comparison with existing work.
Strong Points (SPs)
- The implicit entity linking is a very interesting problem and has not been well studied yet.
- The general idea of formalizing the problem of implicit entity linking as an ad hoc retrieval task is interesting.
Weak Points (WPs)
- The proposed method mainly relies on the online user-generated content, i.e., reviews in this work, which might not available for all the domains. This issue should be evaluated in the experiments if the domains change, while the existing approach [15] doesn't have this restriction.
- The proposed method also heavily relies on entity linking for online user-generated content, which is actually a challenging problem. The impact of the performance of such explicit entity linking system for online user-generated content on implicit entity linking should be shown in the experiments.
- Some steps described in the paper are quite similar to the existing method without explicit references. E.g., the context expansion for candidate selection is similar to the steps of acquiring factual knowledge and contextual knowledge described in Perera et al. [15]. Please make it more clearly which is the contribution of this work and which is adopted from the existing work.
Questions to the Authors (QAs)
- The experimental results show that the proposed method yielded a lower recall for candidate selection, which resulted in better precision and accuracy for entity ranking and linking. This is something that can be expected. Why not use the F-measure to provide the overall performance? Please clarify this issue.
======== AFTER REBUTTAL ===========
Thanks for the rebuttal and answers. Please improve the paper accordingly.

Review 2 (by anonymous reviewer)

(RELEVANCE TO ESWC) Entity linking is of central importance for Text2KB.
(NOVELTY OF THE PROPOSED SOLUTION) The authors propose a neural-based model for implicit entity linking based on ad-hoc retrieval.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The paper is sound and the formal model seems to be correct. However, some of the configurations used are not made explicit.
(EVALUATION OF THE STATE-OF-THE-ART) A comparison with the state of the art is given. The choice of the entity linker is not well motivated.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Nice discussion of the results. The choice of the statistical test must be clarified.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) No code nor link to results.
(OVERALL SCORE) Formalities: The authors seem to have altered the paper template. If this were to be the case, it would be a reason for a reject without review. I'd hence strongly suggest that the authors check the template they are using and ensure it abides by the requirements of the conference. This detail however plays no role in the following evaluation of the technical content of the paper.
Technical assessment: The authors address a relatively new task, i.e., implicit entity linking. The fundamental assumption of the task seems rather hard to quantify. While the authors present the example ‘Then there’s Ethan Hawke and Patricia Arquette, easily the best characters and best performances of the movie’ and claim that it is referring to the movie "Boyhood", one could argue that the movie would have been mentioned explicitly (and not the actors), if the movie were really intended here. It is unclear why the author suggest that the movie and not the director, the type of acting, the type of movie, etc. is the implicit mention here. It could also be the article [1], which talks about the two actors. How does one judge objectively what an implicit mention is, given that it is implicit. Still, the assumptions underlying the paper are well presented.
After a presentation of the state of the art, the authors present the problem they tackle more formally. They assume that a certain type of entity is being referred to implicitly and address the problem of finding the most likely entity of this type. They model the problem as an ad-hoc document retrieval problem with two phases: candidate retrieval and candidate ranking. In the candidate retrieval step, they find all entity mentions m with ((m, p, o) \in KB OR (o, p, m) \in KB) AND o rdf:type \theta. To this end, they expand the queries via a biased search. The candidate ranking basically used social content pertaining to resources o of type \theta as documents for o. Now, the task is simple to match the candidates from the tweet to the documents. The authors use Markov Random field without regularization. As potential function, they use a neural-based approach (NEMS). The entity length definition used is incorrect (marginal): e = [(w_k, t_k), ..., (w_k+l, t_k+l)] stands for an entity of length (l+1). The idea here is to use embeddings to capture the distributional semantics of entities. The final function used for ranking combines NEMS with two previous models by means of a polynomial kernel. While the authors state that they learn weights using SGD, they do not give any further information as to how they parameterize the SGD.
The evaluation section is based on data by Perera et al. The authors show in Tables 1 and 2 that their approach outperforms the previous state of the art. I'd suggest that the authors use a uniform number of digits after the point. While the authors claim that their results are significant, they do not state (1) how they performed the t-test and (2) why they assume that a t-test can be used for the data at hand. The error analysis is highly appreciated and well carried out.
The solution provided by the authors addresses the problem at hand in an interesting way. The task in itself seems ill-defined but given that this definition stems from previous work, this cannot be held against the authors. Some of the assumptions made (e.g., the existence of documents pertaining to resources in the reference KB) only hold for a small portion of the resources available on the Linked Data Cloud. Hence, while the approach is viable for such resources, it will not work for a large number of resources.
[1] http://www.independent.co.uk/arts-entertainment/films/features/ethan-hawke-and-patricia-arquette-interview-growing-up-in-public-9584412.html
Questions
1- Choice of TagMe: The study cited by the author is from 2013. Newer results such as those published at http://gerbil.aksw.org/gerbil/overview suggest different results. Was there any other reason for choosing TagMe?
2- Have you measured the influence of TagMe on the total performance of your tool?
3- The condition ((m, p, o) \in KB OR (o, p, m) \in KB) AND O rdf:type \theta describes a 1-hop model. Have you consider a two-hop model?
4- What does S stand for in Eq. (8)
6- Please clarify how and why you preferred a t-test over a non-parametric test.
Minor
With unbound number of classes => With an unbound number of classes
user generated content => user-generated content
state of the art method => state-of-the-art method
that would identify => that can identify
that could be relevant => that are (potentially) relevant
wikidata:Film dbp:label Film => Punctuation missing?
Please fix "‘Then there’s Ethan Hawke and Patricia Arquette, easily the best characters and best performances of the movie’" on page 8. You can use line breaks in \texttt if you are using TeX.
++ After rebuttal
I thank the authors for their rebuttal and their answers. Please do include those points into the final version of the paper.

Review 3 (by Giuseppe Rizzo)

(RELEVANCE TO ESWC) The paper fits very well into the scope of ESWC/NLP&IR track.
(NOVELTY OF THE PROPOSED SOLUTION) The idea is novel, the approach is quite innovative however few details of the solution should have given more (see below)
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) A few weaknesses, mainly on the protocol used to create examples (see below)
(EVALUATION OF THE STATE-OF-THE-ART) Thorough analysis of the state-of-the-art.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) There are a few questions arose when reviewing (see below)
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Likewise, missing a few inputs to be capable of reproducing the experiments (see below)
Strengths
- the paper is well written and clear with a good balance between motivations/claims, formulation and experimental verification
- use of embeddings to cope with the linking towards entities not observed in the original tweets. The generation of the embeddings is performed using a loss function proposed by the authors in this paper. The embeddings are then generated for each gold standard, turning such a thing as a clear advantage allowing the approach to potentially adapt to any domain
- higher accuracy wrt the baseline, even though the authors haven't specified if the baseline has also been tested in linking tweets to reviews, please verify and elaborate in the camera ready paper if accepted.
Weaknesses
- for the candidate selection the authors rely on a black box, namely Tagme, whose effects in the approach haven't been assessed and studied thoroughly. We can indeed observe a drop of performance wrt the baseline on this specific sub-task.
- the hyper-parameters of the neural network used to generate the embeddings haven't been reported in the paper. Similarly, authors have just illustrated the neural network in Fig. 2 but they haven't described what sort of network is used (feed forward?)
- the experimental setup has been built around two gold standards, namely Movie and Book, which are an extension of the gold standards proposed by Perera et al [15]. The original annotated datasets are publicly available while the modified versions not and this is a shortcoming given that this paper might represent a reference work for implicit entity linking from tweets to reviews
- lack of descriptions of the protocol used to extend the original gold standards and this is a shortcoming as it isn't unclear what has driven the annotation to link a given tweet to a review
======== AFTER REBUTTAL ===========
Thanks for the answers. Please improve the paper accordingly in the camera ready.

Metareview by John McCrae

The authors present an approach to ad-hoc document retrieval, this is heavily based on the work of Perera et al., and there are some questions as to how large a delta is made in this work. Furthermore, it seems that the task is itself quite hard to define and the authors again rely on previous work, however this task could be more clearly motivated. However the authors present a result that does seem to improve over the baseline and the evaluation is sufficiently thorough. Furthermore, the paper is well-written and the task is certainly interesting.

Share on