# Paper 66 (Research track)

Topic-Controlled Unsupervised Mutual Enrichment of Relational Document Annotations

Author(s): Felix Kuhr, Bjarne Witten, Ralf Moeller

Full text: submitted version

Abstract: Knowledge graph systems produce huge knowledge graphs representing entities and relations.
Annotating documents with parts of these graphs to have symbolic content descriptions representing the semantics of documents ignore the authors’ higher purpose in mind.
Authors often paraphrase words and use synonyms encoding the semantics of text instead of explicitly expressing the textual semantics.
Hence, it is difficult to annotate documents with entities and relations from generic knowledge graphs.
In this paper, we present an unsupervised approach identifying annotations for documents using annotations of related documents representing a symbolic content description including the authors’ higher purpose in mind and introduce an EM-like algorithm iteratively optimizing the document-specific annotations.

Keywords: semantic computation; unsupervised text annotation; annotation database enrichment

Decision: reject

Review 1 (by anonymous reviewer)

(RELEVANCE TO ESWC) yes, because it is part of the proposed topic list for this specific track.
(NOVELTY OF THE PROPOSED SOLUTION) Novel approach, as it is the first published EM-like algorithm solving the introduced challenge.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The proposed solution seems to be complete as well as correct.
(EVALUATION OF THE STATE-OF-THE-ART) Nice introduction to the general areas, this paper covers. No evaluation of other state-of-the-art papers, but with this kind related work section (giving a broader overview about the different research areas this paper covers),  this is fine.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The discussion and demonstration is sufficient, the author used a nice and understandable use-case to evaluate their approaches.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Based on the detailed description of the two introduced similarity measures and the EM-like algorithm all experiments should be reproducible. However the paper would benefit, if the authors publish the source code as well as the data underlining the evaluation.
(OVERALL SCORE) The authors introduced a topic-controlled approach for the unsupervised addition of knowledge to an existing knowledge graph.
To to so the authors introduced two similarity measures (based on previous extracted topics), one for the similarity between documents and one similarity measure based on annotations.
Finally the authors introduced an EM-like algorithm to enriching existing annotations with new facts.
I liked the details and clear description of the similarity measures and the newly introduced algorithm (section 3)
I would have liked, if the used corpora, as well as the implementation would have been published.
I also would have liked an improved version of table 1 (Example of associative annotations of four documents.). I found it quite hard to understand which information already existed and which kind of information was added via the enrichment Algorithm

Review 2 (by Guillermo Palma)

(RELEVANCE TO ESWC) The enrichment of knowledge graphs with annotations from documents with specific content and considering the semantic meaning of words within the text of documents, is relevant to ESWC.
(NOVELTY OF THE PROPOSED SOLUTION) This paper presents an iterative annotation enrichment algorithm for annotation databases, based on an novel topic-controlled approach using two similarity measures (D-Similarity and G-Similarity) in an iterative EM-like algorithm.
This paper introduces two new  similarity measures:
D-Similarity computes the relatedness between two documents using the similarity of the documents' topics.
G-Similarity computes the relatedness between a document and a set of documents.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The authors demonstrated that the iterative annotation enrichment algorithm proposed is correct and terminates for a finite set of documents as input.
(EVALUATION OF THE STATE-OF-THE-ART) This paper does not include state-of-art techniques for the enrichment and integration of Knowledge Graphs based on semantic similarity measures, for example MINTE[1].
[1] MINTE: semantically integrating RDF graphs. WIMS, 2017.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) This paper introduce an iterative EM-like algorithm  to identify the annotations describing the semantic meaning of a document. The proposed algorithm and the two similarity measures introduced are well described. The complexity of the proposed algorithm is studied. This paper presents the soundness and correctness of the proposed techniques.
On page 5 the paper indicates that for 2 documents de and dk, D-similarity SimD(de, dk) \in [0, 1]
From the definition of G-similarity, equations 2 and 3 on page 6, I conclude that the similarity value  SimG(ge, gk) \in [0, 6]. ERV is defined in equation 4, on page 7.  Why in equation 4 of ERV has a greater weight the value of G-similarity (SimGt)?
Regarding to the Algorithm 1 Iterative Annotation Enrichment:
1) The variable th (line 3) is not used.
2) Typo error in the G letter used in G^de  in line 14.
3) The variable \tau = 0.75  was defined in line 3 and it is the threshold of D-similarity used in line. Why D-similarity has a threshold of \tau = 0.75 ? Why \tau is not an input variable?
4) In line 13 ERVt is present as a variable. But ERV is defined as a function in equation 4. If ERVt is a variable, which is the initial value of ERVr?
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experimental evaluation comprises only one case study: 50 Wikipedia articles in the German automotive industry.
The results of the  iteratively enriching of the  annotation database presents low average values of probability of the true positive rate (tpr), and positive predictive value (ppv)
The parameters used in the Iterative Annotation Enrichment algorithm and in the MALLET library are not explained enough.
On page 10 it is indicated: “We choose one document d e from the corpus D and remove 85% annotations of the corresponding ADB ge”
Are the removed annotations in at least two different ADB?
On page 10 it is indicated: “In a third step, we infer the topic distribution for document de …”
What was the method used to infer the topic distribution?
On page 10 it is indicated: “we use a small D-similarity of 0.20 ..”
Is the D-similarity value of 0.20 the same value used in the variable \tau in  line 10 of the Algorithm 1?
On page 11 it is indicated: “After applying IE techniques to extract the directly extractable data from the text of the documents in D”
What was the IE technique applied to extract data from the text of the documents studied?
Regarding to the database enriching, explained on page 11.  Figure 1 and 2 presents the number of iterations performed for the Algorithm 1 in the iterative ADB construction.
Did all 50 documents perform the same number of iterations in the Algorithm 1 for the ADB construction?
G-similarity plays a fundamental role in the Algorithm 1 iterative annotation enrichment. Why the experimental study does not include a study on the performance of  the Algorithm 1, with different values of G-similarity?
On page 9, it is indicated: “Applying Algorithm 1 to each document in D leads to the complexity O(n^3*m^2). Obvious, in practise the number of documents dk ∈ D (n’), being similar to document de, is small (n’ <<  n) and the rank of the similarity matrix M (m‘) is small, too (m’ << m). Furthermore, the number of iterations for each document is only a fraction of n (see Section 4).”
But, on section 4  the number of similar documents m’  is not reported for the 50 Wikipedia documents of the case of study.  Furthermore, the total number of  annotations (n) and number of annotations associated with each Wikipedia document (n’) are not reported.
(OVERALL SCORE) Strong Points (SPs)
* This paper presents theoretical formalism of the proposed approach.
* A EM-like algorithm which introduced a topic-controlled approach for the iteratively enrichment of document annotation databases.
* The annotations identified by Iterative Annotation Enrichment algorithm that do not correspond to the ground truth are not necessarily incorrect.
Weak Points (WPs)
* The experimental evaluation comprises only one specific domain.
* The effect of G-similarity on the results is not discussed.
* Parameters that impact quality of the results the proposed Iterative Annotation Enrichment algorithm are not discussed.
* The values obtained of true positive rate (tpr) and positive predictive value (ppv) are low.
* As indicated above, the experimental study has weaknesses, which makes the reproducibility of the results difficult.
* Typo error in page 6, in the equation (2): "3 if (si = sj /\ pi = pj /\ oi = pj)"  should be "oi = oj" instead "oi = pj"
----- AFTER REBUTTAL PHASE -----
I would like to thank the authors for their responses. Many of my concerns were answered. However, in the proposed algorithm the G-similarity values are important in the quality of the results and the experimental study does not include an evaluation with different G-similarity values.

Review 3 (by Roberta Cuel)

(RELEVANCE TO ESWC) The structure of the paper is ok,
the case study seems very intresting
(NOVELTY OF THE PROPOSED SOLUTION) I'm not an expert on that
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) the case study and the discussion should be improved
(EVALUATION OF THE STATE-OF-THE-ART) There are some content overlaps with the following paper
www.ifis.uni-luebeck.de/.../tx_wapublications/ki2017_ikbc_workshop_public.pdf
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) it seems ok, but more details should be provided in particular on drawbacks
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) I'm not an expert on that
(OVERALL SCORE) I'm not an expert, but the paper is well organized, and the results seem interesting

Review 4 (by Brian Davis)

(RELEVANCE TO ESWC) The paper is extremely relevant to ML track.
(NOVELTY OF THE PROPOSED SOLUTION) The novelty lies in the application of unsupervised text annotation using expectation–maximization (EM) algorithm for topic controlled  enrichment of an annotation database of Wikipedia documents.   What is interesting is of course that the authors claim that such annotations are not otherwise extractable by existing Ontology Based IE approaches and more importantly they take an unsupervised approach.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The algorithm for the EM is extremely well described and it builds strongly on the authors previous work on unsupervised text annotation.  Though the solution attempts to provide associated and relevant annotations to facts in the knowledge backbone i.e. DPBedia, given the original argument regarding paraphrasing and nominal coreference in the case of US President etc examples, the experimental results don't indicate whether these entity tracking issues ares solved by your approach although relevant annotations are produced within the BMW context.   So the evidence does not match the claim in my opinion.
(EVALUATION OF THE STATE-OF-THE-ART) There is a good knowledge of the state of the art, but a proper IE systems will also include some entity tracking/coreference in text.
Not all IE systems are handcrafted, but you are correction that they rely on some supervised intervention.  Indeed SW datasets are lacking in rich lexical information (but this is changing with ongoing efforts in Ontology lexicalisation)  with  but a good IE system should attempt to augment its internal language resources with external semantic knowledge (ontological aware dictionaries) or online thesauri, again all supervised or crafted.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The algorithm is very well described but I am not sure the evaluation is rigorous enough.  It is not clear to me how generalisable as of yet the approach is beyond the example of BMW.  The experiment does not go beyond this brand or car or other classes of Car?  Also calling this a case study is somewhat misleading as this would involve in my opinion some requirements informed by an external stakeholder or the study of natural occurring event - though one could argue that wikipedia is a crowdsourcing event.  But this is really a preliminary experiment.  I feel the example dataset is too narrow to make a broad claim on the effectiveness of the approach.   Please justify why only the BMW brand?
In addition, can you categorise the types of associative annotations - when do some become irrelevant.  Im not sure if I am missing this but apart from examples it would be interesting to know what are types of associative annotations you generate.  Are they synonyms, pronouns/referents, paraphrases?
Are  paraphrases and multi word expression  lost or does the EM algorithm still capture them as topics...?Should be the case but its no clear, since examples seem to of one token length.
Can the algorithm cope with this co-referents such as "his wife" or the "the former president" or at this present stage are you finding single word synonyms.    This is ok but it would be good to clarify the limitations if any.  It appears to be handling variations of the BMW acronyms.
Another clarification needed is whether you needed to manually check the true positives as being DBpedia entries or were you exploiting the anchor text or info boxs or what? This isn't mentioned explicitly in Section 4 only that you took 50 articles.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) There experiment is somewhat replicable with respect to the thorough description of the algorithm  - see above for comments but there are not links to online documentation, datasets for inspection.
(OVERALL SCORE) The algorithm for the EM is extremely well described and it builds strongly on the authors previous work on unsupervised text annotation.  Though the solution attempts to provide associated and relevant annotations to facts in the knowledge backbone i.e. DPBedia, given the original argument regarding paraphrasing and nominal coreference in the case of the US President etc, the experimental results don't indicate whether these entity tracking issues ares solved by your approach although relevant annotations are produced within the BMW context.   So the evidence does not match the claim in my opinion.
This is a good paper and the research direction is promising and should be encouraged but the experimental results are a lacking and they are not convincing me that it is quite ready for publication.
Good Points
The contribution is important with respect to pushing annotation beyond NER using unsupervised techniques.
Well written paper overall.
Good Description of algorithm in Section 3
Weak Points
The claims regarding linguistic issues described in Section 1 examples (Obama etc)  do not seem to be resolved by your experiments.  This is ok but it should be made clear that you are tackling a subset of these problems.
The related work is missing many other mentions of NLP tools for ontology Aware IE, Semantic Annotation and Entity Linking and some better positioning of your unsupervised annotation approach relative the state of the notably other unsupervised learning approaches for IE.
Its seems that some of the content in this paper borrows heavily at first glance from the reference of [1] http://ceur-ws.org/Vol-1928/paper2.pdf in parts.  Also the experiment is quite similar which begs the questions what is the delta between this submission and [1] and by how much?
Questions
1) Please differentiate between this work and [1]
2) I feel the experimental dataset  is too narrow to make a broad claim on the effectiveness of the approach.   Please justify why only the one BMW brand?
3) Please provide more details on the limitations of the types of associated annotations discovered.  See above.
4)Do manually check the true positives as being DBpedia entries or were you exploiting the anchor text or info boxs or what? This isn't mentioned explicitly in Section 4 only that you took 50 articles. Are these introduced to the EM algorithm at all?

Metareview by Achim Rettinge

The authors extend their previous work on extracting graph-based document representations by expanding each with related documents. While previous approaches mostly exploit overlaps across documents to improve the extraction quality of a central graph, this paper utilizes them to identify related documents to expand document specific graphs. This is novel and closer to related work on event extraction, which tries to identify common graphs over a set of related documents.
The reviewers concerns are mainly two-fold:
1. Does such an expanded representation live up to the benefits claimed in the introduction? The empirical evidence presented seems not to match the general claim.
2. It seems hard to reproduce the results and the generality of the outcomes are unclear.
The authors' response did not sufficiently clarify the issues and the overall assessment remains a weak reject even if the scoring system does not reflect this sufficently and also in the light of others paper reviews. We therefore recommend a reject.

Share on