Paper 121 (Research track)

How Truth Discovery can benefit from RDF Knowledge Bases and vice versa

Author(s): Valentina Beretta, Sébastien Harispe, Sylvie Ranwez, Isabelle Mougenot

Full text: submitted version

Abstract: This study leverages information richness of RDF Knowledge Bases (KBs) to improve Truth Discovery models. These models address the problem of identifying facts when conflicting claims are provided by several sources. Assuming that true claims are provided by reliable sources and reliable sources provide true claims, they iteratively compute value confidence and source trustworthiness in order to establish which claims are true. We propose a model that benefits from the knowledge expressed by an existing RDF KB in the form of rules quantifying the evidence that supports a claim. Then, this quantity is used to improve the value confidence estimation. Enhancing truth discovery performance allows to efficiently obtain a larger set of reliable facts that reciprocally can be used to populate RDF KBs. Empirical experiments on synthetic datasets show the potential of our approach.

Keywords: Truth Discovery; RDF KBs; Rule Mining; Source Trustworthiness; Value Confidence

Decision: reject

Review 1 (by Martin Kaltenböck)

(RELEVANCE TO ESWC) The paper present a research on applying rule mining techniques to predict the truth values in the RDF triples. The problem of assessing the veracity of data is relevant for the semantic web. Moreover, many knowledge bases are available openly and could be used for rule mining.
(NOVELTY OF THE PROPOSED SOLUTION) The rule mining techniques have a long history of applications for predicting values in the database community. However, authors make a step further and combine rule mining with truth discovery. In the combination the two parts are computed independently and at a later stage combined in a formula to compute the final result.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The solution is not self-contradicting. The correctness could be judged upon the results of the experiments. Author note that to their best knowledge comparable methods do not exist. In this respect we cannot judge if the proposed method to integrate rule mining into truth discovery is the best one.
(EVALUATION OF THE STATE-OF-THE-ART) The relevant related work is described in section 2.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Though the paper is well written and as far as I can judge contains no mistakes, the readability and comprehension of the paper could benefit from a running example starting from section 2. The definitions and the descriptions could be shorter and, possibly, simplified. For example, in Definition 3 the eligible rule is defined for a claim - this becomes clear from the context, however not stated explicitly.
Section 3.3 is called "Assessing claim confidence...", Section 3.4 - "Value confidence computation". However, value is a part of claim and the reader has to induce from the context what is being done.
The discussion of results in Section 4.2 is extensive and outlines many interesting conjectures about the properties of the proposed method. Figs 2 and 3 could, probably, contain less information, but highlight the most important results. A Table summarizing the results would also be beneficial for the paper to my opinion.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The dataset and the source code are available at a public repository.
However, in the paper the dataset is basically not described, but a reference to a previous work of authors is given. The paper would benefit from some basic characteristics of the dataset.
(OVERALL SCORE) The paper introduce a novel method to combine rule mining with truth discovery techniques. Rule mining is used to extract certain recurrent patterns in the data. For this purpose various knowledge bases are available on the web. The authors introduce formulas to compute the score and the boosting factors for the values predicted by rules. Then these scores are combined with the trustworthiness of the sources of claims. The problem is relevant, the approach is well motivated and described in the paper. Authors conduct several experiments and provide an extensive discussion of the results. The results show a large improvement over the baseline for several datasets and scenarios. However, except for the baseline, there are no other methods in the experiments.
The strong and weak points are outlined in the justifications above.
Questions to the authors:
1. In this case the numbers of the rules were limited manually (47 and 62). In a possible application scenario who would limit the number of rules or select the best rules?
2. Examples of rules in Table 2 are not very convincing, for example, if a person has died in a certain place it is not that common that person was born in that place, especially for the people mentioned in DBpedia. Are there many such rules? Does the possible "incorrectness" (in everyday meaning) of the rules affect the results?


Review 2 (by Catherine Faron Zucker)

(RELEVANCE TO ESWC) This paper addresses the problem of truth discovery (TD) consisting in identifying the most reliable triples among a set of conflicting ones, with the ultimate aim of constructing high quality knowledge bases from several potentially conflicting sources. They propose an approach to improve a state-of-the-art truth discovery model with domain knowledge (ontological rules) to evaluate the confidence of triples (depending on the proportion of rules that confirm a triple); these rules are mined from existing RDF sources on the LOD and their quality is measured to enable the selection of relevant ones.
The paper is quite relevant to the conference.
(NOVELTY OF THE PROPOSED SOLUTION) The proposed approach is a combination of existing solutions to score rules and to score facts.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The experiments shows the added value of using (scored) rules to score the truth of facts and decide whether to integrate them in the integrated high quality KB. But the proposed metrics to score rules mined on available RDF datasets is not evaluated: it should be compared to state-of-the-art metrics used to score rules (or the experiments should show that these metrics are not relevant for the special purpose of selecting rules to score facts).
(EVALUATION OF THE STATE-OF-THE-ART) the state-of-the-art on learning ontologies from the LOD could be investigated where rules/axioms are scored in order to decide to integrate them in an ontology or not, e.g. papers from Lorenz Bühmann and Jens Lehmann. The proposed metrics may be compared, State-of-the-art metrics may be reused.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The propoerties of the approach are not really discussed.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The approach has been tested on two artificial datasets which are available online.
(OVERALL SCORE) This paper addresses the problem of truth discovery (TD) consisting in identifying the most reliable triples among a set of conflicting ones, with the ultimate aim of constructing high quality knowledge bases from several potentially conflicting sources. They propose an approach to improve a state-of-the-art truth discovery model with domain knowledge (ontological rules) to evaluate the confidence of triples (depending on the proportion of rules that confirm a triple); these rules are mined from existing RDF sources on the LOD and their quality is measured to enable the selection of relevant ones. The approach has been tested on two artificial datasets.
The paper is quite relevant to the conference.
The datasets for the experiments are available online. An additional experiment on real-world datasets would be valuable.
The experiments shows the added value of using (scored) rules to score the truth of facts and decide whether to integrate them in the integrated high quality KB. But the proposed metrics to score rules mined on available RDF datasets is not evaluated: it should be compared to state-of-the-art metrics used to score rules (or the experiments should show that these metrics are not relevant for the special purpose of selecting rules to score facts). In both cases, the state-of-the-art on learning ontologies from the LOD could be investigated where rules/axioms are scored in order to decide to integrate them in an ontology or not, e.g. papers from Lorenz Bühmann and Jens Lehmann. The proposed metrics may be compared, State-of-the-art metrics may be reused.
The Sum truth discovery model should be introduced and presented to make the paper self-contained and the choice of it should be motivated.
The motivation to consider partial order on values should be revised: the fact that it has been taken into account in a previous work whose thematic is even not mentioned is not sufficient. The relevant part of reference 3 should be summarized to make the paper self-contained, both on taking into account partial order and on generating an artificial dataset for the experiments.
Section 4.2 is a little indigestible.
The paper should be revised by a native English speaker.
***
after rebutal: I initially missed the indication of the URL where some materials can be found online. Sorry for that. I updated my review accordingly.


Review 3 (by Steffen Staab)

(RELEVANCE TO ESWC) Determining the truth value of triples in knowledge bases is a major concern for the semantic web.
(NOVELTY OF THE PROPOSED SOLUTION) The paper extends existing methods for judging the truth value of an object given a subject and a predicate. The authors come up with a nice reflection resulting from learned rules. As a minor modification they use Bayes' updates with pseudocounts to reflect initial knowledge about distribution of spurious information.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The proposed method appears to be sound.
However, the suggested evaluation falls short of investigating the topic in real world knowledge bases. It remains unclear to which extent such an approach would help in a realistic setting. The evaluation shows however improvement to related approaches - though on artificially created knowledge bases.
(EVALUATION OF THE STATE-OF-THE-ART) All Important work considered
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The evaluation targets a discussions of the behavior of the suggested mechanism - which it does under the constrained of having artificial datasets. I think the artificial datasets should be one point of evaluation, but other ones should include a realistic dataset.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Very clear. Examples help to understand core notions such as eligibility and approval.
(OVERALL SCORE) Weak Points:
- I found abstract and introduction entirely misleading. E.g. the abstract talks about computing trustworthiness of sources - which is not done in this paper at all. Also I found the introduction rather useless. I think that finding errors in triples should be a clear enough reason for why to pursue such an approach and all the extra text about schema mapping, duplicate recognition  etc. is rather misleading. Also the title does not well describe the paper (where is the "vice versa" in the paper?)
- Evaluation (see above)
- not a big step compared to previous approaches
Strong points:
- Conceptually nice approach
- Improvement of existing models


Review 4 (by anonymous reviewer)

(RELEVANCE TO ESWC) The paper fits the conference topics
(NOVELTY OF THE PROPOSED SOLUTION) The paper focuses on an interesting problem. However, it presents a limited novelty, being the main  contribution of the paper a proposal for combining the values of the different metrics. Also the improvement in terms of experimental results is not that impressive even if it appears to be promising
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) There are some aspects that need to be fixed and also additional works that should be taken into account. 
As regards the first issue, see details below.
When presenting (2), the implicit assumption is that no fresh variables are introduced in the head and that all atoms are transitively connected. These assumptions should be made explicit when presenting the metric. 
As regards (3), is p(x,j) the actual occurrence in the KB? If so, it is not clear the reason why the OWA is taken into account. For the same reason it is not clear the reason why, as reported after presenting (3) when talking about normalization, this metric is able to take into account false facts whilst support only takes into account true facts. Similarly, when support is computed, does the number of couples (x,y) refer to facts that are asserted in the KB or does it include also facts that may be derived from the KB? I'm aware that this is an already published result but if something is reported in the paper it needs to be clear and self-contained.
(EVALUATION OF THE STATE-OF-THE-ART) As regards the second issue (metrics for the evaluation of discovered patters), recently new metrics for extracting frequent patterns in the form of Horn-like rules from RDF/OWL knowledge bases have been proposed (see references below). Given the importance that the metrics play in this proposal, these recent works should be taken into account or a deeper discussion on the reason why they have not been considered should be reported. 
- Sazonau V., Sattler U. (2017) Mining Hypotheses from Data in OWL: Advanced Evaluation and Complete Construction. In: d'Amato C. et al. (eds) The Semantic Web – ISWC 2017. ISWC 2017. Lecture Notes in Computer Science, vol 10587. Springer, Cham
- Pellissier Tanon T., Stepanova D., Razniewski S., Mirza P., Weikum G. (2017) Completeness-Aware Rule Learning from Knowledge Graphs. In: d'Amato C. et al. (eds) The Semantic Web – ISWC 2017. ISWC 2017. Lecture Notes in Computer Science, vol 10587. Springer, Cham
- Claudia d'Amato, Steffen Staab, Andrea G. B. Tettamanzi, Tran Duc Minh, Fabien L. Gandon:
Ontology enrichment by discovering multi-relational association rules from ontological knowledge bases. SAC 2016: 333-338
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The presented approach is overall reasonable. Some aspects to be fixed have been suggested above.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) As regards the experimental evaluation, it is overall clear and well written. 
--- AFTER REBUTTAL PHASE ---
I acknowledge that I read the authors response and particularly the reference to the adopted datasets. I would suggest to make it clear also when presenting the experimental study.
(OVERALL SCORE) The paper focuses on an interesting problem. However, it presents a limited novelty, being the main paper contribution the proposal for combining the values of different metrics. Also the improvement in terms of experimental results is moderate. 
There are some aspects that need to be fixed and also additional works that should be taken into account. For details see the corresponding sections.


Review 5 (by Steffen Staab)

(RELEVANCE TO ESWC) The paper discusses truth discovery in the face of RDF conflicts, i.e., the scenario where one has conflicting RDF triples about the same entity from different data sources. This scenario is very common when trying to interoperate multiple RDF datasets and is therefore relevant to the semantic web and ESWC.
(NOVELTY OF THE PROPOSED SOLUTION) The paper presents novel methods for truth discovery based on horn rules mined from knowledge bases. The paper's related work section does not discuss any work similar to this approach, neither am I aware of any, although I am not very knowledgeable in this specific subfield. Therefore, the proposed solution seems novel.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The paper provides sound definitions of preliminaries and use a notation that unifies RDF and truth discovery. For judging the quality of mined rules, standard rule quality metrics are adopted from the field. The definitions for eligible and approving rules seem straightforward. The core contributions of the paper, a method of how to combine the support and confidence quality measures of a rule into one and how to assess claim confidence using rules, are estimated using empirical Bayes (EB) methods. I am not familiar with these methods and can only speculate on their correct application here, but it seems sensible in this context. The modification of how actual truth and credibility values are then calculated using rule scores is then applied to a seemingly standard method of the field (Sums & AdoptedSums). However, only the formulas for this base method are given, an explanation was omitted. While this approach demonstrates that incorporating mined rules into truth discovery is possible in this way, it is not clear whether this was the only or the most sensible approach.
(EVALUATION OF THE STATE-OF-THE-ART) The presented method is a generalization of previous methods that do not use rule mining (Sums & AdoptedSums). The paper only compares their methods against these specializations of their own method and not against any other techniques, i.e., not against any other techniques that use rules mined from knowledge bases. While the authors state that they are not aware of any other such methods, a comparison with some straightforward baselines that demonstrates the benefit of empirical Bayes estimations would have been a big benefit. Notably absent from the paper is any discussion of how the presented work is different from existing knowledge base completion methods (cf. [1]), which are already used in practice to estimate the veracity of facts.
[1] Nickel, M., Murphy, K., Tresp, V., & Gabrilovich, E. (2016). A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1), 11-33.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The paper only evaluates on synthetic data sets. To understand the structure of these datasets one has to read the authors' previous work on this topic, because the details of how datasets were synthesized are completely absent from this paper. This makes the complete experiments section very hard to follow. The analysis of results reads as just a text description of the figure without any truly generalizing findings. If there were any, they were well hidden inside page-long, single-paragraph text deserts.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Datasets and model codes are published on GitHub. All figures/data should therefore be easily reproducible. The interesting question, how the results generalize to real world data, is left unanswered by the paper. It is not clear whether the synthetic data sets are in any way relevant to the use case. The authors do not describe how their findings could generalize / be applied to other subfields of the semantic web.
(OVERALL SCORE) Summary: The paper discusses truth discovery in the face of RDF conflicts, i.e., the scenario where one has conflicting RDF triples about the same entity from different data sources. The core contributions of the paper are a method of how to combine the standard support and confidence quality measures of a rule into one and how to assess claim confidence using rules. As a proof-of-concept these methods are applied to an existing truth discovery method and evaluated using synthetic data.
Strong points:
- Sound and comprehensive formal definitions.
- Seemingly sensible estimation methods.
- Code and data sets published on GitHub.
Weak points:
- Evaluation using synthetic data.
- Details of data synthesis only available in previous paper.
- Unreadable experiments section with no clear findings.
Notes to authors:
- For me, the paper would have been a clear accept given a stronger and better written evaluation section.
- The formulas for supp and conf were very confusing to me at first, because I am not aware at all of the #(x,y) : \exists z_1 , ... notation. A short description would have helped much. Also the notation doesn't make it clear that the z_1 to z_n may be the same binding as x & y (atleast that was the only way I could make sense of it).


Metareview by Hala Skaf

This submission extends existing model for judging the truth value of an object given a subject and a predicate. 
Reviewers agree that determining the truth value of triples in knowledge bases is related to the topics of ESWC. However, the novelty of the submission is limited, in addition, the evaluation is conducted with unavailable artificielles knowledge bases. It is unclear whether the proposed approach would help in a realistic setting.  As pointed by reviewers some sections of the submission are entirely misleading and the paper needs a proof reading. 
As a result, the submission does not fulfill the requirements for a ESCW publication in terms of maturity, technical depth, and quality.


Share on

Leave a Reply

Your email address will not be published. Required fields are marked *