Event-Enhanced Learning for Knowledge Graph Completion
Author(s): Martin Ringsquandl, Evgeny Kharlamov, Daria Stepanova, Marcel Hildebrandt, Steffen Lamparter, Raffaello Lepratti, Ian Horrocks, Peer Kroeger
Full text: submitted version
Abstract: Statistical learning of relations between entities is a popular approach to address the problem of missing data in Knowledge Graphs. In this work we study how this learning can be enhanced with background of a special kind: event logs, that are sequences of entities that may occur in the graph. Such background naturally occurs in many important applications. We propose various embedding models that combine entities of a Knowledge Graph and event logs. Our evaluation shows that our approach outperforms state-of-the-art baselines on real-world manufacturing and road traffic Knowledge Graphs, as well as in a controlled scenario that mimics manufacturing processes.
Keywords: Knowledge Graph; Representation Learning; Event Logs
Review 1 (by anonymous reviewer)
(RELEVANCE TO ESWC) Knowledge graph completion at the core of the ESWC community interests. (NOVELTY OF THE PROPOSED SOLUTION) Judging from the discussion of the background, the proposed technique appears to be novel (and sufficiently different from previous publications). (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) I should probably give "strong accept" here as well, but I must honestly admit that I cannot judge the correctness of the approach; the maths is beyond me. It looks carefully crafted. (EVALUATION OF THE STATE-OF-THE-ART) There is a detailed discussion of related approaches (Section 2). (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The properties of the approach (computational and otherwise) are discussed in detail. Again, since I can not be sure about whether for this kind of approach the discussion is sufficient, I give 4 instead of 5 points. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The two datasets appear to be handpicked and justified only by the author's opinion of their relevance. But I personally (i.e. subjectively) tend to accept the authors choice of datasets as reasonable, as I am sure standard benchmarks for event-log enhanced KG completion do not exist yet. However, it remains to be seen how the approach would fare using a more wide range of differently shaped input problems. (OVERALL SCORE) Revised review: No changes after the authors rebuttal were necessary. The authors present their work on completing Knowledge Graphs through the exploitation of event logs. The paper is very well written, relevant to ESWC and appears to make a significant contribution.* My only mild concern with the paper is that it could not disperse the typical reviewers doubt of generalisability: the two datasets are handpicked and justified only by the author's opinion of their relevance. But I personally tend to accept the authors choice of datasets as reasonable, as I am sure standard benchmarks for event-log enhanced KG completion do not exist yet. * I must admit that I could not follow the formal underpinnings of the method very well (chapter 3), hence the low confidence rating. That is also why I did not tick "Strong accept". Minor 4.2: - The explanation of the table headers should be moved from the text to the table to avoid unnecessary back and forth jumping - A github link is not a good way to share experimental data and scripts, given the changing nature of the repository contents. Consider using Zenodo, which allows you to take a snapshot of your github repo, and sharing your dataset with a proper DOI (takes a few minutes to do). Optional - I am not a fan of using citations like  as subjects or objects in sentences because it forces the reader in all cases to look in the bibliography (even if  was mentioned before). In such cases of direct reference, consider using Smith, John et al (2017) instead. - Page 2: Are you sure the  in the end of the example descriptions is the good sign to use? I saw it in maths proofs sometimes, but not in this context.
Review 2 (by anonymous reviewer)
(RELEVANCE TO ESWC) The paper proposes a novel method to complete KG. (NOVELTY OF THE PROPOSED SOLUTION) The paper proposes a novel method to complete KG using event logs. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The proposed solution uses external event logs to learn entity relations for KG completion. The paper is well written and motivated. The contribution is interesting. Authors propose two methods to combine embedding models. (EVALUATION OF THE STATE-OF-THE-ART) It would be good to also discuss related work in the area of event log mining and BPM. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The experimental evaluation compares the proposed model with baseline approaches showing the effectiveness of the model. The datasets used are small but taken from real-world use cases. The parameter analysis presented in section 4.3 is also very good. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) (Section 4.1) More details on how the original KG was split would be useful. (OVERALL SCORE) ** Summary of the Paper The paper proposes a novel method to complete KG using event logs. **Short description of the problem tackled in the paper, main contributions, and results The proposed solution uses external event logs to learn entity relations for KG completion. The paper is well written and motivated. The contribution is interesting. Authors propose two methods to combine embedding models. ** Strong Points (SPs) 1 novel approach 2 interesting method 3 great experimental analysis ** Weak Points (WPs) ** Enumerate and explain at least three Weak Points of this work* 1 One dataset is proprietary 2 Small datasets 3 Missing related work on event log mining ** Questions to the Authors (QAs) 1 How does this approach relate to business process mining?
Review 3 (by anonymous reviewer)
(RELEVANCE TO ESWC) The paper addressing an emerging topic of semantic web. (NOVELTY OF THE PROPOSED SOLUTION) The solution proposes a new learning architecture from event-driven KG by modifying state-of-the art KG completion approach. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The formalism and technical details of proposed solution is clearly presented in the paper. (EVALUATION OF THE STATE-OF-THE-ART) The paper evaluates its solution against baseline solutions Trans_E and TEKE_E by implementing them in Tensoflow with provided source code in Github. The relevant details of state-of-the-art is also sufficiently discussed. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The paper give intuitive explanations of the problem and the approach which make it easy to understand properties of the paper’s approach. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The implementation with evaluation data is given in GitHub. (OVERALL SCORE) The paper studies how relational learning can be applied to extract extra knowledge facts from event logs that have relationships with entities in a KG. The paper analyses existing solutions and come up with their modified embedding models that help combine entities of a KG and event logs. These models are evaluated with two datasets, manufacturing dataset and traffic dataset with quite positive results. ***Strong Points*** 1) The paper idea is quite interesting and somehow innovative. 2) Application domains are convincing to nicely motivate the problem the paper tries to address. 3) Implementation effort is quite impressive and the source code is given in Github. 4) The paper is well-writen with intuitive examples which help to understand complicated concepts and technical details ***Weak Points*** 1) Some evaluation details are not clear. The dataset descriptions do not given details how the test data sets were created. I could not find it in authors’ Github. The evaluation datasets seems to be subjective selected. For example, in the manufacturing data, authors did some pre-processing steps based on their prior knowledge to reduce noise which I wonder if it give the bias in the outcome quality? Also, in the traffic dataset, a subset of data is chosen without a clear explanation on the data distribution and the nature of selected data elements. The provided data in Github looks like the simulated data/post processed data to me, I could not find more metadata or provenance information on the data directories, so, I wonder there such details exist? 2) Window parameter is important hyper parameter to dictate the quality the learning model. The evaluation in table 3 confirms this observation. However, it was chosen subjectively, e.g, in manufacturing dataset, window sizes were given without clear justification and no information on window size is given in Traffic dataset. 3) The learning event sequence seems to fit better with Recurrent neural networks which are commonly time-series data like event/sensor data but the paper does not touch on this even in related work discussion, so, I wonder why? ***Questions to the Authors **** 1) Please clarify the evaluation details in weak points
Metareview by Achim Rettinge
The authors present a model to integrate external knowledge from event logs for knowledge graph completion. All reviewers agree that the work is relevant and novel. Only slight doubts remain concerning the reproducibility and generalisability of the empirical results. The authors proposed resolutions for the reviewers concerns in the author feedback and are requested to do so in the final version of the paper. There remain no arguments against acceptance of this paper.