**Modeling Relational Data with Graph Convolutional Networks**

**Author(s):** Michael Schlichtkrull, Thomas Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, Max Welling

**Full text:** submitted version

**Decision:** accept

**Abstract:** Knowledge graphs enable a wide variety of applications, including question answering and information retrieval. Despite the great effort invested in their creation and maintenance, even the largest (e.g., Yago, DBPedia or Wikidata) remain incomplete. We introduce Relational Graph Convolutional Networks (R-GCNs) and apply them to two standard knowledge base completion tasks: Link prediction (recovery of missing facts, i.e. subject-predicate-object triples) and entity classification (recovery of missing entity attributes). R-GCNs are related to a recent class of neural networks operating on graphs, and are developed specifically to handle the highly multi-relational data characteristic of realistic knowledge bases. We demonstrate the effectiveness of R-GCNs as a stand-alone model for entity classification. We further show that factorization models for link prediction such as DistMult can be significantly improved through the use of an R-GCN encoder model to accumulate evidence over multiple inference steps in the graph, demonstrating a large improvement of 29.8% on FB15k-237 over a decoder-only baseline.

**Keywords:** link prediction; neural network; knowledge graph; deep learning

**Review 1 (by Dagmar Gromann)**

(RELEVANCE TO ESWC) Both the novel method as well as its application tasks are well within the scope of this conference. Categorizing entities into their superordinate classes as well as detecting new relations in knowledge bases are scenarios for the proposed method that are interesting for this community. (NOVELTY OF THE PROPOSED SOLUTION) To the best of my knowledge this is the first adaptation f GCNs to the domain of knowledge graphs. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Properties of the proposed solution are fully specified and the paper is technically sound. (EVALUATION OF THE STATE-OF-THE-ART) On the entity classification task the proposed solution is compared across four data sets with three other models and on the task of link prediction the comparison is with six other models across two data sets. The chosen reference implementations are state-of-the-art as is the whole evaluation. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) While the model outperformed other implementations on most tasks, other models could perform better on two entity classification data sets. The authors offer a detailed discussion on potential causes and future improvements. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The description allows for an easy reproduction of the proposed method, even though it would be highly appreciated if the authors decided to share their code via a github link in the paper. (OVERALL SCORE) SUMMARY This very clear and well-written paper proposes the translation of Graph Convolutional Networks (GCNs) to the domain of knowledge graphs in form of modeling relational data, which to the best of my knowledge is a new approach. In contrast to classical GCNs, relation-specific transformations are introduced, i.e., specific to the type of relation and its directionality, which generates a unique neighborhood for each relation type. To share parameters across those neighborhood types, basis matrices and block diagonal constraints are introduced. The main contributions of this paper are the introduction of Relational Graph Convolutional Networks (RGCN), a regularization method that allows for parameter sharing across the model, and a second regularization technique that introduces a sparsity constraint. One further contribution is the demonstration that the proposed RGCN can successfully operate on large relational data sets. The proposed model is tested on an entity classification and a link prediction task and shows a good performance. The evaluation is sound and very thorough in terms of related approaches, baselines and diversity of datasets. Where the model is outperformed by related methods, the authors offer a thorough discussion of potential causes and concrete suggestions for future improvements. STRONG POINTS 1) In terms of originality, the application of R-GCNs to knowledge graphs and the two experimental tasks is novel 2) Both description and evaluation of the approach are very thorough and clear 3) The properties of the model are well situated within related work and clearly explained as well as properly evaluated against similar models in two tasks that are important to this community 4) Style and structure of this paper are very clear and easy to follow WEAK POINTS 1) Some design decisions could be discussed in more detail, e.g. number of layers QUESTIONS TO AUTHORS 1) In the experiments, results with two-layer architectures have been reported. Where deeper architectures tested? Would be interesting to know whether a higher number of layers made a difference. 2) Could you explain the impact and functioning of the self-connection of a relation type to each node in more detail? 3) Where do the 29.8% outperformance of R-GCN over DistMult come from? Maybe the authors could explain a little where this comes from since I do not find it straightforward to gather that value from the presented table.

**Review 2 (by Martin Giese)**

(RELEVANCE TO ESWC) Excerpt from the "Overall score" review: Large knowledge bases, such as DBPedia, Wikidata and others, are, in spite of focused efforts, incomplete, both with respect to entity classification and relations between entities. A significant amount of research is being invested in developing methods for automatic classification and missing edge detection/insertion in knowledge bases, and the authors of this article provide an approach using neural networks for both entity classification and missing relations detection in RDF data. Comparison with existing benchmarks shows promising results for both tasks. (NOVELTY OF THE PROPOSED SOLUTION) Excerpt from the "Overall score" review: The proposed solution, dubbed Graph Convolutional Neural Networks (GCN) by the authors, is a variant of convolutional neural networks adapted to work directly on graphs with vertices and edges. A convolutional neural network makes use of a convolutional filter that scans a part of the input at the time, and in the current setup, for each node in the graph, the filter provides the node together with its nearest neighbours. This allows for a more high-performing solution compared to considering e.g. the full graph for each node, with the penalty of having less information to compute with. The solution is closely related to several other works, whereof perhaps two of the closest are [1] "Embedding entities and relations for learning and inference in knowledge bases" by B. Yang et al. (2015) and [2] "Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering" by M. Defferard et al. (2017). While [2] also makes use of convolutions, that solution is based on a spectral representation of the graph, whereas the article under review works directly on the nodes and edges of the graph itself. In [1], they introduce the DistMult scoring function, that is used as a part of the missing edge detection of the current article. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The proposed solution is backed by empirical experiments, showing that the solution works and performs either well or best-in-class on commonly accepted benchmarks. (EVALUATION OF THE STATE-OF-THE-ART) Excerpt from the "Overall score" review: Benchmarking relative to other state of the art classifiers and missing-edge detection algorithms shows good or best-in-class results when tested on commonly available and widely used benchmark datasets, such as FB15k-237 and WN18. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The article is strong on describing the technicalities involved in the mathematical machinery, but could be better at discussing pros and cons of the proposed solution compared to alternative approaches. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) I cannot find any reference to the solution source code, which I would prefer to be open source for easy validation of results. However, the results are expected to be easily reproducible due to the open nature of the benchmark data sets. (OVERALL SCORE) **Short description of the problem tackled in the paper, main contributions, and results** Large knowledge bases, such as DBPedia, Wikidata and others, are, in spite of focused efforts, incomplete, both with respect to entity classification and relations between entities. A significant amount of research is being invested in developing methods for automatic classification and missing edge detection/insertion in knowledge bases, and the authors of this article provide an approach using neural networks for both entity classification and missing relations detection in RDF data. Comparison with existing benchmarks shows promising results for both tasks. The proposed solution, dubbed Graph Convolutional Networks (GCN) by the authors, is a variant of convolutional neural networks adapted to work directly on graphs with vertices and edges. A convolutional neural network makes use of a convolutional filter that scans a part of the input at the time, and in the current setup, for each node in the graph, the filter provides the node together with its nearest neighbours. This allows for a more high-performing solution compared to considering e.g. the full graph for each node, but with the penalty of having less information to compute with. Benchmarking relative to other state of the art classifiers and missing-edge detection algorithms shows good or best-in-class results when tested on commonly available and widely used benchmark datasets, such as FB15k-237 and WN18. The experimental results are presented in a tabular format which makes it easy to compare towards alternative approaches. (Although all of these benchmarks should be considered with a shrewd eye, as pointed out by Kadlec, Bajgar and Kleindienst (2017)). The solution is closely related to several other works, whereof perhaps two of the closest are [1] "Embedding entities and relations for learning and inference in knowledge bases" by B. Yang et al. (2015) and [2] "Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering" by M. Defferard et al. (2017). While [2] also makes use of convolutions, this solution is based on a spectral representation of the graph, whereas the article under review works directly on the nodes and edges of the graph itself. In [1], they introduce the DistMult scoring function, that is used in the missing edge detection of the current article. The article is well written, demonstrates a strong command of the English language, and is more or less devoid of spelling errors. The ideas are communicated through high level mathematics, relieving the reader from having to know details of e.g. TensorFlow to understand the mathematical details of the contribution. ** Enumerate and explain at least three Strong Points of this work** * Novel application of graph convolutional neural networks on RDF data * Good benchmark results * Clear and easy to understand exposition * Part of the novel research being done regarding neural networks on graphs ** Enumerate and explain at least three Weak Points of this work** (with comments for improvement): * The model only makes use of nearest neighbours, and there are no indications to how this can be extended to consider a larger portion of the graph (i.e. making the convolution filter wider). In citation graphs, the current node and existing citations may be a good basis for detecting missing citations, but for other graph structures, such as molecules, the global structure may be of greater importance for detection of local properties (features) due to transitive effects on electron distributions, receptor affinity etc. (An alternative hypothesis to why the algorithm performs less than optimal on the MUTAG dataset?) * Many details from referenced works, which this solution is based on and uses, are left out. This makes the article harder to read, and more material (formulas) could be concisely in-lined as a courtesy to the reader. * The article is rather short on related works, and also on the general subject of using neural networks on graphs and relational data. Since the target audience are researchers on semantic technologies, which up to now is not dominated by neural networks, more context is easily justified. * The Results paragraph in Section 5.1 suffers from much lower readability than the rest of the article, and could do with some tidying up. Deferring parameter settings to a table and providing a more concise description of the results would improve this section. Also, in my opinion, too much text is spent on describing results regarding FB15k, which the authors themselves say is of less interest due to a known flaw in many of the existing benchmarks. I would rather see that focus was placed on the results from FB15k-237, and that, in general, the results sections were shortened down. One possibility is to factor out some of the details from the empirical evaluations and place these in a technical note which is simply referred to. * There is a great deal of white-space around tables and figures, that should be trimmed away in the name of compactness. * I cannot find any reference to the solution source code, which I would prefer to be open source for easy validation of results.

**Review 3 (by Michael Granitzer)**

(RELEVANCE TO ESWC) The paper presents an approach for knowledge base completion, a highly relevant task within the Semantic Web and hence, the paper has a high relevance to ESWC. (NOVELTY OF THE PROPOSED SOLUTION) The authors apply the recently introduced Graph Convolutional Networks (GCNs) on knowledge base completion. While this in itself would not justify the novelty, they adapt and extend the GCNs in an appropriate manner, therefore adding novel parts. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) While the chosen route seems meaningful, the background information and discussion could be extendend for some parts. This concerns in particular the choice of the two regularizers. See als "Demonstration and Discussion of the Properties of the Proposed Approach". (EVALUATION OF THE STATE-OF-THE-ART) The work is put well into the context of the state-of-the-art: differences and similarities are discussed and comparison takes also place in the evaluation section. Further, issues are addressed were related work might not be perfectly comparable. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) While I generally like the demonstration/discussion of the properties, it could be extended, when it comes to the regularizers. From my point of view, they are at the core of the paper, as this is the part, where the GCNs are extended. Therefore, I would expect a deeper discussion on the background, reasoning and also their properties and influence. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) While a definitive judgment about the reproducibility is only possible after reproduction, the paper seems to contain all the necessary information on the setup, hyperparameters, etc. Also, the datasets used are common for the task. Provision of the evaluation code would be the last detail towards a strong accept on that point. (OVERALL SCORE) The paper presents an approach to knowledge base completion, in particular entity classification and link prediction, based on Graph Convolutional Networks (GCNs). GCNs are adapted by introducing to different regularizers, in order to account for realistic knowlegde base properties, i.e. to account for highly multi-relational data. Strong Points: 1) The paper takes a recent approach and applies it to a different domain by adapting/extending it appropriately 2) Dataset issues/details, which are not "visible from the outside" are discussed and explained in the evaluation section 3) The paper is well written and contains all the necessary information Weak Points: 1) The core point, i.e. the adaption of GCNs should be discussed more extensively (c.f. previous sections) 2) Counterpart to SP3: While the paper contains all the necessary information, slightly more explanation would be desirable, in particular in section 2, making the paper more self-contained (it was a bit hard to understand before reading the original GCN paper). An explanatory sentence here and there should suffice already. 3) A large scale knowledge base (e.g. DBpedia) might be added to the evaluation in order to account for scalability I would consider the statement that knowledge bases enable applications such as question answering and information retrieval as a well known fact in the community, which doesn't need to be backed up by so many references. Providing only a few would spare space for more interesting details (see above)

**Review 4 (by Steffen Remus)**

(RELEVANCE TO ESWC) The paper is about entity type classification and link prediction in knowledge bases. Hence, the paper is relevant for ESWC, but it is clearly written for the ML community. (NOVELTY OF THE PROPOSED SOLUTION) The proposed approach is an extension to existing Graph Convolutional Networks. It is hard to assess the novelty of this approach. (see my remarks below) (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Scalability seems to be a mayor claim of this paper, but unfortunately I can not find any arguments about runtime or complexity which shows that the approach really scales. (EVALUATION OF THE STATE-OF-THE-ART) Comparison to SotA is hard to assess, as several versions of this paper were already published on arXiv and it has already gathered citations which beat the systems performance, e.g. [1]. [1] Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel. 2017. Convolutional 2D Knowledge Graph Embeddings. arXiv (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The paper is very well written, but I would have wished to see some examples and some error analysis, i.e. what works, what not, and why. The related work section is very short and kind of misses out on the fact that there exists other approaches than tensor factorization techniques for knowledge-graph completion, e.g. see [2]. Adding a couple of sentences would definitely improve the paper for ESWC. [2] Victor Martinez, Fernando Berzal, and Juan-Carlos Cubero. 2017. A Survey of Link Prediction in Complex Networks (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The paper evaluates on standard benchmarks and reports replicated results as well as results taken from other papers systems. Unfortunately there is no source code provided. Maybe the authors will consider providing the source code, since this would further strengthen the paper. (OVERALL SCORE) The paper introduces R-CGNs (relational graph convolutional networks) which is an extension to GCNs (graph convolutional networks) and which is explicitly designed to handle highly multi-relational data. The approach is evaluated on entity type classification and link prediction, and shows partial improvements over some compared systems. SPs: - The paper is very well written, the equations seem complete and references are sufficient - The ideas in the approach are definitely worth publishing WPs: - The novelty of the approach is hard to assess, since it was previously published on arXiv and already gathered references which beat the systems performance (see my remarks above) - Scalability is a major claim, but I cannot find any backing remarks about the complexity or the runtime of the approach, neither qualitative (by mentioning the complexity class) or quantitative (by timing the experiments). - In order to provide proper replicability, some accompanying source code would be helpful My mayor concern lies in the comparison to SotA and the assessment of the novelty of the system, given that it was first published in March 2017 on arXiv, and already went through several revisions, and eventually also already gathered some references. I'm not against republishing arXiv papers on conferences -- and the idea of the paper is worth publishing in my opinion -- but the time that has passed now since first publishing is quite long already.

**Metareview by Achim Rettinge**

The authors present an extension of Graph Convolutional Networks (GCNs) to knowledge graphs. All reviewers agree that the work is relevant and novel. Only slight doubts remain concerning the reproducibility and demonstrations of the empirical results. The authors addressed them in their response and are requested to do so in the final version. Therefore, we recommend to accept the work.