Paper 37 (Research track)

Using Link Features for Entity Clustering in Knowledge Graphs

Author(s): Alieh Saeedi, Eric Peukert, Erhard Rahm

Abstract: Knowledge graphs holistically integrate information about entities from
multiple sources. A key step in the construction and maintenance of knowledge
graphs is the clustering of equivalent entities from different sources. Previous
approaches for such an entity clustering suffer from several problems, e.g., the
creation of overlapping clusters or the inclusion of several entities from the same
source within clusters. We therefore propose a new entity clustering algorithm
CLIP that can be applied both to create entity clusters and to repair entity clusters
determined with another clustering scheme. In contrast to previous approaches,
CLIP not only uses the similarity between entities for clustering but also fur-
ther features of entity links such as the so-called link strength and link degree. To
achieve a good scalability we provide a parallel implementation of CLIP based on
Apache Flink. Our evaluation for different datasets shows that the new approach
can achieve substantially higher cluster quality than previous approaches.

Keywords: Link; Link strength; Knowledge graph; Clustering; Overlap resolve; CLIP

Share on

Leave a Reply

Your email address will not be published. Required fields are marked *