Paper 30 (Research track)

Addressing Big Data Variety via Type Information Identification

Author(s): Ruth Frimpong, Matt Selway, Wolfgang Mayer, Markus Stumptner

Abstract: There are lots of Linked Data (Big Data) out there but they use lots of different ontologies to describe the data. The ontologies sometimes are relatively coarse grained taxonomies. That is, either the type to which an entity belongs to is not known or the type is too general to be used for ontology/schema matching. For effective matching, you may need to recover more fine grained structure. In this work, we propose to deal with coarse type information and the granularity variety challenge of Big Data. We present it as a clustering problem and discuss the features required to obtain useful solutions. Since Big Data has more instance features with relatively less schemas, we present a novel clustering algorithm (ExTypifier) which is an extended version of TYPifier that addresses the above problems by inferring fine grain type information from data instances to bring the entity type to the same level of granularity. Furthermore, we present the experimental results which show the effectiveness of ExTypifier in addressing the granularity problem and demonstrates improved results over the original TYPifier algorithm.

Keywords: Big Data; Granularity Variety; Entity Type; Hierarchical Clustering; Ontology Matching; Type Information

Share on

Leave a Reply

Your email address will not be published. Required fields are marked *