Author(s): Tao Chen, Lan Wang, Dongsheng Wang
Abstract: Knowledge base is largely developed and utilized in both academia and industry fields, such as the most typical one – DBPedia, which is built on Wikipedia consisting of various languages. However, existing chinese knowledge bases are sort of independent because they are primarily extracted from other encyclopedia websites, i.e. Baidu Encyclopedia, instead of Wikipedia. Therefore, though the entity linking could be conducted between commonsense knowledge bases, the systematic alignment between properties or relations is absent, given that knowledge retrieving should based on a shared scheme. Our work is based on an assumption that the commonsense knowledge base should be concerning with similar amount and a similar set of properties; which should be independent of language itself. Therefore, we propose the SinoPedia, a Chinese Knowledge base extracted from Baidu Encyclopedia, that is aligned with DBPedia’s properties or relations by mapping their properties based on vector space model. The experiment shows that 81% of the infobox properties in Baidu Encyclopedia could be mapped to the properties in DBPedia ontology. In this way, it benefit us retriving knowledge based on a more shared scheme.
Keywords: Knowledge base; Semantic web; Linked data; Chinese Knowledge base