Paper 45 (Research track)

SinoPedia – A Chinese Knowledge Base aligned with DBPedia by Relations

Author(s): Tao Chen, Lan Wang, Dongsheng Wang

Full text: submitted version

Abstract: Knowledge base is largely developed and utilized in both academia and industry fields, such as the most typical one – DBPedia, which is built on Wikipedia consisting of various languages. However, existing chinese knowledge bases are sort of independent because they are primarily extracted from other encyclopedia websites, i.e. Baidu Encyclopedia, instead of Wikipedia. Therefore, though the entity linking could be conducted between commonsense knowledge bases, the systematic alignment between properties or relations is absent, given that knowledge retrieving should based on a shared scheme. Our work is based on an assumption that the commonsense knowledge base should be concerning with similar amount and a similar set of properties; which should be independent of language itself. Therefore, we propose the SinoPedia, a Chinese Knowledge base extracted from Baidu Encyclopedia, that is aligned with DBPedia’s properties or relations by mapping their properties based on vector space model. The experiment shows that 81% of the infobox properties in Baidu Encyclopedia could be mapped to the properties in DBPedia ontology. In this way, it benefit us retriving knowledge based on a more shared scheme.

Keywords: Knowledge base; Semantic web; Linked data; Chinese Knowledge base

Decision: reject

Review 1 (by Chenyan Xiong)

(RELEVANCE TO ESWC) The construction of knowledge graph and the alignment are one of the core tasks in semantic web.
(NOVELTY OF THE PROPOSED SOLUTION) It is a straightforward way of using embeddings to align relations. The attribute filtering is rule-based.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The experiments are not comprehensive enough to demonstrate the contribution of this paper.
(EVALUATION OF THE STATE-OF-THE-ART) Little comparison is done w.r.t. state-of-the-art.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Not many analyses of the proposed approach are provided.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The proposed methods are simple thus easy to reproduce.
(OVERALL SCORE) This manuscript constructs a Chinese Knowledge Base that is aligned to DBPedia. It extracts entities and triples from two Chinese common knowledge bases, Baidu Encyclopedia and Hudong Encyclopedia. It uses a rule-based method that filters the schemas in the two encyclopediae to keep the most popular and useful ones. To align the filtered Baidu Encyclopedia to DBpedia, the authors first translate the Chinese triples using a commercial translation API and then aligns them based on word2vec similarities with DBpedia’s schemas.
There is definitely a need for high quality and large scale Chinese commonsense knowledge bases. The current large scale knowledge bases often do not have sufficient Chinese triples, for example, in Freebase, YAGO, and DBPedia. The rule-based schema filtering and also the word2vec based schema alignment both are reasonable and seem to work.
However, this manuscript is more about a `work-in-progress’. Many aspects of this work need improvements to make this a finished paper.
First, the experimental study needs to be more thorough. Currently, there is basely any baseline compared. Without a fair comparison with previous approaches, say, the size and schema of other Chinese knowledge base, the Chinese version of DBpedia, or other alignment technologies, it is hard to verify the stated contributions of this paper. How the labels are generated in evaluating the alignment part also needs more details and justification. How many annotators are used? How they agree with each other?
Second, there should be more analyses of the different components of the system. It is not clear how accurate the translation API works, nor the influence of schema filtering in the alignment accuracy. More studies are required to justify the technical contribution of this paper.
Third, the related work is not complete. Similar to the baseline part, there have been other Chinese knowledge bases; many alignment approaches have been developed. More discussions about how this manuscript stands within previous work are necessary.
The writing quality also needs improvement. Many details of the proposed methods are hard to figure out. A fair amount of guessing is required to picture what the authors did.
Overall, I suggest to reject this paper and hope the authors can improve the quality for the next iteration.


Review 2 (by Petya Osenova)

(RELEVANCE TO ESWC) The paper falls into the scope of the conference, since it presents a new semantic web resource, built by mapping properties of a Chinese knowledge base to DBPedia.
(NOVELTY OF THE PROPOSED SOLUTION) The parameters of novelty are related to:
- the mapping between resources that reflect two different languages;
- the mapping of a locally designed resource to a well-established one,
and thus handling discrepancies and relation asymmetry;
- using word embeddings for improving the classification
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The observations on property distribution are quite informative. However, the property clustering and classification part needs more clarification and detailness with respect to the proposed process. Also, the results section seems very schematic, and I lack certain connections among the presented statements.
(EVALUATION OF THE STATE-OF-THE-ART) I think that the Related work part is too general, and thus not so closely related to the reported work. Also, I would expect more sources in the References section.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The proposed approach relies on the E-RDF model for selecting relevant properties among the huge quantities. As I mentioned above, although the task and the general idea are clear and justified, there is no enough substance in the clustering and classification explanation.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Unfortunately, the resource itself does not seem to be publicly accessible. The idea is reproduceble, but I am not sure that this is true for the used model and data.
(OVERALL SCORE) The paper aims to merge a specific Chinese knowledge base to DBPedia on properties level. The outcome seems promising. 
Strong points:
==============
- merging resources of different construction and language type into a new resource
- handling asymmetric property mapping
- using state-of-the-art techniques for improving classification
Weak points:
==============
- English is not good neither in grammar, not in phrasing - needs improving
- results section is very schematic and lacks a focused discussion
- too broad Related Work section and insufficient references
Questions:
==============
- Is there any error analysis to be provided?
- Are there any false positives/negatives in the dropping properties process?
- How good is the Translation step, presented in Fig. 1 and how does it influence next steps?


Review 3 (by Nitish Aggarwal)

(RELEVANCE TO ESWC) Knowledge base alignment is relevant problem to ESWC.
(NOVELTY OF THE PROPOSED SOLUTION) There is lack of novelty as the paper does not proposed any new method in comparison to existing methods of knowledge base alignment.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) It is hard to understand overall proposed approach and the contributions are not clear.
(EVALUATION OF THE STATE-OF-THE-ART) There is no evaluation and comparison with any state of the art method.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The paper describes the results of achieved alignment, however, it requires deep insight of different steps in the proposed solution.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) It is very to understand the proposed approach and the paper has many grammatical error.
(OVERALL SCORE) The paper presents an approach align Chinese knowledge base with DBpedia. The proposed approach first translate the properties in Chinese knowledge base into English and then finds the best match using similarity based on word2vec scores. 
- Strong Points (SPs)
1. Addresses an important problem of different knowledge bases alignment. 
- Weak Points (WPs)
1. The paper is poorly written and it is very hard to follow. 
2. The contributions and proposed approach is not clear, need some formalism.
3. lack of novelty in the proposed approach.


Review 4 (by Olga Streibel)

(RELEVANCE TO ESWC) SinoPedia is very relevant to the ESWC.
(NOVELTY OF THE PROPOSED SOLUTION) The authors make at least the difference when comparing to other existing approaches to Chinese Knowledge Base as they align better with DBPedia.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The overall contribution could be better described, the results could be better objectified.
(EVALUATION OF THE STATE-OF-THE-ART) The SOA is provided in sufficient way.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The discussion and demonstration of the proposed accept is fair.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The results are fair, I am unsure about their reproducibility.
(OVERALL SCORE) Site 2, Spain - Spanisch
The paper SinoPedia-A Chinese Knowledge Base aligned with DBPedida by Relations and E-RDF focuses on the challenge of creating a Chinese Knowledge Base which is based on 
the common shared scheme. The proposed SinoPedia is being created in alignment with DBPedia. 
The difference that the contributors provide comparing to other Chinese Knowledge Base creators is that the SioPedia approach 
aligns with the scheme of a different language knowledge base which is built from local data sources towards the scheme of DBPedia.
One of the contribution is also the online tuning of word vectors for approaving the accuracy of classification while creating the knowledge base.
After the introduction in which the authors clearly state their contribution and short compare it with other approaches, in Section 2 the related work is being discussed. Most relevant 
knowledge base are mentioned here and the references are provided.section 3 describes the Construction methodology for SinoPedia. In this, the authors refer to their
previous work on a so called E-RDF Model. This Entitiy Relationship model allows for expressing RDf scheme as an ER diagram. It would be helpful here to have 
a visualisation or any diagram provided as an example (one does not want start reading another publication in order to understand this one. It would be better to priovide here
a visualized summary of E-RDF).Combined into the E-RDF mentioning, the authors start describing the details on scroing calculation. It is a bit difficult to understand if the scoring
is any standard scoring or something "new" proposed by the authors and if the scoring is connected to E-RDF or not. In the subsection 3.2 the property clustering and classification is described. 
Figure one shows the process, however it is on a very general level. One can follow that the Baidu triples are being re-modelled with the help of DBPedia ontology-is it that what you aim to
explain with your picture?
In the Results Section, the authors discuss their findings and results. 
The paper is overall well written and good structured. I would suggest to have a better representation of the E-RDF work done earlier and more clear distinction of the contributed online tuning of words.
Also, I could not follow why there are only two examples shown in Results, the Book and the Biological Species Class. I am not sure if the results can be overall justified as objective but they look promising.
I suggest spell check/language correction, e.g. correct Spain into Spanish (Page 2).


Metareview by Christoph Lange

The main concerns of the reviewers, which justify rejection, are the following:
* It is not clear how well the individual components work.
* The description of several components requires more detail (e.g., property clustering and classification).
* The role of E-RDF not clear.
* The approach has not been evaluated against a baseline.
* The related work covered is too general.
* The resource is not publicly available.


Share on

Leave a Reply

Your email address will not be published. Required fields are marked *