Indirect Matching of Domain and Top-level Ontologies using OntoWordNet
Author(s): Daniela Schmidt, Rafael Basso, Cassia Trojahn, Renata Vieira
Full text: submitted version
Abstract: Top-level ontologies play an important role in the construction and integration of domain ontologies, providing a well-founded reference model that can be shared across domains. While most efforts in ontology matching have been particularly dedicated to domain ontologies, the problem of matching domain and top-level ontologies has been addressed to a lesser extent. This is a challenging task in the field, specially due to the different levels of abstraction of these ontologies. In this paper, we propose an approach for matching domain and top-level ontologies that exploits existing alignments between WordNet and top-level ontologies, as an intermediate layer. We evaluate our approach in the task of matching the DOLCE top-level ontology to domain ontologies from the OAEI Conference track, with the help of the OntoWordNet resource. Our manually validated results may form a baseline for an OAEI task once there is no current track involving this kind of challenge.
Keywords: Top-level ontology; Ontology Matching; WordNet
Review 1 (by Stefano Faralli)
(RELEVANCE TO ESWC) The topic addressed in this work is relevant to the ESWC conference (NOVELTY OF THE PROPOSED SOLUTION) The big problem with this work is in my opinion related to the novelty and the contribution itself. I believe that the proposed baseline can't be considered a contribution to the community. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) I found confusing and incorrect the definition of: ctx_synset_i^e moreover why not using standard baseline similarity measure (e.g. cosine similarity) between weighted vectors (e.g. by frequency) of content words? (EVALUATION OF THE STATE-OF-THE-ART) The state of the art should discuss also the contributions from Word sense disdambiguation/Word sense induction/Entity linking ... In fact the proposed baseline is widely adopted in the above fields. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) One evaluator is not enough. I use to expected, in this kind of manual assessment settings, at least three experts. Usually from such minimal setting an inter-annotator agreement measure is also estimated. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The simple proposed system can be easly replicated, and the dataset ivolved are also reachable on the web (OVERALL SCORE) The presented work can't be considered as a mature contribution to the ESWC conference. Even the proposed future work it has been already widely studied in different areas. My suggestion to the authors is to review some surveys on Entity Linking, Information Extraction, Word sense disambiguation, Semantic similarities and try to develop a framework for the indirected alignment of domain ontologies trough existing knowledge bases. Even if the topic is widely studied there are a lot challenges still pending. An additional consideration about the experimental part of this work. By personal experience when aligning or inducing hierarchical data structure I know that errors committed at an high level where, (due to the nature of such structures) more abstract concepts are rooted may destroy and impact the corectness of all the direct and undirect relations to less abstract concepts. For that reason I believe that to correctly evaluate the performance of the alignment one should also estimate the performance in terms of the correctness of the resulting taxonomy structures. For the above experiment I would like to suggest the author to have a look to the TExEval evaluation task from the SemEval.
Review 2 (by Lavdim Halilaj)
(RELEVANCE TO ESWC) Addressing the alignment of different ontologies deserves attention and is an important aspect to spread the use of the Semantic Web Technologies. It has been addressed for a long time by different authors in their work. (NOVELTY OF THE PROPOSED SOLUTION) The presented approach is novel. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The approach is clearly described, and with good formalization. (EVALUATION OF THE STATE-OF-THE-ART) The state-of-the-art is presented in a well-structured way with information about the current initiatives and similar approaches. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Although the number of demonstrations described in the paper is limited, they are good enough to help the reader to understand the approach. However, more information about the generability, scalability, efficiency, etc is missing. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The evaluation is to some extent general, but couldn't find any reference to the implementation in order to enable the reproducibility. (OVERALL SCORE) The paper presents an approach to match domain and top-level ontologies by leveraging existing alignments that exist between top-level ontologies and WordNet. The context of the entities which are constructed from their meta-information and neighborhoods are used to realize the matching process. Addressing the alignment of different ontologies deserves attention and is an important aspect to spread the use of the Semantic Web Technologies. It has been addressed for a long time by different authors in their work. Strong Points (SPs): - The paper is mostly well written, clearly describe the motivation, and the steps followed in order to match domain and top-level ontologies. - The presented approach is presented well, and with good formalization. - Important aspects about the evaluation, including some of the limitations are discussed. Weak Points (WPs): Although the topic is very important, there are some major issues that need to be elaborated in order to achieve a mature state of the paper, which I would list as below: - According to the authors, the domain ontologies are enriched with missing definitions for the concepts from Cambridge dictionary. Such enrichment is problematic, since it might be seen as biased. Further, they claim that the added definitions "...are conform to the meaning of the ontology concept in the chosen domain". This is a hard-conclusion, since it easily to provide some definition for a concept using completely different words. - The authors said that the approaches presented  and  are similar to theirs which are focused on the alignment of multiple ontologies using pairwise ontology alignment and alignment through a reference ontology. Those approaches seem to be more generic than aligning domain and top-level ontologies. - Not reproducible, or at least I couldn't find any reference to the implementation of the tool or any prototype demonstration. - In "manual evaluation", only one expert is asked to validate the identified correspondences. - According to my opinion, the baseline for the evaluation should be the total number of correct correspondences that one could manually do. - Missing information about the domain ontologies used in the evaluation such as number of concepts, properties, etc. This would give an impression how scalable is the proposed approach. Questions to the Authors (QAs) - How generic is the proposed approach? To what exten it can be used to match any domain ontology with top-level ontologies? - Rel. to the presented results, it seems that the number of correctly identified correspondences is the same in both cases, using original and enriched version ontologies. Why is this happening? Did the authors say the approach strongly relies on the context of the concepts? It seems the context of the concept does not play a crucial role, as also stated later by authors "... the adopted description (from the dictionary) may benefit or not the approach to be effective in the selection of the correct synset...In fact, we expected that the descriptions would improve the synset selection and therefore produce an impact on the alignments, however the improvements were not that significant between the two versions"? This again might lead to biased selection of the definitions as noted above. - "we can observe that the expert considered a reasonable number of correspondences as correct" - How much "a reasonable number" exactly is? In percentage? Minor issues: - Section 4 - "In this section, we present our approach to match top-level and domain ontologies..." - I think it should be "...to match domain and top-level ontologies" to keep the consistency of the bottom-up approach presented here. - Many sentences are too long, which forces the reader to read twice or more in order to grasp them. - Table 1 and 2 -- would be good to switch the order by first showing the table 2 following the description in the paper. - Not completely clear what are "top-level" alignments? - Please clarify what the "stop words" are? ********* COMMENTS AFTER REBUTTAL *********** I thank the authors for their rebuttal. I cannot fully recommend the paper for acceptance at this stage, but I hope that comments from all reviewers have helped the authors in better positioning, evaluating and arguing for their work in future submissions.
Review 3 (by Ernesto Jimenez-Ruiz)
(RELEVANCE TO ESWC) The paper presents an approach to match domain ontologies to top level ontologies using OntoWordNet. The topic is not only interesting for the ontology matching community but also to the Semantic Web community. (NOVELTY OF THE PROPOSED SOLUTION) Although there are not many available systems able to automate the alignment between domain and top level ontologies, the core proposed solution is reduced to a comparison with WordNet synsets (and their context), and similar approaches have already been used in the literature. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The calculation of the context may introduce noise. The evaluation confirms the reviewer's intuition. How the proposed method could be extended to other domains and using other top ontologies like BFO? The defined method seems to be driven by the selected domain ontologies and top ontology. Using different ontologies and different top ontologies may require the use of background knowledge other than WordNet/OntoWordNet. (EVALUATION OF THE STATE-OF-THE-ART) As mentioned below the current obtained results among system are not fully comparable. Not fully relevant to the topic addressed in the paper but AML, GOMMA and LogMapBio could also fi in Section 3.3. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The approach has been evaluated but with a limited set of ontologies and one top ontology. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experiments should be easy to reproduce if the validated alignments are made available. However, the experiments is very focused and it does not give proper insights of the evaluated systems. A comparison with respect to the selected synsets by other system (using WordNet) would have been more meaningful. (OVERALL SCORE) I would like to thank the authors for their response. I agree with them that the topic is very relevant and the ideas behind the paper are novel, but I believe the work is still in a preliminary stage. Nevertheless, I encourage the authors to continue working in this line an strengthen their contribution. ----------------------------------------------- Perhaps the main criticism is the research novelty of the presented approach. In the current state it seems to be the calculation of contexts and the linkage to WordNet synsets, which is not completely novel. The paper could be a better fit to the Resources track as a new OAIE track to match domain ontologies against top level ontologies. The topic is interesting and this review should not discourage the authors but motivate them to improve their approach and create a more generic solution. Comments: - I may have missed something but it is not completely clear why OntoWordNet is required instead of WordNet since in Section 4.1 domain concepts are linked to WordNet synsets and there exist an alignment between DOLCE-LP and WordNet. Perhaps the role of OntoWordNet could be emphasized. - The way the contexts are calculated may introduce quite a lot of noise since not only synonyms are taken into account but also descriptions and sub/superclasses (Are also annotations of sub/superclasses considered?). The evaluation confirms this fact as the Precision of the presented approach is low and the introduction of additional descriptions does not help. Perhaps splitting the context would be beneficial. One context of synonyms to find suitable synsets and another context of other annotations and sub/superclasses to solve ambiguities when several candidate synsets are found. The domain of the ontology itself could also help in the disambiguation. Adding a threshold for the minimum overlapping may also avoid non suitable synsets. - The assumption that the entities in the domain ontologies have suitable synsets in WordNet may be too strict for some domains. In the current state, the proposed approach may not be applicable for any set of domain ontologies and top ontology. - Compound labels have been preprocessed. Are only single word terms considered? Not clear the motivation of this step. - For the case of "Contribution" I believe the context given by the ontology itself would also be helpful to disambiguate which was the intended meaning. The current context probably added noise and mislead the synset discovery. - Regarding the evaluation: * The precision is expected to be low given the introduced noise in the context definitions. However, the Recall is also lower than what one would like. Any insights on the missing links? * The provided results are not fully comparable. A more realistic comparison would have been to use the WordNet synset(s) an OAEI system would chose for a given domain concept and compare Step 1 in the paper's approach. Strong Points: - Interesting topic worth investigating - Performed user validation - The outcome of the paper may also be a new resource: a new dataset to evaluate alignment systems Weak Points: - Generality of proposed approach - Obtained results - Introduction of noise in the context evaluation - Novelty and research contribution Questions for rebuttal - Which is the role of OntoWorNet? Would not have been WordNet enough? - How the presented approach could be adapted to other domain ontologies and top ontologies? - Which is the main research contribution/novelty of the presented approach? Minor comments/typos: - References  and  are missing conference/journal information (I did not check all) - In the equations to calculate the context "or" would be more suitable than "union" - Page 9: * relies -> rely * we consider the both... -> remove "the" * relies with -> relies on - Page 12: missing references to AML, LogMap and POMap - Page 13: The numbers in second paragraph does not seem to match.
Metareview by Christoph Lange
The reviewers agree on a number of issues with this submission. There is no general agreement on the value of the proposed baseline as a contribution of its own. Major unclarities include the generalisability of the approach and the role of OntoWordNet as a foundation. The paper should demonstrate a broader awareness of state of the art. The evaluation with just one human annotator was not found sufficient.