A Complex Alignment Benchmark- GeoLink Dataset
Author(s): Lu Zhou, Michelle Cheatham, Adila Krisnadhi
Full text: submitted version
Abstract: Ontology alignment has been studied for over a decade, and over that time many alignment systems and methods have been developed by researchers in order to find simple 1-to-1 equivalence matches be- tween two ontologies. However, very few alignment systems focus on find- ing complex correspondences. One reason for this limitation may be that there is no widely accepted alignment benchmark that contains such complex relationships. In this paper, we propose a real-world dataset from the GeoLink project as a potential complex alignment benchmark. The dataset consists of two ontologies, the GeoLink Base Ontology (GBO) and the GeoLink Modular Ontology (GMO), that were developed in consultation with numerous domain experts. The manually created alignment between these two ontologies includes 1:1, 1:n, and m:n equivalence and subsumption correspondences. The reference alignment for this benchmark is available in both EDOAL and rules syntax.
Keywords: Ontology Alignment; Complex Alignment; Benchmark
Review 1 (by Amelie Gyrard)
Summary: Two ontologies: GeoLink Base Ontology (GBO) and the GeoLink Modular Ontology (GMO) ontology design pattern, are considered for complex alignments to design the GeoLink complex alignment benchmark based on the real-world GeoLink dataset. The designed alignment follows EDOAL and rules syntax and with an open access license. There are 12 different kinds of simple and complex correspondence patterns in the GeoLink complex alignment benchmark. Resource: GeoLink Modular Ontology (GMO): http://www.geolink.org/ Alignment: http://dase.cs.wright.edu/content/complex-alignment-benchmark-geolink-dataset Ontology documentation: http://schema.geolink.org/1.0/base/main.html Strengths: • Developing a unified schema since all providers have their own schema for the GeoLink Project. The authors make the effort to understand the existing schemas already employed and design the one to unify them which is a tedious task. • Example provided for each pattern in section 4.2 • Hermit reasoned has been used to detect any consistencies • GeoLink Base ontology documentation provided: http://schema.geolink.org/1.0/base/main.html Weaknesses: • http://schema.geolink.org/ o Ok re-tested 30 January 2018 o does not work when tested (29 January 2018). Error: This site can’t be reached. • Explain more why table 2 is important, why did you decide to show this table? Suggestions for improvements: • Do you heel the provenance of the ontology/dataset when you change the namespace? • Explain more about EDOAL, provide background required for this paper. What is the main difference with Alignment API? • Why is it useful to provide alignment in description logic rule syntax? Explain more. • Does Hermit has been applied to both ontologies + dataset? Not explicitly written in the paper • Suggest your ontology on LOV: http://lov.okfn.org/dataset/lov/suggest?q=http%3A%2F%2Flode.bco-dmo.org%2Flode%2Fsource%3Furl%3Dhttp%3A%2F%2Fschema.geolink.org%2F1.0%2Fbase%2Fmain.owl • Learn more about semantic web best practices to encourage the reuse of your ontology o http://perfectsemanticweb.appspot.com/ • Explain more why this research work would be extremely useful to the community o Provide use cases etc. References: • Check book from Euzenat and Svaiko on Ontology Matching 2nd edition • Be aware of latest results: http://oaei.ontologymatching.org/2017/
Review 2 (by Stefano Faralli)
The authors presented the GeoLink Dataset. The resource is created as set of alignments (1:1, 1:n, and m:n) between the GeoLink Base Ontology and the GeoLink Modular Ontology. To the best of my knowledge this resource represents an important benchmark for the task of non trivial Ontology Alignement. I have only a minor complain about the quality of Figure 1, which is for some reasons quite confusing to me. I believe that the Figure may be improved both in terms of presentation and (more important) in terms of reperesentation of the use case. Authors may also provide a permanent URL to refer the resource, and in this way, be more consistent with the "good practices" for this kind of contributions. I think the authors adequately argued the reviewer's questions.
Review 3 (by Ernesto Jimenez-Ruiz)
First of all, I would like to thank the authors for the rebuttal. Some points are more clear now and I will keep my original score. The creation of a new benchmark in the OAEI would definitely strengthen the contribution as resource. -------------------------------------------------------- The paper presents benchmark that requires the discovery of complex alignments, such task is a very relevant contribution to the ontology matching community. My main concern about the presented benchmark is the complexity to discover the alignments. As it happened in the past (OAEI 2009) the benchmark may not attract enough systems and be discontinued. It is not clear how the authors plan to address such situation. For a complex alignment task, systems may need to overspecialize which is not always desired. To allow more generic system to participate, the benchmark should provide more than only the ontologies and the reference alignment. It should include (1) training data (1 or many input sets), and/or (2) guidance about what to extract, for example, given 1 (or more entities) specify which pattern is expected for them. Then the system is required to find suitable entities to fit the pattern. Guidance may also make the task very simple, so a trade-off should be found. Other comments - In the related-work section it is stated that all tracks involve 1-1 equivalence mappings. This is not completely correct: (1) Some tracks like the conference tracks include subclass of mappings in the reference alignment (see page 5 in [s1]). Others also welcome subclassof mappings for the evaluation. (2) The use of 1-1, 1-n and n-m is slightly different to the ones used in the Ontology matching community. When an alignment is 1-1 typically refer to the fact that entities from a source/target ontology can only be matched to an unique entity in the source/target ontology. In 1-n alignments, mappings like Person=People and Person=Human can be present in the alignment. I see the categories "1-1, 1-n and n-m" refer to the individual mapping and not to the alignment set. However this may cause confusion to the general reader unless it is clarified. - Figure 1 is not very representative. Perhaps a more concrete example would be more useful. - Page 5, it is strange that they could not find something similar to AgentRole in their schemas, since databases typically include n-ary relationships - I cannot see a 1-1 correspondence between the SPARQL query in page 6 and the pattern in Figure 2. Is the startTime and endTime available in the GBO-triples somehow? - In page 8-10, when the rule of the pattern is bidirectional. Thinking in a matching task to be given to a system, it makes sense to split the rule in two (e.g. for a given entity(ies) and a pattern find target(s)). The bidirectional pattern may be too complex. - The use of EDOAL seems to be a limitation. Why not using other representation language like SWRL? Are all the defined rules datalog rules? This will enable the use of (scalable) reasoning after the matching. - Complex mappings (especially the presented patterns) share characteristic with R2RML mappings [s3]. A similar standard could be used to migrate data from one ontology to another using SPARQL. For R2RML benchmarking there are some contributions in the literature (e.g. [s4]). Typos: - Page 3. Extra dot when referencing footnote Strong points: - Small benchmark but based on a real use-case. - Very welcome in ontology matching community. - The paper introduces some candidate patterns for the complex alignments Weak points: - Only two small ontologies in the benchmark. - It is unclear how this benchmark will attract generic alignment systems. - The benchmark/dataset is not available in a persistent URI nor includes a licence specification. - No evaluation has been presented nor state of the art system identified that could cope with such benchmark. Other more concrete questions for the authors: - Do you have concrete plans to define a new task within the OAEI? The HOBBIT platform wil open the door to new tasks involving a more complex evaluation. - How do you plan to address the problems previous complex benchmarks faced? - Have you taken into account the semantics of the complex alignment and their use in a subsequent reasoning step? - Are there available state of the art systems able to cope with the proposed benchmark? Suggested literature: [s1] Results of the Ontology Alignment Evaluation Initiative 2017. http://www.dit.unitn.it/~pavel/om2017/papers/oaei17_paper0.pdf [s2] Matching Disease and Phenotype Ontologies in the Ontology Alignment Evaluation Initiative. Journal of Biomedical Semantics 2018 [s3] https://www.w3.org/TR/r2rml/ [s4] RODI: Benchmarking relational-to-ontology mapping generation quality. Semantic Web 2018
Review 4 (by Cristina Sarasua)
This paper presents the alignment between two ontologies: GeoLink Modular Ontology (GMO) — developed as an ontology to integrate various data sets in geosciences and previously published  — and GeoLink Base Ontology (GBO) — a simpler version of GMO that the GeoLink project members used as an interface between GMO and the data providers, who did not feel comfortable when using the GMO constructions. GMO follows data modelling best practices, but the data providers in the project found the GMO constructions too complex. The main contribution of the paper is the definition of a set of so-called complex mappings between ontology elements, that involve more than one element of each ontology. The authors provided the alignment in the EDOAL format and Rule Syntax. While I acknowledge the need for more complex alignments between ontologies in general, the current submission provides a limited contribution for an ESWC Resources track paper. The authors present a collection of 12 patterns, from which the first 4 are simple patterns, the next 4 are alignments between classes and subclasses, and the next 4 are based on the idea property chains, introduced by others in the related work , as the authors specify. Therefore, this work is in a way the adoption of some predefined patterns and it is not clear how it covers new research challenges. Moreover, the authors do not report on any sort of evaluation of their solution — they just ensure that the modelled alignment does not produce inconsistencies according to HermiT. While this is a Resources track submission, and not a Research track submission, a resource that is supposed to be a benchmark should be somehow evaluated. All in all, I find the research direction very interesting, and I encourage the authors to continue working on new contributions in the field. This paper could be a valuable contribution to the OAEI workshop, because it describes the experience that authors made in a case study. The authors could elaborate further on problems they encountered and the measures they took to solve them. For the paper to be a Resources track, I would suggest that the authors investigate the application of further complex patterns across various domains and ontologies, to come up with more complex Alignment Ontology Design Patterns (http://ontologydesignpatterns.org/wiki/Submissions:AlignmentODPs). Additionally, I think this work is still too focused on manual intervention. Having ontology experts modelling the integration is usually a guarantee of high quality ontologies and ontology alignments. However, in order to populate the Web of Data with these complex patterns, the transformation method presented here (i.e. writing adhoc GBO to GMO CONSTRUCT SPARQL queries) and the manual identification of complex patterns will not scale. Therefore, I would strongly encourage the authors to work on (semi)-automatic methods that aid ontology experts and data providers curate these complex alignments. For all the reasons mentioned above, I recommend the paper to be rejected. **Positive aspects** * The Web of Data does not yet contain sufficient complex alignments, because the methods developed by the community fail to create them, even if the data and the ontologies do need them. * It is a good first step towards the adoption of complex patterns. * Section 4.2 explains the patterns with understandable examples. * The paper is in general well written. It could be a bit more specific, though. For example, when the authors mention that GBO was a “flatter” version, they should directly show an example of a construct in GMO that is directly flatter and explain its limitations. Ideally, the authors would use exactly the same example in the explanation about the CONSTRUCT SPARQL query. The text tries to give an example inline (section 3), but I would suggest that the authors provide the same running example across all sections and try to include code / graphic representation of the constructs immediately after mentioning the problem. **Negative aspects** * The paper lacks details about the process that the authors followed to identify all the mappings. * The contribution does not point to any evaluation of the proposed alignment. * The contribution is limited to two specific ontologies. Therefore, the paper does not prove how generalisable these patterns are (i.e. whether they fit in other scenarios and the extent to which they are actually needed in other alignment scenarios). * If GeoLink is an ontology integrating other data sets in the geosciences, the paper should at least contain a specific example of such integration — even if GeoLink is described in , this paper is about alignment and it should be self-contained. It would also give the reader a better understanding of the need for GeoLink and this alignment between GBO and GMO. Fig 1 could be replaced with an example where the authors provide an excerpt of the ontologies. * In the comparison to the related work, the authors say that their work is very much related to the work by Thiéblin et al. and they mention that the benchmark by Thiéblin et al. is under development. However, the authors do not specify the concrete commonalities and differences that their work has with the work by Thiéblin et al. Besides the two projects belonging to different topical domains, I assume the two approaches have conceptual and implementation differences. Moreover, according to the OM2017 program (http://disi.unitn.it/~pavel/om2017/papers/om2017_poster6.pdf and http://oaei.ontologymatching.org/2017/multifarm/index.html), and this other paper (https://www.researchgate.net/publication/321028656_Cross-Querying_LOD_Datasets_Using_Complex_Alignments_An_Application_to_Agronomic_Taxa), it looks like a contribution by Thiéblin et al. is already published. Therefore, the authors should elaborate the part of the text that just indicates that the benchmark is still under development, and indicate how they go beyond that work. * I guess the authors should find a different way to share a reusable data file with Rule Syntax, since PDF might not be the best way to receive data and parse it. * In terms of a Resources track submission, the paper has potential to advance Semantic Web methods and this kind of complex alignments will very likely be of interest to the community. However, as mentioned before, the work needs to be expanded. The authors might want to revise the criteria of the Resources track (https://2018.eswc-conferences.org/resources-track/) to document their resource further, as well as to ensure its sustainability and availability. At the moment, the authors only provided the .owl files, a PDF with the rules, a README listing the names of the files and the .xml with the EDOAL alignment. The Web site allows the user to browse through the elements, but the Resource is not well documented. * From the text, I guess I can infer that the formal definition of a complex alignment is a 1:n or m:n alignment. The paper should formally define that. ** After the rebuttal ** Thank you for answer. If, as the authors say in the rebuttal the "goal with this work is not to develop a synthetic benchmark to illustrate many types of complex relations but rather to evaluate the performance of complex alignment systems on a real-world alignment task", I think that the paper should provide details about the things that went wrong, things that went well, reasons for both and an exhaustive description of the evaluation process. Even if specific scenarios (and domains) trigger specific mappings, the mappings listed in this submission are defined as general alignment patterns, so I expect them to be generalizable. The direction of the work is interesting, but I still think that the submission is more suitable for a workshop like OAEI than for the Resources track of ESWC.
Review 5 (by Raphael Troncy)
This paper has been largely discussed among the reviewers and with the resources track chairs. While there is a consensus that the research direction is very interesting, and that the authors should be encouraged to continue working on this contribution, a number of significant weaknesses have been pointed out which prevent this paper to be readily accepted at this year ESWC 2018 Resources Track. In particular, the reviewers have wondered how the proposed work could be generalized beyond the two specific ontologies being used. They also suggest to address the lack of evaluation of the proposed alignments.