Paper 194 (Research track)

Milan- Automatic Generation of R2RML Mappings

Author(s): Sahil Nakul Mathur, Declan O’Sullivan, Rob Brennan

Full text: submitted version

Abstract: Milan automatically generates R2RML mappings between a source relational database and a target ontology. It uses a novel multi-level algorithm to address the inter-model semantic gap by resolving naming conflicts and structural or semantic heterogeneity. This enables high fidelity mapping generation for realistic databases that are de-normalised or utilise features of the relational data model that do not easily map to RDF. Milan is unlike many state of the art mapping systems which first produce a direct mapping ontology, and then apply ontology alignment techniques. Despite the importance of mappings for interoperability across relational databases and ontologies, a labour and expertise-intensive task, the current state of the art has achieved only limited automation. An experimental evaluation of Milan with respect to the state of the art systems using the Relational-to-Ontology Data Integration (RODI) metric is provided which shows that Milan outperforms all systems in all categories.

Keywords: RDB2RDF; Automatic Mapping; Schema and Ontology Matching; OBDA; Mapping Rules; Linked Data

Decision: reject

Review 1 (by anonymous reviewer)

(RELEVANCE TO ESWC) The paper presents an approach for automated generation of R2RML mapping between a relational schema and an ontology.
As such, it clearly addresses a research topic relevant to ESWC.
(NOVELTY OF THE PROPOSED SOLUTION) The approach is based on a new algorithm that addresses shortcomings in current state-of-the-art-approaches, in particular related to complex correspondences (class - table, object property - referential integrity, datatype property - columns).
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Overall, the work is well motivated, requirements and use cases are clearly described, deficiencies in the state-of-the-art are explained. The presented approach appears novel.
However, the presentation makes it very difficult to follow and understand the main ideas.
The difficulties are on a number of levels: 
1) Syntactic and grammatical errors make the paper hard to read. (see list below) 
2) Semantic ambiguities and imprecisions force the reader to guess. Ion particular section 5)
E.g. “ The Datatype of data property is retrieved by querying its rdfs:label , rdfs:range and if present its owl:UnionOf .” How?
3) Some details are omitted in a way that might be considered offensive:
“ These query templates are trivial hence is left to readers.”
Examples might help in conveying the key ideas of the approach.
(EVALUATION OF THE STATE-OF-THE-ART) The evaluation results appear impressive, but they are so coarse granular that they only provided limited insights as to which elements of the approach are responsible for an improvement.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Some explanations are provided in the analysis if the results, but they are not substantiated experimentally.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The level of detail provided 
As the software itself is not published, the work is effectively not reproducible.
(OVERALL SCORE) The paper presents an approach for automated generation of R2RML mapping between a relational schema and an ontology.
The approach is based on a new algorithm that addresses shortcomings in current state-of-the-art-approaches, in particular related to complex correspondences (class - table, object property - referential integrity, datatype property - columns). 
Evaluations based on the RODI benchmark show results outperforming the state-of-the-art.
Overall, the work is well motivated, requirements and use cases are clearly described, deficiencies in the state-of-the-art are explained. The presented approach appears novel.
However, the presentation makes it very difficult to follow and understand the main ideas.
The difficulties are on a number of levels: 
1) Syntactic and grammatical errors make the paper hard to read. (see list below) 
2) Semantic ambiguities and imprecisions force the reader to guess. Ion particular section 5)
E.g. “ The Datatype of data property is retrieved by querying its rdfs:label , rdfs:range and if present its owl:UnionOf .” How?
3) Some details are omitted in a way that might be considered offensive:
“ These query templates are trivial hence is left to readers.”
Examples might help in conveying the key ideas of the approach.
The evaluation results appear impressive, but they are so coarse granular that they only provided limited insights as to which elements of the approach are responsible for an improvement. Some explanations are provided in the analysis if the results, but they are not substantiated experimentally.
In summary, while the results appear impressive at first glance, the benefits of the approach are not sufficiently well explained.
Typos etc:
page 9: rdfs:InverseOf does not exist. Do you mean owl:inverseOf?
page 9: owl:UnionOf -> owl:unionOf
page 10: ow:sameAs -> owl:sameAs
page 11: “in 5.4” -> “in Section 5.4”?
page 12: dont -> do not or don’t
desription -> description 
page 13: obect property -> object property


Review 2 (by Andrea Giovanni Nuzzolese)

(RELEVANCE TO ESWC) The automatic generation of R2RML mappings is a relevant topic to ESWC.
(NOVELTY OF THE PROPOSED SOLUTION) The paper proposes a number of novel solutions for generating R2RML mappings. Those solutions have been identified with an analysis of the state of the art.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The paper seems to be written in a hurry. There many typos and sentences that are not clear. Accordingly, the readability of the paper is negatively affected.
(EVALUATION OF THE STATE-OF-THE-ART) There is a fair analysis of the state of the art that allows to identify new requirements that are investigated in the paper in order to propose the new solution of R2RML mapping.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The section describing the experiment should be reworked as many aspects remain unclear.
In general the readability of the section is hard.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The software is not published.
It is not clear what part of RODi have been used.
(OVERALL SCORE) The paper presents Milan, a system that automatically generates R2RML mappings between a relational database and target ontology.
The paper need to be significantly improved and reworked. Hence, it is not worth to be published as it is in its current form.
*** PROS ***
- the analysis of requirement is fair;
- the analysis of the state of the art is good;
- the paper introduces a solution with a certain degree of novelty.
*** CONS ***
- the paper is hard to read;
- the are many aspects of the evaluation that are unclear. In general the evaluation seems to contain flaws;
- the experiments are not reproducible as none the software nor the dataset are published.
*** AFTER THE REBUTTAL ***
My impression is that the paper needs to be significantly improved in order to make it acceptable. There are a lot of problems that affects its readability and clarity. Hence, I keep my scores and I suggest to not accept the paper.


Review 3 (by anonymous reviewer)

(RELEVANCE TO ESWC) The topic addressed by the paper is very important and timely.
(NOVELTY OF THE PROPOSED SOLUTION) The proposed approach extend the state of the art approaches.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) With respect to automatic generation of mappings, the paper seems to address the most relevant aspects. I'd be curious to understand how Milan could be improved by human intervention.
(EVALUATION OF THE STATE-OF-THE-ART) The authors provide quite a clear comparison with competing approaches also summarized by a table.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The discussion is long and detailed, even if the paper is generally a bit hard to read and would benefit by better concrete examples.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The use of the RODI benchmark makes the evaluation very easily comparable to the existing approaches. However, since the authors claims that their approach takes into consideration several aspects neglected by the state of the art, the reader is left wondering whether the RODI benchmark is a "low hanging fruit" for comparison and what would happen with more difficult cases (e.g. a database with completely meaningless table/column names: in my experience I have seen real DBs using names like "a", "b", "c" that without a human interpretation were inaccessible).
(OVERALL SCORE) The paper presents the Milan approach for automatic generation of R2RML mappings.
The strong points are related to the fact that Milan considers and addresses a number of features that were only partially solved by the state of the art. Moreover, the evaluation on the RODI benchmark confirms the quality of the proposed solution.
The weak points are a general lack of clarity of the paper that I found quite hard to read and the doubts I expressed above w.r.t. the potential adoption of a different evaluation setting. Moreover, the Milan solution itself doesn't seem to be publicly available.


Review 4 (by anonymous reviewer)

(RELEVANCE TO ESWC) This paper is about R2RML mappings, which are part of the Semantic Web standards
(NOVELTY OF THE PROPOSED SOLUTION) Combines existing techniques into one system
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) To the best of my knowledge this is correct
(EVALUATION OF THE STATE-OF-THE-ART) Compares to existing systems using the RODI benchmark
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Experiments are presented. However, the system does not seem to be available
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Experiments are based on an existing benchmark RODI. Algorithms are presented but given issues with readability of the paper, it is hard to follow
(OVERALL SCORE) Summary of the Paper **Short description of the problem tackled in the paper, main contributions, and results**
This paper presents a system, Milan, which automatically generates R2RML mappings. It combines different functional processes together (naming, structural, semantic) in conjunction with the Hungarian Algorithm. The system is evaluated using the RODI benchmark and compared to similar systems. Milan outperforms all the other systems. Therefore I know that the system is doing something very good, however, the details of the system are very hard to follow and understand. 
Strong Points (SPs)  ** Enumerate and explain at least three Strong Points of this work**
- Combination of different techniques into one system
- The results of Milan in the RODI benchmark, which outperforms other systems
Weak Points (WPs) ** Enumerate and explain at least three  Weak Points of this work**
- The system relies heavily on the Hungarian Algorithm, however it is never explained. 
- Paper is very hard to follow. Does not have a nice flow, many grammatical mistakes. Very heavy on terminology. Seems like the paper was written in a rush. 
Questions/Comments to the Authors (QAs) ** Enumerate the questions to be answered by the authors during the rebuttal process**
1) It’s my understanding that the system relies heavily on the Hungarian algorithm. However, the algorithm is never introduced and how it relates to create mappings. This is what makes this paper weak. A clear description of the algorithm and it’s relationship to mappings is crucial. Examples should be included. Additionally, the paper lacks a high level intuition of Milan without diving into the details. 
2) The research question is overloaded. I would suggest to rephrase to the following: 
"How and to what extent can RDB2RDF mappings be automatically be created such that they are complete and accurate?"
This part "Milan, based on semantic, lexical and structural analysis of both a source relational database and a target ontology" is HOW you are planning to do it. It should not be part of the research question. 
3) The requirements seem to have been defined based on what the proposed system Milan can do. In other words, it seems to me that the requirements are biased towards Milan, when they should be independent and agnostic of systems. They state specific techniques (i.e. naming convention, tokenization, and token re-ordering.) The requirements should state the WHAT, not the HOW. 
Table 1 is missing R2, does that mean that existing systems do not satisfy that requirement? 
4) Readability of the paper.
The paper is very hard to follow. It completely lacks examples of mappings. The visuals of fig 1 are hard to follow, it should show an example database and ontology. 
There is a lot of jumping around, pointing to sections ahead. This is not an optimal style of writing.  As a reader, I should not be forced to jump around the paper. There should be a flow.
The paper dive into the NPD without explaining what that database is.
Many writing mistakes. These are just a few. 
- In page 2, the following is repeated: To date, most Relational database (RDB) to RDF mappings have additional complexity since the relational and RDF data models do not exactly align, have different expressivity and each emphasizes a different modeling repertoire [13].
- Contribution paragraph repeated.
- Several papers such as Tarasowa et al[18] and RODI benchmark [16] and RODI benchmark [11]
- D2QRQ
- "Hence for human validation of RDB to RDF mappings.” —> this is not a sentence
- “The algorithm first detects 1:1 followed n : 1 class-table relationships” not a sentence
The paper has to be revisited completely.
5) If this is an extension to [13], then what exactly are you extending? What are the new things? 
6) You should consider reviewing the paper "R2O, an extensible and semantically based database-to-ontology mapping language". http://oa.upm.es/5678/ Your Section 2  is similar to Section 3 of this paper. 
7) Is the system available to download and test?
=====
Comments after rebuttal
After reading the rebuttal, it confirms that this paper is 1) on the right track, however 2) still needs to be improved. First of all, the entire presentation of the paper needs to be revisited (research questions, examples, explanations, etc). I also did not get an answer for my question 6 and 7. 
Therefore my recommendation is to reject the paper. I look forward to seeing a next version of this work!


Metareview by Hsofia Pinto

The paper presents an approach for automated generation of R2RML mapping between a relational schema and an ontology an the topic is of interest for the ESWC community. However the approach is not clearly explained at an adequate level of detail, and the experiment descriptions lack detail so that they can be reproduced. 
Therefore, at its present state and after rebuttal and discussion among reviewers the paper cannot be recommended. 
However, authors are strongly encouraged to improve their work from the comments provided by reviewers .


Share on

Leave a Reply

Your email address will not be published. Required fields are marked *