Semantic Integration of Geospatial Data from Earth Observations through Topological Relations
Author(s): Helbert Arenas, Nathalie Aussenac-Gilles, Catherine Comparot, Cassia Trojahn
Full text: submitted version
Abstract: Earth observation is a rapidly evolving domain. Recently launched satellites, which deliver between 8 and 10TB of image data per day, open emerging opportunities in domains ranging from environmental monitoring to urban planning and climate studies. However, domain-oriented applications require raw image metadata to be enriched with data coming from various sources (either static or dynamic), in order to support decision-making processes related to the observed areas. One of challenges to be addressed concerns the integration of heterogeneous data highly relying on spatio-temporal representations. This paper presents a semantic approach to integrate data with the aim of enriching metadata of satellite imagery with various open data sets that are relevant to describe Earth Observations for a particular need. We propose a semantic vocabulary that specializes standards (like SOSA, GeoSPARQL) as well as a process – based on spatial and temporal features – to select, map and integrate heterogeneous geo-spatial data sets. This process relies on image tiles to handle data with a fixed spatial component while the temporal relationships are calculated on the fly based on temporal topology.
Keywords: earth observation data; satellite imagery; semantic integration; vocabularies and ontologies; Ontology-based Data Integration
Review 1 (by anonymous reviewer)
(RELEVANCE TO ESWC) The paper presents an architecture for using Semantic Web technologies in mapping earth observation data. Hence, it is relevant to the conference. (NOVELTY OF THE PROPOSED SOLUTION) The approaches used in the paper are not clearly motivated (see review). Moreover, there is absolutely no evaluation. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) See above. No evaluation available. (EVALUATION OF THE STATE-OF-THE-ART) See above. No evaluation available. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The authors provide access to data, which shows that they did use real data indeed. However, they do not show that their approach really works in a reliable fashion. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) No experimental study. (OVERALL SCORE) Summary of the paper The paper aims to compute enriched geospatial data with temporal relations. The authors propose (1) a vocabulary and (2) an integration process based on the topology of the entities. After a presentation of the goals of their work, they present work related to publishing and linking earth observation data. There has been some work on geo-spatial and temporal linking in the recent years, which is not considered by the authors. These works include: - Panayiotis Smeros, Manolis Koubarakis. Discovering Spatial and Temporal Links among RDF Data. Proceedings of the Workshop on Linked Data on the Web, LDOW 2016, co-located with 25th International World Wide Web Conference (WWW 2016) - Mohamed Ahmed Sherif, Kevin Dreßler, Panayiotis Smeros, Axel-Cyrille Ngonga Ngomo. Radon-Rapid Discovery of Topological Relations. AAAI, 2017 - Kleanthi Georgala, Mohamed Ahmed Sherif, and Axel-Cyrille Ngonga Ngomo. An Efficient Approach for the Generation of Allen Relations. Proceedings of the 22nd European Conference on Artificial Intelligence (ECAI) 2016 I'd suggest that the authors discuss the difference between their work and the state of the art in geo-spatial and temporal link discovery. In section 3, the authors present the architecture of their framework. Thereafter, they present their integrated vocabulary. While both sections describe the work done in a clear fashion, it is unclear whether the authors really compare the approach taken with other alternative. Why particular vocabularies were chosen is motivated in 1-2 sentences (e.g., improved reasoning capabilities in GeoSparql) but specifics would really help here (i.e., a clean comparison of alternatives, a clear set of requirement and an objective selection of vocabularies based thereon). Section 5 presents the data used within the use case presented in the paper. Both dynamic and static data is used. The authors use a mapping tool to generate RDF from JSON. They claim that the conversion is not easy using standard tools (which is most probably the case) but they do not say why this is the case. Finally, the authors rely on shapely to compute geo-spatial relations. Overall, I'm afraid the paper reads more like a deliverable for the SparkInData project. The authors most probably created a reliable piece of software for the purposes aforementioned but they fail to motivate 1- the choice of vocabularies 2- their architecture 3- the data selection 4- their integration approach. Hence, while the paper would most probably be a valuable "in-use" contribution, I cannot recommend it for the research track. Strong points + Timely and relevant paper + Solution seems to be implementable + Good reuse of vocabularies Weak points - Data and solution choices barely motivated - No evaluation - State of the art missing rather relevant papers Questions to the authors While I do some questions (see the summary), I'd simply suggest that the author rework the paper completely and provide a proper evaluation of their approach. Minor an homogeneous => a homogeneous Font mismatches in 5.1: Might be my reader (Acrobat Reader DC) -- After rebuttal: Looking forward to the updated version of the paper.
Review 2 (by Payam Barnaghi)
(RELEVANCE TO ESWC) This work provides an RDF/SOSA/GeoSPARQL based mode for integration of heterogeneous satellite image data. It uses metadata descriptions and linked data model to make the data more accessible and integrated with other existing sources. (NOVELTY OF THE PROPOSED SOLUTION) The paper presents a model based on the common semantic web representation frameworks and query language and describes a system for creating, storing and querying the metadata. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The work presents a good approach to solve an interesting problem; however the implementations details, evaluation of the system and complexity and efficiency of the proposed solutions are not sufficiently described and discussed. (EVALUATION OF THE STATE-OF-THE-ART) The paper provides sufficient overview of the related work and describes the background concepts and models. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The work lacks detailed evaluation and discussion of the complexity and performance issues. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The proposed system and the annotation methods should be made available to support the reproducibility of the work. (OVERALL SCORE) This is an interesting work but requires further evaluation and discussion. This work will be a very good poster/short paper at this stage.
Review 3 (by anonymous reviewer)
(RELEVANCE TO ESWC) The topic of this paper is the management of geopstial data using Semantic Web technologies. The topic is quite relevant to ESWC and fits into the track well. (NOVELTY OF THE PROPOSED SOLUTION) appreciate the contribution as a complete framework. However the proposed approach seems rather straightforward. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The approach is a practical framework and serves the goal of the semantic spatial data management. (EVALUATION OF THE STATE-OF-THE-ART) Some state-of-the-art is provided but should be extended. Some recent work on OBDA/I and geospatial data have not discussed. In particular, there is a line of research by Bereta et al. extending the Ontop system [c] (formerly called Quest ) to support GeoSPARQL  and has benn used by serveral use cases [b]. There are also a few triple stores supporting GeoSPARQL, e.g., Stardog, Strabon, and Oracle Spatial and Graph. [a] K. Bereta and M. Koubarakis. Ontop of geospatial databases. In The Semantic Web – ISWC 2016: 15th International Semantic Web Conference, Kobe, Japan, October 17–21, 2016, Proceedings, Part I, pages 37–52, Cham, 2016. Springer International Publishing. [b] S. Brüggemann, K. Bereta, G. Xiao, and M. Koubarakis. Ontology-based data access for maritime security. In Proc. of ESWC, 2016. [c] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk, M. Rodriguez-Muro, and G. Xiao. Ontop: Answering SPARQL queries over relational databases. Semantic Web Journal, 2017. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) There are extensive discussions and examples are the proposed approach. However, what is missing is some performance study. The authors should explain at least some qualitative study of the performance to see whether the approach can be implemented efficiently. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The framework is general and easy to understand. There is no experimental result in this paper. (OVERALL SCORE) Summary of the Paper ** Short description of the problem tackled in the paper, main contributions, and results** This paper proposed a framework for managing and integrating geospatial data from earth observation. The framework relies on standard Semantic Web technologies and is essentially an ETL pipeline converting data from different datasources in different formats into RDF/OWL. The authors also proposed a semantic vocabulary for the task. However, there are mainly two major issues: (1) the approach seems rather straight forward, (2) the evaluation is completely missing. Strong Points (SPs) ** Enumerate and explain at least three Strong Points of this work*** - The topic is highly relevant to ESWC. - The framework is easy to understand - There are many examples explaining the main points. Weak Points (WPs) ** Enumerate and explain at least three Weak Points of this work*** - Some related works are missing (see above) - It is not clear the performance and scalability of such a framework - what is the technical challenge? - References need to be cleaned up. E.g., -  should be inlined into  and . -  dl-lite -> DL-Lite -  w3c -> W3C Questions to the Authors (QAs) ** Enumerate the questions to be answered by the authors during the rebuttal process** - What is the technical challenge of this work? - How about the performance and scalability? Is there any experiment? - What are the lessons learnt through this study? - Are there any practical use cases of this framework (apart from collecting the data)? - Whether the tools (e.g. the Python library) developed in this framework will be made publically available? ------------- I acknowledge that I have read the rebuttal.
Review 4 (by anonymous reviewer)
(RELEVANCE TO ESWC) The topic is clearly relevant for ESWC. (NOVELTY OF THE PROPOSED SOLUTION) There is no original research contribution, just an execution of well-known steps without an evaluation. The idea of integrating raster data with sensors is certainly very nice but of course not new. The same is true for the topological relations. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) There are some concerns wrt the modeling that I will describe below. Another major concern is the lack of information about the topological relations. This should have been the main part of the paper (given the title) but the paper lacks all details. (EVALUATION OF THE STATE-OF-THE-ART) Some work is missing, but overall the references are fine [assuming this is meant by Evaluation of the State-of-the-Art] (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) This is my main concern. The paper discusses a sequence of steps without showing the results, lessons learned, and shortcomings. That said, the goal is certainly noble and hopefully will be successful. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) See above. (OVERALL SCORE) The paper entitled 'Semantic Integration of Geospatial Data from Earth Observations through Topological Relations' presents a sequence of steps taken by the authors to integrate geospatial data, e.g., from remote sensing missions, with other data sources using spatial and temporal relations. The paper is clearly relevant for ESWC 2018 and aside of minor typos and language issues, easily readable and well motivated. Interestingly, while the paper starts very promising, there is a substantial drop regarding quality and structure towards the end with the conclusions almost appearing out of the blue. There is no clear evaluation, no use case beyond a few code snippet examples, no detailed lessons learned, and so forth. Most importantly, the paper remains entirely unclear about its main part -- topological relations. We learn nothing about them and the accuracy of extraction aside of the software library used to extract them. This is very worrisome as it is well-known that topological relations (often) cannot be extracted from geometry alone. To put it a bit provocatively, this is what the paper has to say about the extraction of topological relations "In our approach, we use a python script to calculate the topological relationships between instances of classes." I also found the sentence "[SOSA] describes sensors and their observations, the involved procedures, the studied features of interest, the samples used to do so, and the observed properties." in the paper. If I am not mistaken, this is taken from the very first sentence of the official W3C specifications for this ontology (https://www.w3.org/TR/vocab-ssn/) without quotation marks or any reference to the ontology or specs as such. Another issue relates to the modeling choices taken by the authors. The paper states that "In our model, the class eom:Footprint specializes both geo:Feature and sosa:FeatureOfInterest: a footprint is a closed polygon (a geometry) that represents the geographic area covered by the image." and later that "The specific geographic position of the measurement is represented as an instance of the class mfo:MeteoFeatureOfInterest, a subclass of sosa:FeatureOfInterest." This would be incompatible to GeoSPARQL as it clearly distinguished between the spatial feature and its many possible geometries. In fact, the authors get it right in their code snippets, e.g., on page 11. I would suggest to clarify this part and rework the ontology and figure 2. Finally, there seems to be some confusion about raster versus vector data and the difference between features in the real world versus data about them. For instance, tiles (as artifacts of remote sensing) are modeled as features. This makes for a very odd choice as it puts them on the same level as buildings and countries. I would suggest a clean model here. A few minor issues: The authors switch back and forth between footnotes with URLs and references for the discussed ontologies. This may be just my eyes, but it looks like the font size shrinks from section 5.1 on. My PDF viewer gives the same results. Maybe the authors can take a brief look and correct me if my system shows wrong results. "https://www.w3.org/2015/spatial/wiki/SOSA_Ontology" is provided as a reference for SOSA but the URL merely points to a wiki discussion page. See above for a link to the specs. Overall, this is interesting work that follows a common idea of linking data across different types and providers, but it suffers from several shortcomings and a lack of novel contribution which makes it more a project report than a research paper.
Metareview by Olaf Hartig
This paper presents a procedure based on which the authors have integrated geospatial data. While the reviewers consider the topic relevant and interesting, they point out a number of significant weaknesses of the paper--most importantly, the lack of any research contribution or any evaluation whatsoever. Due to these weaknesses, the paper cannot be accepted as a research paper for the conference.