Semantic annotation for Mineral Intelligence
Author(s): Danielle Ziebelin, Philippe Genoud, Marie-Jeanne Natete, Daniel Cassard, Francois Tertre
Full text: submitted version
Abstract: The MICA Web of data platform is currently being developed in the frame of the H2020 MICA project ‘The Eu-ropean Raw Materials Intelligence Capacity Platform (EU-RMICP)’. The objective is to develop a web of data platform, integrating metadata on data sources related to primary and secondary mineral resources and providing the end users with an expertise on the methods and tools used in mineral intelligence. The system annotates min-eral resources, covers significant parts of mineral supply chains (from prospecting to recycling), taking into ac-count the environmental, technical, political and social dimensions. The platform is based on an ontology of the domain of mineral resources (coupled relative commodities, time and space, etc.). The system is coupled with a ‘RDF triplestore’, a database storing the ontologies, fact-sheets, doc-sheets and flow-sheets (i.e., specific format-ted forms) respectively related to methods and documentation, scenarios and metadata. In practice, the system is connected with eight existing European Data Platforms, and the inference engine to search, select, infer and rank the results is specifically developed for this application. The mining resources are indexed by space and time.
Keywords: Semantic Web technologie; Ontologies developed for an application; Annotated data set; Linked data; Semantic web architecture
Review 1 (by Fiona McNeill)
The resource described in the paper seems comprehensive and useful. It seems clear to me that the web of data platform described addresses an important need, is of interest to the SW community and has the potential to be useful beyond the scope of the project it is being developed for. It seems to have been developed with appropriate design principles in mind - for example, it uses persistent URIs which can be dereferenced, etc. However, I feel there are some major issues with the paper. Firstly, there is no link to the resource. There is a link to the project page, but there is no obvious way to access the resource from there. I would infer from the paper that it is appropriately set up for users, which adequate instructions, etc., but I can't verify this. It also doesn't really reference other people's work. It mentions a few existing data platforms it builds on but doesn't discuss how innovative this work is, how it relates to what others have done, etc. Are the ideas of FactSheets, DocSheets and FlowSheets, for example, entirely novel, or do they build on existing ideas? It also doesn't really discuss its use. Who has used it? Has it been evaluated? These are major issues and need to be addressed before the paper is suitable for the ESWC track. The paper itself is fairly well written. It appears to have been copied over from somewhere as there are lots of words with random hyphens in them - this should be tidied up. It goes into a lot of detail in places where this doesn't seem necessary - e.g., most of section 3. It is important to assure the reader that these things have been considered and standards adhered to, but I would have thought the detail belongs in documents for users rather than a high-level description of the tool. In place of this, I'd like to see more detail of the actual process - described fairly briefly in section 2 - and, more importantly, more grounding about how it fits in to the bigger picture and some discussion of evaluation. How much has this tool actually been used? How successful was it? What bits could be improved? Etc. There isn't any evidence that the tool has been used outside the project, though I would have thought the ideas were fairly transferable. In fact, there isn't really any discussion of its use within the project and what the user experience has been, which I would like to see. In summary, I think there is a lot of potential here and the tool seems useful and potentially important. However, the paper does not provide all the information that I believe a resource paper should, and for that reason I am not convinced the paper should be accepted.
Review 2 (by Alasdair Gray)
The paper describes the architecture and documents used in the MICA project for capturing information about mineral resources. This paper strikes me as an EU deliverable reporting on the development of a system which has been reskinned into a paper; in fact it is deliverable 6.2 (http://www.mica-project.eu/wp-content/uploads/2016/03/MICA_D6.2_Note-accompanying-the-EU-RMICP-Delivery.pdf). The paper does not give indication as to why design choices have been made, or even what the alternatives would be. In fact there is more detail in the deliverable about this than there is in this paper. There is no identification of requirements. It is unclear to me what a reader of the paper gets from it other than specific plumbing details of your system. Large parts of section 3 is explaining the basic principles of data on the web and permanent URIs using redirection. This is not required for a paper at ESWC2018. Structurally the paper has problems. The introduction jumps straight into the approach adopted by the project rather than scoping the paper and indicating the contributions, i.e. what is the resource that is being made available for reuse. Figure 1 is provided without explanation. What are the details of the 7 thematic domains? Why 7? What is a factSheet template? How is is manifested? How was the ontology developed? What were the competencies identified that it needed to satisfy? How was it evaluated? Where is the ontology published? What is the structure of a linkSheet? How does it relate to a VoID linkset? References should be provided for EUR-Lex and EC Publications Office. What version of VocBench was used? Rather than a website reference you should cite the appropriate paper corresponding to the version used. The paper is lacking a discussion of related work and suitable citations to the state of the art. How does the approach compare with other annotation frameworks? What ontologies and standards have been reused? Technical reports such as  and  should be cited as such and not simply as a URL.  should be replaced by the data on the web best practices (https://www.w3.org/TR/dwbp/). The paper is lacking any form of evaluation of the framework or ontology or whatever the resource is that is being presented. There is no indication of community uptake or discussion of sustainability. Minor issues: - What is MICA, acronym should be expanded in the abstract? - Figure 2 is not informative to a reader of the paper. - Several words have random hyphenation in them as if they have been copied and pasted from other documents, e.g. paretic-lar.
Review 3 (by Jodi Schneider)
Thank you for your replies. --- The submission appears to be a copy-paste of a technical report, complete with hyphenation typos (e.g. "This knowledge is ac-cessible through a URI (publicURI)."). There is no indication that the authors are aware of the ESWC audience (nor is there a consistent expectation of audience throughout the report); for instance, definitions of RDF and triplestore are given in Figure 1. The submission describes a "web of data" platform; it is not made clear whether this platform, the underlying data, or the structures are mean to be the "resource" (since this is the resource track). The notion of FactSheets, docSheets and flowSheets sounds interesting but these concepts are insufficiently described; Figure 2 does not help. I suggest that the authors clearly consider what ESWC authors know, what they might want to do with the platform (or its data or its structures) and compose a submission next year. Images are difficult to read and interpret; text has non-idiomatic English which is readable but requires some interpretation; and more detail is given almost everywhere.
Review 4 (by anonymous reviewer)
This paper refers to a knowledge-based system developed under the umbrella of a H2020 project. Basically it is presented a web-oriented platform integrating metadata and general resources related to the primary and secondary minerals. The paper is far from being suitable to be presented in a scientific venue. It is quite well written, but the style and the organization of the manuscript seem the ones of a technical report. There is a very very large introductory section, a literature/state-of-the-art comparison is absolutely lacking and very poor experimental evidence of resource work is given. Hence, two fundamental issues arise: the scientific and technical relevance of the proposal is questionable and no evidence of the added value with respect to related work in literature is emerging. From the formatting standpoint, two remarks are urgent: there are several typos probably coming from cut and paste errors to be fixed and most of figures are unreadable or even of poor quality. I guess also the formatting protocol of the conference is not fully respected everywhere in the manuscript. A part from these aspects, I'm really doubtful about the appropriateness of this submission. The proposed work presents a limited added value with respect to more articulated and complex KBS. The domain of minerals -probably not already covered by similar modeling efforts- could take advantage of similar approaches, but the system is still at an early development stage (the ontology modeling only accounts 300 concepts) to be really adoptable in concrete situations. Thanks for the reply. I confirm my opinion on the paper. As is, it cannot be accepted in a scientific venue.