SemSur- A Core Ontology for the Semantic Representation of Research Findings
Author(s): Said Fathalla, Sören Auer, Christoph Lange
Full text: submitted version
Abstract: The way how research is communicated using text publications has not changed much. We have the vision that ultimately researchers will work on a common structured knowledge base comprising comprehensive semantic and machine-comprehensible descriptions of their research, thus making research contributions transparent and comparable. The current approach for structuring, systematizing and comparing research results is based on survey or review articles. We present the SemSur ontology for semantically capturing the information commonly found in survey and review articles. SemSur allows to represent individual research problems, approaches, implementations and evaluations in a structured and comparable way. We discuss possible applications and present an evaluation of our approach with the retrospective, exemplary semantification of a survey. We demonstrate the utility of the SemSur ontology to answer queries about the different research contributions covered by the survey.
Keywords: Semantic Metadata Enrichment; Quality Assessment; SWRL rules; Knowledge-based Scholarly Communication; Semantic Publishing
Review 1 (by Michel Dumontier)
This paper describes an improved and extended version of their initial (previous) Semantic Survey Ontology which aims to create a knowledge graph for representing research findings. The new SemSur covers more domains, defines better alignment with external ontologies and rules for eliciting implicit knowledge. However, they did not provide enough strengths of their work compared with other research regarding ontologies. The idea of this paper to make research contributions transparent, structured, and comparable is quite motivated. It showed a good start point to develop an efficient knowledge-based scholarly communication from current document-based. SemSur’s structure is impressive for me which includes 5 core concepts: research problem, approach, implementation, evaluation, and publication. Its similarity to most scientific paper structure makes this ontology acceptable and understandable for researchers who do not have a technical background. Apparently, authors took a lot of efforts in learning other research regarding ontologies. They listed 12 existing ontologies describing the content and structure of scholarly articles and reused 9 of them into their own ontology. I appreciate they provided all URIs of ontologies they reused and declared the sources of classes and relations they used in SemSur. The new classes and relations they defined such as “isContinuationOf” and “isCoAuthorOf” relations are quite novel and also important for a well-structured research ontology. The visualization of SemSur graph (Figure 4 and 5) plays a great role in helping readers understand SemSur ontology and how it works. Furthermore, they applied two evaluation strategies – expert assessment from 10 experts in ontology, and satisfaction questionnaire from 18 researchers in other fields of computer science. 10 questions they designed for 18 researchers are very specific and functional which can avoid respondents misunderstanding questions or giving vague answers. The evaluation results from both experts and non-experts are good to show SemSur is practical and user-friendly. In the end, the whole structure of this paper is very clear and understandable. Their work fits the topics of ESWC 2018 Conference very well. However, there are still some points need to be improved. Firstly, in the Keywords part, “Quality Assessment” is not precise enough for this paper. I also suggest adding one term related to ontology in the Keywords part. Secondly, in the introduction, authors did not describe their special contributions compared to other existing work. The same problem happened in the related work part. Authors listed plenty of research regarding ontologies, but they did not show what the difference is between SemSur and other related work. I did not see any description about why SemSur is better than other existing ontology, why previous ontologies cannot solve the problem, what the special or advanced parts SemSur has compared with other researchers’ work. For methodology part, it is too brief and short to give a good explanation. The methodology they applied is using guidelines proposed by another paper. However, as a core of this paper, they did not provide any evidence or reason why they chose this guideline. The same problem happened again when they chose SWRL to define SemSur rules. They declared SWRL was selected because of its popularity. I do not think popularity is a strong argument in a scientific paper. SemSur reused 9 existing ontologies, however, authors did not provide any explanation why they chose these 9, what the advantages of them, and why they fit SemSur. In the evaluation section, they focused more on non-expert evaluation results than expert assessment. In my opinion, they did not provide very strong proof for the performance of SemSur. Additionally, there are some typing mistakes, for instance, the same classes, and relations appear twice in table 2 and 3. In general, this paper is weak on explanation, rigor, and persuasiveness. Therefore, I would suggest weakly reject this paper. I do like their work and agree that the challenging they are tackling is very promising and also important for the research world. However, this paper does not describe their work very well. They need to provide more reasonable arguments, stronger evidence, and more convincing evaluations.
Review 2 (by anonymous reviewer)
This paper presents the SemSur ontology resource. SemSur stands for Semantic Survey. The intended use of the ontology is for capturing knowledge presented in survey and review articles. It forms part of a (very ambitious) research project to represent scientific results and publishing using knowledge graphs. It's definitely on topic for this conference and track and the over all goal and intended use of the ontology is clear. However, from reading the paper I can't judge whether the ontology is on course to help meet these goals. The authors mention various related works in the paper. I'm not sure how comprehensive this list is, but in any case, the related work section does not do a good job of specifying how this work relates to previous work – it's merely an annotated list of related work. In particular, it does not highlight the novelties of this work and it does not show how this ontology resource plugs a gap or how this ontology breaks new ground. While the ontology is available online and the link in the paper works, the ontology doesn't actually load into Protege without problems (at least when I tried to load it on Wed. Feb. 7th). The following imported ontologies are not available (404 errros): <http://www.hozo.jp/owl/EXPOApr19.xml/> and <http://www.w3.org/2016/03/mls#>. I'm not sure how much this affects the appearance of the ontology when browsing them, but I did find some odd classifications e.g. - AttributeRole SubClassOf Abstract, - Normal_Disease SubClassOf ExperimentalDesignStrategy These sound very odd to me. There are a lot of high level classes as well (under owl:Thing), but I assume this is due to problems with the imported ontologies. On the plus side, the ontology specifies a license, with an annotation pointing to a URL. It contains version information, uses standard properties to specify things like titles etc. The above problems, and cluttered appearance of the ontology do not give me confidence in it. I suppose the authors could fix the broken imports during the rebuttal phase of the review. I also don't feel like I have any idea as to the coverage of the ontology. While I don't expect the authors to have produced a final ontology (given the enormity of their task) they could and should produce a better summarization of coverage and of the maturity level of each area that is covered by the ontology – this would be vital for users of the ontology I would think. I find the methodology section to be very brief. It's good that domain experts were consulted, but not enough details about this are provided in the paper. How many experts? Which domains or specific sub-domains? The wording in this section is very generic and, after reading it, I don't feel I have any clear idea of how the ontology was actually developed. I find Section 4 to be hard to follow. I realize that describing an ontology in any paper is difficult, but from reading this section I don't feel like I have a good overview of the ontology. The work also uses a non-standard query language (SQWRL). I wonder why the authors didn't use SPARQL. They may have valid reasons for this and it would be interesting to hear them. Section 5, the "Example Scenario" section isn't understandable. Why do the authors mention the metrics comparison between SemSur 1.0 and SemSur 2.0 here. The paper does present an evaluation but it is not well described and there are some problematic aspects to the protocol. The evaluation seems to be based on some competency questions. They seem reasonable, but it would be good to know who came up with these competency questions and what they mean in terms of coverage. Also, how were the researchers recruited? What's their background? The results are very subjective and without knowing this information it's not possible for the reader to interpret the results. Most worryingly, is that the protocol involved showing the recruited participants a presentation on the benefits of SemSur. This surely biases them and the results of this particular study. I find this to be a very odd and worrying approach. I don't feel that I can reliably take anything away from this evaluation. Overall, I find this to be a poorly written paper. It is not cohesive or easy to follow. While the resource itself meets some of the requirements of this track, I don't find the work, as written down in this paper, to be of a high enough standard for publication in this conference. If properly described and published then this work has the potential to be of interest to the community, but as it stands the work feels like it is at a preliminary stage. Viewing the ontology in Protege doesn't give me confidence it is at a level of maturity to be reused by others.
Review 3 (by Krzysztof Janowicz)
The paper entitled 'SemSur: A Core Ontology for the Semantic Representation of Research Findings' introduces a second version of the SemSur ontology design to provide metadata for survey articles such as whether a paper has one or more authors, what problem is being addressed, and so on. The paper as such fits the resource track and is well-illustrated and easy to follow. At the same time, I am unsure about whether incremental changes in the form of new versions should lead to new publications as full papers. The SemSur ontology has been published as version 1 in a paper in 2017. If version 2 will be accepted for ESWC, will there be a version 3 with new lessons learned at ISWC? Clearly, ontologies, as software engineering artifacts, will keep evolving. I believe that this is a decision that should be taken by the chairs and thus will not take it further into account for my review. Another point that I am unsure about is the focus on surveys and review articles. Wouldn't the ontology be equally useful for papers in general? Ideally, all research papers address a certain problem. They all have authors. They may be a continuation of previous work, just as the example of the two SemSur papers given by the authors (note that none of these papers is a survey article). I understand why one would like to use it for surveys, but that does not fully explain the selected scope. Finally, Semantic Survey Ontology is an odd name, why not just Survey Ontology or Survey Article Ontology? As far as modeling is concerned, there are a few surprising decisions. For instance, 'Unpublished' is a subclass of 'Publication'. This is not only odd, it will also require changing the type once an unpublished item gets published in the future. In fact, not (yet) being published is a status, not a type. Manuals can be equally unpublished as can technical reports. Another example is the use of transitivity. The authors state that "[f]or instance, isContinuationOf is a transitive relation represents that a publication is the continuation of another one (e.g. this publication is the continuation of the previous publication of SemSur)." This reminds me a bit of the well-known realization that similarity is not a transitive property as (green) apples can be similar to oranges because both of them are fruits but (green) apples are also similar to (green) frogs because of their common color. Nonetheless, oranges are not similar to (green) frogs. A work can be a continuation of another work by building upon certain parts, e.g., using the same data. Another work building upon this second work can use the same methods but not the same data, thereby showing that isContinuationOf is not a transitive property. Would you agree? Another issue worth discussing is the explanation of ontology engineering given in the paper. This remains on a very generic level and mostly refers to rather old (and outdated) work on how to design ontologies. As such these text fragments do not contribute to the understanding or usability of the presented work and are generic to a degree that they could be part of any other ontology paper. I would also tend to disagree with the notion that one should reuse other ontologies 'as early as possible in the ontology building life cycle'. Typically, methodologies have argued the other way around by first focusing on developing conceptual models and only then seeing whether they are already present in other ontologies. As far as the evaluation and use case is concerned, I doubt that there is a real need for answering queries such as 'Single-Author Publications proposed approach X?' but I am happy to be convinced otherwise. Given that the ontology has been published before, providing more usage evidence by third-parties would have been useful. All that said, the paper and ontology are interesting and fit well within the larger effort to go beyond the paper/pdf publication paradigm. Hence, the SemSur ontology may indeed see uptake in the future.
Review 4 (by anonymous reviewer)
The paper describes the development of the Semantic Survey (SemSur) Ontology for describing research findings and enabling the generation of a "knowledge-based model" for querying, analyzing, and comparing research findings. The SemSur ontology re-uses terms from existing ontologies, such as the Semantic Web for Research Communities (SWRC), the scientific EXPeriment Ontology (EXPO) among others. The paper also describes the use of rules for simple inferences. 1. The scope and structure of the SemSur ontology is limited to describing a few terms focused on computer science related publications with generic terms such as "Experiment Design", "Experiment requirements" etc. It is not clear how these terms can be used to query and retrieve papers describing research findings at a fine-level of granularity. For example, it is not clear which SemSur ontology terms can be used to annotate and query research papers related to "generating RDF from relational database" (from Table 5). In particular, it is not clear how the query will differentiate research findings related to "RDF and relational database" "RDF and XML database", and "RDF and graph database". 2. Although a key objective of the SemSur ontology is to support comparison between research finding through generation of "knowledge-based model", it is not clear how it supports formal modeling of various details of a research study as compared to a systematic review of a domain topic. For example, a review of existing techniques for "schema-based matching" by Shvaiko et al. systematically defines a set of specific criteria for matching such as "element and structure-level." It is not clear how the SemSur ontology described in the paper can be used to replace review papers for comparative evaluation of research findings. 3. The paper does not compare the proposed approach with existing semantic techniques for representing research publications such as Nanopublications (nanopub.org) and Research objects (www.researchobject.org). It is not clear how is the SemSur ontology different from these existing projects that aim to formally model research findings. 4. The evaluation of the SemSur ontology using a satisfaction questionnaire and expert assessment does not demonstrate the effectiveness of the proposed ontology for formally modeling research findings as compared to existing approaches. In addition, the evaluation does not provide examples of query results using SemSur ontology and other approaches (e.g., Google Scholar) corresponding to the queries in Table 5. For example, what were the limitations of the query results from Google scholar regarding "knowledge graph refinement" and how were these limitations addressed in the SemSur ontology-based query results. Overall, it is not clear how the SemSur ontology addresses the limitations of existing approaches to formally model research findings, therefore its utility as an ontology resource to the scientific community is not clear.