Paper 124 (Research track)

An Assessment of’s Coverage of Terms from Key Medical Datasets

Author(s): Kody Moodley, Josef Hardi, John Graybeal, Michel Dumontier, Mark Musen

Full text: submitted version

Abstract: is an initiative by the purveyors of the major search engines to define a common vocabulary for structuring Web content from a variety of domains, promoting data interoperability, potentially allowing for increased discoverability in search results, and enabling Web content to benefit from sophisticated search services.’s health-lifesci extension provides specialized attributes for describing healthcare and medical data. Before applying these extensions to increase interoperability of medical data, it is valuable to know the current expressivity of the vocabulary to capture key biomedical attributes. We are not aware of any quantitative evaluations addressing this question, and we fill this gap by providing such an evaluation. We propose a mapping of attributes from a selection of prominent community specifications for drug and clinical trial metadata, to terms. We also define a mechanism for measuring the coverage of for attributes in these specifications. For our selected specifications, showed roughly a 60%, 66% and 10% coverage ratio for drug, medical dataset and clinical trial metadata, respectively. Our study shows that: 1) a substantial portion of drug and medical dataset metadata can immediately leverage for the potential benefits, and 2) precise descriptions of clinical trial data are not supported by Our proposed mapping provides clues for: 1) extending to support detailed description of clinical trial data, and 2) further improving coverage of drug and medical dataset attributes, should these items be required.

Keywords:; Linked data; Scientific metadata; Semantic markup

Decision: reject

Review 1 (by Francesco Ronzano)

(RELEVANCE TO ESWC) The paper deals with the comparison in terms of expressiveness of different schemas ( and biomed vocabularies), a topic of great interest for ESWC attendants.
(NOVELTY OF THE PROPOSED SOLUTION) The proposed analysis, as far as I know, is novel.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The analysis is complete and extensively commented.
(EVALUATION OF THE STATE-OF-THE-ART) There is an extensive review of SOA.
(OVERALL SCORE) ** Summary of the Paper **
This paper presents an extensive study of how the vocabulary (together with its health-lifesci extension) covers the expressiveness of the most recognized biomedical vocabularies used to describe datasets, drugs and clinical trials (8 vocabularies). After introducing the official (health-lifesci) and "unofficial" (Bioschemas / BioCADDIE) extensions of to support the modeling and interoperability of data in life science, the authors identify and introduce the recognized biomedical vocabularies describing datasets, drugs and clinical trials that will be studied with respect to their mapping to For each metadata described by these biomedical vocabularies, the authors ask two researchers to select the most likely mapping to a, pointing out if there is no match or if the match identified is partial or exact. Differences in mappings between the two researches are conciliated by reaching consensus among them thus producing a final consolidated set of mappings.
In order to quantify and compare how much of the expressiveness of a vocabulary can be reproduced by exploiting the metadata defined by another one, the authors propose the compatibility rating metric: this is a normalized metric that aggregates, by means of a prioritized aggregation model, the percentage of mandatory (MUST), suggested (SHOULD) and accessory (OPTIONAL) attributes expressed in a first vocabulary (i.e. that can be mapped to a second one. To better explore vocabulary mapping, also the coverage ratio is computed so as to quantify the percentage of attributes of a vocabulary that can be mapped (exact or partial match) to the other vocabulary.
The compatibility rating value and the coverage (total and by attribute requirement level) are computed starting from the conciliated mapping provides by considering each biomedical vocabulary against (including its health-lifesci extension). The results of this mappings are commented showing that covers a considerable amount of biomedical datasets and drugs attributes, but a small portion of the clinical trial description attributes.
** Strong Points (SPs) **
The paper provides a complete evaluation of the biomedical vocabulary coverage provided by the widespread vocabulary
The paper is clear and easy to read.
The authors share the detailed results of their peer-reviewd mapping analysis of biomedical vocabularies against (including its health-lifesci extension) by means of a set of spreadsheets.
** Questions to the Authors (QAs) **
I enjoyed reading this paper that presents a clear analysis of coverage of biomedical metadata. It is interesting and could provide useful suggestions both to extend and to standardize the use of such metadata across datasets.
It would be interesting to add some comments on the disagreement among the two Data Science researchers in mapping the biomedical vocabularies to could we add a confusion matrix? Could we abstract some relevant situation in which disagreement occurred?
Looking at the CSV you shared with the results of the mappings of the biomedical vocabularies to it seems that often several metadata / attributes of a biomedical vocabulary are mapped to the same attribute of by means of a PARTIAL MATCH relation. Thus it happens that even if we have a high coverage ration (since PARTIAL MATCHES are consider positive examples in computing coverage ratio), we will loose some expressiveness when using attributes. It would be great to quantify this issue - for instance you could consider the proportion / ratio among the number of distinct attributes of used and the number of attributes that have EXACT or PARTIAL MATCHES to
Typo: Section 3.3 Mapping Process Overview: The metadata specifications in Table 1 and 2... ---> IT SHOULD BE ONLY Table 1 since Table 2 deals with the evaluations of mappings
** After rebuttal **
Thanks to the authors for their answers. After reading their comments, I would leave my final score unchanged.

Review 2 (by Edna Ruckhaus)

(RELEVANCE TO ESWC) This paper presents an evaluation and proposed improvement of the repository in the biomedical domain. An assessment of is interesting for the biomedical community that works with structured data in the web.
(NOVELTY OF THE PROPOSED SOLUTION) The work that is presented is not clearly a research problem with its proposed solution. It is an analysis of an existing resource and a proposed solution of how it can fit better the needs of the biomedical community.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The proposed solution is correct but in some cases not complete. Detailed comments follow in the general score.
(EVALUATION OF THE STATE-OF-THE-ART) There is not really a state of the art (section 2) but more a starting point for the mapping process that is described later on.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The approach is presented as the process and computations of compatibility scores defined to assess the coverage of the resource with respect to existing vocabularies in the field.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The work presents the process and computations of scores that indicate the expressivity of With only two subjects it lacks generality and although the resources are public it is not clear how the process can be reporducible.
(OVERALL SCORE) This work presents an assessment process of the vocabulary repository in the area of biomedical data. It focus on three sub-areas: (1) general medical datasets, (2) drugs, (3) clinical trials. The work describes the analysis of the existing vocabularies that are used in the study, a description of the mapping process and the results obtained of the compatibility scores (also defined in this work). A final analysis is presented on the coverage of for the three sub-areas.
(SP1) The paper is clear and well written.
(SP2) The results seem useful and give insight on the improvement of
(SP3) The scores used for the evaluation are well defined.
(WP1) The assertion in section 1 that it is "apparent" that often differs with existing specifications is not well supported.
(WP2) There is no related work but reference to existing vocabularies and projects as a starting point for this evaluation.
(WP3) The analysis of existing vocabularies (section 3.1) that will be used in the evaluation process does not seem complete, e.g. SNOMED. In general, the criteria for selection as "widely recognizable" in general or "widely referenced" for Drugbank is very fuzzy.
(WP4) The reference [11] mentioned in 3.1 in relation to the NDF Reference Terminology is over 10 years old, does not seem to reflect current situation with respect to this vocabulary.
(WP5) In section 3.2.1 the description of the determination of similarities among attributes of both specifications seems very subjective, probably a similarity measure of some sort could complement the manual process done by the evaluators.
(WP6) Some comments on the format and presentation of the paper: 
- some formulas may have a smaller font, 
- you can just present one of the three formulas at the end of 3.2, 
- figure 1 is not needed, the process is very understandable.
(QA1) Why try to resolve conflicts among the evaluators? It seems a stronger approach to average or somehow compute an integrated score, that would scale if you have more than two evaluators.
I acknowledge the answers (in the rebuttal phase) to the issues raised in the review. Some of them have been recognized as possible extensions to this work which I believe would enrichen this research.

Review 3 (by anonymous reviewer)

(RELEVANCE TO ESWC) The paper fits to ESWC due to the usage and evaluation of the vocabulary in medical datasets.
(NOVELTY OF THE PROPOSED SOLUTION) Although the paper gives some insight of the vocabulary usage, a clear novelty is missing.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The approaches sound correct, nevertheless I would like to see experiments and evaluations with term similarity measures and how these compare to direct term matching. Also different approaches or comparisons are missing.
(EVALUATION OF THE STATE-OF-THE-ART) Evaluation of the SotA briefly describes some similar initiatives, but any comparison to this work of similarities or dissimilarities are missing.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Due the usage of publicly accessible resources, reproducibility is possible.
(OVERALL SCORE) The authors perform a mapping evaluation between vocabulary and resources in the medical domain, specifically drug, medical dataset and clinical trial metadata, respectively. They define a mechanism for measuring the coverage of for attributes in the specifications of the drug and clinical trial metadata. Their experiments suggest that the metadata of drugs (60% coverage) and medical (66%) datasets benefits from the vocabulary, whereby clinical (10% coverage) data does not.
Weak points
- no similarity measures
- no comparison to other approaches

Review 4 (by Jean-Paul Calbimonte)

(RELEVANCE TO ESWC) Using for semantic interoperability is of high relevance for ESWC. The Health domain, in particular, has constantly been active in this regard.
(NOVELTY OF THE PROPOSED SOLUTION) The paper is an assessment of compatibility and coverage, essentially. The novelty in the proposed approach is limited, although this is also due to the nature of this work.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The metrics and specifications used are well defined and detailed, as well as justified.
(EVALUATION OF THE STATE-OF-THE-ART) Teh state of the art per se is more related to the usage and definition of vocabularies and specifications in the health domain, specially for metadata. The current description is sufficient.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The results are sufficiently described, although a more insightful discussion could be expected, especially concerning the existing gaps, and variations depending on how applications use biomedical/health vocabularies.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The methodology is clearly detailed for the assessment.
(OVERALL SCORE) This paper provides a quantitative evaluation of the health-lifesci extension, with regards to its ability to  capture biomedical attributes.
The authors have chosen well established vocabularies and provided a detailed methodology and quantitative indicators (e.g. coverage). The results show that can actually be used to an important degree in many scenarios, with the notable exception of clinical trials metadata.
- well described methodology
- uses well known vocabs/specifications for asessment
- discussion could be enhanced
- limited in novelty
1. are there any other relevant specifications to consider? If yes, why not include them?
2. are two data scientists enough for the assessment process?
3. are CDISC standards considered, e.g. for clinical trial or data acquisition instruments?
The paper addresses a question that has practical implications in the bio-medical-health domains, when referring to data integration and heterogeneity. The emergence of and its related extensions has been seen as a clear opportunity for paving the way for commonly agreed vocabularies. It is true that up to now there has been a growing interest in using in this context, but this paper provides concrete evidence on what type of metadata can readily leverage on and its extensions, right away. 
The methodology presented is quite clear, and the quantified assessment parameters are well explained and justified. Therefore, it could also be possible to apply the same methodology to other specifications, and assess the compatibility levels.
However, there are also some concerns to be raised in relation to this paper. First, the entire mapping process is based on the expertise of two data scientists. Is this enough for the consolidation of the mappings provided? Another question refers to the actual usage of the specifications. The mapping process has taken into account the specifications only (e.g. which attributes are required, etc.) However, is the use of these specifications following the recommendations or are there other patterns that arise? In some cases, depending on the domain of application, vocabularies can be used in different ways.
Although the authors justify the need for this assessment study, the fact that this paper is an assessment/Evaluation work, the degree of novelty is somehow limited. However, as it cannot be seen as benchmark paper either, there is also no justification to say that it would fit better in the benchmark track.
The paper is well written and is easy to follow. However, in terms of presentation it would be advisable to present early on a figure describing the overall methodology, which is quite central and important for the entire paper. It would also be advisable to enhance the presentation of the different formulas, which are currently too large and look strange in comparison with the rest of the text. 
Finally, I also miss a more detailed discussion in the end, covering with more detail some if the questions raised in the introduction, i.e. a table or figure summarizing the types of terms where (and extensions) lacks coverage, discussing in more detail potential new extensions, etc.
After rebuttal: thanks to the authors for the comments and clarifications. They confirm some of the limitations and set a list of interesting future works. I maintain the scores.

Metareview by Adrian Paschke

This work presents an evaluation of the vocabulary repository in its coverage and usefulness for the area of biomedical data on three areas: (1) general medical datasets, (2) drugs, (3) clinical trials. 
The suitability of the paper as a Research track paper at ESWC was an issue and it was discussed if the paper can be moved. In case the paper gets accepted the authors should strive to address the following points:
1. The Related work sections needs to be improved. It covers efforts in extension and while the ultimate goal of the paper is to serve as a first step to that process, it is actually about evaluating a vocabulary quality and coverage for a scientific domain. As such, RW sections needs references to other works that evaluate vocabulary/ontology coverage/quality for a domain. There is a large body of work on ontology quality, including coverage that is missing. The Related work should
2. Discuss manual vs automated similarity 
3. Discuss how generalizable is the methodology if we wanted to evaluate for other scientific domains.
4. Improve final discussion highlighting the main points. It would be particularly helpful if the discussion would address the 
5. Improve the paper presentation following all reviewers requests in this area

Share on

Leave a Reply

Your email address will not be published. Required fields are marked *