Evaluation of Schema.org for Aggregation of Cultural Heritage Metadata
Author(s): Nuno Freire, Valentine Charles, Antoine Isaac
Full text: submitted version
Abstract: In the World Wide Web, a very large number of resources is made available through digital libraries. The existence of many individual digital libraries, maintained by different organizations, brings challenges to the discoverability, sharing and reuse of the resources. A widely-used approach is metadata aggregation, where centralized efforts like Europeana facilitate the discoverability and use of the resources by collecting their associated metadata. The cultural heritage domain embraced the aggregation approach while, at the same time, the technological landscape kept evolving. Nowadays, cultural heritage institutions are increasingly applying technologies designed for the wider interoperability on the Web. In this context, we have identified the Schema.org vocabulary as a potential technology for innovating metadata aggregation. We conducted two case studies that analysed Schema.org metadata from collections from cultural heritage institutions, and used, as evaluation criteria for this metadata, the specific requirements of the Europeana Network. These include the recommendations of the Europeana Data Model, which has been developed as a collaborative effort from all the domains represented in Europeana: libraries, museums, archives, and galleries. We concluded that Schema.org poses no obstacle that cannot be overcome to allow data providers to deliver metadata in full compliance with Europeana requirements and with the desired semantic quality. However, for specific requirements of Europeana or other aggregation networks, due to Schema.org’s cross-domain applicability, its adoption must be accompanied by recommendations and/or specifications regarding how data providers should create their Schema.org metadata.
Keywords: Metadata; Cultural heritage; Metadata aggregation; Schema.org; Europeana Data Model; Digital libraries
Review 1 (by Anna Fensel)
The work analyses the feasibility of publishing the semantic annotations typically available in cultural heritage domain in the form of schema.org annotations, comparing the semantic models as they are available in Europeana with the schema.org counterparts, and creating a set of corresponding mappings and a software implementation – taking as a basis two real life use cases. The details of the mappings are provided on GitHub. This work and its conclusions are important as it (potentially) assists cultural heritage data and content providers to make their data and contents compliant with the schema.org, and publish in this format. Schema.org became a very prominent format on the Web, and being able to publish schema.org annotations is crucial for online visibility. It would be useful to explain why the two selected use cases are sufficiently representative to claim about all the type of contents of Europeana. Application-wise, it would be useful to make the results of this work more accessible for the cultural heritage data and content providers. For example, to give them a possibility to export the schema.org annotations out of Europeana. Or, give them a tool that can guide them in production and publication of schema.org annotations – like, for example, the tool https://semantify.it/ for the domain of tourism. W.r.t. the conclusions of the paper. For the cases, where there are no satisfactory solutions present in schema.org at the moment – as in particularly pointed out in the paper these include modelling rights and licenses – there will be possibly the corresponding schema.org extensions appearing in the future. There is a typo on page 13: “Schema,org” instead of “schema.org”. *** after rebuttal note *** Thank you for replying the review comments. For the part where you explain the choice of use cases, it could help for a better presentation to list explicitly (e.g. with bullet points), the requirements which you had towards selection of the use cases. So that it is visible that they are not random, and that they are sufficiently representative.
Review 2 (by Michele Barbera)
The paper presents an experimental study mapping schema.org metadata into EDM with the objective of testing the suitability of ingesting schema.org metadata into Europeana. The study concludes that the direct ingestion of schema.org metadata into Europeana is feasible, providing that collection publishers adhere to few policies. The paper represents a valuable contribution, as what is suggested may be the first step to a wider and easier generation of cultural heritage metadata by collection holders. The technical analysis is detailed and clear. From a methodological point of view, the study was conducted based on digital library management systems, whose objects are presumably mostly of a specific kind. Hence, the authors should address whether the approach could be suitable for other types of Cultural Objects, specifically those that tend to require a more complex modelling (e.g. Event Based) like performing arts representations. Although the study is mostly focused on assessing the technical aspects, an issue that is only marginally mentioned by the authors is the nature and the dynamics of the incentives for data publishers to produce schema.org metadata. Interesting studies, such as those periodically conducted by the webdatacommons.org team suggest a fragmented and highly diversified ecosystem, in which the dynamics of the incentives may not be as straightforward as expected. This phenomenon might partly depend on the opaque and ever changing strategies and policies of the main commercial search engines, whose strategic decisions are a major driver in shaping the incentives for commercial websites and also cultural institutions. I recommend the authors to address the above points maybe by adding a new section in the paper considering potential future directions of their interesting and valuable analysis.
Review 3 (by anonymous reviewer)
Some Cultural Heritage Institutions (CHI) publish metadata about their assets on the Web following the Schema.org schema; this paper investigates the suitability of this metadata set as direct sources for metadata integration across CHI. The use case is based on the Europeana metadata aggregator schemas and requirements and shows that, with additional guidelines, the Schema.org metadata published by CHI could be used as a source for aggregator portals such as Europeana, considering Europeana a typical and represenative case thereof. The paper mentions the DPLA aggregator as being heavily influenced by the Europeana Data Model (EDM), but are there further pieces of evidence that Europeana is a typical/representative case? The CH institutions metadata that are investigated are different from EDM but the respective CHI are providing a mapping to an aggregator metadata schema heavily influenced by EDM (DPLA): could this add a bias to the evaluation? As the mentioned CHI are doing one type of mapping from their internal metadata set to the DPLA required set, they might base the Schema.org values on the first conversion set. Do the authors know whether this is the case? If it is not the case, it would be interesting to add that piece of information to the paper, if it is the case, it would be interesting to consider in future work the evaluation of a Schema.org dataset from a CHI that does not provide a metadata set to an aggregator based on EDM. As minor questions/observations: - In the figure: do you mean “is harvested through” by “Is harvested to” - There is a typo in the conclusion section: “Schema,org” --------- The authors have perfectly addressed my comments and I maintain my "strong accept" rating
Review 4 (by Monika Solanki)
In this paper, the authors present a study on evaluating the suitability of Schema.org for metadata aggregation of cultural heritage data, based on the requirements of the Europeana data model. Two case studies have been presented. The paper is very well written and motivates the study appropriately. My concerns are in the choice of the case studies. It appears that only digital library management systems have been exploring the use of Schema.org metadata for their resources. It is worth investigating why other cultural heritage institutions such as museums, archives and galleries who have a much more wider presence on Web, have not yet adopted it. A useful case study would have been to assess the limitation of Schema.org for aggregation of resources exposed by these organisations. Additionally, I would like to recommend the authors to provide the results of their data quality analysis as a chart or a graph to give an overall idea of the quality across the analysed datasets. There should also be a subsection of section 5, that explicitly details the recommendations made by the authors based on their analysis, for the adoption of Schema.org.
Review 5 (by Anna Tordai)
This is a metareview for the paper that summarizes the opinions of the individual reviewers. The reviewers agree that this paper is well-written and addresses a real-world problem with two use cases on real data. The reviewers have some questions about the representativeness of the use-cases as well as the generalizability of the results. The authors have addressed these issues in their rebuttal. Laura Hollink & Anna Tordai