GDPRtEXT – GDPR as a Linked Data Resource
Author(s): Harshvardhan Jitendra Pandit, Kaniz Fatema, Declan O’sullivan, Dave Lewis
Full text: submitted version
Abstract: The General Data Protection Regulation (GDPR) is the new European data protection law whose compliance affects organisations in several aspects related to the use of consent and personal data. With emerging research and innovation in data management solutions claiming assistance with various provisions of the GDPR, the task of comparing the degree and scope of such solutions is a challenge without a way to consolidate them. With GDPR as a linked data resource, it is possible to link together information and approaches addressing specific articles and thereby compare them. Organisations can take advantage of this by linking queries and results directly to the relevant text, thereby making it possible to record and measure their solutions for compliance towards specific obligations. GDPR text extensions (GDPRtEXT) uses the European Legislation Identifier (ELI) ontology published by the European Publications Office for exposing the GDPR as linked data. The dataset is published using DCAT and includes an online webpage with HTML id attributes for each article and its subpoints. A SKOS vocabulary is provided that links concepts with the relevant text in GDPR. To demonstrate how related legislations can be linked to highlight changes between them for reusing existing approaches, we provide a mapping from the Data Protection Directive (DPD), which was the previous data protection law, to GDPR showing the nature of changes between the two legislations. We also discuss in brief the existing corpora of research that can benefit from the adoption of this resource.
Keywords: GDPR; DPD; linked resource; regulatory technology; legal compliance; SKOS; DCAT; e-governance
Review 1 (by Christophe Guéret)
This paper describes an ontology expressing the GDPR. As a proof of usage/usability a previous legislation, DPD, is mapped to the GDPR. The paper is very well written and attention has been paid to the aspects of impact / reusability and availability. I would only have a few remarks/questions related to the future of this work: * What about versioning? If there are new elements updating the current GDPR how will those be taken into account? And how would/will you handle the case of links being made to updated IRIs? * For the sake of stability and trust it could be interesting to hand over the ontology to the W3C or the same group taking care of ELI. Has that been considered so far? Update: I would like to thank the authors for addressing my concerns and those of other reviewers. It is good to know permanent links will be added and that a community effort will be aimed at for keeping the resource alive. I would not agree that publishing different revisions of the GDPR as different documents would be enough, despite indeed the fact that the GDPR is final as is. I would recommend to the authors to look into having some kind of stable identifier scheme linked to different versions of the text related to them.
Review 2 (by Christoph Lange)
UPDATE AFTER AUTHORS' RESPONSE: Many thank for your detailed clarifications. The review below is unchanged, but you have largely answered my open questions, and plausibly promised (and sometimes outlined _how_) to address the issues I raised. This paper presents a linked open dataset derived from the European General Data Protection Regulation (GDPR). The text of the GDPR was semi-automatically split into fragments identified with URIs and annotated using an extension of the ELI ontology; relevant terminology was modelled using SKOS. Links to a similarly annotated version of the preceding Data Protection Directive were established manually. The dataset is available as RDF and HTML. The review criteria for resources are met to a large extent; details below. The paper has some linguistic shortcomings, and a few statements are unclear. I annotated these in the PDF at https://www.dropbox.com/s/jw221wgk4g0hxfe/ESWC2018_paper_135.pdf?dl=0. Further issues: * Listing 1.1 claims to show a SPARQL query but actually shows a Turtle listing with prefix bindings in SPARQL syntax. * It would be helpful to elaborate a bit on the relationship of XACML to ODRL. Review criteria: > Potential impact > > Does the resource break new ground? > Does the resource plug an important gap? > How does the resource advance the state of the art? To some extent. This is not the first semantic representation of the GDPR but complements existing ones [2,3]. > Has the resource been compared to other existing resources (if any) of similar scope? Yes, to [2,3]. > Is the resource of interest to the Semantic Web community? > Is the resource of interest to society in general? Rather to society in general than to specifically to the Semantic Web community. > Will the resource have an impact, especially in supporting the adoption of Semantic Web technologies? It can faciliate the automation of processes and workflows that have to take into account the GDPR. > Is the resource relevant and sufficiently general, does it measure some significant aspect? It is not general but addresses a highly relevant specific domain. > Reusability > > Is th ere evidence of usage by a wider community beyond the resource creators or their project? Alternatively, what is the resource’s potential for being (re)used; for example, based on the activity volume on discussion forums, mailing list, issue tracker, support portal, etc? So far: nothing yet. No such evidence in the paper, neither community activity on the homepage nor in the repository. > Is the resource easy to (re)use? For example, does it have good quality documentation? Are there tutorials availability? etc. The overall resource, and the ontology in particular, have a reasonable amount of documentation. > Is the resource general enough to be applied in a wider set of scenarios, not just for the originally designed use? In principle the annotation approach could be applied to other laws too but this is not explained in detail. > Is there potential for extensibility to meet future requirements? Yes, the linking from DPD to GDPR could also be applied to future versions of GDPR. > Does the resource clearly explain how others use the data and software? Not quite, but it explains how others _could_ use it. > Does the resource description clearly state what the resource can and cannot do, and the rationale for the exclusion of some functionality? Some limitations are acknowledged and outlined as future work. > Design & Technical quality: > Does the design of the resource follow resource specific best practices? Largely yes. The combination of skos:Concept with rdfs:subClassOf is a bit unusual, but justified in the paper. Still, I think it is not adequate at least in some of the cases, e.g., "ProvideCopyOfData" sounds more like a skos:narrower of "RightOfDataPortability" than an rdfs:subClassOf. > Did the authors perform an appropriate re-use or extension of suitable high-quality resources? For example, in the case of ontologies, authors might extend upper ontologies and/or reuse ontology design patterns. The ELI ontology was reused. > Is the resource suitable to solve the task at hand? There is not really a clearly defined _task_ here, but, yes, potentially it does. > Does the resource provide an appropriate description (both human and machine readable), thus encouraging the adoption of FAIR principles? Is there a schema diagram? For datasets, is the description available in terms of VoID/DCAT/DublinCore? "Yes" to all. > Availability > > Is the resource (and related results) publishe d at a persistent URI (PURL, DOI, w3id)? You claim so, but actually I don't see it. I looked into https://openscience.adaptcentre.ie/resources/GDPRtEXT/gdpr/page/GDPR (the Pubby view) and saw URIs like <https://openscience.adaptcentre.ie/resources/GDPRtEXT/gdpr/citation15>. Actually, the following links on the homepage are broken: http://purl.org/adaptcentre/openscience/resources/GDPRtEXT/gdpr.ttl http://purl.org/adaptcentre/openscience/resources/GDPRtEXT/gdpr.rdf > Does the resource provide a licence specification? (See creativecommons.org, opensource.org for more information) A license is indicated in the paper, but I didn't see it embedded in the RDF implementation. > How is the resource publicly available? For example as API, Linked Open Data, Download, Open Code Repository. LOD, SPARQL endpoint, annotated HTML, repository. > Is the resource publicly findable? Is it registered in (community) registries (e.g. Linked Open Vocabularies, BioPortal, or DataHub)? Is it registered in generic repositories such as FigShare, Zenodo or GitHub? Not indicated. > Is there a sustainability plan specified for the resource? Is there a plan for the maintenance of the resource? No. > Does it use open standards, when applicable, or have good reason not to? Yes, it does; several W3C standards.
Review 3 (by Adila A. Krisnadhi)
Update after rebuttal: I thank the authors for the rebuttal. Most of my concerns have been addressed or promised to be addressed. Nevertheless, nothing significant changes my opinion, so I keep my score the same. ============ This submission presented linked data resource representing the European General Data Protection Regulation (GDPR). The linked data resource makes use of the European Legislation Identifier (ELI) ontology for exposing the data. Overall, I consider the submission to be (yet another) nice addition to linked data cloud. The potential impact, however, is rather unclear to me (also because the authors did not expend more effort in discussing this aspect). There are also some problems in findability and sustainability. So, I rate this submission as weak accept. Potential impact =================== Although the resource neither breaks a new ground nor clearly advances the state of the art in the Semantic Web research, practical impact to users of Semantic Web technologies is possible. Unfortunately, the only other resources clearly mentioned as related to this resource are the ELI ontology (which provides the vocabulary) and DPD, which was a predecessor of this resource. How is this resource related to other datasets in the EU Open Data portal? Are there no other linked data resource for legal purposes in Europe related to this resource? This is not explained clearly in the submission and the portal. So, how high the impact is potentially going to be is unclear beyond simply being a nice addition to existing linked data and ontology resources for legal purposes Reusability ================== Evidence of usage by a wider community is not explicitly discussed. The motivation section refers to a series of papers previously published by the authors, which outlined the need for this resource. Moreover, there is a dedicated github repository that allows feedback from the other users. Hence, there is some potential for reuse. Usage documentation is almost nonexistent beyond simply downloading the resource. There is a SPARQL endpoint, but no mention of APIs. On the other hand, since it is obvious that the backend is Virtuoso, it is possible that one can simply use a generic/default approach to access the resource programmatically according to default Virtuoso configuration. Applications to wider set of scenarios are not obvious either. I think, Section 5 should have been used to address this problem more explicitly. Give a concrete example of how exactly automated systems for modeling compliance could be enabled by this resource, i.e., not just mentioning it. Also, there is no discussion on the limitation, i.e., what the resource can and cannot do. Design & Technical quality ============================ Design-wise, the resource does follow some linked data and ontology design best practices. Human and machine-readable descriptions are provided. Schema diagram is also presented in the submission. However, there is no concrete example discussed in the submission that at least briefly, technically illustrates how recording or measuring complicance toward some data protection regulation. This diminishes the value of the submission a bit. Availability ================ The resource is published at some institution's URI and no persistent URI such as PURL, DOI, or w3id, is being used. The submission mentions that the resource is available under CC-BY-4.0 license. Unfortunately, I fail to find this license statement in the online portal. Access to the resource can be done as linked open data, bulk download, SPARQL endpoint, and git repository. So not much problem in this aspect. There is no mention that the resource is registered in a public registry, so findability could be rather problematic. I thought the resource should be at least discoverable through the EU Open Data portal? Also, there is no sustainability plan being discussed. Who is responsible in maintaining the data? What if there is a need to update the resource in the future?
Review 4 (by Serena Villata)
*** I thank the authors for their rebuttal. I can now access the resource. The use case example is simple but useful. My viewpoint on drawback (1) remains unchanged, as you said compliance-checking, classification and summarization cannot be addressed using GDPRtEXT only. For this reason, I find the impact of the resource limited. *** The paper describes a resource, called GDPRtEXT, which expose the GDPR as Linked Data. The authors used the ELI ontology to model the concepts expressed in the GDPR. A mapping with the previous DPD regulation is also provided. The final goal of such resource is compliance checking with respect to privacy-related obligations. The paper is well written and only a few typos are contained. The main positive point of this contribution is that the resource is well documented, best practices are used to build the resource, and it is available online. Despite this proper documentation, the resource in RDF, Turtle, JSON-LD, N3, and N-Triples returns "Not Found The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again." when clicking from the main resource webpage (http://openscience.adaptcentre.ie/projects/GDPRtEXT/). As I cannot access the resource is difficult to assess the benefits it may bring to the community. In addition, the paper presents some drawbacks, mainly concerning relevance and significance: (1) the main contribution of this resource is a navigable version of the GDPR, leading to a quite superficial approach: the result is thus a resource which captures only the form of the GDPR (articles, recitals, ...) which is interesting but still explicitly mentioned in the GDPR itself. The resource does not capture the semantics and the content of the GDPR. As the authors say "GDPRtEXT does not contain any inference but provides a definition of useful terms using SKOS". What are the insightful tasks that can exploit this resource? Compliance checking will not be possible only based on such resource, semantics is not extracted. Classification and summarization can be done using the GDPR itself. In conclusion, there is no evidence that such resource can be of concrete help in the addressing the tasks the authors mentioned in Sec. 5. (2) in the motivation, the authors claim that SPARQL can be used to query the RDF version of the GDPR. It would have been interesting to see an example of SPARQL query showing that based only on the GDPR text certain relevant information cannot be identified. Typos: - page 5: to to - page 5: Protg -> Protege - page 8: Additionally, The --- Potential impact - Does the resource break new ground? No, it is not clear what is the impact of such resource. - Does the resource plug an important gap? It does not seem so. - How does the resource advance the state of the art? It provides a liked data version of the new GDPR. No inference is contained. - Has the resource been compared to other existing resources (if any) of similar scope? Yes - Is the resource of interest to the Semantic Web community? It does not seem to bring much to the community, the RDF version is not accessible so it is difficult to assess it on the real resource. - Is the resource of interest to society in general? It could be of limited interest. - Will the resource have an impact, especially in supporting the adoption of Semantic Web technologies? I don't think so. - Is the resource relevant and sufficiently general, does it measure some significant aspect? No, as highlighted in my comments above. --- Reusability - Is there evidence of usage by a wider community beyond the resource creators or their project? Alternatively, what is the resource’s potential for being (re)used; for example, based on the activity volume on discussion forums, mailing list, issue tracker, support portal, etc? No there is no evidence. All the mentioned tasks can be addressed directly on the GDPR text. - Is the resource easy to (re)use? For example, does it have good quality documentation? Are there tutorials availability? etc. Yes, good quality documentation. - Is the resource general enough to be applied in a wider set of scenarios, not just for the originally designed use? No. - Is there potential for extensibility to meet future requirements? Yes - Does the resource clearly explain how others use the data and software? Yes, the resource is well described from this point of view. - Does the resource description clearly state what the resource can and cannot do, and the rationale for the exclusion of some functionality? It claims to support tasks that cannot actually be eased thanks to the resource or at least not only using this resource, e.g., compliance checking. --- Design & Technical quality - Does the design of the resource follow resource specific best practices? Yes. - Did the authors perform an appropriate re-use or extension of suitable high-quality resources? For example, in the case of ontologies, authors might extend upper ontologies and/or reuse ontology design patterns. Yes. - Is the resource suitable to solve the task at hand? Unclear - Does the resource provide an appropriate description (both human and machine readable), thus encouraging the adoption of FAIR principles? Is there a schema diagram? For datasets, is the description available in terms of VoID/DCAT/DublinCore? Yes, DCAT - If the resource proposes performance metrics, are such metrics sufficiently broad and relevant? Not applicable - If the resource is a comparative analysis or replication study, was the coverage of systems reasonable, or were any obvious choices missing? Not applicable -- Availability - Is the resource (and related results) published at a persistent URI (PURL, DOI, w3id)? Yes, PURL. However, the resource is not accessible. - Does the resource provide a license specification? (See creativecommons.org, opensource.org for more information) Yes. - How is the resource publicly available? For example as API, Linked Open Data, Download, Open Code Repository. Download - Is the resource publicly findable? Is it registered in (community) registries (e.g. Linked Open Vocabularies, BioPortal, or DataHub)? Is it registered in generic repositories such as FigShare, Zenodo or GitHub? Yes - Is there a sustainability plan specified for the resource? Is there a plan for the maintenance of the resource? Unclear - Does it use open standards, when applicable, or have good reason not to? Yes