Paper 35 (Research track)

LDP-DL- A language to define the design of Linked Data Platforms

Author(s): Mohammad Noorani Bakerally, Antoine Zimmermann, Olivier Boissier

Full text: preprint

Abstract: Linked Data Platform 1.0 (LDP) is the W3C Recommendation for exposing linked data in a RESTful manner. While there are several implementations of the LDP standard, deploying an LDP is still complex is tighly coupled to the chosen implementation. As a consequence, the same design (in terms of how the data is organised) is difficult to reuse in different LDP deployments. We propose a language for specifying how existing data should be used to generate LDP resources in a way that is independent of and compatible with any LDP implementation. We formally describe the syntax and semantics of the language and its implementation. We show that our approach allows the reuse of the same design for multiple deployments, or reuse the same data with different design, is open to heterogeneous data sources, can cope with hosting constraints and significantly automatize deployment of LDPs.

Keywords: Linked Data Platform; Design specification; Automaticc deployment

Decision: probably accept

Review 1 (by anonymous reviewer)

(RELEVANCE TO ESWC) The paper addresses the important topic of simplifying the engineering of linked data environments.
(NOVELTY OF THE PROPOSED SOLUTION) Efficient engineering of LDP is definitely an essential cornerstone for the acceptance and application of LD approaches. 
However, this work focuses more on providing a semantics (LDP-DL) for LD concepts and their interrelations on a slightly higher level of abstraction instead of fully exploiting the possibilities that model-driven approaches with domain-specific languages would permit.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Regarding the scope addressed, the presented designs, considerations and results are comprehensive and convincing.
(EVALUATION OF THE STATE-OF-THE-ART) It remains unclear why the authors have only considered LDP implementations from the conformance report while bringing up the topic of model-driven engineering themselves, yet do not pursue this any further than borrowing from that the concept of domain-specific languages: Was there an investigation of other LDP engineering approaches conducted in that direction? Are there other DSL-based approaches that were not considered?
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) While providing comprehensive elaborations on the formal properties of LDP-DL, examples of its application for LDP engineering using design documents as the central element are missing. Many necessary information are only provided on the authors' webspaces; at most, the reader may obtain a sample design document from a website, where availability cannot be guaranteed. Providing a continuous, even small, example for the entire LDP-DL engineering workflow would be good; the example from Fig. 1 does not cover essential aspects. Further, the conducted performance test of ShapeLDP is not clearly described, hence the results are rather vague.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The formalizations are reproducible. Following the authors' ideas, more powerful engineering concepts can be built on top of LDP-DL.
(OVERALL SCORE) *Summary
This work basically proposes a domain-specific language LDP-DL for the engineering of Linked Data Platforms. To this end, a map-based syntax capturing essential LDP core concepts is introduced, along with model-theoretic semantics and an abstract description of documents expressed in LDP-DL. 
*SPs
- The authors thoroughly motivate the need for tools such as domain-specific languages in LDP engineering.
- Formal semantics for the proposed language is defined, permitting reusability.
- Implementations are available, supporting the feasibility and reproducibility of the concept.
*WPs
- The proposed language focuses only on the immediate abstraction from the LDP concepts. It does not exploit the potentials of domain-specific languages in terms of hiding of complexity or improving engineering efficiency.
- The target audience of users of LDP-DL remains unclear: should it be software developers, subject matter experts, knowledge engineers, application vendors, end users, ...? LDP-DL's level of abstraction requires more knowledge about the underlying technology than it possibly should.
- The process of creating well-defined LDP-DL design documents for connections with existing ontologies or models and thus the integration of LDP-DL in a knowledge engineering workflow remains unclear as well as the degree to which the proposed LDP-DL  efficiently permits the claimed decoupling of designs from implementations.
*QAs
1) With the idea of model-driven engineering already introduced, why has the use of existing models for (initial?) LDP structuring not been further investigated?
2) Are there comparable approaches to defining DSLs for LDP configuration? If so, what was the reason for not considering them?
3) Were there experiments conducted regarding the suitability of the LDP-DL concepts with regard to engineering workflows? What were the results?
Details:
In general, the paper is rather hard to read and hard to follow - presentation, language, grammar and punctuation should be revised.
2.2:
- Since the requirements were motivated using "may have to" and even though they are easy to follow: are these requirements based on real requirements from the project or have they been derived from experiences?
3.1:
- "dex:parking has been generated for ex:parking": How is this generation performed?
- "ex:parking cannot be used directly as an LPDR": What is the reason for this?
- According to the UML diagram in Fig. 3, no entities/maps of LDP-DL are aware of their parents/ancestors, yet in the last paragraph it is stated that maps can refer to resources of parents/ancestors. Where is this navigability defined, which logic is responsible for permitting this? Does this constitute a requirement for the implementation of LDP-DL components? If yes, then this should be stated as such.
3.2:
- Introducing the abstract syntax directly by means of the UML diagram would increase understandability.
- What are the "undesirable consequences"?
3.3:
- The first paragraph describes possible future work, thus should be placed in the appropriate section.
- "This is why, in the ResourceMap :rm3 [...]": there seems to be something missing in this sentence.
- In general, the purpose of a domain-specific language is to hide underlying complexity from the user of the DSL. However, there is only weak evidence that LDP-DL operates on an appropriate level of abstraction, considering the fact that no target audience is mentioned.
4.2:
- Preferably the evaluation could be presented by requirement?
- The meaning of Fig. 6 is unclear: does this refer to Read, Write, ... operations?
After rebuttals
===============
I thank the authors for the answers. It remains unclear whether the text will undergo some revision.


Review 2 (by Ana Roxin)

(RELEVANCE TO ESWC) Given the topics listed for the LD track of ESWC, this paper can cope with the following two items: 
- Extraction, linking and integration of LD
- Creation, storage and management of LD and LD vocabularies
(NOVELTY OF THE PROPOSED SOLUTION) To the best of my knowledge no similar approaches exist in litterature today.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The problem addressed in this paper is more or less approximated. The intensive usage of accronyms hardens the overall understanding. The example used to justify the need for such an approach (e.g. a linked data platform for a smart city) isn't very pertaining, as in such context other considerations appear (such as access rights on the data) and those considerations are not taken into account by the authors (mainly because not necessarily pertaining for the approach at hand).
Specific remarks:
- Your first requirement isn't straightforward at all - one may implement a platform to make use of available city-rekated data without relying on Semantic Web technologies or Linked Data
- The context isn't very clear either - first you mention one single "governmental institution reponsible" for an entire city, then different organizations are in charge of the city data; and before considering the exploitation and integration of such data, one should consider access rights and ownership of the data. You mention "hosting constraints" but there are legal constraints also that have to be taken into account. What about the future GDPR regulations ?
- Each city has its own organizations and policies and regulations. Cities almost never use data from another city. There are even issues when integrating data from the fire service with data from a police service. Also when considering investing in some "smart city platform", the people in charge rarely consider aspects such as data homegeneity or integration. On the oppositie, each [human] actor wants to remain the owner of his data and know exactly who uses it and how. So the constraints you derived from this analysis may be correct from a computer scientist point of view, but not from a politician or final user point of view. It is a bit disturbing to take examples from real life and derive such constraints from them, which are not at all straightforward. As a suggestion, perhaps instead of basing your motivation on such real-life example, you should do a solid analysis of existing LPD implementations, highlight the related issues, then derive your constraints for your approach.
- "what do the authors mean by "open data context" ? Used on page 1 and 2, never explained.
- "useful for domains such as smart cities" it can be useful for numerous application domains, smart cities is just one amongst those
- "addressed some heterogeneity levels (syntactic, semantic, structural)." - layers of interoperability are standard (e.g. physical, syntactic and semantic).  Levels of data heterogeneity have not yet been defined, at least not formally and not to the knowledge of the reviewer.
- Is a W3C Recommendation a standard ? "Linked Data Platform (LDP) 1.0W3C Recommendation […] the LDP standard"
- Existing LPD implementations are already criticized in the Introduction, but they haven't yet been presented.
- What do you mean by "interpret LDP-DL documents" ?
- Difficult understanding with all the accronyms used - the reader is lost among LDP, LDPRs, LDP-BC, LDPC…
- LDP-DL never defined !
- To what domain name correspond the prefixes "ex" and "dex" in section 2.3 ?
- In section 2.4 the constraints listed in section 2.2 become "requirements" ?
- Why do you choose to rely on MDE ? And more generally why the generation workflow depicted in Fig. 2 ?
- Fig 2 is too small and should contain references to all the acronyms mentionned in your text - or your text should highlight elements from the figure e.g. your text mentions "LDP dataset is deployed in an LDP." and your figure contains "LDP Dataset Deployer".
- Authors could include a figure for highlighting how LDP-DL applies to concepts in Fig. 1(b)
- "an LDP dataset (as described in Sec. 3.1)" - sections 3.1 describes the syntax of the LDP-DL language
(EVALUATION OF THE STATE-OF-THE-ART) - The current work section could be improved - main LDP implementations are only listed based on a classification thatt could be further justified (why do you consider only two categories, and based on which criteria do you decide which LDP implementation belongs to which category ?)
- The article is missing a section where the current approach is evaluated against the other existing LDP implementations - authors mention that to their knowledge no such LDP implementation exists, but this affirmation should be strongly justified. Could there be approaches that are not mentionned in the LDP implementation conformance report, but perform similar actions to the approach you present ?
- Also if such comparison is still not possible, then the authors should clearly highlight what their approach brings to existing LDP implementations
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) - The "Evaluation" section should explicitly list each of the identified requirements and specify how they are addressed and/or verified or not. It is unclear which experiments have been performed, what was their exact outcome and how they supported (or not) the initial requirements. 
- Additional explications are needed regarding why you "define a notion of interpretation and a notion of satisfaction in a model-theoretic way."
- Fig. 5 should contain an illustration of the different steps as they are performed during the workflow e.g. "1 import of data sources"
- You mention that your approach only uses one LDP Server (section 2.4 "In our work, we consider deployment of only one LDP server."), Fig. 5 specifies several of them - a comment would be welcomed.
- Regarding the LDP Browser (http://opensensingcity.emse.fr/ldp-browser/), you should implement a test in order to prevent lauching the loading of empty URLs for the LDP endpoint. Also it is unclear whether the user has to be logged in or not.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) All prototypes used for demonstration are available on GitHub, and URLs are provided whenever the tools developped by the authors can be tested online.
(OVERALL SCORE) Summary of the paper: Based on the initial finding that reusing the same design in different LDP deployments is difficult with existing implementations, authors have proposed a formally specified language for automatically generating Linked Data Platforms (LDPs) based on existing data, regardless of the existing server implementation.
SPs:
- Interesting idea and approach
- All tools used are published on GitHub
- URLs are provided whenever the tools developped by the authors can be tested online
WPs:
- The justification of why such an approach is needed could've been better formulated. 
- The "Related Work section" could've been improved. Existing LDP implementations are only mentionned, but not really compared with the approach at hand. Authors should emphasize the advantages of their approach over existing ones.
- Some English mistakes must be corrected, along with a rephrasing of several sentences (see below)
QAs:
- Why you "only support the deployment of LDPs where LDP resources are LDP-RSs and where LDP containers are LDP-BCs." ?
- how does this approach relate to the "containers" as defined in the W3C LDP recommendation ?
- Is a W3C Recommendation a standard ? "Linked Data Platform (LDP) 1.0W3C Recommendation […] the LDP standard"
- How does the approach at hand eases the implementation of the "smart city" scenario considered ?


Review 3 (by anonymous reviewer)

(RELEVANCE TO ESWC) The paper adresses the topics of the ESWC Linked Data track well.
(NOVELTY OF THE PROPOSED SOLUTION) The approach is novel in the sense that it automatically creates LDPs from existing data sets.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The approach is in an early stage as the proposed language allows to create a basic design pattern for an LDP and to deploy it. 
It is an interesting and promising first approach. The presented solution is in a early stage, however, already providing promising results.
(EVALUATION OF THE STATE-OF-THE-ART) The discussion of the state-of-the-art is rather short. A comparison is done with a limited set of existing implementations and the basic conclusion is that none of the implementations allow for automatic creation of an LDP from existing data. However, the discussion should also include a comparison of the access to the automatically created LDP in comparison with the existing approaches (is it faster? is it as convenient to access the linked data? ... etc.)
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The relationship between the requirements in section 2.2 and the rest of the paper, and particularly the evaluation in section 4.2 and the final discussion, remains unclear. Therefore it remains open, which requirements have been fulfilled and which ones are open to future research. It is noteworthy that on github some assignment of experiments to requirements is done. This should be explained in the paper as well.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experimental data is provided on github. However, the description of the experiments in the paper is rather shallow, thus limiting the reproducibility.
(OVERALL SCORE) I thank the authors for the rebuttal answers!
The key contribution is the definition of a language to define the design of LDPs, thus providing a tool for the automatization of the creation of LDPs from existing data sets. The strong point is the potential the approach has for the automation, however, the approach is in an early stage. The paper does show first results on the properties and the benefits of the approach. The discussion of the state-of-the-art is limited and should provide a broader scope. 
The paper is hard to read, e.g. it contains many acronyms and often repeats certain acronyms. Despite the fact that the acronyms individually are correctly used, the sheer mass of usage makes it difficult to establish a reading flow.


Metareview by Hala Skaf

The contribution is potentially useful and is well formalized, according to the reviewers. However, motivation and comparison to the SoA needs further improvement. The reviewers are satisfied with the authors reply letter and have increased their positive judgement overall. Notice however that some essential rewriting and changes have to be introduced in the paper prior to its publication (see detailed reviews), like some simplifications asked by reviewers (e.g, in the use of acronymes) and adding emphasis on why this approach cannot be easily compared with other existing ones.


Share on

One thought to “Paper 35 (Research track)”

  1. Our Rebuttal
    ============

    To begin, we thank reviewers for their comments,

    All reviewers say that the state of the art(Section5) is limited. This is because the current scientific literature about LDP is itself limited. Apart from a handful of references in peer-reviewed publications, existing LDP implementations are mostly referenced by the LDP conformance report. To our knowledge, there are only 2 implementations [1,2] not referenced by it. We did not mention them(but admittedly should have) since they can be both classified as LDP resource management systems and the claims we made about these systems apply to them also.

    Regarding the heavy use of acronyms, it’s unfortunately in line with the official naming of the concepts in the standard. We admit that we should make an effort to reduce them, but it may be at the cost of longer phrases.

    Regarding evaluation, we intend to make it clear about the experiments carried, which requirements they are geared to and discuss about the benefits of our approach for current LDP implementations. We already did so on our Github page [3].

    Reviewer 1 and 2:

    -Semantics of the language:
    We provide the semantics in a model-theoretic way because we want to be able to prove the conformance of an implementation of our language, independently of the way it is implemented. It can also be used to provide provably valid test cases in the future.
    Moreover, some LDP servers add metadata or auxiliary triples to the content of LDP resources. If the result of processing a design document was strictly given by algebraic operations, the deployment on such servers would not be compatible with our semantics, thus limiting the compatibility of our approach to a specific implementation. With our semantics, we can show that there are infinitely many valid isomorphic LDP datasets that satisfy a design document. This allows using relative IRIs when generating LDP datasets and setting the base IRI at the time of deployment, thus separating the generation and deployment of LDP dataset.

    – MDE:
    Our approach is based on MDE because it allows us to decouple the design of an LDP from its implementation. Fig.2 instantiates this approach. As shown in the figure, we exploit MDE possibilities by performing model-to-model and model-to-system transformation. We perform the former by generating LDP datasets from design documents and the latter by generating LDP from LDP datasets.

    Reviewer 2:
    -Criteria for classifying LDP implementation:
    The two categories are defined in Section5, so we classify an implementation in one of them if they satisfy those definitions. To make an analogy, the LDP resource management systems are like DBMS where queries can be issued to create, read, update or delete resources. An LDP framework is similar to a Web framework (such as django) where the interaction with the DB is encapsulated.

    -Comparison of our proposal with existing ones:
    Our proposal is complementary to and makes use of existing LDP implementations. A direct comparison is not feasible. We can only show the benefits brought by our approach in complement of an existing implementation. Also, we did not find any other approach similar to ours.

    -First requirement(Section 2.2):
    The first requirement should have been restricted to “In order to enhance interoperability and homogenize access” and the mention of Semantic Web technologies should not be in italic, as it is part of the solution to the requirement.

    -Context(Section2.2):
    In our project OpenSensingCity, we are considering the context of open data where data is publicly accessible to everyone, free, and can be reused in third parties. Therefore access rights, intellectual property, etc. are not relevant. Hosting constraints may be a concern in this setting, though.

    Reviewer 3:
    -Early stage of development:
    Our language is in its first version, but is complete wrt to its specification and implementation. We consider that the handling of dynamic data sources is already a sophisticated feature of our implementation. Moreover, the complexity of the design patterns is not restricted much by the language but by its usage. As an example, a design document can use recursive container maps, arbitrarily complex SPARQL queries, etc. Surely, there are features not yet supported but one has to start somewhere.

    1. https://github.com/OSLC/ldp-service-jena
    2. https://github.com/cavendish-ldp/cavendish
    3. https://github.com/noorbakerally/LDPDatasetExamples

Leave a Reply

Your email address will not be published. Required fields are marked *