Towards a Binary Object Notation for RDF
Author(s): Victor Charpenay, Sebastian Käbisch, Harald Kosch
Full text: submitted version
Abstract: The recent JSON-LD standard, that specifies an object notation for RDF,
has been adopted by a number of data providers on the Web. In this paper, we
present a novel usage of JSON-LD, as a compact format to exchange and query RDF
data in constrained environments, in the context of the Web of Things.
A typical exchange between Web of Things agents involves small pieces of
semantically described data (RDF data sets of less than hundred triples). In
this context, we show how JSON-LD, serialized in binary JSON formats like
EXI4JSON and CBOR, outperforms the state-of-the-art. Our experiments were
performed on data sets provided by the literature, as well as a production data
set exported from Siemens Desigo CC.
We also provide a formalism for JSON-LD and show how it offers a
lightweight alternative to SPARQL via JSON-LD framing (with polynomial
complexity), which makes it a good candidate as a query mechanism in
Keywords: JSON-LD; EXI; CBOR; HDT; Web of Things; Internet of Things; SPARQL; RDF
Review 1 (by Themis Palpanas)
(RELEVANCE TO ESWC) The paper proposes a new object notation for rdf, and demonstrates its advantages. (NOVELTY OF THE PROPOSED SOLUTION) It is hard to judge the novelty of the proposed solution, because the authors do not clearly discuss what the new ideas and contributions are when compared to existing solutions. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) I am not an expert in this area, and could not check the correctness of the proposed solution. (EVALUATION OF THE STATE-OF-THE-ART) The paper provides an extensive and informative discussion of the related work. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The authors explain the properties of the proposed solution through examples and formal statements. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The paper describes the experimental results with four different datasets. It would be useful if more datasets (with even more diverse characteristics) could be added. (OVERALL SCORE) This paper proposes a new object notation for rdf, describes the details of the proposed formalism, and demonstrates the benefits of the approach using four real datasets.
Review 2 (by Valerio Basile)
(RELEVANCE TO ESWC) This paper describes a method of serializing RDF in a compressed format aimed at embedded devices. Therefore, I think that it is relevant to the IoT and SW communities alike. (NOVELTY OF THE PROPOSED SOLUTION) While several formats for compact serialization of RDF have been proposed, this approach is based on Web accepted standards, and thus it should be easier to adopt at a large scale. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The theoretical foundation is very well laid out. A series of transformations of JSON-LD are made, that lead to a compact format that does not need to be decompressed to be further processed (e.g. for querying). I find the introduction of JSON-LD frames quite elegant as a way of mirroring SPARQL queries on the compacted data. (EVALUATION OF THE STATE-OF-THE-ART) Several alternatives are put to test together with the proposed method on datasets with different characteristics. The discussion of the results is fair and generally complete. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The experimental results show that while this may not be the best format for every possible use case, it is definitively a promising way, especially being based on a set of standards that are already widely adopted. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The datasets used for the experimental evaluation are either available or they will be, according to the footnotes in the article. No software seems to be released at the moment that implement the proposed method, although the specifications are quite detailed. (OVERALL SCORE) This paper presents a solid work towards the implementation of an open standard to store and query RDF data where space is a constraint. The presentation is excellent, the motivation is clear, and the experimental evidence is convincing. One thing that I would like to see is an experiment on the performance of real-world queries on data stored with this procedure, in comparison with other methods. Minor remark: there is a citation missing on page 3
Review 3 (by anonymous reviewer)
(RELEVANCE TO ESWC) The paper considers the use of JSON-LD as a serialization format to be used in devices under strong resource restrictions. Basically, the message is that JSON-LD with the so-called compaction can be used as a succinct representation of RDF data that can be used in, e.g., mobile devices, while the so-called framing is a more efficient methods to query RDF data compared to standard SPARQL. (NOVELTY OF THE PROPOSED SOLUTION) In my opinion, the theoretic results on formalization and compaction are rather straightforward (e.g., the compaction technique is based on Header-Dictionary-Triples format, or HDT, which is quite popular nowadays). There is a more interesting result on the computational complexity of framing (results/discussions around Theorem 1), but I have serious concerns about it (see later). (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) My main concern is the authors message that "JSON-LD framing extends SPARQL basic graph patterns without increasing the theoretical complexity of query processing." I think there is some misunderstanding here. It is known that conjunctive queries (or BGPs) are NP-complete if the query is not fixed. If the size of the query is considered to be bounded by a constant, then even *full* SPARQL queries (full first-order queries) are in polynomial time (this is a well-known fact). From this it seems that the authors are effectively introducing a fragment of SPARQL that is tractable even if the size of the query is not assumed to be fixed. However, from this it follows that, unless P=NP, arbitrary BGPs cannot be subsumed by JSON-LD framing. (EVALUATION OF THE STATE-OF-THE-ART) There is a lot of literature on tractable fragments of standard query languages, e.g. acyclic conjunctive queries, queries with bounded treewidth. Some of these works should be mentioned in the paper, given the claims of the paper. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The authors have implemented their approach and have presented promising results. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The authors have implemented their approach and have presented promising results. (OVERALL SCORE) See above. I would be great if the authors comment on my comment in "Correctness and Completeness".
Review 4 (by Javier D. Fernández)
(RELEVANCE TO ESWC) Authors tackle the problem of compact RDF serialization for embedded devices. To do so, they provide theoretical foundations of (a subset of) JSON-LD 1.1, currently under development. Although the work is in progress, given the increasingly attention of RDF in the context of the Web of Things, author's insights can be relevant to further develop compact RDF serializations and efficient querying on such scenarios. (NOVELTY OF THE PROPOSED SOLUTION) Authors formalize three components of the on-going JSON-LD 1.1 specification: the general syntax (without nesting), some compaction features (contexts) and framing (a way to search and objects matching certain criteria). For the latter, they provide the main semantics and complexity. To the best of my knowledge, such formalizations are novel in the community. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The aforementioned formalizations are sound and cover the fragment of the specification selected by the authors. Nonetheless, I have some doubts that can be solved during the rebuttal phase: - In Definition 3, if I'm not wrong, the context c is a map that is applied over U, a set of UTF-8 strings, resulting in an IRI, I. However, authors use c(f) to refer to the application of c over all IRIs. I would assume then that c is a bidirectional mapping, or maybe you rather state that c is I-->U. Update: Clarified in the rebuttal (typo), thanks. - In Definition 1 and 2, I understand that the object 'id' has all the provided types t1,...,tl. In other words, the 'AND' of the types. However, when it comes to the frames with a similar syntax (Definition5), the formalization states that the query is an OR, i.e. UNION, of all the provided types t1,...,tl (Definition 6, Equation 4). Is this correct? If so, I understand that a user cannot specify a 'AND' for the types, is that correct? Update: Clarified in the rebuttal (JSON-LD formalization), thanks. - The definition of wildcard is a bit unclear. In point 3, variables and wildcards can be mixed, which is then reflected in Equation 6 where wildcards are somehow converted to SPARQL variables. What is the concrete difference between a wildcard and a variable in that case? If none, then one can provide a simpler formalization stating that the wildcards in the frame query are replaced by variables whose names do not overlap with the existing variables in the query. Update: Authors provided a justification in the rebuttal, although I still think the formalization can be simplified. (EVALUATION OF THE STATE-OF-THE-ART) The evaluation of the state-of-the-art is complete enough, although some other works on RDF compression could be cited: Joshi A, Hitzler P, Dong G (2013) Logical Linked Data Compression. In: 10th Extended Semantic Web Conference (ESWC), pp 170–184 Pan J, Gómez-Pérez J, Ren Y, Wu H, Haofen W, Zhu M (2015) Graph Pattern Based RDF Data Compression. In: 4th Joint International Conference om Semantic Technology (JIST), pp 239– 256 Fernández N, Arias J, Sánchez L, Fuentes-Lorenzo D, Corcho O (2014b) RDSZ: an Approach for Lossless RDF Stream Compression. In: 11th European Conference on the Semantic Web (ESWC), pp 52–67 In turn, Section 2.2. of the state of the art could be significantly reduced. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Authors are clear in the scope and limitation of the approach. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) I have some concerns regarding the evaluation. First of all, authors do not provide neither the data and queries used in the experiment nor the code for reproducibility. I would encourage authors to clarify in the rebuttal if they are providing those. Further statistics on the datasets (in particular the original size, in order to clarify the reported compression ratio, but also the number of properties) would be helpful too. Then, the introduction of the evaluation is a bit unclear. On the one hand, authors state that they implemented framing, and no transformation to RDF or SPARQL is required. On the other hand, I understand that framing is only used as a retrieval mechanism, while the evaluation is only performed to test compact features. Several comments are in order: - Are authors using any framing features in the evaluation? - If authors implemented framing, an evaluation on the performance would make the paper much more complete, as great part of the formalisms is devoted to framing. In turn, I was wondering if it makes sense to evaluate the performance of the serialization against EN, given that authors state that it is the closest approach. Update: Thanks for the pointers in the rebuttal.The github repository provides the data and some instructions to reproduce the experiments. Nonetheless, it is a handicap to miss the frame evaluation. (OVERALL SCORE) The paper focus on formalizing JSON-LD 1.0 as a compact serialization for RDF graphs in the Web of Things, as JSON-LD Framing 1.1. as a mechanisms to query JSON-LD by example. Thus, authors provide theoretical foundations on the JSON-LD syntax (and how to transform it to RDF), its 'context' compact feature, and some initial semantics and complexity of JSON-LD Framing (taking into account that this latter is a specification in progress). As for this latter, authors provide a transformation of a frame to SPARQL, showing that the solution can be computed in polynomial time. Finally, an evaluation on JSON-LD compact features shows that it is competitive w.r.t. well-established compression methods such as HDT, being particularly effective with datasets with few triples and and a set of documents sharing a global context. * Strong points: - Relevant and timely topic - The formalization is simple and can result ephemeral given the current work in progress in JSON-LD Framing, but it can serve to guide future developments. - The paper is easy to follow in general * Week points (further elaborated in other sections) : - Lack of code/corpus for reproducibility (partially resolve in the rebuttal) - Limited evaluation (only serialization space) - Some definitions can be unclear * Questions for the authors (further elaborated in other sections): - In Definition 3, is c(f) a bidirectional mapping? - In Definition 5 and 6, do frames consider and UNION of the types? Is there a way to specify AND? - What is the concrete difference between a wildcard and a variable? - Are authors using any framing features in the evaluation? Is there a reason for not testing framing? - Is Entity Notation a real competitor as a serialization to be included in the evaluation? - Are authors releasing the data/code of the experiments? - Is the time and computing resources to serialize/deserialize the datasets an important factor in the envisioned scenario? If so, an extension of the evaluation could be needed. As another set of comments, the text of the paper can be improved as it contains many typos. The abstract of the paper has many room for improvement. Update: I thank authors for their comments in the rebuttal. The paper is interesting and have potential, although it needs some minor improvements in the aforementioned issues.
Review 5 (by Floriana Esposito)
(RELEVANCE TO ESWC) The paper fits the conference topics, in particular: Linked Data and Mobile Web, Sensors and Semantic Streams. (NOVELTY OF THE PROPOSED SOLUTION) The proposed approach exploits current technics for obtaining a new method for compacting RDF data in JSON-LD format. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The approach is well explained and all the methodological details are provided. (EVALUATION OF THE STATE-OF-THE-ART) The related work section is rich. However, the authors should better underline the differences and the strong points of the proposed approach with respect to related work. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The proposed approach is well described and motivated. Both Semantics and Complexity are described in a formal and rigorous way. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The evaluation is not reproducible since the dataset is not available. Moreover, an evaluation about querying data is not provided. (OVERALL SCORE) The paper describes a method for compacting RDF data in JSON-LD format. This approach is interesting in the context of Web of Things (WoT) where data must be stored in few kilobytes. The approach is interesting and results prove that for a number of triples around hundred the proposed approach is able to overcome the state-of-the-art. The authors provide a detailed theoretical description of the approach in order to prove that the complexity of the proposed method is equivalent to SPARQL. However, the evaluation section does not take into account any experiments about querying data on WoT. Moreover, the dataset used for the evaluation is not available and it is not possible to reproduce the experiment. I think that the paper is interesting from a technical/engineering point of view but it provides few scientific insights. To summarize: STRONG POINTS: - interesting approach in the context of WoT - the approach is able to overcome the state of the art (in some particular conditions) - the method is well described WEAK POINTS: - the evaluation is not reproducible - the evaluation does not take into account querying data - few scientific insights Minor issues: - pag.3 SPITFIRE [?], reference is missing - pag.10 please, report the set of ontologies in a table - graphs in Fig. 1 are hard to read - pag.13 you state: "datasets of less than hundred triples, the typical amount of data carried by constrained WoT agents". Please, provide some objective references about this.
Metareview by Intizar Ali
This paper proposes a compression technique for JSON-LD data serialized over the Web, the proposed technique focuses on resource constraints devices. Authors are addressing an important challenge faced by the community particularly for the solutions using a combination of semantic Web and IoT/WoT domains. While, we have overall a positive feedback for the paper, but we strongly encourage authors to take reviews comments, concerns and feedback (particularly from Reviewer 3 & Reviewer 4) into account before submitting their camera-ready version. The major concerns on formalism, complexity proof, and definitions’ correctness must be resolved. We also encourage authors to improve the reproducibility of their evaluation results.