VOCALS- Vocabulary & Catalog of Linked Streams
Author(s): Yehia Abo Sedira, Riccardo Tommasini, Daniele Dell’Aglio, Marco Balduini, Muhammad Intizar Ali, Danh Le Phuoc, Emanuele Della Valle, Jean-Paul Calbimonte
Full text: submitted version
Abstract: The nature of Web data is changing. The popularity of News feeds, and Social Media, the rise of the web of things and the adoption of sensor technologies are examples of streaming data that reached the Web Scale. The different nature of streaming data calls for specific solutions to problems like data integration and analytics. We need streaming-specific web resources: new vocabularies to describe, find and select streaming data sources and; systems that can cooperate in real-time to solve streaming processing tasks. To foster interoperability between these streaming services on the web, we propose the Vocabulary & Catalog of Linked Streams (VOCALS). VOCALS is a three-module ontology to (i) publish streaming data following Linked Data principles, (ii) to describe streaming services and (iii) track the provenance of the stream processing.
Keywords: RDF Streams; Semantic Web; Ontology; Web Stream Reasoning; Stream Reasoning; Linked Data Streams; Streaming Linked Data; Linked Data
Review 1 (by Amelie Gyrard)
Rebuttal phase We trust the reviewer regarding the relevance of Vital, Fiesta, and CES. However, we could not find resources that support wider usage of the first two ontologies, or their complete reference documentation (only project deliverables). CES instead was not available. Vital ontology:http://vital-iot.eu/ontology/ns/ontology.owl FIESTA-IoT ontology: http://ontology.fiesta-iot.eu/ontologyDocs/ http://purl.org/iot/ontology/fiesta-iot# Such resources can be easily found on this web page: http://sensormeasurement.appspot.com/?p=ontologies (ctrl+f search browse functionnality name of the project, authors etc.) ---- Summary: The Vocabulary & Catalog of Linked Streams (VoCaLS) ontology has been designed which contains three modules aiming to (i) publish streaming data following Linked Data principles, (ii) to describe streaming services and (iii) track the provenance of the stream processing. The ontology is also the result of the work from the W3C RSP (RDF Stream Processing) Community Group. Three main topic challenges are addressed: Publication & discovery, Access & processing, and Provenance & licensing. Four use cases are provided: (1) Semantic Stream discovery for smart cities, (2) federation operator for RSP engines, and (3) RSP services, and (4) RSP Benchmarking. Strengths: • The ontology is accessible: https://w3id.org/rsp/vocals# • Ontology graph visualization is provided with WebVOWL tool, LODE and Widocio have bene used to provide the ontology documentation. • Paper well-written and structured. Table 2 is interesting. Weaknesses: • Numerous suggestions to improve the model and implement the ontologies, but also to reuse the existing ones • Explore more IoT and smart city ontologies since the IoT and smart city community are addressing real-time data analysis challenge. • What would be the future plan regarding this ontology? Would it be standardized by W3C? Liked W3C SSNO, SOSA? Ontology Modeling or Implementation Issues: • The authors mention the complex event services (CES) ontology which is not used within the implementation. Is the CES ontology available online? • why not reusing more ontologies check ontology catalogues for smart cities or IoT: o Within the iot-lite ontology  , there is a concept of endpoint why not reuse it. Concepts: vocal:Stream, vocals:StreamEndpoint vocals:hasEndpoint could be iot-lite:endpoint o What about the Stream Annotation Ontology (SAO)  ? o Vital ontology http://vital-iot.eu/ontology/ns/ontology.owl UnsubscribeFromObservationStream and SubscribeToObservationStream concepts o within fiesta-iot ontology, there is iot-lite:Service as well which might be relevant o why not using owl-s ontology for services (mentioned in section 2 but not used within the ontology) o SD vocabulary is explained within the background section but not used within the implementation or design. Idem for OWL-S etc. How the SD is used within your own ontology? o vprov:R2SOperator, vprov is not used within the ontology code • The ontology has not been integrated yet with ontology catalogues such as LOV, LOV4IoT or any ontology catalogues relevant to this domain. o The author mentioned the submission to the LOV catalogue o An error occurs with the namespace on LOV suggestion, but ok with the owl extension (https://ysedira.github.io/vocals/docs/core/ontology.xml) o http://lov.okfn.org/dataset/lov/suggest?q=https%3A%2F%2Fw3id.org%2Frsp%2Fvocals%23 • It seems that the ontology has been developed within a month (December 2017). How it can be really used and employed? Suggestions for improvements: • If the section is called background it means that the technologies are used. Otherwise it should be called related work. • Why not keeping the survey opened to accept more answers and to be able to see the questions? https://docs.google.com/forms/d/e/1FAIpQLSenWSc230cTVGveCM7v19NmYZOfKgh17zaF1VwpJRrx--f4wA/closedform • Provide more examples within the ontology documentation • Why not thinking about unifying the existing ontologies relevant? • Ontologies designed for IoT might address those issues of real-time? Have you explored those ontologies. For instance, check the LOV4IoT catalogue • Page 7 vsd:StreamingService, vsd ontology has not been introduced previously • The ontology could be improved with more tools such as Oops, Triplechecker, Vapour o To learn more about semantic web best practices   http://iot.ee.surrey.ac.uk/fiware/ontologies/iot-lite  IoT-Lite: A Lightweight Semantic Model for the Internet of Things [Bermudez-Edo et al.]  http://iot.ee.surrey.ac.uk/citypulse/ontologies/sao/sao  A Knowledge-based Approach for Real-Time IoT Data Stream Annotation and Processing.  http://perfectsemanticweb.appspot.com/ + see publications  http://lov4iot.appspot.com/ Error on LOV with the namespace: An error occurred: org.apache.jena.riot.RiotException: Failed to determine the triples content type: (URI=file:///usr/local/lov/lovnode/https://w3id.org/rsp/vocals# : stream=application/xml : hint=null) at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:755) at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:652) at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:211) at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:104) at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:95) at org.apache.jena.riot.RDFDataMgr.loadModel(RDFDataMgr.java:331) at org.lov.LovBotVocabAnalyser.analyse(LovBotVocabAnalyser.java:58) at org.lov.LovBotVocabAnalyser.analyseVocabURI(LovBotVocabAnalyser.java:48) at org.lov.cli.Suggest.exec(Suggest.java:71) at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101) at arq.cmdline.CmdMain.mainRun(CmdMain.java:63) at arq.cmdline.CmdMain.mainRun(CmdMain.java:50) at org.lov.cli.Suggest.main(Suggest.java:27)
Review 2 (by María Poveda-Villalón)
------ Many thanks for the responses to our comments. After having read them, I've decided to keep my score. ------ This resource paper presents an ontology network consisting on three ontologies to publish streaming data as Linked Data, annotate streaming services and track provenance of the streaming processes. Main points for accepting the paper is that the work is definitely interesting and useful for the community, well supported, described, published and it meets most of the resource call requirements. Almost all questions proposed by the call (https://2018.eswc-conferences.org/resources-track/) are positively answered by the presented resource. The only point for which I have some question is: "Did the authors perform an appropriate re-use or extension of suitable high-quality resources? For example, in the case of ontologies, authors might extend upper ontologies and/or reuse ontology design patterns.": It is clear that existing ontologies are reused, like dcat, prov and dcterms. These ontologies can be considered cross-domain and it is a common practice to specialized them to any domain. However, even though some domain specific ontologies are mentioned for example in Section 8 (Vocabulary of Interlinked Streams (VoIS) or WeSP) they are not reused in the proposed network. It is advisable to provide some reasons for doing so, maybe lack of coverage or instability of the resources, etc. Actually, it seems to me that Vocals might be the evolution of VoIS or at least somehow related. Will VoIS keep its own direction? Will they both converge? It would be easier for readers to have that clearly stated rather than trying to put together things like: (a) checking VoIS description and wondering why VoIS is not reused given that no drawback is pointed out in the paragraph devoted to it in Section 8 and there are some overlapping concepts (https://github.com/streamreasoning/vois/blob/master/vois.png), (b) realizing that VoIS authors are a subset of VoCals authors, and (c) finally noting that the survey answers name provided in footnote 7 (https://goo.gl/zsEJXe) is "VoIS Survey (Responses) : Form Responses 1". In addition, the Web of Things seems to be a concept definitely related and to some point overlapping with the presented resource. Have authors planned/consider to reuse or align with the vocabulary being defined by the W3C Web of Things Working Group? Some concepts as the endpoints (now "Link" in WoT vocabulary, to be renamed to "Form") might be aligned. See https://www.w3.org/TR/wot-thing-description/#vocabularyDefinitionSection **Additional comments**: *Important ones* .- How is R4 split into R4.a R4.b and R4.c in Table 2? Which part of R4 in page 6 is referred to with each sub-requirement in the table? I would suggest to include it, for example in the same R4 description in page 6.*Less important* .- Footnote 7: I can't see the full text of the headers of the table, then it is quite cumbersome to follow the provided data. Also, I can't find the way to download the data as csv, I can only download the html keeping then the same problem. Another visualization or publication format should be provided for this data. .- Footnote 6: https://goo.gl/forms/tV2U9q7VnDzwR4tU2 as the survey is closed the URL does not show the questions, this footnote is not very informative therefore. *Minor ones* .- Figure 1: dcat:Catlog -> Catalog. Also at https://ysedira.github.io/vocals/docs/core/index-en.html#desc .- Figure 2: What is the corresponding namespace for the prefix "frmt"? Also in other parts for vprov and vsd. It would be nice to have a list/table in the paper with all the prefixes used along the paper. Only at the end of the paper I found the key: https://github.com/ysedira/vocals .- Page 9: vprov:WidowOperator -> vprov:WindowOperator .- Page 9, Fig 4: It is not clear the direction of the properties "contains" and "containedIn". One arrow per property would be better, or indicating the direction with a "->" and "<-" somewhere in the property text. .- Page 11: typo "VoCaLS allows CQELS  to add a federation service similar to *the on of* SPARQL 1.1."? .- Page 12, describing LSD, in the 5th line "Therefore, LDN supports..." Does it mean LSD?
Review 3 (by Maria Maleshkova)
The authors present what they identified as a current gap for a shared metadata declaration of RDF streams, streaming services and operations on streams. In addition, they presents a list of requirement such a vocabulary has to fulfill and VoCaLS as the ontology to actually meet them. The outlined gap is plausible and the architecture of the ontology seems well defined. The necessary technical steps for documentation are more or less conducted even though the amount of online documentation could be further increased. Especially the explanations on the core concepts and relations but also required minimum criteria for a valid description are rather short and therefore allow (mis)interpretations. The paper lacks any implementation on a real-world stream or stream service. Consequently, the maturity and completeness of the proposed vocabulary can only be estimated. Especially regarding the use case of integrating various, heterogeneous streams by VoCaLS descriptions emerging technical requirements are not considered. The discovery and selection of suitable streaming endpoints is therefore not possible without additional, external descriptions which are not discussed in the paper. The further comments follow the structure of the paper: Title: ‘… Catalog of Linked Data Streams’ creates the expectation of a registry-like database for streams. Instead, the main contribution is the vocabulary. If a real catalog is future work it would be nice to mention it somewhere. Chapter 3: It’s hard to grasp the main intentions of the following two questions: ‘Where can I access the history of the historical stream as a static dataset?’ What would be the history of an historical stream? It might be the kind of actions manipulating a stored stream but that’s just an assumption. Further clarification is needed. ‘How can we describe the process that generated stream windows?’ If this question targets the method how stream partitions can be defined it’s unclear why according query languages are not sufficient. The aggregation of the seven challenges towards the formulated requirements seems reasonable in general. Only R3 is not intuitively understandable: ‘R3: enable historical stream processing/analysis and replay, i.e., allowing stream storage and dumping of stream samples;’ It’s difficult to understand how a vocabulary shall support historical analysis or maintain stream dumps as this should be part of the specific streaming service or database. The statement claiming that ‘Streaming Data is relevant to the Linked Data community’ is not justified by the answers. First of all, the set of interviewees is quite biased and, furthermore, the conclusion is not covered by the two sentences before. The statement might be correct nevertheless, but is not supported by the presented facts. The survey also uncovers a significant lack of knowledge in the community about the foundational ontologies of VoCaLS as listed in Table 1. This would contradict the conclusion that a new, standardized vocabulary is necessary at all, as the community doesn’t even use the already existing solutions. It would be interesting to explain why a new description language like VoCaLS creates a substantial benefit if a high ratio of the community is very uncertain about the existing vocabularies, most probably because they are not (yet) familiar with them, and what needs to be changed to accomplish a better dissemination. Chapter 4: The amount of text explaining the core module falls rather short considering its importance for the ontology. Therefore, essential parts like e.g. the StreamDescriptor concept are not sufficiently outlined. Chapter: 5 In a naive understanding a CatalogService is a static (RDF) document or at least some kind of static content. Why is it then modeled as a subClassOf StreamingService? Neither a SPARQL endpoint nor a RDF files need to be streaming services. In addition, if a CatalogService is an endpoint itself for a certain set of streams (including their endpoints) a central registry for the catalogs themselves is also necessary to enable discovery. It’s not discussed whether this is out of scope or simply not yet considered. Fig. 3: According to the description of a CatalogService, :TW can also host a CatalogService instance for ‘RDF_S1’. However, it seems like most necessary information about ‘RDF_S1’ are already covered by the description of :TW, so the relevance for the catalog class is not obvious. Chapter: 6 The PROV Ontology distinguishes agent-centered provenance, object-centered provenance and process-centered provenance. It seems like the authors created VoCaLS Provenance to tackle insufficiencies of PROV-O for the process-centered provenance of streams. The fact that Listing 1.3 and 1.4 also includes PROV-O relations supports this assumption. Nevertheless, it would be helpful to make the argumentation more explicit. Chapter: 7.1 How does VoCaLS exactly support Web scale stream discovery? Chapter: 7.2 “VoCaLS allows CQELS  to add a federation service similar to the on of SPARQL 1.1. Using VoCalS vocabularies, the query compiler of CQELS can understand federation constructs like the one shown in Listing 1.5.” Supposing that the declaration of endpoints through a variable is provided by CQELS, what is VoCaLS specific in this query? Why could other vocabularies not do that? Chapter: 7.3 Especially the statement that “the adoption of VoCaLS is not only an opportunity but a necessity to foster interoperability between different implementation and instances of RSP Services.” is creating the questions why both stream publishers and consumers will instantly benefit from applying VoCaLS. After a significant amount of available endpoints, the integration will be simplified. The main challenge is therefore how to accomplish this amount. Secondly, it is stated that “VoCaLS Service Description takes RSP into account and, thus, it can annotate RSP Services out-of-the-box.” It’s not straightforward which characteristics of RSPs are considered here and which parts of VoCaLS-SD is exclusively for RSP. Chapter: 8 It would be helpful to explain where VoIS and VoCaLS differ to better understand why both are justified. To summarize, the proposed ontology seems to target an important gap for RSP services and therefore is of interest for the Semantic Web community. VoCaLS is publicly available and can be imported with the standard tools. Nevertheless, a consistent running example introduced at the beginning of the paper would significantly ease the understanding of the argumentation and the main contributions. Regarding the wide range of targeted challenges and considering the current level of detail in VoCaLS and its documentation, it might seem more like a well-proposed draft than a mature vocabulary. If that is not the case, more detailed explanations on the envisioned correct usage and current scope/limitations/assumptions are necessary. Typos: p.4 “...characterize the contents of <b>and</b>the(?) (RDF) stream content...” p. 4 “...selection of stream partitions ...” Fig. 1: dcat:Catalog instead of ‘dcat:Catlog’ (also in the online documentation) Fig. 3: Turtle instead of Tuttle p.8 “which type of time <b>is</b> control is applied” Listing 1.5: “… filters ...” p.13 “...and receivers are RESTful.” Namespaces are not consistent: vocals as core module’s namespace but also vsd for vocals-sd and vprov for vocals-prov
Review 4 (by anonymous reviewer)
The paper presents and discusses a vocabulary for describing streams of linked data. The idea behind this work can be considered interesting, but I think that it is not mature enough for being published. By reading the paper, I detected three main issues that should be solved. 1) Marginal extension with respect to existing dictionaries. The number of concepts that are defined is very few and a relevant part of the resource consists in information inherited from existing dictionaries. This aspect highlights the lack of scientific contribution of this work. Anyway, it can be considered as an interested starting point for defining a document of specifications, but not a scientific contribution. 2) Comparison with the state of the art. The authors should argue better with respect to existing ontologies, vocabularies, and published work. Example of existing resources are described in: - https://iotdb.org/pub/ - http://lov.okfn.org/dataset/lov/vocabs?tag=IoT No mentions are included in the paper. The same is for papers already published that discusses about models for managing streams of data coming from IoT ecosystems: - María Bermúdez-Edo, Tarek Elsaleh, Payam M. Barnaghi, Kerry L. Taylor: IoT-Lite: A Lightweight Semantic Model for the Internet of Things. UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld 2016: 90-97 (http://personal.ee.surrey.ac.uk/Personal/P.Barnaghi/doc/UIC2016_Bermudez_et_al.pdf) - Amelie Gyrard, Martin Serrano, Pankesh Patel: Building Interoperable and Cross-Domain Semantic Web of Things Applications. Managing the Web of Things 2017: 305-324 (https://arxiv.org/pdf/1703.01426.pdf) Thus, more comparisons have to be included in the discussion that the authors provide in the Introduction of the paper. 3) Adoption. Here, I link my point with the emphasis I gave in the previous one when I mentioned works and resources about IoT. First of all, I would avoid to mention that social networks can be treated as RDF data streams. Big players like Facebook, Twitter, and Instagram do not release RDF streams of their data (to my knowledge) and even if they would do, streams would not be open (e.g. Twitter). Second, by considering the low scientific contribution (as mentioned in the first point), the adoption aspect should compensate this issue. Currently, this resource has not been adopted by a consistent number of people of the SW community or integrated into several projects/initiatives. Thus, there is still the risk that it might be yet another resource that will be parked within the linked data cloud without be considered. Evidences about its use are mandatory before considering it for publication. -------------- I thank the authors for their effort in preparing the rebuttal. After reading their reply, I confirm my concerns and the score given earlier.