ViziQuer- A Visual Notation and Tool for RDF Data Analysis Queries
Author(s): Kārlis Čerāns, Uldis Bojars, Agris Sostaks, Juris Barzdins, Julija Ovcinnikova, Lelde Lace, Mikus Grasmanis, Arturs Sprogis
Full text: submitted version
Abstract: Visual SPARQL query formulation notations aim at easing the RDF data querying task, still the existing approaches fall short of providing a generally accepted visual notation suitable for data analysis and statistics queries. In this paper we present a visual diagram-centered notation and tool for SPARQL select query formulation, capable to cover aggregate/statistics queries and hierarchic queries with subquery structure. We present the notation usage examples, describe its syntax structure and provide the semantics by defining the visual query translation into SPARQL. We report on early pilot studies indicating the potential applicability of the visual notation to formulating SPARQL queries, as well as briefly describe the web-based open source implementation of the tool.
Keywords: Visual notation; Diagrammatic queries; RDF data endpoints; SPARQL; Ad-hoc queries; Data analysis; Multi-modal query tool
Review 1 (by Victor de Boer)
(RELEVANCE TO ESWC) Improving usability of SPARQL querying is core to the conference. (NOVELTY OF THE PROPOSED SOLUTION) Not novel per se (as per the previous publications of the authors) (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The syntax and semantics seem correct as far as I was able to check this. The coverage of the visual language is not complete with respect to SPARQL 1.1, specifically not covered are named graphs and property paths. The authors claim that it is easy to implement but that it is left out to not damage the translatability to RDF. I think this is a fair point, but for the main claim, that this is a good alternative notation for SPARQL, being complete is preferable. (EVALUATION OF THE STATE-OF-THE-ART) seems appropriate (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Well, described and well-supported by online materials. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) well-supported by online materials. (OVERALL SCORE) This paper presents a visual notation for SPARQL queries, called ViziQuer. The notation is based on diagrams and covers most of SPARQL 1.1 syntax and semantics. This paper builds on previous work by the authors and extends that with a new user study, an abstract notation and a tool description. The paper is well-written and the authors make clear where the novel contributions of this paper lie. The paper describes how the viziquer diagrams are constructed and also details the abstract syntax and semantics. The paper points also to a fairly advanced online tool at http://viziquer.lumii.lv/ Strong points: - The paper describes an interesting challenge that is core to the ESWC community - The syntax, semantics of the notation are described quite rigourously, but good examples are also given throughout the paper - The user studies are limited but very relevant and set up well. The materials for the user test are provide in an online appedix, which makes the whole endeavour quite reproducible. - The online tool is great supplemental material Weak points: - The main challenge in reviewing this paper is to determine the extensions beyond the existing papers. There is extra material, however, and I would say that most of these existing publications are workshop papers and therefore can be extended in this conference proceedings paper. - What is also missing is a discussion on the type of users, domains, use cases for which such a visual query environment is preferred. -- update after author rebuttal -- I have read the authors rebuttal. My points were not directly addressed and my view on (and scores for) the paper has not changed.
Review 2 (by anonymous reviewer)
(RELEVANCE TO ESWC) The paper is very relevant to the conference; the problem of designing (and implementing) querying interfaces that can help users to formulate queries is interesting. Indeed, writing (complex) queries can be a major obstacle for non-expert users (this problem has also emerged and has been addressed for SQL). (NOVELTY OF THE PROPOSED SOLUTION) The solution proposed by the authors, that is, a notation inspired by UML diagrams is interesting in general. Nevertheless, the paper lacks a Related Work section and thus it is difficult to grasp the contribution. As far as I know, the problem has been addressed by other research groups (e.g., ). I suggest authors to provide a more detailed comparison with related research (the short discussion in the Introduction is not enough).  http://www.inf.unibz.it/~franconi/quelo/ (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The solution proposed is reasonable although it fails in its main goal. When reading the paper, and especially Section 2.1, I had the impression that the approach is overly complicated. If one looks at Fig.2 it is really hard to match the left part (the visual notation) to the rigth part (the SPARQL query). Authors should consider starting with a simpler example and then incrementally add features. As it is the example is not really helpful. The usage of H(.) and count(.) does not help either. Beside that, authors describe a semantics of their (visual) query language through a mapping to SPARQL. This part is interesting but is orthogonal to the main goal of the paper, that is, helping users in writing SPARQL queries. (EVALUATION OF THE STATE-OF-THE-ART) As previously mentioned, the paper lacks Related work section and thus it is difficult for the reader to completely estimate the novelty of the proposed approach. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The approach described in the paper has been implemented and is available online. The website is well described and complete. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) As previously mentioned, the system is available online. (OVERALL SCORE) Despite the problems mentioned above, I think the paper has some merit. Authors made a great effort in implementing the system and making it available online. Nevertheless, authors should consider reworking the example and stick with one or two examples and incrementally add features. ---- After rebuttal: I thank the authors for their response. Nevertheless, it is not clear the positioning of the paper wrt the state-of-the-art.
Review 3 (by Carlos Buil Aranda)
(RELEVANCE TO ESWC) This is relevant since visually developing a SPARQL query is a difficult and important problem in the community. (NOVELTY OF THE PROPOSED SOLUTION) It is not that novel since there exist other approaches using UML notation to visualize ontologies. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) From what I understood the semantics are the same than for SPARQL, not sure why they are there. From a User Interface point of view, the users do not justify why the visual interface is good or bad. (EVALUATION OF THE STATE-OF-THE-ART) There is no such section, just a few lines in the introduction. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The authors fail to explain why the design was effective in their evaluation. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) the authors provide t he materials for the experiment. (OVERALL SCORE) In this paper the authors present a system (ViziQuer) that helps end users to develop SPARQL queries. To do so the system uses a UML-like visualization for the query components, assuming that users understand such notation. The authors describe in the paper the main components of the visual system, the semantics of such components and their translation to SPARQL and an evaluation of the system. In the introduction section the authors describe the need of new SPARQL user interfaces, introducing briefly other UIs for aiding users in developing SPARQL queries. They also introduce their system, describing from a higher point of view its main characteristics. Comments: What I miss from this section (and from the paper) is a more complete description of other systems in the state of the art. Lately appeared some of them, like the ones presented in the Voila 2017 workshop. Also, the authors comment about some of the SPARKLIS functionalities later in the introduction. Personally I prefer a more compact way of writing so the reader gets all information at once. In Section 2 the authors present the basic visual query constructs of the system by means of examples. First they introduce the domain ontology, the basic query components to draw a SPARQL query and how modifiers/aggregates/etc. are used within the system, how to generate subqueries and link the projected results to other parts of the queries, etc. Comments: This section contains quite a bit of examples (~28) so users are able to use most of the SPARQL operators. One question it comes to my mind is how easy is for end users (with limited knowledge of SPARQL) to learn to use the interface. Do the authors have any insight about it? it seems quite overwhelming. In Section 3 the authors present the abstract query model as UML components. Figure 8 shows the complexity of the query model, arising the same question I had before: how hard is for users to learn to use the system? Learnability is one of key aspects in usability . In Section 4 the authors present the semantics of the system, which if I am not mistaken are the SPARQL semantics as defined in  and . Can the authors explain a bit more about why these semantics are needed when the translation seems almost direct? In Section 5 the authors evaluate the system. This evaluation is done by using two studies, one with users without IT knowledge that had to interpret queries and another one with IT students that had to develop SPARQL queries. All the users from both studies received an introduction with slides at the paper’s web page. These slides show the data ontology, a query example, examples of several tool operators and the details needed to add aggregates, and modifiers. Comments: The main comment I have about this section is, again, about the system’s learnability. In the slides I believe the authors present most of elements to build a query, being the most important the data ontology. The examples are mainly to look at the ontology and write in UML a query similar to that ontology, adding new aggregates by looking at the examples. I find that a very basic evaluation that could be improved by adding variables like i.e. what variables within the interface allowed such good results, and why they were getting bad results before. By variables I mean execution time for each query, number of clicks, what try and error procedure they students followed, etc. Also, from such experiments I would expect to have results about confidence intervals regarding completion rate for instance, so it is possible to generalize the results. However there is no such data. I would recommend the authors to check [1,2]. Overall comments: I think this is an interesting paper, however the evaluation is weak since it does not show why the system is good. Besides, the evaluation may be biased by the handout slides used in the user’s training.  Nielsen, J. (1994). Usability engineering. Elsevier.  Sauro, J., & Lewis, J. R. (2016). Quantifying the user experience: Practical statistics for user research. Morgan Kaufmann.  Garlik, S. H., Seaborne, A., & Prud’hommeaux, E. (2013). SPARQL 1.1 query language. World Wide Web Consortium  Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. Semantics and complexity of SPARQL. ACM Transactions on Database Systems (TODS), 34(3), 2009 =============== After author's response ================ I acknowledge the author's review. I think that the system seems to be in a mature state. I acknowledge the response about the bias, however I am not sure enough about the effectiveness of the tool since the handouts with the data ontology expose how to create the query. The slides show the schema in UML and next a query which is a subset of that schema. This is why I would like to have more proofs about the effectiveness of the tool in the evaluation. I will maintain my score.
Review 4 (by anonymous reviewer)
(RELEVANCE TO ESWC) This paper presents a UML-based notation for SPARQL queries, with the aim to ease the creation and comprehension of SPARQL queries. This is a topic of relevance to ESWC, as not everyone can be expected to write textual SPARQL queries, and a diagrammatic notation might help to make SPARQL-based querying accessible to larger user groups - although the proposed ViziQuer notation can also not immediately be used but needs to be learned first, which will likely limit the impact of the work. (NOVELTY OF THE PROPOSED SOLUTION) While visual SPARQL querying is not a new topic and there exist a couple of attempts already that support and ease SPAQRL querying by visual notations, the proposed notation seems to be comparatively expressive and powerful. The paper is based on previous works presented at workshops and as demos at previous ISWC and ESWC editions (references [8,11,12,13]). (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Overall, the description of the notation looks correct (though I might have missed details, as not all parts of the paper were easy to read and understand). The approach has been implemented in a tool that demonstrates its applicability. I am not yet convinced that the notation is much easier to learn than textual SPARQL and could imagine that it can become tricky to create diagram of complex SPARQL queries (some diagram as shown in Figure 2) for the untrained user. While I appreciate that the authors conducted user evaluations to test their approach, I would be careful with the interpretation and generalization of the results, as the evaluations should be considered preliminary and seem to have some flaws and limitations. (EVALUATION OF THE STATE-OF-THE-ART) The discussion of related work is limited. While some attempts are mentioned (OptiqueVQS, QueryVOWL), others are not (e.g. iSPARQL, RDF-GL), which makes it difficult to evaluate the novelty of the contribution of this work. It is simply stated that related work does not include "queries with data aggregation and statistics facilities", which might be true but needs further elaboration. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The notation is illustrated by a number of examples from a realistic use case. The approach has been implemented as an open source tool and is available online as a demo and on GitHub. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) A link to a website (http://viziquer.lumii.lv/eswc2018_submission/) providing comprehensive supplemental material is provided. It includes the example ontology and diagrams, the tasks of the user study, etc. While this is very exemplary, I am missing material on the results of the user study and some details on the its methodology that would allow to reproduce the study. (OVERALL SCORE) The paper presents a UML-based notation for SPARQL queries. Next to basic queries, it can also represent more complex SPARQL queries containing aggregations and subqueries. The abstract query model and semantics are presented as well as some results from a preliminary evaluation. The notation looks quite expressive and capable to represent most constructs of SPARQL 1.1. I like that it is illustrated by a number of examples and that it has been implemented in a tool which is publicly available. However, the notation does not look intuitive and needs some training. This makes me wonder if it is worth the effort to learn yet another (visual) query language instead of learning (textual) SPARQL directly. The diagrammatic notation might help in the creation and comprehension of SPARQL queries, but it will likely not fully replace textual SPARQL. For that reason, I would have liked to see more realistic cases and scenarios of how ViziQuer could be used as a complement to textual SPARQL editors. Unfortunately, the performed user evaluations are limited and appear a bit biased: students from the university of the authors have been used as study subjects, and it is not clear how the query writing tasks have been created, with the risk that they were unintentionally tailored towards ViziQuer. I would have liked to see an evaluation with a different user group (e.g., administrative staff of the hospital) and a set of tasks created by some third party or selected from a SPARQL benchmark (if this would be possible). Minor comments: - The quality of writing could be further improved. There are several grammatical errors. - I do not see why this notation is "multi-modal", as stated by the authors. - The figures could be of better quality (I would suggest to use vector graphics or at least images with a higher resolution and quality).
Metareview by Olaf Hartig
This paper introduces a system for end users to write SPARQL queries using a visual UML-based notation. The reviewers agree that the presented work is very relevant and interesting. However, there has been no consensus among the reviewers regarding the maturity of the work. In particular, there have been major concerns about the representativeness of the user evaluation and about the ease of use of the visual notation. The core of these concerns is related to the learnability of the notation. This point remained unconvincing to some of the reviewers even after the rebuttal. Another weakness of the paper that some reviewers have raised (and felt was not satisfactorily addressed during the rebuttal) is an insufficiency of the discussion of related work. Due to the remaining concerns of some of the reviewers, the paper in its current state cannot be accepted for publication in the conference. However, given the agreement on relevance and the interest in the presented work, we highly encourage the authors to address these issues to turn this manuscript into a high quality paper in the future.