Paper 32 (Research track)

An Easy & Collaborative RDF Data Entry Method using the Spreadsheet Metaphor

Author(s): Markus Schröder, Christian Jilek, Jörn Hees, Sven Hertling, Andreas Dengel

Full text: submitted version

Abstract: Spreadsheets are widely used by knowledge workers, especially
in the industrial sector. Their methodology enables a well understood,
easy and fast possibility to enter data. As filling out a spreadsheet
is more accessible to common knowledge workers than defining RDF
statements, in this paper, we propose an easy-to-use, zero-configuration,
web-based spreadsheet editor that simultaneously transfers spreadsheet
entries into RDF statements. It enables various kinds of users to easily
create semantic data whether they are RDF experts or novices. The
typical scenario we address focuses on creating instance data starting
with an empty knowledge base that is filled incrementally. In a user study,
participants were able to create more statements in shorter time, having
similar or even significantly outperforming quality, compared to other

Keywords: spreadsheet; RDF data entry; filling knowledge base

Decision: reject

Review 1 (by anonymous reviewer)

(RELEVANCE TO ESWC) This paper describes an approach to populate RDF knowledge bases with instances by using a spreadsheet-like interface. In general, the topic addressed is relevant to ESWC. While there are many solutions to consume data, little has been done in terms of (user) supports to generate data.
(NOVELTY OF THE PROPOSED SOLUTION) The novelty of the approach is questionable. The underlying idea quite trivial: generate data (a matrix basically) using a spreadsheet. There are other approaches like that tackle the problem of converting spreadsheet-like data to RDF. At this point one may wonder why the whole project has not been designed as a plugin of some existing tool like Protege (which by the way allows to insert instances).
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The tool (which can be tried online) is nice and intuitive. Authors conducted a user study to measure the easiness of usage as compared to existing tools. Nevertheless, user studies for academic purposes are intrinsically limited. Since the goal of the tool is to foster the creation of RDF data (presumably in a real-world environment) it would have been nice to evaluate its usefulness in context where RDF data are actually created (although I realize this may be difficult).
(EVALUATION OF THE STATE-OF-THE-ART) Authors compare their approach with Protege. Nevertheless, the comparison is clearly limited to the specific task (i.e., instance creation). Protege is a very mature and comprehensive tool. As previously mentioned, author should consider including their tool as a Protege plugin.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Authors provide a detailed evaluation of the features of their tool through a user study.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) While the user study is clearly not reproducible, the tool is available online.
(OVERALL SCORE) Authors describe a tools to foster the creating of RDF data. The underlying idea is to use a spreadsheet-like interface where columns are seen as properties and sheets as classes (with rows representing domains and range of properties).
Strong points:
(i) the tool has been implemented and is available online
(ii) authors conducted a user study
Weak points:
(i) After reading the paper, I was not convinced that devising an ad-hoc tool is the right choice. Authors should consider implementing their tool as a plugin of some existing mature system like Protege.
(ii) The evaluation has been conducted in an academic environment; however, as the ultimate goal is to foster the creating of RDF data in a real context, it would have been much more interesting to evaluate the tool in context where RDF data are created (even though this may be difficult).

Review 2 (by anonymous reviewer)

(RELEVANCE TO ESWC) The content of the paper is of high relevance to ESWC. The core contribution is a user friendly easy-to-use, zero-configuration, web-based spreadsheet editor to create RDF statements and to some extend ontologies.
(NOVELTY OF THE PROPOSED SOLUTION) The core novelty is the zero configuration part of the proposed solution. 
More traditional tools to help users to create RDF statements provide RDF tailored interfaces (e.g. to manage ontologies or create RDF statements in the form of subject predicate object triples) or spreadsheet like editors require a pre configuration.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) No correctness and completeness analysis is presented.
(EVALUATION OF THE STATE-OF-THE-ART) The approach is evaluated agains one state of the art tool, namely Protege, and against the manual text-editor based creation process. 
Many alternative editors are not considered (e.g Pohl or others from the related work section) and it is not clear why they are left out.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The authors refer to a paper containing an application demo, but do not provide a running and accessible demo or a link to a code repository.
Throughout the paper, design choice are discussed and justified but could be stronger linked back to the motivation.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The setup of the experiment seems to be slightly in favour  of the spreadsheets system. 12/17 participants considered themselves as an (near) expert with spreadsheets while non of the user was an expert in protege or turtle.
The authors even stated that Protege has nice features for adding data ( such as buttons) but users where overwhelmed (which an expert would not be). 
As such, it is somehow to be expected that the spreadsheet solution is in favour. 
Also it is not clear how many participants used or know about the turtle syntax. 
Overall, the experiments cannot be reproduced to the manual setup and the generality is also low given the bias participant group
(OVERALL SCORE) ## Please provide a review according to the following points: Summary of the Paper 
#**Short description of the problem tackled in the paper, main contributions, and results** 
The authors present a tool to create RDF Abox statements and fill existing
(empty) RDF knowledge bases in an easy and intuitive manner. The proposed
solution uses a spreadsheet interface which con be used by non-RDF experts. 
One design choice to is to have one class schema per table and one entity per row mapping in contrast to highly configurable data import set-ups. 
While the zero-configuration approach is very appealing at first, the overall description still leaves many questions open. 
It is entirely unclear how and where the RDF statements are stored. It seems that the tool generates general URIs, but what about namespaces, linking, reusing existing identifiers (e.g. from DBPedia) or integrating already existing domain models? 
Overall, the paper describes a nice application but should provide more details about the architecture, background algorithms and general potential to be used in combination with existing RDF stores, knowledge graphs/bases. 
In addition, the evaluation seems too tailored. The authors should either rethink the presentation and argumentation line of the results or select a more diverse group of participants and maybe even more tools to compare to. 
## Strong Points (SPs) 
#** Enumerate and explain at least three Strong Points of this work**
* intuitive interface and clear design choice (e.g. fixed schema)
* automatically inferring domain and range statements and autocompletion of resource labels
* generally a very nice addition to the RDF editors, especially for non-experts
* comprehensive related work section
## Weak Points (WPs) 
# ** Enumerate and explain at least three Weak Points of this work**
* Unfortunately, the evaluation seems to be tailored to support the authors claim by the choice of participants. The main expertise of the participants lies in spreadsheet editors. It is also unclear how many participants created RDF using text editors.
* It is not clear how background knowledge, other ontologies and already existing data can and will be used with the editor. The authors should provide more details and insights about the architecture
* The authors do not position and motivate their paper strongly enough. While the design choices are clear and motivated, the evaluation seems to be slightly disconnected and might lead to the wrong conclusions. 
## Questions to the Authors (QAs) 
#** Enumerate the questions to be answered by the authors during the rebuttal process**
Q1) Why are other editors not considered in the evaluation? 
Q2) What is the underlying system and architecture? How are statements stored, how are additional ontologies and existing statements used?
Q3) Is the tool publicly available? Who can use/download/adapt it? Is the tool already used in a company/organisation/etc.

Review 3 (by Sören Auer)

(RELEVANCE TO ESWC) Intuitive and efficient user interfaces for RDF curation are very relevant for ESWC.
(NOVELTY OF THE PROPOSED SOLUTION) Unfortunately, the novelty of the submission is not very high. The authors cite a number of related work some of which comprises similar functionality. For example, one of the first data wikis published in ISWC 2006 already comprised a tabular data entry method (cf Fig. 4 on page 742):
OntoWiki–a tool for social, semantic collaboration
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The approach is reasonable although quite simplistic.
(EVALUATION OF THE STATE-OF-THE-ART) Related work is sufficiently discussed, a tabular comparison and analyses of properties would have been a good addition.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The properties of a spreadsheet like authoring method could have been more systematically described and analysed. For example, formulas, formatting (e.g. to indicate different types), autofilling, data linking could have been developed and discussed.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The submission is accompanied with a Web demo, demo video and a comprehensive use study was performed.
(OVERALL SCORE) Summary of the Paper
This submission presents an approach for spreadsheet like authoring of semantic (RDF) data. The paper is well written, structured and illustrated, but partially a bit elaborate - many things could have been presented in a more compact and concise way to make room for other interesting discussions.
Strong Points (SPs)
* intuitive user interface
* well developed experimental study
* relevant problem and reproducible implementation
Weak Points (WPs)
* not very novel approach
* technical depth and breath is relatively limited, there are many interesting extensions which could have been systematically integrated and discussed - e.g. formulas, formatting, autofilling, data linking 
* elaborate presentation

Review 4 (by Irene Celino)

(RELEVANCE TO ESWC) The topic of semantic web tool usability is very important and usually quite neglected, so the paper addresses a relevant aspect.
(NOVELTY OF THE PROPOSED SOLUTION) To my best knowledge it is not the first spreadsheet-inspired editor.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The presented tool has a number of interesting features, but its possible limitations and drawbacks are not clearly illustrated. For example, since it seems there is some automatic linking feature based on the reuse of the same resource (with the same label), how to address the case of different resources with the same label (say, Cambridge in UK vs. Cambridge Massachusetts)? In other words, the tool seems to be partially immature in terms of functionalities.
Moreover, also the evaluation presented in the paper is quite limited (as explained below).
(EVALUATION OF THE STATE-OF-THE-ART) The only related tools that may be missing are the wiki-based editors (Semantic Media Wiki and the like).
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) As said above, the missing features and the potential pitfalls of the tool are not clearly discussed in the paper. I'd suggest the authors to honestly add those to their description. A demo of the tool however is available online which contributes to the demonstration.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) While the evaluation setup is clearly explained and overall reasonable (apart from the Turtle test), I think that with the same configuration and user "exercise" the authors could have collected and discussed several other aspects thar are missing (details below).
(OVERALL SCORE) I generally liked the tool presented in the paper and I think that user-based evaluation is a topic of paramount importance for the semantic web community. Nonetheless, I found that the presented piece of research is somehow an early result that would benefit from a better and more extended evaluation. Hereafter some of the things I think could be improved.
I found the Turtle test quite out-of-scope here. I (am supposed to) know the Turtle syntax, yet writing it by hand in a text editor is of course very error-prone; so, even if after an explanation of the syntax, I'm not surprised at all that the users' solution couldn't be parsed in most cases. Maybe a different setup could have been adopted to achieve a Turtle-like result (e.g. something like NTriple without the namespaces, with one "triple" per line) without the need to comply with a language syntax specification. Also, if the authors will repeat such experiment in the future, they could also compare with a tool like Semantic Media Wiki.
There is no comparison with the "ideal solution": how many of the users with which tool achieved a solution which is comparable to the ideal one? What part of the ideal solution was missing for those who did not model the given information completely? Is the lack of "completeness" of the users' solution correlated to the users' intrinsic "modelling" ability (e.g. compared to the use of an Excel spreadsheet or a ER model)? I suspect that the results may be influenced by that. 
Also, with the data collected, the authors could have tried to test if there was any correlation between the declared ability to use the tools and the actual time spent or completeness/coverage of the achieved result (i.e. are users good at evaluating their own abilities?).
Why was the UEQ submitted only after the use of RSE and not of the other two cases? It would have been interesting to compare those; the scores in figure 4 are not very useful without a term of comparison. Why was the UEQ chosen in the first place? I suspect it is not the most suitable questionnaire for the case at hand; I would have adopted the SUS score for example, which also has some "reference values" to compare to.
I also found quite odd the section about "meaningful triple statements", in that the authors do not discuss the actual meaning/semantics of the achieved results (again, how did the user perform w.r.t. the "ideal modelling"?). The metrics adopted in table 2 are not very much related to the "meaningfulness" to my best understanding; still, they could have been used to compare users' solution with a reference/ideal solution for the given modelling task, otherwise they are again numbers without context.
A final comment on the "data entry" scope of the presented work: while it is true that spreadsheets are very useful to enter instances (ABox), it seems to me that the RSE was employed to test the modelling of both TBox and ABox, since the users were asked also to create the "sheets" related to the classes, thus in the end they presented an evaluation of an ontology editor rather that a pure instance-level data entry tool. This is not bad per se, on the contrary, I believe that what we need most is usable ontology editors; when there is the need to insert instances, usually either they are inserted in the same editor (when there are a few instances) or they are more efficiently "translated" from any existing format into RDF via mapping (when there are a lot of instances).

Metareview by Maribel Acosta

This work proposes a web-based spreadsheet tool to generate RDF statements. The paper describes the basic functionalities of the tool as well as extensions to handle specificities of RDF. The authors conducted a user study and compared their approach with the ontology editor tool Protege. 
The reviewers agreed that the problem tackled in this work is relevant to the Semantic Web. Nonetheless, as pointed out by the reviewers, the novelty of this work is compromised as previous work has proposed similar solutions. In addition, the reviewers identified further major issues in this paper: the research contributions of this work remain unclear, certain settings in the experimental study hinder the generality of the empirical results, and the main limitations or pitfalls of the tool are not sufficiently discussed. Due to these issues, the paper cannot be accepted for publication in the conference.

Share on

Leave a Reply

Your email address will not be published. Required fields are marked *