FERASAT- a Serendipity-fostering Faceted Browser for Linked Data
Author(s): Ali Khalili, Peter van den Besselaar, Klaas Andries de Graaf
Full text: submitted version
Abstract: Accidental knowledge discoveries occur most frequently during capricious and unplanned search and browsing of data. This type of undirected, random and exploratory search and browsing of data results in Serendipity — the art of unsought finding. In our previous work we extracted a set of serendipity-fostering design features for developing intelligent user interfaces on Semantic Web and Linked Data browsing environments. The features facilitate the discovery of interesting and valuable facts in (linked) data which were not initially sought for. In this work, we present an implementation of those features called FERASAT. FERASAT provides an adaptive multigraph-based faceted browsing interface to catalyze serendipity while browsing Linked Data. FERASAT is already in use within the domain of science, technology & innovation (STI) studies to allow researchers who are not familiar with Linked Data technologies to explore heterogeneous interlinked datasets in order to observe and interpret surprising facts from the data relevant to policy and innovation studies. In addition to an analysis of the related work, we describe two STI use cases in the paper and demonstrate how different serendipity design features are addressed in those use cases.
Keywords: serendipity; linked data; intelligent UIs; knowledge discovery; exploration; faceted browsers
Review 1 (by anonymous reviewer)
The problem of universal exploratory browsing of unknown datasets is a very hot topic and the lack of business ready and easy to use implementations represents a serious obstacle to a more wide spread adoption of linked data, especially in user facing applications. The authors tackle the problem with an interesting novel approach that focuses on enabling serendipitous discovery of interesting facts and patterns in the data. The two user stories in the STI domain are interesting and clearly explain how the system is beneficial for the users. However, it would also be interesting to understand what particular aspects the users found valuable in comparison with existing solutions, such as the ones referenced in the related works section. Additionally, I would strongly suggest to provide a live demo of the system during the paper presentation. Finally, as it is already suggested in the Future Works section, it would definitely be valuable to perform a large scale qualitative and quantitative evaluation of the user feedback. Besides life sciences, I’d suggest to also evaluate it in other domains, with a special attentions to those in which the heterogeneity and variety of data tends to be higher (and its quality sometimes lower), like Cultural Heritage, Tourism and Digital Humanities.
Review 2 (by anonymous reviewer)
This paper presents a faceted browser for Linked Data (FERASAT) that aims at enabling serendipity, which is a very useful feature for users who want to explore and familiarize themselves with datasets available as Linked Data. The main strengths of the paper are that the paper seems to be in use and that the paper contains descriptions of first-hand experiences of non-expert users who have worked with the system. In addition, Figure 7 shows a very nice graphical overview of related work. The paper also has some weaknesses that I will comment on in the following. One of them is readability. For instance, whereas the design features are explained in detail in the authors' previous work, there are not many details in this paper, which makes it difficult to read this paper. Terms such as "gulf of execution" and "abduction" are used without explanation - these terms should be explained as these are not standard terms for the targeted audience. Readability further suffers from several grammatical mistakes and awkward formulations. Although the paper motivates the browser for use in the context of Linked Data, the browser does not focus much on exploiting the links between datasets but can be applied to an arbitrary local RDF dataset without links to external sources. The paper mentions several very interesting features but does not describe how they are implemented. For instance, Section 2 mentions that "the system adapts its behavior" but the paper does not really explain in which way the system adapts and how this is implemented. Another example is that the paper mentions that SPARQL queries are generated and that "a user can select multiple resources and ask for the potential correlations between them", but the paper does not sufficiently explain how these non-trivial tasks are implemented. Section 2.2 mentions the support of multiple RDF graphs and the use of federated SPARQL queries; again, it would be interesting to read about how this is implemented (do the graphs/sources have to be known in advance or is it possible to freely traverse the links?). So, in summary I have to conclude that the weaknesses seem to outweigh the strengths. -- after having read the authors' response -- I appreciate the authors' response and acknowledge that some of my concerns could probably be addressed in a revision. However, I have to make the decision on the version that has been submitted, which has several weaknesses including the fact that the technical details are not explained (not even sketched on a high level). Hence, I will keep my original scores. A few examples of awkward formulations (not complete): - browsing a set of linked data which is scattered over multiple knowledge graphs - the first step is to identify properties of interest as semantic links to move forward and backward in the data space - together with the number of resources containing those values - to do surprising observations - discover successful errors in data together with the possible explanations for their occurrence
Review 3 (by anonymous reviewer)
This 15-page paper reports an implementation of the twelve serendipity-fostering design features extracted from a previous study (Khalili et al., 2017). The features (such as: “F10, Allow sharing of surprising observations among multiple users”) are intended to help designers and developers of Semantic Web and Linked Data browsing environments to facilitate discovery and interpretation of new, useful, and interesting facts. The implementation results in a FacEted bRowser And Serendipity cATalyzer (FERASAT). FERASAT provides an adaptive multigraph-based faceted browsing interface to catalyze serendipity while browsing linked data. We have based the evaluation of the paper on the five criteria provided by the In-Use track chairs: 1. Measurable impact of semantic technologies: No measure of the impact of FERASAT is provided. 2. Extent to which the In-use paper addresses real-life problems: The paper addresses real-life problems met by a specific population of users, i.e. social scientists conducting their research in the domain of Science, Technology and Innovation studies. 3. Novelty of the techniques applied in practice: The authors claim that “What distinguishes [their] approach from the [related] work [they mention] is [their] more comprehensive investigation of serendipity design features and their implications on linked data faceted browsing environments for fostering serendipity on LOD”. To corroborate this claim, the authors compared FERASAT with six existing RDF faceted browsers based on the proposed twelve serendipity design features. 4. Inclusion of an evaluation of the technology: An evaluation is provided in the form of a series of five real use cases written by social scientists that experienced browsing data on FERASAT environment while conducting their research. Two use cases are reported in the paper in the form of a first-person narrative: the first one deals with analyzing structural change in Higher Education systems; the second one deals with evaluating research portfolios with regards to current societal challenges. The use cases are annotated with the types of features that they demonstrate. A link toward the three remaining use cases is provided (http://sms.risis.eu/usecases). These use cases respectively deal with: studying links between organizations, studying the localization of innovation in urban areas, studying the relationship between universities’ environment and performance. They are not presented the same way as the use cases reported in the paper, i.e. they are not written in the form of a first-person narrative. FERASAT is said to be actively used. However, there is no mention of how many users do effectively use FERASAT. Note that the authors intend to evaluate the usability of their implementation using a rigorous evaluation framework. Given that a set of social scientists provided experience feedback about their use of FERASAT, one might wonder why at least there was no qualitative feedback about the usability of this system. If the authors have this information, they can add it to their paper. 5. Reflection on the pros and cons of the approach: There is no reflection on the pros and cons of the approach. The use cases only report the pros of the approach. -------- REBUTTAL -------- Thanks to the authors for their response letter. Concerning their response about the suggestions I made to improve the paper, namely: “- We can also provide more details on the current users of FERASAT (i.e. 388 researchers from all over the world), add some qualitative feedback about the usability of the system and also talk about the Cons of the system in the project (e.g. users that were concerned about privacy/access control when they combined data from different datasets in an easy visual way).” I acknowledge the authors’ intention to modify their paper according to the suggestions I made. I find interesting the example they provide to deal with the Cons of the system and expect that it will be developed. I regret however that the authors do not provide examples or precisions concerning both the current users of FERASAT and the qualitative feedback about the usability of the system. This could have allowed getting a clearer idea of how the authors will take these two points into account. Therefore I maintain my score.
Review 4 (by Tomi Kauppinen)
This paper is a very good piece of work. The task of supporting serendipity is fantastic. Although well studied earlier, the work by authors here convincingly provide the whole range for supporting it: tooling (FERASAT), reports from two actual users and a comparison to other faceted browsers for RDF. Minor details: - Fig t caption: " RDF facted" -> "RDF faceted" - please provide more clear visual clue that the user stories are quotations (via indenting text or quotation marks).
Review 5 (by Anna Tordai)
This is a metareview for the paper that summarizes the opinions of the individual reviewers. The strengths of the paper include: the fact that the system is being used by real users, that the system is compared to existing facetted browsers, and that the paper includes first-hand reports of users. The reviewers point out that the paper lacks details on how features are implemented, and apart from the user reports no evaluation is performed. Laura Hollink & Anna Tordai