Paper 17 (In-Use track)

Comparing Keyword-based Query Processing over RDF Datasets and Relational Databases

Author(s): Yenier T. Izquierdo, Grettel M. Garcia, Elisa S. Menendez, Marco A. Casanova, Frederic Dartayre, Carlos H. Levy

Full text: submitted version

Abstract: This paper compares keyword-based query processing in two environments: RDF datasets with schemas and relational databases. The comparison is based on a tool that first translates a keyword-based query into an abstract query, and then compiles the abstract query into a SPARQL or a SQL query such that each result of the SPARQL or SQL query is an answer for the keyword-based query. The tool explores the schema to avoid user intervention during the translation process. The paper includes extensive experiments to compare keyword-based query processing in the two environments, using a full version of IMDb – The Internet Movies Database, and the Mondial database.

Keywords: Keyword search; SQL; SPARQL; Relational Model; RDF

Decision: reject

Review 1 (by Eva Blomqvist)

This paper attempts to evaluate keyword-based query processing over both RDF graphs and relational datasets. It extends an earlier benchmark, and uses a tool developed by the authors to do the keyword-to-structured-query-translation process. 
The paper is interesting, although I am unsure if all the formal definitions are actually necessary in an evaluation paper, since they are not referenced in the result discussion later on. Also, it is not clear whether the evaluation setup is fair and entirely appropriate - what is actually to be evaluated here? What are the research questions? Is this an evaluation of the underlying stores' implementations of their query engines? Or of the tool proposed by the authors? Judging from the discussion at the end, mostly the underlying stores are evaluated, since this is where a difference can be detected, but shouldn't then a larger set of stores have been included? To make sure results are not only due to these specific implementations?
However, my main concern with this paper is not actually the evaluation, it is that it is not an in-use paper. Rather, it would fit better in the benchmarking and empirical evaluation track (one of this year's special tracks at ESWC). To be an in-use paper it would have to have a much more clear connection to some specific real-world problem and/or use case, and it would have to show usage ("outside the lab") in that specific scenario, e.g., experiments with actual users expressing keyword queries, rather than a benchmarking dataset. 
In the call for in-use papers it is written that they should be evaluated on the extent to which they show a measurable IMPACT of semantic technologies and address real-world problems. Although the results presented in the paper are for sure measurable things, they are not measures of impact, but rather of efficiency, i.e. rather a technical evaluation than an evaluation of impact in the real world. Complemented by a discussion of impact in a real-world use case, with actual users, that could have made a nice in-use contribution, but that is not at all the paper's focus at the moment. The paper does not even present any real-world use case showing when this kind of keyword search over different kinds of databases is needed, nor any use case where their results could be useful, e.g., perhaps when selecting between RDF or relational representation of your company data perhaps? 
I should also add that I have mainly evaluated the paper as an in-use paper, and I do not have enough competence in the query processing field to actually assess the formal details of the approach. Hence, my focus has been on the evaluation, and the lack of description of any "in-use" aspects.

Review 2 (by anonymous reviewer)

This paper presents an compares the performance of keyword-based query processing over RDF and relational data. As the authors point out themselves, this paper extends their previous work on keyword-based querying over RDF datasets with support of relational databases. 
In general, I do appreciate when authors are precise and provide formalisms for concepts and key terms that are relevant for the paper. But Section 3 introduces a very high number of formalisms that are actually not used later on in the paper. So, this section adds an unnecessary level of complexity that makes it more difficult to read the paper without providing additional insights. 
Section 3 introduces a similarity function (match) and a similarity threshold and states "We leave match and μ unspecified at this point.". I could not find precise information on which similarity function and threshold were actually used in the experiments.
Whereas the basics are well formalized, the algorithms are not, i.e., there is mostly only textual description and examples for the described algorithms. But it would be interesting and necessary to actually read about how they work in detail. 
In particular, I am wondering how the keyword matches are computed (Section 4.1, paragraph "Computing keyword matches"). Does the implementation use some kind of indexes?
How does the system handle ambiguous queries, e.g., a single keyword "Paris", which can actually refer to a city or a name?
Section 4.1 mentions "using a small number of equi-joins". What exactly does that mean in the context of RDF datasets?
I am missing an early discussion/definition of what the result of a keyword-based query should be. Section 4.2 finally shows some examples but still does not provide any general answer (in SPARQL: how many variables, labels over URIs - but some datasets might not use rdfs:label to provide more information, etc.).
Why are the experiments run on two different machines (even with different operating systems) - Mondial on Windows and IMDb on macOS? Hence, the times for these two datasets are not comparable. 
Section 5.3 mentions that keywords were surrounded with quotes with the text highlighting the difference between connecting keywords via AND and OR. How exactly were the queries altered in this way? How many queries are affected? How many keywords were combined? 
The section ends with "We note that the use of quotes only improved the query build time, and did not change the final results. Section 5.4 will again detail these points." However, I did not find details on queries involving quotes in Section 5.4.
In summary, this paper features a nice comparison between running keyword-based over RDF and relational data. But as explained above, there are many open questions and the novelty over the authors' previous work appears to be very limited.
-- after having read the authors' response --
I thank the reviewers for their response. Many of my concerns were addressed, some not sufficiently.
For instance, my question regarding the equi-join was not answered. There might, of course, not have been enough room for it within the given contraints regarding the length of the rebuttal text.
More importantly, I do appreciate that the authors plan to rerun the experiments on a windows machine and measure precision and recall. 
Regarding my question on how exactly the queries were altered when combing keywords... The authors provide an explanation of the effect of combining keywords. Yet, my question was about how this is done in a formal way, e.g., if a query has three keywords are always all combined or only a subset. 
The rebuttal does not provide information on how many queries were affected.
In addition, I also agree with the concern raised by the other reviewers that this paper does not really seem to be a good mach for the In-Use track.

Review 3 (by anonymous reviewer)

The paper compares keyword-based search for RDF datasets and Relational databases. The authors propose an approach in which they propose a meta-data model to represent the RDF and relational schemas and an abstract language to represent the Keyword Search Query that is later compiled to either an SQL or SPARQL query.
The paper is fairly well written, but it has important problems: first, there is no formal presentation of the algorithms that translate an abstract query into a SPARQL and/or SQL one. In addition, there is no proof that this translation is correct. The authors simply provide an example of such a translation that is not enough to my opinion. Second, the authors consider simple keyword queries, with no negation. The kind of keyword queries that the authors consider is not clearly defined, this should have been specified already from the introduction. Last but not least, the experiments must measure, precision and recall and these are standard metrics for keyword based searches. The query build time and the query processing time is really not of significance if the queries return results that are not really the results that the keyword query should return. 
I strongly suggest that the authors provide such experiments otherwise I do not think that there is any real contribution in this paper.

Review 4 (by anonymous reviewer)

This paper presents the QUIOW tool. QUIOW is a tool that supports keyword-based query processing over RDF datasets and relational databases.
I admit that while reading this paper I got confused about its purpose. On the one hand, the title talks about comparing RDF and database keyword-based query processing while the tool which is presented in the paper does support both of them. There is no thorough comparison of the two, as it would have been expected based on the title, as there is no concrete description of why would one tool support both RDF and relational databases which leads also to my next comment.
I miss the motivation of the paper. Namely why do we want such a comparison? e.g. to decide if we want to go for the one or the other? To combine their best cases? To prove that one does it better? And why such a tool? When such a tool is useful? I would expect that to be clear already in the beginning of the paper, ideally accompanied by one sentence verifying that the evaluation says that it's indeed like this or not. But unfortunately that is nowhere explained till the end of the paper.
The Related Work section is well written overall, but fails to point the differences and similarities between the approaches used for databases and for RDF. Then the QUIOW tool supports a database approach which is combined and compared to an RDF approach without being motivated why these particular approaches were chosen. From the related work it seems that Steiner trees were not used in the past for RDF schemas. What is the reasoning of choosing it? Moreover,  it is not specified if the QUIOW tool is schema or graph-based.
However, my major concern is related to the paper's submission to the In Use track. The structure of the paper follows a typical Research Track paper rather than an In Use Track paper. It presents a methodology and how it was implemented to a certain tool. The focus is on the methodology rather than on the tool, while it never becomes clear how this paper reflects a real-life set up. I would consider the "Comparison of semantic technologies with alternative or conventional approaches" theme from the call as the most relevant match, but the comparison is not extensive while there is a tool presented afterall.

Review 5 (by Anna Tordai)

This is a metareview for the paper that summarizes the opinions of the individual reviewers.
The paper presents and evaluates a tool for keyword query processing over RDF data. The reviewers point out that the description of the approach lacks detail, leaving a reader unable to fully grasp what the approach does. Also, the reviewers express concerns regarding the evaluation. There are no clear descriptions of which types of keyword queries are included in the evaluation, and only simple cases are presented. Moreover, the chosen evaluation metrics are not the most relevant ones. Because the system is not deployed in real life with real users, and there is no clear industrial use case, nor impact analysis, the reviewers see this paper as unfit for the In-Use track.
Laura Hollink & Anna Tordai

Share on

Leave a Reply

Your email address will not be published. Required fields are marked *