Paper 20 (Research track)

An Approach to Query Prefetching for SPARQL Endpoints

Author(s): Mariano Rico, Rizkallah Touma, Anna Queralt, María S.Pérez

Full text: submitted version

Abstract: Data prefetching is a standard technique used to accelerate the access to data stores. In the context of SPARQL endpoints, previous approaches have been based on two main techniques: (1) query augmentation and (2) detection of recurrent patterns in queries. In this paper we present a novel approach for data prefetching in SPARQL endpoints by using machine learning methods. Our approach is based on separating the structure of the SPARQL queries from the retrieved data and measuring two independent types of similarity between queries: structural similarity and content similarity. We then apply a machine learning algorithm to detect recurring patterns in previous queries and predict the next query’s structure and triple patterns. We tested our approach with real-world query logs from the Spanish DBpedia and the results show that this approach predicts the next query with a precision above 90%. We also show that, by caching the predicted queries, we can achieve a higher cache hit rate than previous approaches.

Keywords: Prefetching; Linked Data; Semantic Web; SPARQL Endpoint; Query type; Q-type; Triple pattern

Decision: reject

Review 1 (by anonymous reviewer)

(RELEVANCE TO ESWC) The addressed problem is completely relevant to the realm of Semantic Web.
(NOVELTY OF THE PROPOSED SOLUTION) I think the proposed solution is quite novel and elegant, in particular, 
the prediction of queries at two levels: template (q-type) and content 
(triple patterns).
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The solution seems correct.
(EVALUATION OF THE STATE-OF-THE-ART) The state of art seems to be exhaustively covered, however the paper does not compare to state of the art approaches.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The paper is fairly clear and the approach is well elaborated. Nevertheless, some aspects need to be further detailed (and exemplified) in my opinion.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Experimental data is available for download. The authors also provided easy means to reproduce the experiments.
(OVERALL SCORE) Strong points
S1) Experimental data and means to reproduce the experiments are available online.
S2) Legible and well-written
S3) Relevant to the Semantic Web and with a great potential of application. All SPARQL endpoints could benefit from such a smart prefetching and caching.
S4) It does not require large quantities of historical or training data. The experimental evaluation shows that with two previous queries, it is possible to predict the next one with good accuracy (at least in the DBpedia-Es endpoint).
Weak points
W1) Evaluation is performed only on one dataset (Spanish DBpedia). While it is arguable that all DBpedia endpoints will behave alike in terms of workload, we cannot state the same about the endpoints of other datasets.
W2) The paper focuses mainly on query prediction rather than in the prefetching model. The prefetching and caching architecture is sketched, but there are no concrete hints about how to implement it. For example, it is not clear if cache retrieval requires some data processing (see detailed review).
W3) The proposed approach is not compared to the state-of-the-art methods.
W4) The quantitative evaluation in the paper is very good, however I would appreciate a brief qualitative analysis, e.g., how do the easy/difficult, frequent/rare q-types look like? (difficult = those q-types where accuracy is low)
Detailed review and questions
The paper presents a method to prefetch and cache the results of SPARQL queries in order to increase the performance of SPARQL endpoints. The prefetching is achieved by predicting the next SPARQL queries to be run against the endpoint, so that their results are cached and reused. The prediction relies on the previous queries run during a "user session", e.g., all the query requests coming from the same IP address within a time interval. By predicting the upcoming queries and caching their results, we can reduce the endpoint's load and increase performance. The paper sketches the architecture of such prefetching system and details its query prediction algorithm. The prediction relies on standard classification techniques and operates in two phases: in the first phase the method predicts the next query's syntactic structure, i.e., its template (called q-type in the paper), whereas in the second phase the method determines the values for the triple patterns of the query template. The experimental evaluation conducted on the Spanish DBpedia shows quite satisfactory results: it is possible to predict with very high accuracy the next query of a "user session" from the previous queries present in the log.
I believe the paper resorts to an original solution to deal with a relevant problem in SPARQL processing. The paper is also well written and quite understandable. I have only a few observations and questions whose consideration may improve the paper's quality.
Q1) Figure 1: Listings 4 and 5 do not correspond to the serialization of the parse tree in Fig. 2. Why not?
Q2) Examples: Albeit intuitive, the paper does not present a fully satisfactory description of a q-type. Listings 4 and 5 are not particularly illustrative because they simply describe a query with one triple pattern. From that example it seems to me that the q-types of the queries with single triple patterns ?x foaf:mbox ?y and db:Iker_Casillas dbo:team ?y are identical. What other information besides the number of triple patterns is captured in a q-type?
Q3) Section 3.3: The example does not make any reference to Q5', although I understood that in the second reference to Q5 the authors meant Q5'. The access to the prefetched data needs to be better clarified. In the example the modified query actually retrieves the results of the original query for all entities that are like Iker Casillas, that is, the approach is actually prefetching the results for a family of queries. For the next query that asks for the former teams of Pique I need to post-process the results of Q5'. When/How is this done in the pipeline of Figure 1?
Q4) Section 4.4: It would be nice to have some hints about the hit rate per q-type? Shall we expect it to be lower for q-types with more triple patterns?
Experimental evaluation:
Q5) In the architecture in Fig. 1 the cache is updated for every new query that comes. For some workloads this may happen often, plus updating the cache is not necessarily cheap. Imagine we update the cache in batches. How long can the cache offer high hit rate (let us say > 90%) without the need of an update?
Q6) Hit-rate is compared with Lorey et al. and Zhang et al. but not on the same endpoints (English DBpedia). To have a fair comparison the approaches must be evaluated on the same data.
Typos:
- we we have a `cache hit' since the cached results will ... -> we have a `cache hit' since the cached results will ...
- Hogan et al. propose and approach that relies on precomputed similarity... -> Hogan et al. propose an approach that relies on precomputed similarity...
After reading the authors' response:
The clarifications provided by the authors are very well appreciated. The idea proposed in the paper is indeed novel, but the experimental evaluation is still too weak: it lacks performance assessment on other datasets and comparison with other prefetching and caching approaches. It is not possible to extrapolate the performance of a system w.r.t the state of the art merely on the fact that the approach uses machine learning techniques.


Review 2 (by Pascal Molli)

(RELEVANCE TO ESWC) The paper presents a new approach for data prefetching in SPARQL
endpoints. Given the knowledge of past SPARQL queries, the objective
is to predict the next query structure and triple patterns. Prefetching
is a well-known approach in database systems and semantic web, so this
paper is relevant for ESWC.
(NOVELTY OF THE PROPOSED SOLUTION) The proposed approach relies on 2 different classifiers:
one for finding the structure of the next query and the other to find
the triple patterns of BGPs. Authors wrote that this approach
outperform previous approaches with smaller number of cached queries.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Section 3.3 mentions that the proposed approach is a multi-class
classification problem with no reference. Section 3.3 does not
describe clearly the features used for the classifiers. I expected to see
for the motivating example used so far the features considered for
queries. So at the end, the crucial part of the approach is not
properly described.
There is also a question on how the system can evolve when queries are
evolving as it can be the case with DBPedia ?
(EVALUATION OF THE STATE-OF-THE-ART) Section 4.4 is the only section that really compare the proposed
approach with related works. Authors wrote:
"Compared to previous approaches, Lorey et al. reported a cache hit rate of
around 91% on average [7], which is below the range of our approach."
This is true... but not on the same dataset. Lorey used DBPedia 3.6
English edition. IMHO, authors cannot claim that they outperform
related approaches based on that. They have to run their algorithm on
the same data, or they have to re-code the approach of Lorey on their
own data. So at the end, maybe the proposed approach is nice, but
there is no clear evidence of that.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) why the proposed approach should logically outperform previous approaches ?
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Section 4 describes the experimentation. The experimentation looks
reproducible.
(OVERALL SCORE) The paper presents a new approach for data prefetching in SPARQL
endpoints. Given the knowledge of past SPARQL queries, the objective
is to predict the next query structure and triple patterns. Prefetching
is a well-known approach in database systems and semantic web, so this
paper is relevant for ESWC.
Strong points:
- motivations are clear.
- The approach is original and interesting (can work).
Weak point:
- Analysis of weakness of previous approach can be improved.
- Provide a fair comparison with state of art in experimentation.
So to resume, this paper is relevant for ESWC. The proposed approach
is original and makes sense. The evaluation of the state of art lacks
some precisions: why the proposed approach should logically outperform
previous approaches ? The experimentation has not a fair comparison
with previous approaches.


Review 3 (by anonymous reviewer)

(RELEVANCE TO ESWC) The paper is relevant to the venue
(NOVELTY OF THE PROPOSED SOLUTION) Though the authors do not propose a new fundamental solution of the prefetching problem, they present an innovative application of machine learning algorithms to address the problem at hand. This is acceptable.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The proposed approach targets only a very small subsection of the SPARQL 1.1 query fragment. In this too, only the query of similar types are predicted by the proposed approach.
(EVALUATION OF THE STATE-OF-THE-ART) The related work is good, with an exception of one paper, which has been mentioned later in this review
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The evaluation of the approach is rather weak.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The dataset and results are made public by the authors.
(OVERALL SCORE) ** Summary of the Paper **
The article presents a novel approach for performing prefetching of queries over SPARQL endpoints. The proposed approach achieves so by separating the structure of the query from the data being retrieved measuring structural and content-based similarities. The authors then apply machine learning algorithm in order to predict the structure and triple patterns in the next query. The approach is tested using the real world SPARQL query logs of the Spanish DBpedia and claims to achieve over 90% prediction precision using the proposed technique. 
** Strong Points (SPs)  ** 
- SP1 - A very nicely written paper with an apt number of sections and in good and easy to read English. People, these days, tend to overwrite articles adding unnecessary sections and increase their structural complexity. Feels good to read and review articles such as this one.
- SP2 - The proposed approach is innovative in the sense that it applies machine learning (multi-class classification problem) to address the SPARQL query prefetching issue at hand. The authors also include the time interval between the incident queries in the construction of the feature vector, which is nice.
- SP3 - Very good performance reported for the subsect of SPARQL 1.1 query language fragment. The proposed approach is promising over the demonstrated query types, however infinitesimal. It is a promising candidate for future examination and extension. 
Weak Points (WPs) ** Enumerate and explain at least three  Weak Points of this work**
- WP1 - The current article only considers SPARQL “SELECT” queries (i.e. one of the four query forms in SPARQL, a very small subset of SPARQL). Further, the proposed approach does not predict the query rather only a template (structure) which is without the query modifiers.
- WP2 - Proposed approach cannot predict structurally diverse consecutive SPARQL queries, not sufficient technical contribution. 
It seems that the proposed approach can predict only similar types of queries, i.e. the queries that seek to access similar type of information or related information. This limitation has to be addressed, as ideally, a given SPARQL endpoint does not always receive queries that seek related information as the previous queries. Real query log analysis (such as bonifati2017analytical, refer to the related work comments later) show that diversity in the user information needs can vary drastically. In such an occasion the performance of the proposed approach will be questionable.
- WP3 - Presented evaluation of approach is weak. 
The authors need to prove the validity of their approach using other real and synthetic query log datasets (such as DBPSB, BSBM, WATDIV, etc) to sustain their claim. The authors may also consider including the English DBpedia SPARQL query logs for the same. The approach has to be proved (if not formally) empirically over a very large set of diverse (structurally) SPARQL queries. The presented evaluation for validating the proposed approach is comparatively weak at this point for the chosen venue. Also, the proposed approach is not compared (one-to-one) with other existing prefetching approaches, it is therefore not clear whether the achieved 90% above precision is a substantial improvement or not.
** Questions to the Authors (QAs) ** 
I summarize some of the questions which seem crucial to be answered below, The rest can be found in the other Remarks/Comments section (later in this review):
- Q1 - Will the proposed approach work in presence of Blank nodes in the SPARQL query?
- Q2 - How much time is taken by the prediction or the prefetching system to predict a query (say Q`)? For instance, in section 3, the first paragraph it is mentioned that if a prefetched query is encountered the system returns the answer without executing the query, in this case, how much time is spend “deciding” this. What is the trade-off between this “decision” process (Q`) as compared to executing the query on the endpoint (Q)? Moreover, Is the prediction process carried out in an asynchronous way in general or on the fly? I.e. is the prediction model trained apriori or this happens in parallel while the endpoint is receiving queries?
- Q3 - In section 3.2, what happens if the entire triple pattern is different? I.e. all the s, p, o differ, in this case, how do you train the classifier. More specifically, what will be the feature vector representation? Also please show with an example how your feature vector will look like for each of the four queries as illustrated in the running example, in section 3.3.
==========After Rebuttal========
I appreciate the authors' clear responses. The methodology used in this paper is definitely interesting and deserves further investigation. It will be useful if the authors presented examples of the feature vectors produced by the prefetching system. Furthermore, there is no concrete proof of the proposed approach outperforming the state of the art. The authors should also analyze the performance trade-off between the prefetching approach and the baseline in order to present a clear idea of the applicability of their system. Unfortunately, the current version of the draft is not suitable for publication at the venue.


Review 4 (by anonymous reviewer)

(RELEVANCE TO ESWC) The article is relevant to ESWC: optimisation in SPARQL endpoints is an important topic, and methods relying on caching and prefetching may substantially help.
(NOVELTY OF THE PROPOSED SOLUTION) The solution is novel and it addresses a relevant problem.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The study of correctness is developed through experiments. Results are promising, but I have some concerns about the dataset and the role of the user's IP, discussed below.
(EVALUATION OF THE STATE-OF-THE-ART) The related work section gives an overview of the studies developed in the context of the semantic web.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The method is presented with clear examples, and it is demonstrated through experiments over the Spanish version of DBpedia.
Q1 - In Section 4.4, authors state that the proposed approach and [16] can achieve similar results with a different number of cache queries (1000 vs 100). However, if I got correctly, the dimensions of the query answers generated by the authors' approach may be bigger than the one generated by [16]. What is the total dimension of the caches (i.e. number of queries and number of answers per query) in the two methods?
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Reproducibility is discussed in an external resource (https://prefetch.linkeddata.es), which contains the Weka data files used to run the experiments. 
However, to enable reproducibility, other info is needed.  
Q2 - For example, the log files are not available (and not even some instructions on how to retrieve them). It is also not clear how to use the dataset, e.g. how should it be split to generate the training and the test set?
Q3 - To show that the approach generalises, authors should have run the experiments on some other datasets. Why were the experiments not run on one of the USEWOOD datasets? Other studies (e.g., [16]) use such dataset for experiments.
(OVERALL SCORE) Authors propose a method to enhance query prefetching in SPARQL endpoints. The idea is to analyse the query history and to build a set of classifiers to predict information about the next queries, i.e. (i) the type of query and (ii) the triple patterns composing it. The study is validated through experiments over the Spanish version of DBPedia, analysing the different parts of the method and the parameters of the algorithm. 
The problem is relevant, and the approach proposed by the authors go in an interesting direction, exploiting machine learning techniques to tackle it.
Experimental results are promising and show that the proposed approach may work. The analysis of the parameters (e.g. the query length) is well developed and interesting.
The paper is clear, easy to read, and the examples help in understanding the details of the single parts of the approach.
However, the current article is not providing enough information to reproduce the experiments. Similarly, the article offers only a brief overview of the dataset, but some additional analysis would've needed (see Q5, Q6, Q7). 
It is not clear how much data the predicted queries retrieve. In some cases, such queries may retrieve a huge portion of the dataset (e.g. Q'_5 may retrieve the list of all the athletes of every sport, with relative teams). If it is the case, the approach may not work in practice.
Finally, the method should've tested on on a second dataset, which would've been important to discuss the generalisation of the approach. 
In addition to the questions above, I have the following ones:
Q4- Some parts of the process are not properly described: how is the information about the user IP used at runtime? Are the classifiers using it to make the prediction?
Q5 - The dataset contains 22M queries, issued by 2100 users: what is the distribution of the queries over the users? Is it skewed (e.g., few users issue most of the queries)?
Q6 - Which is the distribution of the q-types over the users? How many users always submit queries of the same Q-type?
Q7 - if the distribution of the queries over the users is skewed, and every user submits queries on few Q-types, the method may work because, in this setting, every user always submit the same (few) types of query (changing some value, which the method manages by introducing a variable). What is your opinion about it?
Q8 - Are all the users in both the training and the test set?
Q9 - Do this work consider only queries with one BGP?
Q10 - Why is it important to highlight that no international characters are used (pages 5 and 6)?
**** After rebuttal ****
I would like to thank the authors for the useful information they provided. I increased the score of novelty, but I confirm the other scores. The intuition behind this study is interesting and promising, but it requires some additional effort to achieve an adequate level of maturity. As a suggestion, experiments should:
- include another dataset to show that the approach generalises
- a set of baselines, such as the solution presented in related work 
- analyse the memory, as the number of data that is retrieved from the dataset
I'd like to comment some of the answers:
A1: While our queries might retrieve more data than [16], executing fewer queries saves the overhead of executing several queries that each retrieve less data
I agree on the fact that it reduces the overhead, but it may not reduce the total query latency: if your queries are retrieving much more data than [16], the total time to send the queries and getting back the results can be bigger. This is why I recommend to measure the size of the cache and make also a memory comparison.
A2: The log files are not available because anonymizing the IPs is not trivial. We do provide the training model and explain in the paper that we use the entire set to perform 10-fold cross validation
A3: We used Spanish DBpedia as a first evaluation and did not have time to repeat experiments on another dataset.
Since the data you are using is not available to allow other research groups to reproduce your experiments (from data retrieval to evaluation), it becomes even more important to provide experiments over an available dataset.
A5/A6/A7: Distribution of queries/q-types over IPs
A few IPs launch the majority of queries (distribution is skewed). Most IPs launch queries of very few q-types. This was the initial motivation behind our approach.
I went again through the paper and it can should be more explicity. It would be good to have this description about dataset in the experimental sections.
If your assumption is that the data has usually this shape, can the query predictor be replaced by a memory that stores user/q-type/query template?


Metareview by Maribel Acosta

This work tackles the problem of data prefetching to enhance the performance of SPARQL endpoints. Based on query logs that record the queries executed over endpoints, the proposed solution applies supervised ML techniques to predict upcoming queries. The empirical results indicate that the proposed approach is able to achieve high precision and recall on query logs from the Spanish DBpedia endpoint. 
The reviewers agreed that the problem tackled in this work is relevant to the Semantic Web and the proposed solution is promising. Nonetheless, the reviewers have identified major issues which hinder the acceptance of the paper in its current state. These issues include: the formal properties of the approach have not been demonstrated, the empirical study does not compare the performance of the proposed approach with respect to state-of-the-art solutions, and the generality of the empirical results is compromised as the experiments only consider query logs for one dataset. Due to these issues, the paper cannot be accepted for publication in the conference. We highly encourage the authors to address the reviewers' comments to improve the overall quality of this work.


Share on

Leave a Reply

Your email address will not be published. Required fields are marked *