Author(s): Mariano Rico, Rizkallah Touma, Anna Queralt, María S.Pérez
Abstract: Data prefetching is a standard technique used to accelerate the access to data stores. In the context of SPARQL endpoints, previous approaches have been based on two main techniques: (1) query augmentation and (2) detection of recurrent patterns in queries. In this paper we present a novel approach for data prefetching in SPARQL endpoints by using machine learning methods. Our approach is based on separating the structure of the SPARQL queries from the retrieved data and measuring two independent types of similarity between queries: structural similarity and content similarity. We then apply a machine learning algorithm to detect recurring patterns in previous queries and predict the next query’s structure and triple patterns. We tested our approach with real-world query logs from the Spanish DBpedia and the results show that this approach predicts the next query with a precision above 90%. We also show that, by caching the predicted queries, we can achieve a higher cache hit rate than previous approaches.
Keywords: Prefetching; Linked Data; Semantic Web; SPARQL Endpoint; Query type; Q-type; Triple pattern