HOBBIT- A Platform for Benchmarking Big Linked Data
Author(s): Michael Röder, Axel-Cyrille Ngonga Ngomo
Full text: submitted version
Abstract: Linked Data is being increasingly adopted in the new data economy. As a corollary comes the development of many solutions that aim to support the booming number of requirements and requests for Linked Data at scale. This plethora of solutions however also leads to the growing need for objective means that facilitate the selection of adequate solutions for particular use cases. We hence present HOBBIT, a novel distributed benchmarking platform designed for the unified execution of benchmarks for distributed solutions that address the lifecycle of (Big) Linked Data. HOBBIT is based on the FAIR principles and is the first benchmarking platform able to scale up to benchmarking real-world scenarios for Big Linked Data solutions. The platform has already been used in eight benchmarking challenges. We give an overview of the results achieved duringthese challenges and point to some of the novel insights that were gained from the results of the platform. HOBBIT version 1.0.14 is open-source and available at http://github.com/hobbit-project.
Keywords: Benchmarking platform; Big Linked Data; Benchmarking distributed systems
Review 1 (by Pavel Shvaiko)
(RELEVANCE TO ESWC) The submission describes the HOBBIT platform for benchmarking steps of the linked data lifecycle. It focuses on the architecture of the platform and results of the first evaluation campaigns. The problem addressed by the paper is relevant and is worth further investigations. (NOVELTY OF THE PROPOSED SOLUTION) Related work is adequate, though the novelty of the work appears to be incremental. Also the link/compatibility with the OAEI campaigns has to be clarified. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Adding a summative table that shows how the requirements (section 2) have been realized within the platform, would strengthen the technical depth and the presentation. For example, Q4 – “robustness” is weakly addressed. Architecture components are described in detail. It would be also good to have implementation details, e.g., code break-down per component. How exactly the FAIR principles have been preserved? How exactly the HOBBIT platform covers all the steps of the linked data lifecycle, especially the ones beyond Section4? (EVALUATION OF THE STATE-OF-THE-ART) Two evaluation use cases were provided. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Section 5 provides a discussion on the platform applications. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The platform is available as open-source. Experiments are reproducible. (OVERALL SCORE) Given the comments above, in my view, this is a borderline paper, though i would lean to weak accept due to its practical significance.
Review 2 (by Riccardo Tommasini)
(RELEVANCE TO ESWC) The article presents the HOBBIT platform, a distributed infrastructure for benchmarking Big Linked Data approaches in the cloud. I am aware of the work thanks to the DEBS challenge 2017 and I believe it the semantic web community can benefit from it. (NOVELTY OF THE PROPOSED SOLUTION) The HOBBIT platform attempt to bring benchmarking to a higher level both in terms of foundational and empirical research. However, I am not convinced to provide the full mark here, because of other flaws that the paper presents and saved me to fully appreciate the main contributions. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The work presents the HOBBIT Platform that is designed and build accordingly to a requirement analysis. This would be just fine, however, the presented requirements are not elaborated enough. I will comment again on this to explain exactly what I did not like. (EVALUATION OF THE STATE-OF-THE-ART) The related work is presented at the beginning of the paper, making hard to compare it with the proposed approach. Indeed, the authors are forced to point forwards to their requirement analysis, without going to much in details. A clear understanding how existing solutions fail to fulfil the presented requirement is necessary to correctly position the work in the state of the art. Comparable solutions exist in the big data community and should be taken into account, e.g.,  Difallah et al. OLTP-bench: An extensible testbed for benchmarking relational databases  Ghazal et al. BigBench: towards an industry standard benchmark for big data analytics In the context of stream processing, which is my area of expertise, I believe similar works have been carried on in the recent years. Although I believe HOBBIT has more ambitious goals, the lesson we learned in benchmarking stream processing systems should be considered as the velocity aspect is one of the four Big Data Challenges.  Boden et al. Benchmarking data flow systems for scalable machine learning (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The paper immediately provides a requirement analysis that is then used in the following sections to support the design choices. My first concern regards the requirement analysis which, IMO, could be a further contribution of the work, but is not enough elaborated. Indeed, the section does not sufficiently explain the intended message. The authors claim the requirement are collected by  using a survey that involved experts. However, going through  one can observe a lot of details left out from the discussion. I understand the difficulty of ruling a crowd-based requirement analysis, and I am not asking that, however, the current requirement analysis is not mature enough; it lacks a systematic formulation and a minimal explanation that would make it general and valuable in future works as well. Furthermore, the distinction between Functional and Qualitative requirements (Non-Functional?) is not convincing. Some of the Qualitative Ones are not sufficiently motivated, and I was not able to find in  adequate justification either. Finally, some very important requirements are missing, e.g., reproducibility, repeatability, definition of baselines, soundness and completeness of the KPIs. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The authors "evaluated" the HOBBIT platform in two ways, two synthetic use-cases and they refer to the successful application of the platform in some challenges (among which the DEBS 2017). The first synthetic use-case regards a triple store benchmark. The authors reproduce  and show how it is possible to run HOBBIT platform locally and on the cloud, scaling the size of the experiment. The goal of this use-case was showing that Hobbit can be used for local evaluations as well as large-scale one. The second synthetic use-case is a Knowledge Extraction benchmark. The authors reproduce  and benchmark a plethora of tools. The authors wanted to measure the "scalability" and the accuracy of the tools in a way that was not possible before. The reproducibility of these experiments is granted by the HOBBIT platform by design since it relies on a containerized implementation. However, the presented use-cases do not completely show what HOBBIT can do that was not possible before. For instance, it would be useful to have an idea of the limits of existing benchmarks in terms of a number of queries. What is the max? Regarding the presented application, I think HOBBIT is doing a good work in this context, and I considered a valuable point to include in the paper. (OVERALL SCORE) The paper presents the HOBBIT platform, a scalable infrastructure for big linked data benchmarking. The authors present the related work for benchmarking linked data, the provide a requirement analysis and present the architecture of the HOBBIT platform showing how it fulfils the presented requirements. **Strong Points** The paper presents a relevant resource for the semantic web community the platform follows the FAIR principles and exploit relevant technologies for empirical research the platform was already used in many relevant challenges **Weak Points** the paper structure requires a lot of improvement the requirement analysis is not mature enough and does not exploit all the content referred in  there is an unclear concept of scalability to furtherly discuss authors should take into account also big data benchmarking approaches
Review 3 (by Christian Dirschl)
(RELEVANCE TO ESWC) This paper deals with the HOBBIT platform. The whole effort is a dedicated contribution to the semantic web community with a novel benchmarking platform for Big Linked Data Solutions. In that sense, it addresses the benchmarking track. On the other hand, this track is more dedicated towards benchmarking of datasets and not the Technology enabling that. On the other hand, since this is a novel solution, there is probably no specific call for this Kind of platform. So ESWC is definitely the right spot to Show it. (NOVELTY OF THE PROPOSED SOLUTION) This is the first benchmarking platform of its kind. So this is indeed novel work here. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The paper describes in a classical form all parts of a platform solution, starting from core requirements, via Technology modules towards process descriptions and Evaluation. This is very clear and transparent. (EVALUATION OF THE STATE-OF-THE-ART) Chapter 1 describes this in Detail. Since the authors are well connected in the community, the overview is sufficiently valid. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The Evaluation is more on the platform and not on a specific dataset. So Evaluation is covering how well the platform works as a Big Linked Data platform, so e.g. to what extent it scales in real Environments. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The platform is open source and available on github. (OVERALL SCORE) When accepting that this paper is about benchmarking a benchmarking platform, this contribution should be discussed at ESWC. Strong Points: - Important contribution to the community - Well described platform and processes - Well structured paper in a good language Week Points: - Topic potentially not covered in this call - Is a detailed description of a platform still Research or more Engineering? - It is not clear to me why the Support of the whole LD LifeCycle is per se helpful. This should be explained.
Review 4 (by Ana Roxin)
(RELEVANCE TO ESWC) The paper presents a solid contribution, namely the Hobbit platform. Still given the description of the considered track ("Benchmarking and Empirical Evaluation"), the paper might be a bit borderline. "Its goal is to provide a place for in-depth experimental studies and benchmarks of significant scale, which have been normally considered as part of the potential submissions in other regular tracks. " (NOVELTY OF THE PROPOSED SOLUTION) The paper at hand presents the Hobbit platform for Big Linked Data benchmarking. The platform is evaluated through two main experiments: triple store benchmarking and knowledge extraction benchmarking. In the first case, authors have executed the benchmark with three different numbers of queries on a "small machine" and two larger numbers of queries on a cluster. In the second one, authors focused on the task of identifiying named entities in a text and linking them to the DBPedia 2015 knowledge base. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) After reading Section 1, one expects to find "specific requirements for benchmarking BLD platforms" as deduced following the supposed analysis of existing LD benchmarking platforms. Instead the reader finds the requirements for the Hobbit platform as extracted following a survey. The so-listed requirements are mainly supported by the usage of a container architecture for the Hobitt platform meaning that its different components are implemented as independent containers. Hidden in a footnote, the reader discovers which choices were made by the authors for the containers (e.g. Docker) and for the message bus (RabbitMQ). The Hobbit platform is described in Section 3 namely in terms of components and architecture supporting the functionnal and qualitative requirements previously listed. (EVALUATION OF THE STATE-OF-THE-ART) The "Related Work" section is quite extensive and delivers a complete view of available benchmarking platforms for Linked Data/RDF-based systems. Still, this section ends with an "enumeration" of the benefits or the advantages of the Hobbit platform, without really having justified those. Moreover, almost the same advantages are cited when presenting the Hobbit platform in the Introduction. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Authors should clearly define how the BLD lifecycle is different from the "traditional" LD lifecycle, and what are the new challenges such lifecycle raises (if there are some; if not, then there is no use of mentionning BLD lifecycle, just LD lifecycle is enough). Affirmations such as: "The HOBBIT platform is the first benchmarking framework which supports all steps of the BLD lifecycle which can be benchmarked automatically." should be clearly justified. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) While the Hobbit platform seems to have reached a good level of popularity among researchers from the (Big) Linked Data domain (as mentionned in Section 5), authors do not provide all elements required for submissions in this ESWC18 track, notably in terms of reproductibility. As the contribution of the paper is the overall benchmarking platform Hobbit, the experiments conducted and described in the paper can hardly be implemented outside the Hobbit platform. Or if they can, authors should discuss experiment settings in more detail (as mentionned in the CFP "experimental settings will have to be extensively described so that the results could be independently reproduced, so that counter-experiments could be designed and subsequent work is enabled to improve over the presented results.") The application range of the benchmarking platform should also be studied and detailed. While authors mention that the Hobbit platform has been conceived to address the challenges risen by Big Linked Data (notably by allowing to increase the load), this concept ("Big Linked Data") has not been formally defined in the article, nor have been enumerated its differentiating charateristics. (OVERALL SCORE) SPs: - Good description of the platform components and architecture WPs: - Experiments conducted could've been further specified - Underlying concepts (e.g. Big Linked Data) should've been formally defined QAs: - What special charateristics of BLD are not addressed by existing benchmarking platforms mentionned in Section 1 ? - "Each system to be benchmarked using the HOBBIT platform has to implement the API of the benchmark to be used and the API of the platform" - are those APIs available online ? Or at least the system adapter containers developped for the experiments mentionned in the article at hand ? - Section 4.1 is entitled "Triple store benchmark". What triple store(s) has(have) been used for obtaining the results displayed in Table 1 ? What motivated your choice ?
Metareview by Emanuele Dellavalle
There has been extensive discussion on this paper. It is really a pity that the authors did not write a rebuttal. A possibile follow up discussion cloud have clarified the doubts of one of the reviews. However, the paper was rejected because it presents a benchmark platform more than an empirical evaluation. Therefore, it does not fit into this track. The authors should have sent it to the resource track.