Price Sharing for Streaming Data- A Novel Approach for Funding RDF Stream Processing
Author(s): Tobias Grubenmann, Daniele Dell’Aglio, Abraham Bernstein, Dmitry Moor, Sven Seuken
Full text: preprint
Abstract: RDF Stream Processing (RSP) has proposed solutions to continuously query streams of RDF data. As a result, it is today possible to create complex networks of RSP engines to process streaming data in a distributed and continuous fashion. Indeed, some approaches even allow to distribute the computation across the web. But both producing high-quality data and providing compute power to process it costs money.
The usual approach to financing data on the Web of Data today is that either some sponsor subsidizes it or the consumers are charged. In the stream setting consumers could exploit synergies and, theoretically, share the access and processing fees, should their needs overlap. But what should be the monetary contribution of each consumer when they have varying valuations of the differing outcomes?
In this article, we propose a model for price sharing in the RDF Stream Processing setting. Based on the consumers’ outcome valuations and the pricing of the raw data streams, our algorithm computes utility-maximizing prices different consumers should contribute whilst ensuring that all the participants have no incentive of manipulating the system by providing misinformation about their value, budget, or requested data stream. We show that our algorithm is able to calculate such prices in a reasonable amount of time for up to one thousand simultaneous queries.
Keywords: RDF Streaming Processing; Price Sharing; Equal-Need Sharing
Review 1 (by anonymous reviewer)
(RELEVANCE TO ESWC) The paper discusses a problem related to pricing in the context of rdf stream processing systems. (NOVELTY OF THE PROPOSED SOLUTION) Even though the proposed solution is novel, it is rather straight forward with no significant technical challenges. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The solution is correct, but could be extended to cover situations where condition 1 (section 4.4) is not satisfied. (EVALUATION OF THE STATE-OF-THE-ART) The discussion is brief, but adequate. There is no evaluation of approaches other than the proposed one. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The discussion is easy to follow. Though, some aspects of the approach (eg, when condition 1 is not satisfied) are left for future work. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The evaluation of the proposed approach is not complete, and does not help to deeply understand the behavior of the proposed algorithm. It is not clear how the time limit parameter for each one of the queries is set. It is also not clear how varying the value, budget and time limit parameters will affect the performance of the proposed approach. Even though experiments with synthetic datasets are useful, it would be nice to report results with real datasets (at least one), as well. (OVERALL SCORE) This paper deals with the problem of price sharing for streaming rdf systems. This is an interesting problem, but the paper does not make significant technical contributions. Presenting a solution that would also work for cases where the consumer gets value for partial streams (condition 1) would make the paper much stronger, since these cases appear often in practice. SP1. Interesting problem. SP2. Simple and elegant solution. SP3. Easy to follow text. WP1. Solution does not cover the important case of partial streams. WP2. Non-conclusive experimental evaluation. Update: My most important point on not covering the case of partial streams has not been addressed, and this means that the contribution of the paper is limited.
Review 2 (by anonymous reviewer)
(RELEVANCE TO ESWC) The work presents a price sharing model for streaming data. Though, RDF stream processing is mentioned in the title of the paper, there is nothing specific to RDF or semantic data. Thus, the paper is only weekly relevant to ESWC. (NOVELTY OF THE PROPOSED SOLUTION) Though, price sharing is not a novel idea in economics, the application of this approach to stream processing in a cloud environment is an interesting idea. Though, some assumptions of this paper are somehow questionable, the approach is basically promising. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) An algorithm for cost sharing (or better allocation) is presented. However, this problem seems rather to be an optimization problem with constraints (total budget). Furthermore, the algorithm isn't well presented: e.g. S should be better the set of queries but not indexes and what happens to queries if the budget is exhausted? (EVALUATION OF THE STATE-OF-THE-ART) The authors discuss briefly some cost sharing approaches. This should be extended and consider also cloud pricing models. In contrast, the discussion of RDF stream processing is actually not really necessary for this paper. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) See comments below. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The evaluation described in the paper is just a run of the price sharing algorithm without any connection to a real system or deployment. However, details of the synthetic data (apart from "randomly generated") are not given which limits the reproducibility. (OVERALL SCORE) The paper presents a model and algorithm for price sharing in (cloud-based) stream processing. The basic assumptions are that users have to pay for running operators and accessing stream sources for a certain time. The main idea is that users specify their expected value (utility) and the budget they are willing to spent. Based on this information a cost distribution is calculated. In principle, this represents an interesting and promising idea. However, there are some weaknesses. strong points: S1: Cost/price sharing is an interesting idea for cloud-based stream processing and the paper is among the first to try to address this problem with an economic approach. S2: The price model and the requirements are well described and motivated. weak points: W1: The assumptions underlying this approach are questionable. This holds both for the requirements (partially R2, R3) and the model (see details below and in the correctness section). W2: Actually, the approach is based on a method described in . Thus, the main contribution of this work is to apply this model to a stream processing scenario. Furthermore, there is nothing special related to stream processing (apart from longer running queries) or even RDF stream processing. W3: The evaluation is limited to a runtime analysis of the algorithm based on synthetic data. Neither the hypothesis nor the goal of these experiments are really clear. I would argue that calculation time is only a minor issue in this approach. detailed comments: * The price model doesn't reflect typical price models for cloud infrastructures: wouldn't it be more realistic to pay for resource usage and not for operators? Second, in stream processing the arrival rate of tuples is important which is not considered here. There is a big difference in running a query where only 1 tuple per minute arrives vs. a query with thousands of tuples per millisecond. * There are some concerns with the requirements (R3, R2) and the corresponding properties (sect. 4.4): though, the motivation for R2 and R3 are clear, the explanations (what means "misinformation", getting higher utility by outside computation) are unclear. Why is the algorithm "ignorant" to budget and value? These parameters are used in Alg. 1?! For R3: what is wrong with a query where the platform is used to prepare some data and perform the compute-intensive learning step outside? How could this be forbidden by licenses? In summary, the work seems to be in a rather premature state. The authors should revise their assumptions. Furthermore, the paper is probably better suited for a cloud conference than a semantic web venue. After rebuttal: I thank the authors for their response, but my concerns still hold. Therefore,I do not wish to change the scores in my review.
Review 3 (by Pieter Colpaert)
(RELEVANCE TO ESWC) The Semantic Web puts forward tools for decentralizing data tasks, yet this decentralization comes at a cost for which the business model is unclear. This paper puts forward a price sharing algorithms for funding RDF stream processing. It might be one of the highlights of ESWC2018. (NOVELTY OF THE PROPOSED SOLUTION) Exciting work putting forward the first step into research for new data-driven business models. Exactly what the community needed. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Q1: Is it correct that I seem to miss information about how this algorithm was implemented, and on what machines the evaluation was executed? This evaluation, which I deem less important than the intriguing ideas behind the paper, seem to lack implementation and design details. (EVALUATION OF THE STATE-OF-THE-ART) The Linked Data Fragments framework, explained in , was also created from the observation the publishers would be paying too much for hosting a public Web API with too many functionalities, while a much simpler and more cost-efficient interface could be thought off, still offering user agents on the Web good access to the data. Furthermore, it would allow other intermediary agents to perform some actions for other parties. This paper could in fact automate this negotiation of costs. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The paper is well written and has nice examples to illustrate the proposed approach. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Q2: It’s clear that stream processing is a special resident in the Web of Data, as explained in your Introduction, yet can your approach also be applied to classic webby datasets? The research datasets are not available. The algorithm is well described in a code snippet, yet the code itself is not available either. Q3: Could this data become available? (OVERALL SCORE) Based on three economic requirements, an algorithm to allocate runtimes and calculating the total prices is put forward. The requirements are proven to be implemented. In chapter 5, Evaluation, the processing time is evaluated to run the algorithm. Strong points: * Introducing economics in Semantic Web * First step into overlooked issue of business model for Web of Data * Algorithm is built on the basis of clear requirements Weak points: * No research data available * Evaluation is unclear * Not sure why there is a strong focus on stream processing 3 Questions see earlier. Typo: Ontoloty in reference 6 After rebuttal: I do not wish to change the scores on my review. I believe the contribution is indeed limited, as mentioned by other reviewers, but unique and novel in its kind and may lead to interesting discussions.
Review 4 (by anonymous reviewer)
(RELEVANCE TO ESWC) Very relevant (NOVELTY OF THE PROPOSED SOLUTION) Interesting concept which has been tackled before (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The presentation is clear. Evaluation provide a level of validation. (EVALUATION OF THE STATE-OF-THE-ART) Quality of Service approach from stream processing would have a relevance here. At the very least to show a GAP in their efforts. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) An evaluation of the approach is provided with a discussion on the limitations of the approach. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The description of the experiment is OK. No code or datasets released. (OVERALL SCORE) This work tackles the problem of sharing access and processing fees for a set RDF stream processing graphs with overlapping operators and sources among multiple consumers. The paper proposes a model that uses a joint execution plan, which is based on the collected queries of all data consumers. It combines this query execution plan with the outcome valuations, runtime limits and willingness to pay values from all data consumers, the pricing of the raw data streams, and the pricing of the computations to determine a utility-maximizing payment distribution. The approach is evaluated to determine the run-time performance. Strong points: - Equal need cost sharing: Their price sharing algorithm follows an equal need cost sharing method for operators/sources that overlap between multiple queries, where the assigned price share for the whole query can never be higher than the price of running the query in isolation. - Maximized Utility: They allow the user to provide a utility value for running the query for a specific period of time, a runtime limit and a total willingness to pay value. The provided parameters are used by the price sharing algorithm to allocate runtime and charges payments as long as the assigned price is smaller or equal to the consumer's value thus maximizing the consumer’s utility. - Illegitimate Utility Gain: Assigning price shares to queries is ignorant of any value, runtime limits, or budget for consumers, hence the consumers cannot benefit from manipulation by misinformation. Weak points - Multiple users per query: Their algorithm for price distribution calculates the price shares based on the different queries in the model assuming a single user per query which may not be the case in a real world scenario. However, they mention including a special kind of license attached to the streaming result of the query which prohibits the redistribution of the results outside the legal entity (a person or a company) which defeats the concept of contention ratio thus the number of users should be part of the price sharing model. - User Negotiation: The price sharing algorithm charges payments as long as the assigned price is smaller or equal to the consumer's value which is beneficial for maximizing the consumer’s utility. However, if the calculated price share is greater than the consumer value, the algorithm drops the whole query and recalculates the price shares for the remaining queries. (1) Perhaps they should consider some sort of negotiation with the user before not providing the service at all (Future work), and (2) In a model with multiple users per query, the query cannot be dropped. - Scalability: Their evaluation shows that the solution is limited in scalability when tens of thousands of queries are simultaneously involved which they acknowledge in their conclusion section. - Related work: QoS for Stream Processing should be covered. Questions: Have you considered the effect of any common runtime stream processing optimizations e.g. operator replication, operator reordering, etc., operator failures or not meeting consumer QoS constraints on price sharing? After rebuttal: My score remains the same. If accepted I would encourage the authors to acknowledge the limitations highlighted by the reviewers as future research opportunities.
Metareview by Intizar Ali
This paper presents a price sharing model for streaming data which considers cost and resource sharing among multiple data consumers. Pricing models are well addressed for cloud-based and services domain solutions. However, the proposed study is one of the initial efforts to introduce a pricing model for Web of data. A well-established pricing model will be a key to creating a self-sustaining economy for Web of data and there are no doubts that this study can bring very interesting discussion at the ESWC. However, the strong reservations are regarding the novelty of the approach with limited contribution beyond state of the art. Also, one of the reviewers rightly raised a few concerns related to assumptions and model itself, which needs a careful revision. The paper in its current state has limited contribution to be accepted as a full research paper, but a careful revision of the assumptions, pricing model, and its evaluation can certainly have a good impact for the sustainability of Web of data. We strongly encourage authors to submit a revised version of their paper in the upcoming editions of semantic web related conferences.