Paper 72 (Research track)

Processing incoherent open government data- A case-study about Romanian public contracts funded by European Union

Author(s): Bogdan Ghita, Octavian Rinciog, Vlad Posea

Full text: submitted version

Decision: reject

Abstract: Lately, many governments have adopted policies and mechanisms for making open data available to citizens, in order to increase the transparency of state administration and institutions. The usage of these data is hampered by the incorrect, incomplete and incoherent nature of the information.

The purpose of this paper is to summarize the general steps that are needed in order to transform raw open data that contain errors to consistent data. These steps are used to correct the open data published by the Romanian government regarding public contracts funded by European Union, supporting entities interested in using these data.

Keywords: Open Data; semantic web; error correction


Review 1 (by Pavel Shvaiko)


(RELEVANCE TO ESWC) The submission provides a case-study on processing incoherent open government data related to the Romanian contracts funded by the EU. The topic addressed is relevant and is worth further investigations. However, the major problem of the submission is that there no relationship at all about the usage of semantics or semantic technologies in this context, what makes it only marginally relevant to the ESWC scope.

(NOVELTY OF THE PROPOSED SOLUTION) The method proposed is rather straightforward, being mostly an engineering exercise, hence the originality of work is low.

(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The method applies ad hoc rules to handle incoherences.

(EVALUATION OF THE STATE-OF-THE-ART) Related work on open government data and data quality in relation to the semantic technologies is weak.

(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Section 6 discusses a usage scenario, though with no explicit links to the semantic technologies.

(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Experiments could be reproduced. There are no specific statements on "generality" of the study.

(OVERALL SCORE) The submission does not justify the ESWC publication, based on the reasons mentioned above.


Review 2 (by anonymous reviewer)


(RELEVANCE TO ESWC) See detail below.
(OVERALL SCORE) The paper addresses a practical topic - use of open data beyond the SW and wider academic communities, and challenges in data correctness and validity when coming from multiple sources. 
Based on a use case reporting EU funding in Romania, the aim is to provide support for analysis, mainly for prevention of corruption. 
The authors propose data validation for reports submitted on requests for refunds for money spent delivering (previously) approved projects. Validation is based on a set of files describing data expected in recording such cases. 
The proposal addresses the underlying issue at a relatively superficial level - the solution is VERY specific, and as described is not easily reused. As is, even for the same use case, any changes in data properties require a manual fix. The methodology described is actually fine wrt to identifying where issues exist and where solutions may lie - the implementation is where it breaks down.
Also, data considered invalid is simply discarded, no attempt is made to correct it, and importantly, to set up a structure to prevent errors in future data capture. Considering the process of validating could be easily reused to prevent input errors in the first place, it's surprising the authors don't consider this aspect of the problem at all. Esp  as it would also simplify input.
Looking at Berner-Lee's design issues (ref1) and proposal for 5* data, this solution is problematic. Note therefore that wrt novelty and proposed solution the actual score is borderline. But this option does not exist so I had to go down as accept is definitely not the case. For reproducibility/generality I moved up from borderline to weak accept.
Two graphs of the validated data are shown but no discussion carried out to illustrate how they are used to prevent fraud. And yet this is the main purpose of the exercise. 
The paper is classified as Linked Data. Three references address linked data, but the paper doesn't at all. And there is no effort to link the output to any other relevant datasets either, or even encode using relevant ontologies, which would be an alternative way to achieve this. So open data, yes, but not necessarily linked. This in itself doesn't invalidate the proposal but the paper should probably be reclassified.
Overall, I would class the paper as borderline, while the issues wrt to implementation and discussion are not too difficult to fix I doubt there is enough time to do so within the conference review process. Without this option I have to  move down to reject.
"In [4], Futia et all discussed about using errors occurred in Italian procurement documents …" - is "using" an error? - otherwise please expand on how they made use of the errors in the documents.
"by correcting the various errors that may occur and to describe them in a consistent manner.." - what needs description - the DATA - this is what I would assume, but the text says the ERRORS.
"In order to achieve the goal of obtaining clear and consistent data, all these cases should be investigated. …This involves error detection and correction, which if not possible, the affected data must be invalidated."
Isn't this a bit extreme - why invalidate - this would simply reduce the amount of data made open, potentially by a significant amount. Why not simply flag as not validated?
I'm not convinced the example of "Praha" vs "Prague" in S4.5 is a good one - neither is a mistake, it's simply the same city in different languages. So this is inconsistent, yes, but not an error. Which brings me back to clearly missed potential to reuse the solution at the input stage.
Wrt fig1 - how can you tell unique customers - you state earlier they do not have a unique code. And the probability the name/label of each is always correctly entered is near 0.
In fact, further on in the paper you give an example of constantly changing names. As the discussion there also answers this question, please bring this to the first place where you mention the issue rather than making it appear to be unresolved. 
Also, what is the difference between a customer and a beneficiary?
In step 1 - "A document is considered valid if a tuple of <one single document type, one month, and one 4-digit number, between 2009 and 2016> …"
Isn't this a bit short-sighted - I'm reading this in 2018, so already you have more than a year's worth of data that will be automatically invalidated. Isn't the whole point of an exercise like this to develop patterns that check for correctness, rather than using fixed values - going back to the example of "Praha" vs "Prague" - that cause the problem to start with?
"Because values belonging to the same type of logical information can be represented in different formats (eg: numbers as string or as number) we normalized the format across each column." - normalised as what? Simply because this matters, the choice will either help to resolve the problem or create a bigger headache.
Table 1 - the numbers in the text that follow do not completely match those in the table. If reporting also joint totals put these in the table. And please also use consistent number formatting, including a comma separator for 1000s for readability. 
The table and text following have "POC (project_unique_code)" - previously this has been abbreviated PUC.
In step 2 - "from a total of 1,487,508 entries 6183 were discarded, representing a negligible percentage of 0.41%." - if those 6183 contained 25% of the funding or reimbursement, or were the most current, say, they are definitely not negligible. Using percentages this way is valid only if each data point is equivalent to all others. Unless this is the case AND you state so the conclusion is not valid.
"… identify the customers recipients …" who are these recipients? Why are they important?
fig 2 - what is "outsources"? Who is authorising the request? Looking at the graph, reimbursement follows authorisation. Is the former counting from after authorisation is granted or do both start at the same time? What is the graph saying - it is presented without any discussion.
Ditto fig 3 - what conclusions can be drawn based on it?
Please run an auto-grammar check and proof-read - there are a large number of errors that would be identified quite easily.
Place "the" in front of cases such as "European Union" - several cases missing, e.g., "projects are financed by [THE] European Union", "first register a project plan to [THE] European Commission"
On the other hand, "The Figure 1 shows an overview …" - delete "the" at the start of the sentence.


Review 3 (by Axel Polleres)


(RELEVANCE TO ESWC) In general, the description of the datasets and the proposed pipeline seems to be more a technical report on how to solve/engineer the specific task than a research paper, and therefore, in my opinion, is not suitable for the research track of ESWC.
(NOVELTY OF THE PROPOSED SOLUTION) My main point of criticism is that the contribution of the paper, the presented data processing pipeline, is heavily tailored to the specific use case, i.e., the processing of the project funding datasets. 
For instance, you extract the date and type of a single document from the file name, which works fine for your use case but isn’t a generalizable approach. The same holds for any step of your process.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Neither is there an evaluation of the data processing pipeline, nor a discussion of how to evaluate/generalize/apply the solution. 
An online version, source code, demo, etc. would be helpful to test and verify the submission.
(EVALUATION OF THE STATE-OF-THE-ART) Your initial data quality discussion should be based on existing reports and should provide some quantitative insight into the discussed errors. Here are some pointers to existing works on Open Data profiling and processing:
[1] Ermilov, Ivan, Sören Auer, and Claus Stadler. "User-driven semantic mapping of tabular data." Proceedings of the 9th International Conference on Semantic Systems. ACM, 2013.
[2] Ritze, Dominique, et al. "Profiling the potential of web tables for augmenting cross-domain knowledge bases." Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2016.
[3] Mitlöhner, Johann, et al. "Characteristics of open data csv files." Open and Big Data (OBD), International Conference on. IEEE, 2016.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) As mentioned above, no online version, source code, demo, etc. is provided.
There are statistics on the number of detected errors, the invalid values, ... in the processed dataset, however, these are only of descriptive nature and no discussion of potential false positives, misclassifications, etc. is provided.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) See novelty, demonstration & correctness above.
(OVERALL SCORE) The submitted paper describes an ETL pipeline to process Romanian OGD datasets about public contracts funded by the European Union. A total of 59 documents are analysed wrt. representation inconsistencies and errors, and the steps of the data processing pipeline are detailed. Eventually, the paper discusses how the curated dataset is used in the context of fraud detection. (However, I must say that I did not understand how fraud detection is possible based on the presented plots. The author should explain or exemplify this.)
Strong points:
- bring awareness to (at first sight simple) open data quality issues, such as column ordering of datasets, format inconsistencies, .. 
- motivated by real-world use case (the fraud detection application)
Weak points:
- tailored, not generalizable, solution
- missing discussion of novelty and of related/state-of-the-art literature (regarding the data processing but also the data quality)
- missing relevancy to scope of conference


Review 4 (by Judie Attard)


(RELEVANCE TO ESWC) The authors target issues in open data within an open government context with the aim of enhancing use.  This topic is is possibly relevant to the conference (if it focused on Linked Open Data), however the authors fail to apply any Linked Data or semantic web technologies, and the paper does not provide any contribution.
(NOVELTY OF THE PROPOSED SOLUTION) The authors simply provide documentation on the identifications of errors in open government data. They simply describe a number of steps, without providing any context, motivation, approach details, or improvement upon state of the art.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The authors do not propose any solution.
(EVALUATION OF THE STATE-OF-THE-ART) The provided section on state of the art barely grazes the surface. The authors mention only one (!)  publication that has a similar objective to theirs, but fail to provide any discussion on how they compare. Moreover, the authors claim the following: 
"Their authors studied how the usage of this type of information helps both economic growth and confidence in public administration and minimizes corruption in public institutions, increasing transparency [2][7]."
There are also many authors who indicate that benefits of open data cannot yet be proven. The authors cite a Draft paper (2003) and a workshop paper (2012) which are certainly not the most recent or authoritative publications on the topic.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The approach is simply described in a very abstract manner, with no details about the scientific approach or the research contributions.
(OVERALL SCORE) This paper seems to only provide an abstract overview of notes, and therefore is not worthy of being published. There is no concrete contribution, no comprehensive description of the approach (including motivation, context, etc), and not even a proper state of the art discussion. Moreover, no use of linked data or semantic web technologies.


Metareview by Hala Skaf


This submission presents a case-study on processing incoherent open government data related to the Romanian contracts funded by the EU. 
The submission simply provides documentation of errors in open government data. 
Reviewers agree that the submission is not relevant to ESWC conference. The authors do not provide a rebuttal.


Paper 5 (Research track)

Price Sharing for Streaming Data- A Novel Approach for Funding RDF Stream Processing

Author(s): Tobias Grubenmann, Daniele Dell’Aglio, Abraham Bernstein, Dmitry Moor, Sven Seuken

Full text: submitted version

Decision: reject

Abstract: RDF Stream Processing (RSP) has proposed solutions to continuously query streams of RDF data. As a result, it is today possible to create complex networks of RSP engines to process streaming data in a distributed and continuous fashion. Indeed, some approaches even allow to distribute the computation across the web. But both producing high-quality data and providing compute power to process it costs money.

The usual approach to financing data on the Web of Data today is that either some sponsor subsidizes it or the consumers are charged. In the stream setting consumers could exploit synergies and, theoretically, share the access and processing fees, should their needs overlap. But what should be the monetary contribution of each consumer when they have varying valuations of the differing outcomes?

In this article, we propose a model for price sharing in the RDF Stream Processing setting. Based on the consumers’ outcome valuations and the pricing of the raw data streams, our algorithm computes utility-maximizing prices different consumers should contribute whilst ensuring that all the participants have no incentive of manipulating the system by providing misinformation about their value, budget, or requested data stream. We show that our algorithm is able to calculate such prices in a reasonable amount of time for up to one thousand simultaneous queries.

Keywords: RDF Streaming Processing; Price Sharing; Equal-Need Sharing


Review 1 (by anonymous reviewer)


(RELEVANCE TO ESWC) The paper discusses a problem related to pricing in the context of rdf stream processing systems.
(NOVELTY OF THE PROPOSED SOLUTION) Even though the proposed solution is novel, it is rather straight forward with no significant technical challenges.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The solution is correct, but could be extended to cover situations where condition 1 (section 4.4) is not satisfied.
(EVALUATION OF THE STATE-OF-THE-ART) The discussion is brief, but adequate. There is no evaluation of approaches other than the proposed one.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The discussion is easy to follow. Though, some aspects of the approach (eg, when condition 1 is not satisfied) are left for future work.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The evaluation of the proposed approach is not complete, and does not help to deeply understand the behavior of the proposed algorithm. It is not clear how the time limit parameter for each one of the queries is set. It is also not clear how varying the value, budget and time limit parameters will affect the performance of the proposed approach. Even though experiments with synthetic datasets are useful, it would be nice to report results with real datasets (at least one), as well.
(OVERALL SCORE) This paper deals with the problem of price sharing for streaming rdf systems.
This is an interesting problem, but the paper does not make significant technical contributions. Presenting a solution that would also work for cases where the consumer gets value for partial streams (condition 1) would make the paper much stronger, since these cases appear often in practice. 
SP1. Interesting problem.
SP2. Simple and elegant solution.
SP3. Easy to follow text.
WP1. Solution does not cover the important case of partial streams.
WP2. Non-conclusive experimental evaluation.
Update: My most important point on not covering the case of partial streams has not been addressed, and this means that the contribution of the paper is limited.


Review 2 (by anonymous reviewer)


(RELEVANCE TO ESWC) The work presents a price sharing model for streaming data. Though, RDF stream processing is mentioned
in the title of the paper, there is nothing specific to RDF or semantic data. Thus, the paper is only
weekly relevant to ESWC.
(NOVELTY OF THE PROPOSED SOLUTION) Though, price sharing is not a novel idea in economics, the application of this approach to stream
processing in a cloud environment is an interesting idea. Though, some assumptions
of this paper are somehow questionable, the approach is basically promising.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) An algorithm for cost sharing (or better allocation) is presented. However, this problem seems
rather to be an optimization problem with constraints (total budget). Furthermore, the algorithm
isn't well presented: e.g. S should be better the set of queries but not indexes and what happens
to queries if the budget is exhausted?
(EVALUATION OF THE STATE-OF-THE-ART) The authors discuss briefly some cost sharing approaches. This should be extended and consider
also cloud pricing models. In contrast, the discussion of RDF stream processing is actually not
really necessary for this paper.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The evaluation described in the paper is just a run of the price sharing algorithm without any 
connection to a real system or deployment. However, details of the synthetic data (apart from 
"randomly generated") are not given which limits the reproducibility.
(OVERALL SCORE) The paper presents a model and algorithm for price sharing in (cloud-based) stream processing.
The basic assumptions are that users have to pay for running operators and accessing stream sources
for a certain time. The main idea is that users specify their expected value (utility) and
the budget they are willing to spent. Based on this information a cost distribution is calculated.
In principle, this represents an interesting and promising idea. However, there are some 
strong points:
S1: Cost/price sharing is an interesting idea for cloud-based stream processing and the paper is
among the first to try to address this problem with an economic approach.
S2: The price model and the requirements are well described and motivated.
weak points:
W1: The assumptions underlying this approach are questionable. This holds both for the 
requirements (partially R2, R3) and the model (see details below and in the correctness section).
W2: Actually, the approach is based on a method described in [15]. Thus, the main contribution of
this work is to apply this model to a stream processing scenario. Furthermore, there is nothing
special related to stream processing (apart from longer running queries) or even RDF stream processing.
W3: The evaluation is limited to a runtime analysis of the algorithm based on synthetic data. 
Neither the hypothesis nor the goal of these experiments are really clear. I would argue that calculation
time is only a minor issue in this approach.
detailed comments:
* The price model doesn't reflect typical price models for cloud infrastructures: wouldn't it be more
realistic to pay for resource usage and not for operators? Second, in stream processing the arrival
rate of tuples is important which is not considered here. There is a big difference in running a query
where only 1 tuple per minute arrives vs. a query with thousands of tuples per millisecond.
* There are some concerns with the requirements (R3, R2) and the corresponding properties (sect. 4.4):
though, the motivation for R2 and R3 are clear, the explanations (what means "misinformation", getting
higher utility by outside computation) are unclear. Why is the algorithm "ignorant" to budget and value?
These parameters are used in Alg. 1?! For R3: what is wrong with a query where the platform is used
to prepare some data and perform the compute-intensive learning step outside? How could this be forbidden
by licenses? 
In summary, the work seems to be in a rather premature state. The authors should revise their assumptions.
Furthermore, the paper is probably better suited for a cloud conference than a semantic web venue.
After rebuttal:
I thank the authors for their response, but my concerns still hold. Therefore,I do not wish to change the scores in my review.


Review 3 (by Pieter Colpaert)


(RELEVANCE TO ESWC) The Semantic Web puts forward tools for decentralizing data tasks, yet this decentralization comes at a cost for which the business model is unclear. This paper puts forward a price sharing algorithms for funding RDF stream processing. It might be one of the highlights of ESWC2018.
(NOVELTY OF THE PROPOSED SOLUTION) Exciting work putting forward the first step into research for new data-driven business models. Exactly what the community needed.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Q1: Is it correct that I seem to miss information about how this algorithm was implemented, and on what machines the evaluation was executed?
This evaluation, which I deem less important than the intriguing ideas behind the paper, seem to lack implementation and design details.
(EVALUATION OF THE STATE-OF-THE-ART) The Linked Data Fragments framework, explained in [22], was also created from the observation the publishers would be paying too much for hosting a public Web API with too many functionalities, while a much simpler and more cost-efficient interface could be thought off, still offering user agents on the Web good access to the data. Furthermore, it would allow other intermediary agents to perform some actions for other parties. This paper could in fact automate this negotiation of costs.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The paper is well written and has nice examples to illustrate the proposed approach.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Q2: It’s clear that stream processing is a special resident in the Web of Data, as explained in your Introduction, yet can your approach also be applied to classic webby datasets?
The research datasets are not available. The algorithm is well described in a code snippet, yet the code itself is not available either.
Q3: Could this data become available?
(OVERALL SCORE) Based on three economic requirements, an algorithm to allocate runtimes and calculating the total prices is put forward. The requirements are proven to be implemented. In chapter 5, Evaluation, the processing time is evaluated to run the algorithm.
Strong points:
* Introducing economics in Semantic Web
* First step into overlooked issue of business model for Web of Data
* Algorithm is built on the basis of clear requirements
Weak points:
* No research data available
* Evaluation is unclear
* Not sure why there is a strong focus on stream processing
3 Questions see earlier.
Typo: Ontoloty in reference 6
After rebuttal:
I do not wish to change the scores on my review. I believe the contribution is indeed limited, as mentioned by other reviewers, but unique and novel in its kind and may lead to interesting discussions.


Review 4 (by anonymous reviewer)


(RELEVANCE TO ESWC) Very relevant
(NOVELTY OF THE PROPOSED SOLUTION) Interesting concept which has been tackled before
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The presentation is clear.  Evaluation provide a level of validation.
(EVALUATION OF THE STATE-OF-THE-ART) Quality of Service approach from stream processing would have a relevance here.  At the very least to show a GAP in their efforts.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) An evaluation of the approach is provided with a discussion on the limitations of the approach.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The description of the experiment is OK.  No code or datasets released.
(OVERALL SCORE) This work tackles the problem of sharing access and processing fees for a set RDF stream processing graphs with overlapping operators and sources among multiple consumers. The paper proposes a model that uses a joint execution plan, which is based on the collected queries of all data consumers. It combines this query execution plan with the outcome valuations, runtime limits and willingness to pay values from all data consumers, the pricing of the raw data streams, and the pricing of the computations to determine a utility-maximizing payment distribution.  The approach is evaluated to determine the run-time performance.
Strong points:
- Equal need cost sharing: Their price sharing algorithm follows an equal need cost sharing method for operators/sources that overlap between multiple queries, where the assigned price share for the whole query can never be higher than the price of running the query in isolation.
- Maximized Utility: They allow the user to provide a utility value for running the query for a specific period of time, a runtime limit and a total willingness to pay value. The provided parameters are used by the price sharing algorithm to allocate runtime and charges payments as long as the assigned price is smaller or equal to the consumer's value thus maximizing the consumer’s utility.
- Illegitimate Utility Gain: Assigning price shares to queries is ignorant of any value, runtime limits, or budget for consumers, hence the consumers cannot benefit from manipulation by misinformation.
Weak points
- Multiple users per query: Their algorithm for price distribution calculates the price shares based on the different queries in the model assuming a single user per query which may not be the case in a real world scenario. However, they mention including a special kind of license attached to the streaming result of the query which prohibits the redistribution of the results outside the legal entity (a person or a company) which defeats the concept of contention ratio thus the number of users should be part of the price sharing model.
- User Negotiation: The price sharing algorithm charges payments as long as the assigned price is smaller or equal to the consumer's value which is beneficial for maximizing the consumer’s utility. However, if the calculated price share is greater than the consumer value, the algorithm drops the whole query and recalculates the price shares for the remaining queries. (1) Perhaps they should consider some sort of negotiation with the user before not providing the service at all (Future work), and (2) In a model with multiple users per query, the query cannot be dropped.
- Scalability: Their evaluation shows that the solution is limited in scalability when tens of thousands of queries are simultaneously involved which they acknowledge in their conclusion section.
- Related work:  QoS for Stream Processing should be covered.
Have you considered the effect of any common runtime stream processing optimizations e.g. operator replication, operator reordering, etc., operator failures or not meeting consumer QoS constraints on price sharing?
After rebuttal:
My score remains the same.  If accepted I would encourage the authors to acknowledge the limitations highlighted by the reviewers as future research opportunities.


Metareview by Intizar Ali


This paper presents a price sharing model for streaming data which considers cost and resource sharing among multiple data consumers. Pricing models are well addressed for cloud-based and services domain solutions. However, the proposed study is one of the initial efforts to introduce a pricing model for Web of data. A well-established pricing model will be a key to creating a self-sustaining economy for Web of data and there are no doubts that this study can bring very interesting discussion at the ESWC. However, the strong reservations are regarding the novelty of the approach with limited contribution beyond state of the art. Also, one of the reviewers rightly raised a few concerns related to assumptions and model itself, which needs a careful revision.  The paper in its current state has limited contribution to be accepted as a full research paper, but a careful revision of the assumptions, pricing model, and its evaluation can certainly have a good impact for the sustainability of Web of data. We strongly encourage authors to submit a revised version of their paper in the upcoming editions of semantic web related conferences.


Paper 227 (Research track)

Dynamic Tailoring of RETE Networks in Incremental Scenarios

Author(s): William Van Woensel, Syed Sibte Raza Abidi

Full text: submitted version

Abstract: Decision support systems, with production rule systems at their core, have an opportunity to leverage the embedded semantics of semantic, ontology-based data to improve decision support accuracy. Advances in mobile hardware are enabling these rule-based systems to be deployed on mobile, ubiquitous platforms. By deploying reasoning processes locally, time-sensitive tasks are no longer influenced by network conditions, less bandwidth is wasted, and less re-mote (costly) resources are needed. Despite hardware advances however, recent benchmarks found that, when directly re-using existing (PC- or server-based) technologies, the scalability of reasoning on mobile platforms is greatly limited. To realize efficient semantic reasoning on resource-constrained platforms, utilizing rule-based axiomatizations of ontology semantics (e.g., OWL 2 RL), which are known to trade expressivity for scalability, is a useful first step. Furthermore, the highly dynamic nature of mobile and ubiquitous settings, where data is typically encountered on-the-fly, requires special consideration. We pro-pose a tailored version of the RETE algorithm, the mainstay algorithm for production rule systems. This algorithm dynamically adapts RETE networks based on the evolving relevance of rules, with the goal of reducing their resource consumption. We perform an evaluation of semantic reasoning using our custom algorithm and an OWL2 RL ruleset, both on the PC and mobile platform.

Keywords: RETE; OWL2 RL; rule-based reasoning; OWL reasoning; reasoning optimization

Decision: reject

Review 1 (by anonymous reviewer)

(RELEVANCE TO ESWC) The article depicts optimizations of the Rete algorithm for forward inference in rule based systems, and the approach is empirically evaluated over the OWL 2 RL Benchmark Corpus.
Both seem relevant to ESWC.
(NOVELTY OF THE PROPOSED SOLUTION) According to the authors themselves, the optimizations are similar to [Doorenbos 95] (and to some extent to [Miranker 87]), and a detailed comparison is missing to appreciate the novelty of the approach.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Completeness is claimed to be preserved, but no justification is given for the case of fact retraction (see the full review below).
(EVALUATION OF THE STATE-OF-THE-ART) Pointers to the literature are relevant, but the analysis of the state-of-the-art needs to be more detailed.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The optimizations are very clearly described and illustrated.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Thanks to a precise description of the algorithm, and to the publication of the ruleset and data, the evaluation should be reproducible.
The main weakness of the evaluation is the chosen baselines (see the full review below).
(OVERALL SCORE) The article proposes a series of optimizations of the Rete algorithm, one of the most popular algorithms for forward inference in rule based systems.
The optimizations are meant to target resource-constraint platforms (such as mobile phones), and incremental reasoning, i.e. scenarios where additional facts (or fact retraction) may be dynamically fed to the inference engine. 
The optimizations prevents attempting some joins which will necessarily fail, because the memory of one of the join arguments is empty.
This is performed by "unlinking" alpha nodes from (the Rete graph of) certain rules, or further "pausing" them if they have been unlinked from all rules in which they appear.
A triggering mechanism allows relinking/reactivating alpha nodes once the join operations have a chance to succeed again.
As a side effect, if a paused alpha node never resumes execution, unnecessary matches will be avoided.
A detailed description of the whole unlinking/pausing/resuming mechanism is provided, including an algorithm, called Rete_tailor.   
The strategy is then empirically evaluated over the OWL 2 RL Benchmark Corpus, together with a partially handcrafted ruleset, available online, in addition to helpful links and samples.
The two baselines are a naive implementation of the Rete algorithm, and a simple a priori tailoring strategy, which consists in running the naive Rete over a fragment of the dataset, discard unused rules, and then perform (incomplete) inference over the remainder of the dataset. 
The article is particularly well written, clear, and largely self-contained.
The more technical aspects are introduced progressively, and illustrated with examples.
A precise description of the algorithm, together with the publication of the dataset, make the experiment reproducible.
Possible limitations (such as the need for storing tokens in memory for paused alpha nodes) are explicitly pointed out, as well as similarities to optimizations proposed in the literature.
My main reservations are:
1) An insufficient positioning/analysis of the contribution.
Rete-based forward inference is a relatively mature technology.
A number of improvements of the naive algorithm have already been proposed/evaluated, and cost-models have even been designed (see for instance Wright et al. 2013 below).
In this context, a finer-grained (theoretical and/or empirical) comparison of Rete_tailor with existing optimizations is expected.
In particular, the parallels drawn in Section 5 are too high-level to appreciate the novelty of the approach.
It is argued to be well-suited for resource-bounded scenarios and incremental reasoning, but the article does not explain why previous optimizations of the Rete algorithm may not.
2) The baselines chosen for the empirical evaluation are arguably not appropriate.
Comparing Rete_tailor to the naive Rete algorithm is not very informative, as the latter has long been replaced in practice by optimized versions.
As for the other baseline, it does not guarantee completeness.
So the performance comparison (number of join/match actions) is not only unfair to Rete_tailor, but also somewhat irrelevant, as both systems may not yield the same output.
Instead, one would expect a comparison with some of the optimizations proposed for instance in [Doorenbos 95], with which Rete_tailor has a lot in common, according to the authors.
Other concerns are the following:
3) Rete-based fact retraction is mentioned several times, but never (even briefly) described.
Retracting a fact is arguably a key aspect of incremental reasoning, which is the application scenario targeted by the authors.
To my (limited) knowledge, Rete-based retraction consists in applying the inference mechanism in a reverted fashion.
But it is unclear how the proposed optimizations behave in scenarios where facts are both added and retracted.
In particular, can completeness be affected by the interaction between asserting/retracting a fact and pausing/resuming alpha nodes?
The answer may be obvious, but a short justification would help.
4) Although OWL 2 RL can express disjointness, the assumption seems to be made that the input dataset (data + ruleset) is consistent.
Maybe this could be explicitly said. 
### Questions ###
- Page 5, Section 3.2, definition of h.1: "and \alpha memories in case i <= 2; Fig. 1".
I could not find an "i" in Figure 1.
Is it the index of the alpha node?
- Section 5: How does Rete_tailor compare to [Özacar et al. 2007]?
### Suggestions ###
- Page 7, Section 4.1.4 7, T.iii: "to avoid redundant join and match operation": "redundant" is possibly misleading.
Intuitively, redundancy means matching twice the same token (at different times), or joining twice over the same pair of tokens.
I suggest something more explicit, like "avoid necessarily failing join/match attempts".
- Page 7, Section 3.4: "Code 1" -> "Algorithm 1"
- Page 8, second paragraph: "previously deemed redundant". Same remark as above for "redundant".
- Section 5 helps understanding the intended application context, as well as the specificity of the approach, in particular to what extent the proposed work differs from [Doorenbos 95].
Therefore it may appear earlier in the article.
- Bibliography: at least one of the articles of Forgy (author or the original Rete algorithm) should be mentioned (e.g the one referenced below).
### Typos ###
- Figure 1: t3 is used instead of t2
- Page 4, 2nd paragraph: "Afterwards, the tailored". Part of the sentence is missing.
- Page 7, 3rd paragraph: "is longer empty" -> "is no longer empty"
- Page 7, Section 4.1.4 7: "took place the original dataset". Missing preposition.
- Page 7, case "t_x > t_y": "to beta node \beta_1(network1 > network2)".
Seems like what is meant here is "(network1 > network3)", as network2 is never reached in the case t_x > t_y.
- Page 12: avoid splitting Table 2 over 2 pages.
### References ###
Forgy, Charles L. "Rete: A fast algorithm for the many pattern/many object pattern match problem." Readings in Artificial Intelligence and Databases. 1988. 547-559.
Wright, Ian, and James AR Marshall. "The execution kernel of RC++: RETE*, a faster RETE with TREAT as a special case." Int. J. Intell. Games & Simulation 2.1 (2003): 36-48.
After rebuttal
Thanks to the authors for their response letter.
Please find below my comments about it.
1) "one requires a comparison to the baseline system": fair enough. A comparison to the standard Rete seems indeed relevant. My point was that it is not sufficient.
2) "An explicit comparison with the optimizations from Doorenbos [18] seems unsuitable, since most of these are included in our work": I have to disagree with this statement. I was precisely expecting a comparison between optimizations from Doorenbos [18] (at least the ones implemented by the authors) and the Rete_tailor algorithm. Currently, the experiments do not specifically evaluate the new optimizations proposed in the article (namely pausing, and the lower-failed alpha node heuristic).
3) I also still disagree with the relevance of the other baseline (a priori tailoring).
The evaluation compares performances (token matches, join attempts, reasoning time) of the different algorithms.
Therefore they should produce the same output.
To take an analogy, imagine comparing query evaluation times of two SQL engines.
This is arguably pointless if one of them returns incorrect or incomplete answers.
Put another way, one can only expect a qualitative comparison of Rete_tailor and a priori tailoring (number of inferred triples, ...).
4) "other approaches with the same goal" and "the most relevant work in the field":
I may be missing something obvious here (apologies if I am).
Which "field" exactly are we talking about?
If I understand correctly, the field is "Rete-based forward chaining in incremental scenarios", as opposed to "Rete-based forward chaining in non-incremental scenarios".
And according to the authors, the most relevant work in the former field is Tai [5].
But what exactly makes the incremental case different?
Section 3.3 paragraph 2 is not sufficient for me to understand it.
More exactly:
- Let us first assume that no fact is retracted. Which optimizations are better-suited to the incremental case, and why (is this due to additional parallelization opportunities in the non-incremental case)?
- Now let us assume that facts can be retracted. Are there additional differences between incremental and non-incremental?
And if so, why is the literature about deletion (in the non-incremental case) not relevant?

Review 2 (by Wouter Beek)

(RELEVANCE TO ESWC) While reasoning is an important use case on the Semantic Web, I am not sure whether the adaptation of reasoners for resource-constrained environments is specifically relevant.  The desktop and mobile systems that are used in the evaluation are already almost comparable in hardware specs.  There may only be a very brief window (two to five years) in which mobile and desktop devices have access to different computational resources.  It certainly would have helped to have at least some discussion on the mid-to long-term utility of this work in the paper.  I can imagine some additional long-term applications in an IoT context, but the paper does not mention these at all.
(NOVELTY OF THE PROPOSED SOLUTION) The proposed approach does include non-trivial extensions of an existing approach and of existing algorithms.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The approach is explicitly preserving completeness of entailment results (as opposed to a specific existing approach), and completeness is also evaluated WRT to multiple ontologies.
(EVALUATION OF THE STATE-OF-THE-ART) Does Table 1 show the total number of unlinking/relinking operations over all ontologies?  Is this not a very small number / does this even matter when compared to the total number of operations?  In the text the general applicability is explained to be larger than these numbers suggest: “it is likely that [real-world scenarios] will require many more reverting operations.”  But is this really the case?  Since real-world data seems readily available on the Semantic Web, it should be possible to determine the validity of this supposition.
I am not 100% sure whether I understand the outcomes reported in Table 2.  The main outcome seems to be that the here presented approach has performance comparable to the existing approach of Tai et al., while preserving completeness.  However, the improvement in performance of both approaches compared to the default case seems to be small/negligible.  It is a bit unfortunate that Table 2, which is already somewhat difficult to read/interpret, is split across two pages.
Since the resume heuristic requires an external store, I was expecting the size of this store to be quantified as part of the evaluation.  After all, is the performance benefit in the mobile scenario not (partially) achieved by offloading the mobile process' storage onto another process / the triple store?  If the latter is case (that is at least how I understood the setup), then it would be useful to also know the memory and CPU consumption of the triple store.
Since the different between mobile and desktop environments is crucial to the main research question of this paper, it is unfortunate that "average performance times for PC and mobile are not directly comparable".
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The comparison to other approaches and implementation in the Related Work section is very good.  However, I am missing a more high-level discussion/reflecting on the utility of the here presented research.  I am prepared to accept use cases for mobile reasoners, but to what extent do they really have restricted access to computing resources (when compared to desktop systems)?
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) I did not find a link to the published Open Source code of the here presented implementation, but it is probably not difficult for the authors to add such a link.  Since the data and rulesets are shared online, through a link that is given as a literature reference (which is a bit unusual), availability of the code would enable reproducing the here presented evaluation results.
*After the rebuttal*: Since no link to an Open Source version of the code was shared, the reproducibility of the here presented work is more difficult than originally assessed.
(OVERALL SCORE) This paper extends the existing RETE algorithm with a heuristic-driven approach that results in less memory and/or CPU consumption, while preserving the completeness of reasoning over an important OWL fragment.  The reduction in hardware resources specifically applies to resource-constrained environments like mobile phones, where a process has limited access to RAM.
Strong points:
- The invention of two intelligent extensions that could positively impact performance of the RETE algorithm without sacrificing its completeness in monotonically increasing databases.
- A good Related Work section, where alternative approaches are explained.
- The paper is written very well and it nice to read.  I could find almost no mistakes (not even minor ones).
Weak points:
- The utility of the paper currently relies on a property of the hardware market (desktop versus mobile) that may no longer exist within a couple of years.  (The utility is at least not substantiated, e.g., by referring to discussions in the hardware and/or mobile research communities.
- The evaluation does not deliver the numbers that seem to be most interesting: a direct comparison between desktop a mobile.
- The evaluation results -- assuming I am interpreting them correctly -- seem to indicate that the proposed approach only marginally affects performance.  Yet the text does not draw this conclusion, but sometimes refers to different (real-world) datasets for which the proposed approach would have / could have had different results.
Small grammar errors:
- Ungrammatical sentence: “Afterwards, the tailored ruleset for future
- p6. “This [is] realized”
- p10. ungrammatical phrase: “A reasoning cycle took place the initial
- p11. “seem[s] negligible”
*After rebuttal*
Thanks to the reviewers for their rebuttal.  Unfortunately, my worries
about the main purpose of this paper have not been cleared.  The
authors say that “Although desktop and mobile systems are comparable
in hardware specs, there are huge differences in performance.”  The
paper gives detailed information about the former (in Section 4.1.3),
but never quantifies the -- apparently more important -- differences
in performance.  Based on the details in Section 4.1.3, it is not
clear where the huge differences in performance originate from.  They
must be due to properties not documented in the paper (e.g., more
aggressive power conservation strategies on mobile).  If these
properties are the most important differentiator and motivator for
this research, then they should be documented (in addition to or
instead of the less important hardware specs).
Also: “Regardless of improvements in technology, the processing and
memory capability of the latter will be smaller than those of the
former.”  But this is already not the case according to the
information the paper gives us: a dual-core 2.9Ghz processor (desktop)
does not have a larger processing capability than a quad-core 2.2Ghz
processor (mobile), unless properties not documented in this paper are
in place (see above).

Review 3 (by Brian Davis)

(RELEVANCE TO ESWC) The paper if extremely relevant to the Reasoning track under:
optimisation, approximation, or evaluations of web reasoning algorithms,  i.e., of procedures that take, as an input,  ontologies (usually in RDF, RDFS, OWL, RIF) and test entailments or answer queries.
(NOVELTY OF THE PROPOSED SOLUTION) The paper presents and advancement on the RETE pattern matching algorithm for implementing production rule systems which dynamically adapts RETE networks at run time and aims to reduce consumption of computational resources.   The dynamic tailoring involves unlinking alpha  memories from the network as well pausing alpha nodes such that they will no longer by matched by incoming tokens. The RETE tailor algorithm seeks to guarantee completeness in incremental scenarios linking and resuming at runtime when required.  The delta appears to be that they extend  which extends null left activation with a second heuristic -  lower failed alpha nodes  as well as the addition of a second operation - pausing  - which avoids redundant match operations.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The algorithm is very well described and is clearly differentiated against the state of the art.  The experiment setup is well described, including ontologies, rulesets used and benchmark configurations and metrics.
(EVALUATION OF THE STATE-OF-THE-ART) The authors describe the state of the art quite thoroughly and assert their  contribution relative to the related work.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The RETE Tailor algorithm is very well described and is clearly differentiated against the state of the art.  The experiment setup is well described, including ontologies, rulesets used and benchmark configurations and metrics.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experiments appear to be well documented  referencing algorithms used in benchmark configuration, the datasets used in addition to link to online documentation See
(OVERALL SCORE) The paper presents and advancement on the RETE pattern matching algorithm for implementing production rule systems which dynamically adapts RETE networks at run time and aims to reduce consumption of computational resources.   The dynamic tailoring involves unlinking alpha  memories from the network as well pausing alpha nodes such that they will no longer by matched by incoming tokens. The RETE tailor algorithm seeks to guarantee completeness in incremental scenarios linking and resuming at runtime when required.  The delta appears to be that they extend  which extends null left activation with a second heuristic -  lower failed alpha nodes  as well as the addition of a second operation - pausing  - which avoids redundant match operations.
Strong Points
Mostly very Well written
Algorithm and experiments well described.
Good knowledge or related work
Weak Points
Section 3 is somewhat heavy to read and it could have been structured, broken down more clearly  succinctly as opposed to what appears to be a lengthy discussion in parts.
Minor issue is the use of "Further" at lot in Section 4.2.  I don't think that is and adverb rather an adjective.
discussion in parts.

Metareview by Jeff Z. Pan

The paper addresses the topic of resource aware incremental reasoning by dynamically adapting the RETE networks at runtime to reduce resource consumption. The reviewers agree that this line of research is interesting and is related to the Semantic Web. However, they also identified important weaknesses which led to a reduced value of the current version of this work:
(W1) An insufficient positioning/analysis of the contribution against related works, inluding the Delete and Rederive approaches, incremental reasoning approaches for OWL RL and OWL EL;
(W2) Section 3 is somewhat heavy to read;
(W3) The utility of the paper currently relies on a property of the hardware market (desktop versus mobile) that may no longer exist within a couple of years: some more thourough motivations (supported by evidence) should be provided.
The paper was suggested to be a candidate cond-accept paper. Unfortunately, it didn't manage to stay above of the threshold in the final round of decision making. We hope the comments here are useful for a future submission of the work.