Paper 178 (Research track)

Semi-automatic Alignment of REST APIs to Schema.org for Effective Service Discovery

Author(s): Simon Schwichtenberg, Stefan Heindorf, Christian Gerth, Gregor Engels

Full text: submitted version

Abstract: Today’s web services are usually REST APIs which are described purely syntactically.
Therefore, an effective service discovery is hindered as service requests and offers are usually heterogeneous with respect to their domain terminologies.
Semantic specifications, based on machine-readable ontologies, allow to overcome this heterogeneity.
However, it means a lot of manual effort to provide semantic specifications by establishing links to an ontology.
In this paper, we present ASTRO, a semi-automatic tool that derives semantic OWL-S specifications from syntactic Open API specifications.
It assists service providers to align their specifications to the widely-adopted schema.org ontology in order to reduce heterogeneity.
In our evaluation, we determine the practicality of our approach on a large-scale set of real-world REST APIs from Mashape.
ASTRO considerably reduces manual effort to enrich existing syntactic specifications with semantics.
About 51% of the extracted concepts from Mashape specifications can be mapped to schema.org.
Based on these enriched specifications, we show that the alignment to schema.org improves the effectiveness of the service matchmaker OWLS-MX3 by 61%.

Keywords: Semantics Derivation; Alignment; Domain Ontology; Schema.org; Web Service; REST API; Heterogeneity; Service Discovery; Mashape

Decision: reject

Review 1 (by Jean-Paul Calbimonte)

(RELEVANCE TO ESWC) The topic is relevant to ESWC. Although Semantic Web Services has been deeply studied in the past, current trends in Web Services have changed significantly. This requires researchers to revisit theory and applications that attempt to integrate and reuse services using SW-based techniques.
(NOVELTY OF THE PROPOSED SOLUTION) The workflow proposed in this paper reuses past works form the authors (e.g. for code generation) as well as rather standard techniques for alignment. Although the usage of these technologies is in a relevant use case, the overall novelty of the approach is not high.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The approach has several limitations which are reflected in the results. Although in some cases there are actually nice MRR scores, as the authors mention, there are several lines of work to improve the current workflow.
(EVALUATION OF THE STATE-OF-THE-ART) The State of the art in web services is actually larger than what is presented in the paper, but for the specific use case, the presented works are at least sufficient.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The discussion brought by the authors is interesting as it even shows that the proposed approach has several limitations. The authors point out at the current issues that prevent the results to be better. This includes issues in the service description, multilinguality problems, missing data, cryptic codes, etc. While the discussion and the limitations are interesting, they reveal that improvements can be made to the currently presented approach, which would result in a potentially better set of results.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Although the scope of the paper is limited, the proposed metrics are well justified and the dataset is actually a nice collection from mashape, including real services form the wild.
(OVERALL SCORE) The paper presents a tool that matches REST APIs with schema.org, and generates OWLS descriptions, in order to make them discoverable.
The paper describes a workflow for matching service requests, and offered services from a given collection of REST APIs. The workflow consists in first a syntactic transformation from the Open API format to OWLS, then aligning it to schema.org, so that it later can be matched against the request (OWLS too). Then a code generation step is performed to be able to invoke the matched services.
The approach taken by the authors is interesting in the sense that it allows developers to stay within the boundaries of well-known RESP/JSON interfaces, while the entire semantic processing does not interfere with the usage of the APIs. It also reuses as much as possible existing agreements for semantic services (OWLS), matching techniques, well-established ontologies (Schema.org), and code generation tools (their own work on [18]). 
However, as a result the overall contribution of the paper in terms of novelty is rather limited, given that the core of the paper is the alignment step (section 3.3), which appears to use standard techniques with Lucene and ConceptNet. Therefore, it seems that the main contribution of this paper is more on the way that the existing technologies have been applied to this specific problem, which could make this work more appropriate to an In-use track.
Nevertheless, as the authors point out, the results reported in the evaluation are not entirely satisfactory, given the heterogeneous nature of the service descriptions, among other reasons. Lack of descriptions, multiple languages, cryptic codes, non-evident synonymy, etc., are some of the problems that  lower the scores of the alignment and discoverability. It is precisely due to this heterogeneity that data integration, and in this case service integration is a hard problem, and it is for these cases that semantics are expected to provide promising results. So, many of the future works announced by the authors, would actually be necessary, in order to strengthen their claims and end up with much more convincing evidence of the appropriateness of the approach. 
It would be important to tackle some of these issues early on, for example, not depending entirely on schema.org alignment. As the authors mention, it does not cover all domains, and even for the mashape sample this is also not the case. Therefore, it could be expected to expand the approach to consider sets of good-quality ontologies that could be selected for alignment. Depending on the domain, specialized ontologies could be more appropriate. Also, language discovery can be explored for addressing the multilinguality problems, and for other data, services can be invoked and explored in order to guess the semantics based on the retrieved data. This type of approaches have been explored for a while in semantic data integration, although maybe a bit less for services. 
Overall, the paper is very well written and it is easy to follow. It has a well defined scope, although I think that this scope is quite limited, and the results reflect this. The authors point out quite well some of the limitations found on their approach, and while they mention them as future work, it seems that actually addressing these limitations could result in a stronger paper with more convincing results. 
Strengths:
- Approach relies on well established standards and ontologies: OWL-S, schema.org, alignment techniques.
- Approach is not "invasive" for application developers of REST APIs.
- Makes available code and data
- Uses real life REST APIs for testing the methods
Weaknesses:
- Contributions appear to be rather incremental
- Results show improvement but still many issues to address
- Novelty of the approach is not high
Questions:
- Would it be a radical change to add a (semi) automatic ontology choosing step? so that it does not depend only on a predefined ontology (schema.org)
- Have you considered using data exploration techniques to solve the issues of cryptic codes and other cases where service descriptions are not useful?
- Service quality can also be a criterion for selection of an API. 
- How representative is your mashape sample? Are there other possible sources that could help to show a wider applicability of the approach.
After rebuttal: thanks to the authors for their comments. My question about not sticking just to schema.org arises from the results that show that in some cases this vocabulary is not enough. So I thought it would be worth thinking about it. Concerning multilinguality, perhaps for this specific data sample is not too important, but in the *wild* I am pretty sure this has some relevance (e.g. services in China, India, or spanish in the Americas...). In any case these are side-issues.
The main concerns about the limited novelty of the approach remain, although I raise the score a bit because of the dataset used, and the effort that has been put to complete the entire chain from service discovery to invocation. One may argue that this is more an 'in-use' kind of work, but as the authors argue in the response, there is definitely scientific merit in doing so.


Review 2 (by anonymous reviewer)

(RELEVANCE TO ESWC) The presented approach clearly relies on semantic technologies.
(NOVELTY OF THE PROPOSED SOLUTION) The proposed alignment between schema.org and mashape is novel.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) I would suggest improving how the weaknesses and strengths are discussed (see detailed review)
(EVALUATION OF THE STATE-OF-THE-ART) The state of the art is sufficiently dicussed
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The proposed approach is discussed with sufficient level of detail
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The code of the tool is available
(OVERALL SCORE) The author present an approach for assisting the search and matching of APIs, bz mapping syntactic API descriptions to schema.org. This is an innovative approach and is very relevant for the community, since currently a lot of the service tasks such as discovery, matching, composition and invocation have to be done manually. This is mainly due to the lack of semantics and the heterogeneous structure of the descriptions, which are sometimes even incomplete. The paper is well written, easy to follow, and presents a clear line of argumentation. The evaluation clearly supports the applicability of the approach. There are a number of improvements suggestions listed below:
1) Please be very clear about the difference between APIs and REST APIs. Not all APIs are RESTful, in fact the majority of the APIs are actually not. Either only talk about APIs or say explicitly that you are following REST quire loosely. Depending on the community and readers, this might cause quite a discussion. If you are looking at operations, then you are definitely not following REST. 
2) Why did you choose OWL-S? This a a very heavy-weight formalism. Would it not be easier to just go for a more lightweight description… such as WSMO-Lite? This can still be used for matching (you would need to do a simple mapping of the concepts).
3) Why not learn an ontology for mashape, with complete coverage, and them map it schema.org. Having an ontology for maschape would already be a great contribution, you would have high coverage, and most-probably search request would be easier to match. 
4) I am not sure why you talk about ontology matching in the related work. It is a huge filed a research. Maybe only exempting the relevant parts and focus on those. It might be that the section title can be improved to avoid confusion.
5) How to requesters and providers agree on the same ontology? This is quite a big assumption. My suggestion here would be to learn a mashape ontology, do the mapping to schema.org and argue that you support “native” mishap requests and “standard” ones (e.g. schema.org) (see 4).
6) There are a number of comments regarding the evaluation: i) be absolutely clear about the disadvantages of your approach (concepts that are not covered by schema.org, missing JSON, etc.) and in the same way highlight the advantages. Currently the advantages are not clerkly stated and occur in small bits in the text. Do a separate discussion section and sort both disadvantages and advanced out. This would only strengthen the paper. ii) Why is there no simple precision/recall mapping regarding the concept mapping? This is the first thing that one would expect. iii) Instead of saying that schema.ord does not cover all needed concept, say how you can practically compensate for that (plan on sensing standard domain-sepcific ontologies?) iv) what is the overall coverage of mashape?
7) In the discussion section you talk about the not needing to learn OWL-S. My suggestion is to focus on the fact that the approach is non-invasive, which a very important, and also to talk about the implications for further tracks such as composition and invocation. Does are actually improved too, right?
8) Section “Effectiveness of ASTRO”, sentence “It has to be tested to what degree this can be done automatically”. This really leaves the reader hanging. I would suggest rephrasing. 
9) “is reduced which improves the the effectiveness” double “the”
I really enjoyed the paper and hope that it gets accepted.


Review 3 (by anonymous reviewer)

(RELEVANCE TO ESWC) The application of a semantic approach for solving the problem of Effective Service Discovery is relevant to ESWC conference
(NOVELTY OF THE PROPOSED SOLUTION) The process as a whole is not novel, the contribution is mainly a system/tool. Additionally, all the magic seems to be done by Lucene search engine.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The evaluation shows in some way the correctness of the proposed solution
(EVALUATION OF THE STATE-OF-THE-ART) The related work explains in a good way that the previous work was focused mainly in WDSL specification. However, the Ontology Matching related work seems to be out of the place
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The description of the approach mix technical details with research problems with system details such as Lucene search engine
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The metrics and evaluation procedure is presented, additionally the source code in GIT
(OVERALL SCORE) The paper presents a tool named ASTRO to transform REST OPEN APIs into Schema.org vocabulary in a semi-automatic way. ASTRO main goal is to solve the service discovery problem between providers and requesters. Schemar.org was selection due to its matching with real-world objects. At first step, the tool transforms an OPEN API into OWL-S then to Schema.org. The evaluation is done using Mashape Marketplace data, concluding that more work needs to be done since Schema.org is not enough to describe the all the API operations from Mashape Marketplace.
The following are the strong points of the papers:
1. The problem is well presented and it is relevant nowadays, taking into account the growing Cloud Services, and Internet of Things (IoT) 
2. The idea of using schema.org as the main vocabulary is interesting 
3. The use of Mashape Marketplace data is definitely a good idea since the insights gaining during this work are real data
4. The metrics and evaluation procedure is presented, additionally the source code in GIT
The following are the weak points of the papers:
1. The contribution from the research point of view is not clear, the paper contains too many technical details of the tool. In some parts of the paper, there is a mix of Service Discover Problem and the Code Generation problem.
2. One of the main conclusions is that schema.org is not enough, maybe the paper would be read better presenting this finding like the main contribution “Why schema.org is not enough…”
3. The text mix technical problems with research problems (Lucene provides this and that functionality)
4. I am missing an Architecture diagram and description e.g. Apache Lucene appears from nowhere and it seems that is one of the main components for the tool to work. Is Lucene to facilitate the service discovery?
5. Figure 2 does not bring any value to the paper
6. This motivation phrase “Web Services of today…“ is repeated more than three times in the paper(abstract, intro, format conversion, conclusions)
7. I would suggest not to use references for URLs (Reference [1] Open API Specification…)


Metareview by Amrapali Zaveri

While the problem addressed is relevant to ESWC, the reviewers are concerned about the novelty, scientific contribution and generalizability of the proposed approach. Moreover, the reviewers agreed that the paper was too technical and lacked the necessary research ingredients. The authors should not only clarify the novelty of their approach but also provide a discussion of their findings. Considering the limitations identified as a result of the evaluation, there is room for improvement and further evaluation with several different domain-specific ontologies. However, in its current form, it is not suitable for acceptance at ESWC 2018.


Share on

Leave a Reply

Your email address will not be published. Required fields are marked *