Semantic Data Quality Challenges in Online Marketing and Sales- an Empirical Study
Author(s): Anna Fensel, Zaenal Akbar, Elias Kärle, Christoph Blank, Andreas Gruber, Patrick Pixner
Full text: submitted version
Abstract: The quality of linked data and schema.org instance data is currently a severe bottleneck for building applications that place semantic data in use. Many currently ongoing data quality research efforts such as in exploring existing datasets and new data acquisition techniques are surely largely promising. To complement them, we approach the problem from a different angle: with creation, deployment and testing of a typical online application that is basing on the state of the art semantic data (linked data and schema.org instance data), and consequently evaluating the data quality shortcomings and their level of severity. We provide a design and feasibility pilot of a solution implementing semantic content and data value chain for online direct marketing and sales. The designed and developed a solution is applicable for the use on the Web, social media and mobile channels. The designed, implemented and evaluated solution is within the tourism sector, and applicable globally. The state of the art challenges in using of semantic data in our solution have been identified and prioritized by 33 experts. The encountered and reported challenges primarily relate to data quality. We discuss the outcomes and potential future solutions that would be also applicable to other sectors.
Keywords: Data quality; Semantic services; Linked Data; schema.org; online marketing; online sales; eTourism
Review 1 (by anonymous reviewer)
This paper describes an implementation and deployment of semantic content and value data chain for direct marketing and sales. Such a solution is applicable for use on the Web, social media, and mobile channels, and is demonstrated and evaluated through a case study from the tourism sector. The paper is well-presented and tackles a compelling problem from an interesting angle. My main concern regards the data quality challenges identified. In Table 2, the authors list the result of their user study: users identify data quality-related issues, that are briefly described and ranked based on how many users mention them. I would like to know more about these issues. Is it possible to see more details about them? How frequent are these issues? And, most importantly, how much do these impact the system functionalities? The fact that many users report issues related to heterogeneity handling and quality of external data does not tell me whether such issues are impeding the normal functionality of the system or not. Lastly, it would be interesting to understand whether (and, if so, which) proper countermeasures have been established in order to limit such issues. I thank the authors for their short response.
Review 2 (by anonymous reviewer)
Given the valid aurguments raised by the other reviewers I revise my score to a weak accept. --- A nice use case that illustrates the value of Linked Data as an integration layer for customized and integrated touristic services. The paper is well written and easy to read, although some paragraphs especially in sections 1 & 2 could be simplified and shortened due to redundancies. The description of the service and its architecture is okay. The implementation semms to be very promising. The evaluation seems to be robust apart from the fact the the evaluators have been students from compueter science, which might impact the validity of the evaluation when it comes to generalize the results with respect to non-expert users. Anyway, the evaluation contains some interesting insights which are pretty unique in the area of data quality management in application scenarios not just listing technical features but also governance-related aspects such as rights issues, business issues, security issues etc. The conlusions drawn in section 7 sound reasonable and provide a good outlone for future work. In case this paper is being rejected I recommend to submit it as a poster.
Review 3 (by anonymous reviewer)
The paper presents a description of a system intended to provide tourism related booking services to clients. Although the application domain is very interesting and a prototype has been implemented in part using semantic technologies, the paper has several shortcomings in terms of quality of presentation and clarity of writing, technical depth, and novelty, and therefore I do not recommend the paper for acceptance at ESWC's In-Use & Industrial Track. The most noticeable problem with this paper is in its writing style. Most statements are too wordy and lack the clarity one would expect from a publication at a scientific venue. There are many words and statements that can simply be removed without affecting the contents of the paper. There are also many words and terms that are not properly defined. There are too many issues to point out, so I just mention a few examples: - The term "semantic data" that is used heavily in the paper and even its title is not a well defined term. Do you mean Linked Data, Knowledge, RDF Triples, RDFa, microdata? - A prime example of how unnecessarily wordy your statements are is this sentence: "we conceptualize and prototype an IT multi-stakeholder ecosystem and infrastructure to interoperate across different marketing data and sales content resources employing linked data and schema.org, and enhancing interoperability of distributed marketing resources for allowing meaningful searches and efficient information dissemination.". You can easily break this into simpler shorter and more accurate statements, dropping unnecessary words like "IT" and replacing terms like "conceptualize and prototype" or "multi-stakeholder ecosystem and infrastructure" with accurate description of what you propose. - On page 4, what is "semantic mining"? - You suddenly mention a "consortium" on page 5. What is it? - You mention company names in your evaluation. What are those companies or the products they offer that can take advantage of your solution? As per the requirements for this track, I do not see a "measurable impact of semantic technologies" - I can see a use of schema.org in your project and a picture of the LOD cloud. But if your use case is only availability of the data in standard formats, then you have a very weak/basic use case that lacks novelty. Also, your evaluation is not really an evaluation. You describe two *potential* use cases by two companies, and have performed a survey on the broader topic from 33 students. Your paper's title is very misleading as you are targeting a very specific application domain and even for this application you only have a prototype. Finally, it is not clear how much this paper adds to all the previous publications you have listed on http://tourpack.sti2.at/results Update after rebuttal: Thank you for your response. Regarding my comment about the companies mentioned in your evaluation section, I did not mean to undermine the fact that you have real clients and important real-world use cases. I mentioned this issue as a problem with the clarify of your writing. While reading for example the first paragraph on page 11, as a person who had not heard of the company "m-Pulso GmbH", I first had to look up in the paper itself to understand that "m-Pulso" refers to "m-Pulso GmbH" and is a company, then had to look up online to understand what the company's business is, and still wasn't able to know what the "product portfolio" and the "existing product" is that could benefit from your work. Basically, you could just replace this paragraph with a sentence (perhaps in the conclusion section) to say that: we are working with a company that has a product in this space and a solution based on our work will be rolled out to several pilot customers soon.
Review 4 (by anonymous reviewer)
The paper presents tourpack, a platform for aggregation and delivery of touristic content via a mobile application. The project - which has apparently been evaluated at least in an educational scenario, if not fully into a real commercial touristic setting (... while it became feasible from a technology perspective the main blocker to realize / offer a true package are legal and business related packaging... ) - certainly is in line with the in-use conference track. However, at its current state, the paper fails to deliver a coherent and easily readable presentation of the project challenges and results. This is partly due to a number of language issues that make the text sometimes very difficult to understand. I recommend a profound language and stylistic review by a native speaker prior to resubmitting.
Review 5 (by Anna Tordai)
This is a metareview for the paper that summarizes the opinions of the individual reviewers. The reviewers mention the relevance of the topic for ESWC and the interestingness of the use cases by two companies. One reviewers poses questions regarding the application of semantic technologies beyond standardisation. There are many unanswered questions regarding the user study. Some reviewers point out issues with the writing style. Laura Hollink & Anna Tordai