Semantic Concept Discovery Over Event Databases
Author(s): Oktie Hassanzadeh, Shari Trewin, Alfio Massimiliano Gliozzo
Full text: submitted version
Abstract: In this paper, we study the problem of identifying certain types of concept (e.g., persons, organizations, topics) for a given analysis question with the goal of assisting a human analyst in writing a deep analysis report. We consider a case where we have a large event database describing events and their associated news articles along with meta-data describing various event attributes such as people and organizations involved and the topic of the event. We describe the use of semantic technologies in question understanding and deep analysis of the event database, and show a detailed evaluation of our proposed concept discovery techniques using reports from Human Rights Watch organization and other sources. Our study finds that combining our neural network based semantic term embeddings over structured data with an index-based method can significantly outperform either method alone.
Keywords: Event Databases; Concept Discovery; Semantic Embeddings; Decision Support System
Review 1 (by anonymous reviewer)
This paper tackles the problem of identifying concepts for assisting humans in writing a deep analysis report. The paper describes the concept discovery framework in detail, by providing both an overview of the sources used by the system and of the processes and algorithms that compose it. Also, some experiments are performed and well-presented in order to evaluate the system. My doubts and concerns, however, regard the paper's fitness with the in-use and resources track. The call explicitly refers to "The Semantic Web In-Use and Industry track provides a forum for researchers and industry to discuss novel research taken to the market, or on any other relevant uptake of semantic technologies outside the lab.", while to me, the paper reads more as a research paper. I thank the authors for their detailed response.
Review 2 (by anonymous reviewer)
The paper presents a concrete example of effective use of semantic technologies in real application settings. Specifically, the paper describes a solution for concept discovery over existing, large event databases using semantic technologies. The architecture, steps and algorithms of the solutions are clearly explained, properly highlighting the role and benefits of semantic technologies. Some of the steps of the proposed solution are not novel per-se, but their combination and some adaptations (in particular the semantic embedding algorithm) make the whole framework quite interesting. The reported experimentation results provide a detailed validation of the proposed solution, mainly focusing on evaluating the performances of the concept ranking algorithms. The evaluation outcomes also include a brief discussion of pros and cons of the proposed approach. In view of highlighting more the potential impact of the proposed framework, the authors could further elaborate the discussion about possible (concrete) application fields, comparing the state of the art also from an application perspective. In the paper it is reported that the solution could support analysts in their reporting work. However, this is a generic statement and, for example, it could be substantiated with a more detailed analysis about how an analysts act now to produce their reports (i.e. what tools) and how he/she can act with the support of the proposed framework. It would be also interesting to know if the same framework could be used in different applications and/or with different data sources; e.g. extending the capability to retrieve people and organizations.
Review 3 (by Andriy Nikolov)
The paper presents a system for discovering concepts in unstructured sources in order to facilitate retrieval of relevant documents and handling analytical questions. The pipeline involves indexing the data sources with entities (identified by Wikipedia URL) and building embedding vectors. The concept ranking algorithm then identifies the most relevant concepts for those mentioned in the question based on the index and the embeddings. Overall, the approach looks like a combination of pre-existing methods applied to a large-scale real-world dataset, which I think is in line with the requirements of the in-use track. I found interesting the joint use of the index-based and embedding-based retrieval methods and the evaluation results showing the effects of combining them. One thing that is missing is a more detailed discussion of the use case and the motivation on the one hand side and a more use-case oriented evaluation on the other. Who are the intended users of the system? What entities would they select as the most relevant ones? Actually, in the discussion section it is already suggested that some entities not included in the ground truth, but returned by the context method were relevant ones. Would this revise the reported evaluation results, in particular, relative importance of the co-occur and context approaches and the optimal balance between them? I would like to thank the authors for the provided answers and explanation. My main concern was about the fitness of the paper to the In-Use track. From the authors' response I can see why a more detailed description of the use case was not possible. Nevertheless, I think that the paper still sufficiently fits the track requirements.
Review 4 (by Anna Tordai)
This is a metareview for the paper that summarizes the opinions of the individual reviewers. The reviewers praise the fact that the system is explained in detail and evaluated in an experimental setting. The authors have addressed the reviewers' concerns about whether this paper fits the In-Use track. We accept that in some cases information about usage of the technology cannot be shared with the community. In such cases, sufficient information has to be shared for the work to be valuable to the research community. We feel this paper fits these criteria. Laura Hollink & Anna Tordai