Modeling and Summarizing News Events using Semantic Triples
Author(s): Radityo Eko Prasojo, Mouna Kacimi, Werner Nutt
Full text: submitted version
Abstract: Summarizing news articles is becoming crucial for allowing quick and concise access to information about daily events. This task can be challenging when the same event is reported with various levels of detail or is subject to diverse view points. A well established technique in the area of news summarization consists in modeling events as a set of semantic triples. These triples are weighted, mainly based on their frequencies, and then fused to build summaries. Typically, these triples are extracted from main clauses which might lead to information loss. Moreover, some crucial facets of news, such as reasons or consequences, are mostly reported in subordinate clauses and thus, they are not properly handled. In this paper, we focus on an existing work that uses a graph structure to model sentences allowing the access to any triple independently from the clause it belongs to. Summary sentences are then generated by taking the top ranked paths that contain many triples and show grammatical correctness. We further provide several improvements to such approach. First, we leverage node degrees for finding the most important triples and facets shared among sentences. Second, we enhance the process of triple fusion by providing more effective similarity measures that exploit entity linking and predicate similarity. We performed extensive experiments using DUC2004 and DUC2007 datasets showing that our approach outperforms baseline approaches by a large margin in terms of ROUGE and PYRAMID scores.
Keywords: Knowledge bases; Text Similarity; Entity Linking
Review 1 (by Francesco Ronzano)
(RELEVANCE TO ESWC) The paper provides an interesting example of how semantic knowledge structures and Semantic Web resources can improve multi-document summarization. (NOVELTY OF THE PROPOSED SOLUTION) The paper provides improvements of an existing approach proposed by Li, Cai and Huang in 2015 (paper: "Weakly Supervised Natural Language Processing Framework for Abstractive Multi-Document Summarization: Weakly Supervised Abstractive Multi-Document Summarization"). (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Some aspects of the procedure are nor clearly explained (see Overall score comemnts - use of degree of nodes in triple pattern selection). (EVALUATION OF THE STATE-OF-THE-ART) The SOA review is extensive. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The approach is consistently evaluated. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) It would be great to release the implementation of your summarization approach as the authors of the original summarization system that you improved (Li, Cai and Huang in 2015 - "Weakly Supervised Natural Language Processing Framework for Abstractive Multi-Document Summarization: Weakly Supervised Abstractive Multi-Document Summarization") did. They state that: "Code is available at: https://github.com/jerryli1981". (OVERALL SCORE) *** Summary of the Paper *** The paper proposes and evaluates a set of improvements to the multi-document summarization approach proposed by Li, Cai and Huang in 2015 (paper: "Weakly Supervised Natural Language Processing Framework for Abstractive Multi-Document Summarization: Weakly Supervised Abstractive Multi-Document Summarization"), applied to news events. The authors provide an overview of previous works in which abstractive summaries are generated by relying on (semi-)structured knowledge representations obtained from a set of documents by means of Open Information Extraction systems, constituency / dependency parsers or semantic role labelers. Then they provide an overview of their basline summarization system, the multi-document summarization approach proposed by Li, Cai and Huang in 2015: such approach, given a set of documents to summarize and a set of facets to include in the summary, relies on the OLLIE Open Info Extraction system, the Stanford NER and the SEMAFOR semantic parser to generate and merge triple based representation of these documents. Then triples are clustered with respect to the facet of the sumamry they deal with by a semi-supervised approach: from each cluster the patterns of triples with higest coverage and grammatical coherence are selected to be included in the summary. This paper improves the following aspects of the approach proposed by Li, Cai and Huang in 2015: (1) the process of triple clustering that is totally unsupervised and based on k-means applied to embedding based representations of the words in each triple (2) the merging of triples that is improved by relying on DBpedia Spotlight and the Stanford Deterministic Coreference Resolution for entity fusion and by relying on Wordnet Similarity metrics for predicate fusion (3) the selection of sentences to include in the final summary that consider also the degree of the nodes of the triple patterns (besides their pattern coverage and grammatical coherence) The authors evaluate the proposed improvements to the approach proposed by Li, Cai and Huang in 2015 that is considered as baseline. To this purpose, they used the datasets of the DUC04 and the DUC07 and the following summary evaluation metrics: ROUGE scores (unigram and bigram) and pyramid score. They show that their improved system outperform the baseline approach, also by means of a manual evaluation of the coherence and correctness of 100 random summary sentence generated by each approach. *** Strong Points (SPs) *** The topic faced by the paper (graph-based abstractive summarization of documents) is of great relevance. The paper is clear and well written and the evaluation consistently organized. *** Weak Points (WPs) *** Some aspects of the procedure are nor clearly explained (see below - use of degree of nodes in triple pattern selection). *** Questions to the Authors (QAs) *** The paper proposes incremental improvements over an existing multi-document summarization approach (proposed by Li, Cai and Huang in 2015). In particular three improvements to the original approach are described. It would be great to include some evaluation (besides by means of better ROUGE/PYRAMID scores) of the a-priori impact of each single modification proposed to the original approach. For instance, you exploit DBpedia Spotlight / Coref Resolution / predicate Wordnet Similarity to merge entities and predicates of different triples so as to increment the contentedness of the resulting graph. It would be great to evaluate on a set of multi-document summarization scenarios how the contentedness of such graph improves. In Section 4.4 you state to use total / averaged node degree to improve Summary Sentence Selection. From the paper it seems unclear how you exploit the node degree to this purpose and how you consider the degree of nodes of a triple pattern together with its pattern coverage and grammatical coherence so as to evaluate if to include that sentence in the summary (could a formula be included to explain how these aspects are considered when ranking a triple pattern?). It would be great to include an more detailed analysis of the type of incoherences of the sentences of a summary and if possible investigate their origin / cause in the summarization aprpoach. The original multi-document summarization approach, used as starting baseline in this paper, was evaluated by Li, Cai and Huang on the "TAC2011 Summarization task" dataset. It would be great if, in order to better evaluate your improvements, you could consider also this dataset in your evaluation efforts. Typo: Section 4.3: Section 3 show that the fusion of the two triples... --> showS -------------------- ** After rebuttal ** Many thanks to the authors for providing answers to the issues raised. After reading their comments, considering the intention of the authors to better explain the role of node degree in content selection and their willingness to evaluate their approach on the TAC 2011 dataset, I will consequently modify my final score.
Review 2 (by Daniel Garijo)
(RELEVANCE TO ESWC) The paper is relevant for the ESWC conference, as it proposes a method to merge and transform triples to text in a way that can be human readable (NOVELTY OF THE PROPOSED SOLUTION) The conversion and fusion of triples to text is a novel approach, and far from being solved. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Although the authors introduce the approach at a high level, they don't provide any formal methods, algorithms or details on how they actually perform their experiments. For example, if an exisitng software is being used, I would expect some details of the configuration used. If an existing software is extended, I would expect details on what where the necessary extensions. Otherwise I cannot assess the correctness and completeness of the proposed solution (EVALUATION OF THE STATE-OF-THE-ART) The authors seem to do a good job on the landscape of analyzed tools and methods. Perhaps I am a little surprised to not see any work on summarization based on neural networks or abstract meaning represenations to cluter together events that could be referring to the same entities. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) ----after rebuttal----- The authors claim that they will do a better assessment of the system and include it in the camera ready version. The question is, what will happen if the users think the system is not good? We should have seen the results of such evaluation in order to properly assess its outcomes. ----original review----- I see two main issues with the current evaluation: 1) There is no comparison against other state of the art methods, besides the baseline (which is the basis for this work) 2) The manual evaluation has only two users, which cannot be accepted as a significant evaluation (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) ----after rebuttal----- The authors have shared their code: https://gitlab.inf.unibz.it/rprasojo/summarization, but details of the evaluations are still missing. I have modified my score accordingly ----original review---- None of the resources developed by the authors seem to be available online. The code is nowhere to be seen. I haven't been able to test the system myself, and the evaluation results are not available. Given that the approach has no details on the parameters and hyperparameters used, I cannot reproduce the approach described by the authors. (OVERALL SCORE) ----after rebuttal----- I would like to thank the authors for answering my concerns. In particular, I am happy to see that the code will be included in the final version of the paper, and that a better user evaluation is planned. I will raise my score. ----original review----- This paper presents an approach for summarizing and transforming triples to human readable text. As I expressed in my points above, I think the paper is relevant for ESWC, well written and leads with an important and timely topic. Among its strong points, I would list the overall description of the system and the examples provided, which helps understanding the main steps. Weaknesses of the paper include the lack of detail in the approach, the lack of a pointer to the resources developed by the authors (only pointers to the reused software/datasets are provided) and that the evaluation only compares the method against versions of itself, instead of the existing state of the art. In addition, the manual evaluation has very few users to be considered significant, and the authors don't really explain what the scale of correctness and coherence is. Addressing any or all of these weaknesses would make the contribution of this work very compelling.
Review 3 (by Christian Mader)
(RELEVANCE TO ESWC) The paper fits very well into the NLP track. (NOVELTY OF THE PROPOSED SOLUTION) The authors improve existing work with their own approaches for (for instance) triple similarity detection, entity linking, and graph fusion. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The approach is well evaluated (at least the automatic part). Manual evaulations by humans could have been done with a higher number of individuals actually judging the summary sentences (only 2 were used). (EVALUATION OF THE STATE-OF-THE-ART) State of the art and existing work are extensively covered and their relation and shortcomings in comparison to the proposed approach is discussed well. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Examples are provided and the approach is compared to the baseline approach. However, additional figures would have been helpful; I understand that within the page limit this is hard to achieve and readers with an NLP background anyways have an understanding of the used tools (e.g., Stanford, WordNet). For some of the improvements of the baseline (e.g., entity linking and graph fusion) a more in-depth coverage would be helpful, e.g., in the form of pseudo code algorithms. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) I cannot find a link or reference in the paper that allows the readers to perform their own experiments with the tool. This would be very helpful for further assessment. However, I do believe that the methodology of the evaulation study is sound and results hold also for other input data than the DUC datasets. (OVERALL SCORE) The paper reports on improvements to existing work regarding automated generation of summaries of news articles. The main contributions are two methods (node degree exploitation and selection of better suitable similarity measures) that improve the state of the art of generating these summaries. The approach is evaluated using the DUC'04 and DUC'07 datasets and an existing approach as the baseline approach to compare against. The results of the proposed appraoch (the summary sentences) are both automatically and manually assessed with respect to their correctness and coherence. The results indicate that the baseline approach is improved in terms of precision and F1 measure in all cases. Strong Points (SPs): 1) Well structured and written 2) Extensive automatic evaluation 3) Good discussion of related work Weak Points (WPs): 1) Manual evaluation could have been more extensive (e.g., greater number of humans) 2) Unclear why authors in particular select the work of Li et al. as baseline 3) More figures would have better illustrated the approach I thank the authors for the clarifications and confirm my score.
Metareview by John McCrae
This paper is concerned with multi-document summarization from news events. This is agreed to be an interesting area and the paper is well-written. The methodology is seen as being generally novel however many of the details of this paper are unclear, the authors making code available greatly helps to solve the issue of reproducibility. There are some concerns about the evaluation, however the authors have compared to recent state-of-the-art and used manual evaluation, so is likely sufficient for a conference paper. The authors recognize weaknesses in the evaluation and plan to extend, they should also acknowledge the shortcomings raised by the reviewers in the camera-ready version. The reviews are broadly positive so I think this paper should be accepted.