Paper 76 (Research track)

Classifying Crisis-information Relevancy with Semantics

Author(s): Prashant Khare, Gregoire Burel, Harith Alani

Full text: submitted version

camera ready version

Decision: accept

Abstract: Social media platforms have become key portals for sharing and consuming information during crisis situations. However, humanitarian organisations and effected communities often struggle to sieve through the large volumes of data that are typically shared on such platforms during crises to determine which posts are truly relevant to the crisis, and which are not. Previous work on automatically classifying crisis information was mostly focused on using statistical features. However, such approaches tend to be inappropriate when processing data on a type of crisis that the model was not trained on, such as processing information about a train crash, whereas the classifier was trained on floods, earthquakes, and typhoons. In such cases, the model will need to be retrained, which is costly and time-consuming.
In this paper, we explore the impact of semantics in classifying Twitter posts across same, and different, types of crises. We experiment with 26 crisis events, using a hybrid system that combines statistical features with various semantic features extracted from external knowledge bases. We show that adding semantic features has no noticeable benefit over statistical features when classifying same-type crises, whereas it enhances the classifier performance by up to 7.2\% when classifying information about a new type of crisis.

Keywords: semantics; crisis informatics; tweet classification

 

Review 1 (by anonymous reviewer)

 

(RELEVANCE TO ESWC) This paper is highly relevant to the conference as it clearly outlines the advantages of a combination of semantic web technologies and statistical machine learning. Furthermore, it showcases the multilinguality aspects of the semantic web.
(NOVELTY OF THE PROPOSED SOLUTION) The proposed combination of features and evaluation are novel.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The paper seems complete and correct to me. I would like to have answers in the final paper to the questions raised below. Furthermore, I wonder whether there is a complete list of classifiers tested in this approach.
(EVALUATION OF THE STATE-OF-THE-ART) There is no comparison with other approaches on this dataset. However, the authors tried a combination of different methods and ML algorithms.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The authors highlighted the advantages and disadvantages of their approach as well as discussed future improvement possibilities.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) There is no source code for this approach available but the dataset is a well-known public entity.
(OVERALL SCORE) The overall impression of this paper is very good. It is easy to read and understand. The content is extremely exciting and I do not have found similar research as of now. Also, it is interesting to see that the authors strive for multilinguality of their approach. The weak points are: the source code is not available, there was no evaluation of approaches of other researchers.
General Questions:
1) - How large was the standard deviation of the classifier performance when it was improved by up to 7.2 %?
2) - Citation style: Please check that there is always a whitespace before the citations. Force it ~\cite{}.
Section 1: 
Page 1: The URL in the footnote is long.
Section 2:
The related work seems exhaustive. 
Section 3:
- I wonder whether the statistical features introduced at 3.2 are actually indicators of anything. Here it would be interesting to see the actual distributions over the labeled data. I cannot believe that the first 6 items (forming 6 features) are distinguishing features where the 7th feature has 10.7k components.
- Please use the citation of Babelfy instead of the footnote. 
- The footnote placement is rather odd behind commata. 
- For the semantic feature vector: For example, the "Babelfy Entities" feature seems to be the size of the KB? Could you please indicate in the final version the size of the feature? 
Section 4:
Footnote 12 seems to be floating around.
Section 4.2: I wonder why dbo:location and dbo:place are among the concepts. This might point to a bias in the training corpus as it seems that the ML algorithms have learned that particular places are linked to particular classes, or?
The same suspicion goes for Table 5 and the No-of-hashtag features. Here it would be interesting to see, whether the distribution of the number of hashtags in the training classes is quite similar?
#post-rebuttal update
To the authors,
my questions were answered and I am happy to see that the authors will add source code and datasets to enable repeatability.  Thank you and good luck!

 

Review 2 (by anonymous reviewer)

 

(RELEVANCE TO ESWC) Generally speaking, the paper is about text based classification of tweets related to some disastrous natural events. The goal is to discriminate between relevant/non relevant information for some defined categories of events. The paper compares the case of learning classifiers from tweets along or tweets extended with some external source (here dbpedia and babelnet).
Overall, the relatedness to ESWC is weak. I would think of the paper more related to conferences of the Information Retrieva/Machine Learning (IR/ML)l community such as SIGIR.
(NOVELTY OF THE PROPOSED SOLUTION) From a scientific point of view, the contribution is rather weak as the work is essentially an application of existing IR/ML technology. It would have been more interesting if e.g., an accompanying formal event ontology has been built to be used during the  learning process. Right now, from a IR/ML perspective the contribution is at the "poster" level.
Beside, the work is an extension of [12] (workshop paper)
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Basically weak: why SVM? I'm wondering, e.g., why not using Hierarchical classifier induction. Also not clear if a classifier for each category has been induced separately or one has applied some multi-class classifier induction method and why not one of the Convolutional neural network realm?
(EVALUATION OF THE STATE-OF-THE-ART) The paper compare reasonably with existing work, though basically this work is an extension of [12] (workshop paper), so with respect it needs a better clarification.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The proposed has been reasonably clearly described.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) While the starting dataset (btw, the same as [12]) is publicly available, neither the enriched dataset used here nor the code are publicly available. The  enriched dataset  seems not replicable from the paper alone. As the paper's results depend on it, the results seem not replicable.
(OVERALL SCORE) **Short description of the problem tackled in the paper, main contributions, and results** 
Roughly, the paper is about text based classification of tweets related to some disastrous natural events. The goal is to discriminate between relevant/non relevant information for some defined categories of events. The paper compares the case of learning classifiers from tweets along or tweets extended with some external source . 
Strong Points (SPs):
The paper shows, by using the specific classifier induction method considered here,  that adding features from an external source may be of some help.
Weak Points (WPs):
- From an IR/ML perspective the results seems non particularly unforeseen and generalisable
- would have been more interesting an accompanying formal event ontology has been built to be used during the  learning process. 
- not clear why not using Hierarchical classifier induction. 
- Also not clear if a classifier for each category has been induced separately or one has applied some multi-class classifier induction method and why not one of the Convolutional Neural Network realm?
- the enriched dataset on which the method is based on seems not replicable, nor it is available (code is also not available), which makes the proposal hardly replicable
Questions to the Authors (QAs) :
- address weak points.

 

Review 3 (by Payam Barnaghi)

 

(RELEVANCE TO ESWC) The work presents a semantic approach to extract and analysis information related to crisis from twitter. The authors have described their approach to extract the semantic information and have discussed their classification method that uses SVM.
(NOVELTY OF THE PROPOSED SOLUTION) This is an interesting and timely work and presents a good approach for extracting semantic data from twitter.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The proposed work follows a systematic approach and is implemented and evaluated using existing datasets.
(EVALUATION OF THE STATE-OF-THE-ART) The related work and background information are well described. The evaluation results are well presented. It would have been beneficial is the authors would have compared their work using other classification methods and if applicable competed their work with other existing solutions (there are several other solutions for crisis information extraction; e.g. twitris) or other similar information from twitter.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The work could be also tested on some other datasets and with different classification techniques. It is good to see that the work has used a cross validation in the evaluation fo the results.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The authors are encourage to make their algorithm/system available online.
(OVERALL SCORE) This is an interesting and timely work. The evaluations and discussions are sufficient.

 

Review 4 (by Edna Ruckhaus)

 

(RELEVANCE TO ESWC) Adding semantic features to the classification of twitter posts in crisis situations is relevant to ESWC research track.
(NOVELTY OF THE PROPOSED SOLUTION) The approach of integrating statistical and semantic features into classification models for twitter posts is novel.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The solution presented is clearly explained. However, the description should include a discussion and evaluation of the cost of adding semantic features to the classification.
(EVALUATION OF THE STATE-OF-THE-ART) The state of the art emphasizes the contribution of this work with respect to related work. The related work covers all important aspects of the problem.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Properties of the proposed approach are discussed. However, in the discussion of the specific important result obtained in the experimental study, it is not clear which are the properties of the approach that hinder other expected results.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experimental study is general but it is not clearly reproducible, i.e. links to the repository of training and evaluation data.
(OVERALL SCORE) This work presents a hybrid system that uses statistical and semantic features for the classification of crisis-related twitter posts. Main results indicate that adding semantic features to classifiers improves performance specially when classifying new types of crisis that was not included in the training data.
(SP1) Adding semantic features to the classification of twitter posts is relevant to ESWC research track and is an important contribution to sorting a large amount of info generated trough twitter posts during crisis situations.
(SP2) The paper is well written and easy to follow. it is very complete in presenting the contribution of the approach and in emphasizing this contribution in relation to existing work. Also, in general the description of the approach is understandable.
(WP1) Properties of the proposed approach are discussed. However, the analysis of the specific important.
result obtained in the experimental study, i.e. improvement when "new" types of crisis (with respect to training data) are classified, it is not clear which are the properties of the approach that hinder other expected results, i.e. significant improvement also when the types are existing in the training data.
(WP2) It is not clear how the experiment may be reproducible, the data for training and evaluation seems not to be available.
(WP3) The description should include a discussion and evaluation of the cost of adding semantic features to the classification. Timeliness of the retrieval of relevant crisis-related information is critical.
(WP4) Some minor typos:
- In the abstract "effected" should be "affected".
- In the description of statistical features, section 3.2, "weight" should be "weigh".
- in section 3.2, semantic features, you should say the unit when expanding the vocabulary, e.g. 3057 ?
- Not sure when it is appropriate to use "crises" or "crisis", should check this.
I acknowledge that authors are addressing the weak points of the paper that were indicated: ensuring that experiment is reproducible, and adding the discussion on the cost of adding semantic features.

 

Metareview by Stefan Dietze

 

The authors introduce an approach to classify crisis-related information in social media using a combination of statistical and semantic features (eg obtained from background knowledge graphs). Reviewers agree that this is a sound and well-presented submission which contributes to the state of the art. We encourage the authors to take into account the comments from all reviewers carefully when preparing the camera-ready versions.

 

Share on

One thought to “Paper 76 (Research track)”

  1. Thank you for this paper that I found very interesting!
    I just wanted to let authors know about the following paper, which is very relevant for their work:

    Stefano Cresci, Maurizio Tesconi, Andrea Cimino, Felice Dell’Orletta. “A Linguistically-driven Approach to Cross-Event Damage Assessment of Natural Disasters from Social Media Messages”. In Proceedings of the 24th international conference companion on World Wide Web. ACM, 2015.

Leave a Reply

Your email address will not be published. Required fields are marked *