Author(s): Marco Avvenuti, Stefano Cresci, Leonardo Nizzoli, Maurizio Tesconi
Abstract: Recently, user-generated content in social media opened up new alluring possibilities for understanding the geospatial aspects of many real-world phenomena. Yet, the vast majority of such content lacks explicit, structured geographic information. Here, we describe the design and implementation of a novel approach for associating geographic information to text documents. GSP exploits powerful machine learning algorithms on top of the rich, interconnected Linked Data in order to overcome limitations of previous state-of-the-art approaches. In detail, our technique performs semantic annotation to identify relevant tokens in the input document, traverses a sub-graph of Linked Data for extracting possible geographic information related to the identified tokens, and optimizes its results by means of a Support Vector Machine classifier. We compare our results with those of 4 state-of-the-art techniques and baselines, on ground-truth data from 2 evaluation datasets. Our GSP technique achieves excellent performances, with the best F1 = 0.91, sensibly outperforming benchmarked techniques that achieve F1 < 0.78.
Keywords: Geoparsing; machine learning; linked data; Twitter