The paper “Semantic Linking for Event-Based Classification of Tweets” authored by Amosse Edouard, Elena Cabrio, Sara Tonelli and Nhan Le Thanh has been accepted for presentation at the 18th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2017).
Detecting which tweets are related to events and classifying them into categories is a challenging task due to the peculiarities of Twitter language and to the lack of contextual information. We propose to face this challenge by taking advantage of the information that can be automatically acquired from external knowledge bases. In particular, we enrich and generalise the textual content of tweets by linking the Named Entities (NE) to concepts in both DBpedia and YAGO ontologies, and exploit their specific or generic types to replace NE mentions in tweets. The approach we propose in this paper is applied to build a supervised classifier to separate event-related from non event-related tweets, as well as to associate to event-related tweets the event categories defined by the Topic Detection and Tracking community (TDT). We compare Naive Bayes (NB), Support Vector Machines (SVM) and Long Short-Term Memory (LSTM) classification algorithms, showing that NE linking and replacement improves classification performance and contributes to reducing overfitting, especially with Recurrent Neural Networks (RNN).