Our group has five papers accepted at CLIC-it 2019, the Sixth Italian Conference on Computational Linguistics that will take place in Bari, 13-15 November 2019:

1. “Annotation and analysis of the PoliModal corpus of political interviews” authored by Daniela Trotta, Sara Tonelli, Alessio Palmero Aprosio and Annibale Elia


In this paper, we present the first available corpus of Italian political interviews with multimodal annotation, consisting of 56 face-to-face interviews taken from a political talk show. We detail the annotation scheme and we present a number of statistical analyses to understand the relation between these multimodal traits and language complexity. We also exploit the corpus to test the validity of existing studies on political orientation and language use, showing that results on our data are not as clear-cut as on English ones.

2.Automated Short Answer Grading: A Simple Solution for a Difficult Task” authored by Stefano Menini, Sara Tonelli, Giovanni De Gasperis and Pierpaolo Vittorini


The task of short answer grading is aimed at assessing the outcome of an exam by automatically analysing students’ answers in natural language and deciding whether they should pass or fail the exam. In this paper, we tackle this task training an SVM classifier on real data taken from a University statistics exam, showing that simple concatenated sentence embeddings used as features yield results around 0.90 F1. We also release the dataset, that to our knowledge is the first freely available dataset of this kind in Italian.

3. “Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain” authored by Sara Tonelli, Rachele Sprugnoli and Giovanni Moretti


In this paper we present a multi-genre corpus spanning 50 years of European history. It contains a comprehensive collection of Alcide De Gasperi’s public documents, 2,762 in total, written or transcribed between 1901 and 1954. The corpus comprises different types of texts, including newspaper articles, propaganda documents, official letters and parliamentary speeches. The corpus is freely available and includes several annotation layers, i.e. key-concepts, lemmas, PoS tags, person names and geo-referenced places, representing a high-quality ‘silver’ annotation. We believe that this resource can foster research in historical corpus analysis, stylometry and computational social science, among others.

4. “Cross-Platform Evaluation for Italian Hate Speech Detection” authored by Michele Corazza, Stefano Menini, Elena Cabrio, Sara Tonelli and Serena Villata


Despite the number of approaches recently proposed in NLP for detecting abusive language on social networks, the issue of developing hate speech detection systems that are robust across different platforms is still an unsolved problem. In this paper we perform a comparative evaluation on datasets for hate speech detection in Italian, extracted from four different social media platforms, i.e. Facebook, Twitter, Instagram and WhatsApp. We show that combining such platform-dependent datasets to take advantage of training data developed for other platforms is beneficial, although their impact varies depending on the social network under consideration.

5. “BullyFrame: Cyberbullying meets FrameNet” authored by Silvia Brambilla, Alessio Palmero Aprosio e Stefano Menini


This paper presents BullyFrame, a dataset of cyberbulling interactions collected from WhatsApp conversations in Italian and annotated with FrameNet semantic frames. We will describe the creation of the dataset discussing the problematic aspects found in the annotation process, such as the lack of coverage of FrameNet for the annotation of texts extracted from social media. Finally, we present a preliminary study that describes the relations between the frames and the cyberbullying-related annotation of the original dataset.

We will also present our work “Novel Event Detection and Classification for Historical Texts” authored by Rachele Sprugnoli and Sara Tonelli and published this year in Computational Linguistics, as a research communication paper.