The paper  “The Content Types Dataset: a New Resource to Explore Semantic and Functional Characteristics of Texts” authored by Rachele Sprugnoli, Tommaso Caselli, Sara Tonelli and Giovanni Moretti has been accepted as a short paper at the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017).

Abstract:

This paper presents a new resource, called Content Types Dataset, to promote the analysis of texts as a composition of units with specific semantic and functional roles. By developing this dataset we also introduce a new NLP task for the automatic classification of Content Types. The annotation scheme and the dataset, available online, are described together with two sets of classification experiments.