Our group has two accepted papers at CLIC-it 2021!
Title “It is MarkIT That is New: An Italian Treebank of Marked Constructions” authored by Teresa Paccosi, Alessio Palmero Aprosio and Sara Tonelli.
In this paper we present MarkIT, a treebank of marked constructions in Italian, containing around 800 sentences with dependency annotation. We detail the process to extract the sentences and manually correct them. The resource covers seven types of marked constructions, plus some ambiguous sentences whose syntax can be wrongly classified as marked. We also present a preliminary evaluation of parsing performance, comparing a model trained on existing Italian treebanks with the model obtained by adding MarkIT to the training set.
Title: “REDIT: a Tool and Dataset for Extraction of Personal Data in Documents of the Public Administration Domain” authored by Teresa Paccosi and Alessio Palmero Aprosio
New regulations on transparency and the recent policy for privacy force the public administration (PA) to make their documents available, but also to limit the diffusion of personal data.
The present work displays a first approach to the extraction of sensitive data from PA documents in terms of named entities and semantic relations among them, speeding up the process of extraction of these personal data in order to easily select those which need to be hidden.
We also present the process of collection and annotation of the dataset.