Two papers from our group have been accepted at the Ninth Italian Conference on Computational Linguistics (CLIC-it 2023) that will take place from November 30th to December 2nd 2023. See you in Venice!

G. Valer, A. Ramponi and S. Tonelli: When You Doubt, Abstain: A Study of Automated Fact-Checking in Italian Under Domain Shift

Abstract: Data for building fact-checking models for Italian is scarce, often contains ambiguous claims, and lacks textual diversity. This makes it hard to reliably apply such tools in the real world to support fact-checkers’ work. In this paper, we propose a categorization of claim ambiguity and label the largest Italian test set based on it. Moreover, we create challenge sets across two axes of variation: genres and fact-checking sources. Our experiments using transformer-based semantic search show a large drop in performance under domain shift, and indicate the benefit of models’ abstention in case of lacking evidence.

 

V. Frasnelli and A. Palmero Aprosio: A preliminary release of the Italian Parliamentary Corpus

Abstract: Political debates have been used for years in political and social studies on languages and their cultures. In this paper, we release a preliminary version of the Italian Parliamentary Corpus, a dataset containing 1.2 billion words that includes the political debates in the Italian Parliament from 1848 to 2018. The data has been collected applying an Optical Character Recognition (OCR) software to the original documents, available in PDF format on the websites of Camera dei Deputati and Senato della Repubblica.

Italian. I dibattiti politici vengono usati da anni in studi sociali e politici sulle lingue e le loro culture. In questo articolo, rilasciamo una versione preliminare dell’Italian Parliamentary Corpus, un dataset contenente 1.2 miliardi di parole che include i dibattiti politici del Parlamento Italiano dal 1848 al 2018. I dati sono stati collezionati applicando un software di Optical Character Recognition (OCR) ai documenti originali, disponibili in formato PDF sui siti web della Camera dei Deputati e del Senato della Repubblica.