Two papers from our group have been accepted at the Ninth Italian Conference on Computational Linguistics (CLIC-it 2023) that will take place from November 30th to December 2nd 2023. See you in Venice!
G. Valer, A. Ramponi and S. Tonelli: When You Doubt, Abstain: A Study of Automated Fact-Checking in Italian Under Domain Shift
Abstract: Data for building fact-checking models for Italian is scarce, often contains ambiguous claims, and lacks textual diversity. This makes it hard to reliably apply such tools in the real world to support fact-checkers’ work. In this paper, we propose a categorization of claim ambiguity and label the largest Italian test set based on it. Moreover, we create challenge sets across two axes of variation: genres and fact-checking sources. Our experiments using transformer-based semantic search show a large drop in performance under domain shift, and indicate the benefit of models’ abstention in case of lacking evidence.
V. Frasnelli and A. Palmero Aprosio: A preliminary release of the Italian Parliamentary Corpus
Abstract: Political debates have been used for years in political and social studies on languages and their cultures. In this paper, we release a preliminary version of the Italian Parliamentary Corpus, a dataset containing 1.2 billion words that includes the political debates in the Italian Parliament from 1848 to 2018. The data has been collected applying an Optical Character Recognition (OCR) software to the original documents, available in PDF format on the websites of Camera dei Deputati and Senato della Repubblica.