The paper “Italian Legislative Text Classification for Gazzetta Ufficiale” authored by Marco Rovera, Alessio Palmero Aprosio, Francesco Greco, Mariano Lucchese, Sara Tonelli and Antonio Antetomaso has been accepted for publication at the 5th Natural Legal Language Processing Workshop (NLLP2023)
The paper is the outcome of the two-year collaboration between our group and Istituto Poligrafico e Zecca dello Stato aimed at developing an automated system to classify Italian laws.
Abstract
This work introduces a novel, extensive annotated corpus for multi-label legislative text classification in Italian, based on legal acts from the Gazzetta Ufficiale, the official source of legislative information of the Italian state. The annotated dataset, which we released to the community, comprises over 363,000 titles of legislative acts, spanning over 30 years from 1988 until 2022. Moreover, we evaluate four models for text classification on the dataset, demonstrating how using only the acts’ titles can achieve top-level classification performance, with a micro F1-score of 0.87. Also, our analysis shows how Italian domain-adapted legal models do not outperform general-purpose models on the task. Models’ performance can be checked by users via a demonstrator system provided in support of this work.
Github page: https://github.com/dhfbk/gazzetta-ufficiale