The master thesis written by Marta Sandri on “Offensive language detection: Analysis and Prediction of Inter-annotator Disagreement” received an honorable mention for the Emanuele Pianta best thesis prize at the Ninth Italian Conference on Computational Linguistics. Marta’s work was partially carried out in our group under the supervision of Elisa Leonardelli and Sara Tonelli. Her thesis was defended at University of Pavia, Italy, with Prof. Elisabetta Jezek as main advisor.

Thesis summary

The project aims to study inter-annotator disagreement on a dataset formed by English tweets, each of them assigned by 5 annotators to the binary label “offensive” (1) or “not offensive” (0). Therefore, the tweets are assigned to 3 categories, A++ if 5/5 annotators agreed, A+ if 4/5 annotators agreed and A0 if 3/5 annotators agreed. We created a linguistic taxonomy with classes and sub-classes containing possible source of linguistic disagreement (i.e., figurative language (Ambiguity macro class), deixis (Missing Information macro class), annotators’ personal opinions (Subjectivity macro class) and noise (Sloppy Annotation macro class) and we annotated tweets that belong to the most difficult categories, namely A+ and A0. Thus, a computational analysis was carried out on annotated data through classifiers based on neural networks (BERT), with the purpose of evaluating how source of disagreement-annotated data would affect the classifiers’ performance on an Offensive Language Detection task. The best system is trained on the complete dataset and it is tested on data from the annotators’ personal opinion source of disagreement. Indeed, Subjectivity grants the best performance of the classifier. Exploratory experiments were also carried out on a Disagreement Detection task. The best system learns both from information about disagreement and offensiveness and reaches the best results when tested on the Subjectivity macro class as well.

Part of this work was published at EACL 2023 in the paper “Why don’t you do it right? Analysing Annotators’ Disagreement in Subjective Tasks“.