Valerio Basile (University of Turin) will give a talk with the title “It’s the End of the Gold Standard as we Know it”
Supervised machine learning, in particular in Natural Language Processing, is based on the creation of so-called “gold standard” datasets for training and benchmarking. The usual annotation methodologies work well for traditionally relevant tasks in Computational Linguistics. However, critical issues are surfacing when applying old techniques to the study of highly subjective phenomena such as irony and sarcasm, or abusive and offensive language.
I am calling for a paradigm shift, away from monolithic, majority-aggregated gold standards, and towards an inclusive framework that preserves the personal opinions and culturally-driven perspectives of the annotators. New training sets and supervised machine learning techniques will have to be adapted in order to create fair, inclusive,and ultimately more informed models of subjective semantic and pragmatic phenomena.
My arguments are backed by experiments showing the lack of correlation between the difficulty of an annotation task, its degree of subjectivity, and the quality of the predictions. Furthermore, I show the potential of the proposed framework towards explaining the decision of a supervised NLP classifier.
Valerio Basile is an Assistant Professor in the Content-centered Computing group at University of Turin, and a member of the Hate Speech Monitoring group. He received his PhD in 2015 from the University of Groningen with a thesis on natural language generation. His work spans across several areas such as: formal representations of meaning, linguistic annotation, natural language generation, commonsense knowledge, semantic parsing, sentiment analysis, and hate speech detection. More recently, his attention focuses on perspectives and bias in supervised machine learning, from data creation to system evaluation.