Focus of this task is entirely on subjective tasks, where training with aggregated labels makes much less sense. To this end, the task organisers created a benchmark of four (textual) datasets with different characteristics, in terms of genres (social media and conversations), languages (English and Arabic), tasks (misogyny, hate speech, offensiveness detection) and annotations’ methodology (experts, specific demographic groups, AMT-crowd). But all datasets providing a multiplicity of labels for each instance.
DH Group at SemEval 2023
Thursday, 13 July 2023 9:00 to Friday, 14 July 2023 18:00