WhatsApp Dataset on Cyberbullying

We developed a WhatsApp dataset to study cyberbullying among Italian students aged 12-13 in the context of the CREEP EIT project.

The corpus of Whatsapp chats is made of 14,600 tokens divided in 10 chats. All the chats have been annotated by two annotators using the CAT web-based tool following the same guidelines.

Our guidelines are an adaptation to Italian of the Guidelines for the Fine-Grained Analysis of Cyberbullying developed for English by the Language and Translation Technology Team of Ghent University. With respect to the original guidelines, we added a new type of insult called “Body Shame” to cover expressions that criticize someone based on the shape, size, or appearance of his/her body. We have also changed the original type Encouragement to the Harasser into Encouragement to the Harassment, so to include all the incitements between the bully and his/her assistants.

The dataset and the guidelines are available on our GitHub page: https://github.com/dhfbk/WhatsApp-Dataset

Reference:

Rachele Sprugnoli, Stefano Menini, Sara Tonelli, Filippo Oncini, Enrico Maria Piras. 2018. “Creating a WhatsApp Dataset to Study Pre-teen Cyberbullying“. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2).

Recent Posts