Linguistic resource

Causal-TimeBank corpus

by Sara Tonelli | Feb 19, 2021 | Linguistic resource | 0

Starting from the TimeBank corpus, where events and their temporal relations are annotated, we manually added causal information …

Corpus of Alcide De Gasperi’s public documents

by Sara Tonelli | Jun 25, 2019 | Linguistic resource | 0

The corpus of Alcide De Gasperi’s public documents is a comprehensive collection of documents issued between 1901 and 1954, which had been previously published in four volumes by Il Mulino but were …

WhatsApp Dataset on Cyberbullying

by Rachele Sprugnoli | Sep 3, 2018 | Linguistic resource | 0

We developed a WhatsApp dataset to study cyberbullying among Italian students aged 12-13 in the context of the CREEP EIT project.The corpus of Whatsapp chats is made of 14,600 tokens divided …

Detection of place names in historical travel writings

by Rachele Sprugnoli | Jul 23, 2018 | Linguistic resource | 0

We manually annotated a corpus of 100,000 tokens taken from a collection of English travel writings (both travel reports and guidebooks) about Italy published in the second half of the XIX century and …

Corpus of English Historical Travel Writings

by Rachele Sprugnoli | May 22, 2018 | Linguistic resource | 0

WHAT: a collection of travel writings – non-fictional narratives (reports, diaries, letters) and guidebooks – about Italy written by English native authors and published between the country …

Histo: event detection and classification for the Digital Humanities

by Rachele Sprugnoli | Apr 5, 2018 | Linguistic resource | 0

We have created a github repository that contains:annotation guidelines designed to detect and classify event mentions in texts;a corpus of historical texts annotated with events (span + class) …

Political Argumentation

by Stefano Menini | Nov 15, 2017 | Linguistic resource | 0

This resource contains two datasets. Each dataset consists of pairs of arguments from Nixon’s and Kennedy’s speeches related to a topic and annotated with a relation of “attack”, …

Code-mixing

by Rachele Sprugnoli | Jul 25, 2017 | Linguistic resource | 0

The present resource is about the automatic identification of English-Italian code-mixing in English historical travel writings about Italy. We release:the domain corpus made of travel narratives and …

Content Types Dataset

by Rachele Sprugnoli | Feb 9, 2017 | Linguistic resource | 0

The Content Types Dataset is a new resource aiming at promoting the analysis of texts as a composition of units with specific semantic and functional roles. By developing this dataset we also …

SIMPITIKI Italian simplification corpus

by Sara Tonelli | Nov 9, 2016 | Linguistic resource | 0

SIMPITIKI is a Simplification corpus for Italian and it consists of two sets of simplified pairs: the first one is harvested from the Italian Wikipedia in a semi-automatic way; the second one is …

Agreement/Disagreement Datasets

by Stefano Menini | Oct 7, 2016 | Linguistic resource | 0

This resource includes three datasets. Each dataset consists of pairs of snippets related to a topic and annotated as in agreement or disagreement.The three datasets are:1960 Elections Dataset : A …

QUANDHO: QUestion ANswering Data for italian HistOry

by Rachele Sprugnoli | Mar 7, 2016 | Linguistic resource | 0

QUANDHO (QUestion ANswering Data for italian HistOry) is an Italian question answering dataset created to cover a specific domain, i.e. the history of Italy in the first half of the XX …

Recent Posts