Three papers have been accepted at the 9th Language Resources and Evaluation Conference, 26-31 May 2014, Reykjavik, Iceland.
- “CROMER: A Tool for Cross-Document Event and Entity Coreference” by Christian Girardi, Manuela Speranza, Rachele Sprugnoli and Sara Tonelli has been accepted as poster + demo presentation
In this paper we present CROMER (CROss-document Main Events and entities Recognition), a novel web-based tool to manually annotate event and entity coreference across clusters of documents. The tool has been developed so as to handle large collections of documents, perform collaborative annotation (several annotators can work on the same clusters), and enable the linking of the annotated data to external knowledge sources. Given the availability of semantic information encoded in Semantic Web resources, this tool is designed to support annotators in linking entities and events to DBPedia and Wikipedia, so as to facilitate the automatic retrieval of additional semantic information. In this way, event modelling and chaining is made easy, while guaranteeing the highest interconnection with external resources. For example, the tool can be easily linked to event models such as the Simple Event Model [Van Hage et al, 2011] and the Grounded Annotation Framework [Fokkens et al. 2013].
- “Crowdsourcing for the identification of event nominals: an experiment” by Rachele Sprugnoli and Alessandro Lenci has been accepted as oral presentation
This paper describes the design process and presents the results of a crowdsourcing experiment on the recognition of Italian event nominals. The research question that inspired this experiment is: is it possible to assign the task of the annotation of event nominals within Italian texts to non-experts using crowdsourcing as a promising alternative solution to the employment of well-trained annotators? In the experiment the gold standard quality assurance mechanism of CrowdFlower has been adopted, multiple judgments from different workers requested and a comparison with expert annotation on the same task performed. The final results demonstrate that the use of crowdsourcing is not always optimal or trivial for complex linguistic tasks but, on the other hand, the use of non-expert contributors allowed to evaluate how much the recognition of event nominals is intuitive, what are the most ambiguous classes of polysemy and the most useful syntagmatic cues to be used to identify the eventive reading of nominals.
- “A SICK cure for the evaluation of compositional distributional semantic models” by Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, Roberto Zamparelli has been accepted as poster
Recently, several compositional extensions of DSMs (Compositional DSMs, or CDSMs) have been proposed, with the purpose of representing the meaning of phrases and sentences by composing the distributional representations of the words they contain. SICK (Sentences Involving Compositional Knowledge) is a data set of sentence pairs that are rich in the lexical, syntactic and semantic phenomena that CDSMs are expected to account for, but do not require dealing with other aspects of existing sentential data sets (multiword expressions, named entities, telegraphic language) that are not within the domain of compositional distributional semantics. The SICK data set consists of 10,000 English sentence pairs, each annotated for relatedness in meaning and for the entailment relation between the two elements of the pair. The sentences were built starting from a subset of two existing sets: the 8K ImageFlickr data set (http://nlp.cs.illinois.edu/HockenmaierGroup/data.html) and the SEMEVAL-2012 STS Video Descriptions data set (http://www.cs.york.ac.uk/semeval2012/task6/index.php?id=data).