Six papers have been accepted at LREC 2016 – the 10th Language Resources and Evaluation Conference, 23-28 May 2016, Portorož (Slovenia):

  • “Exposing Predicate Models as Linked Data by Extending the Lemon Model” by Francesco Corcoglioniti, Alessio Palmero Aprosio, Marco Rospocher, and Sara Tonelli.

Abstract: 

We present PREMON (predicate model for ontologies), a linguistic resource that represents predicate models according to the lemon model by the W3C Ontology-Lexica Community Group. PREMON consists of an ontology that extends lemon for modelling semantic frames, their arguments and the alignment between them, and a set of downloadable RDF datasets (work in progress) with frame and alignment data from PropBank, NomBank, VerbNet, FrameNet, SemLink, and the Predicate Matrix, publicly exposed as Linked Open Data.

Full paper: http://www.lrec-conf.org/proceedings/lrec2016/pdf/854_Paper.pdf

  • “Temporal Information Annotation: Crowd vs. Experts” by Tommaso Caselli, Rachele Sprugnoli and Oana Inel.

Abstract: 

This paper reports on a set of on-going crowdsourcing experiments on temporal information annotation. These experiments aim at comparing the results of experts with those of the crowd in the annotation of temporal expressions, events, and temporal relations. The goal is to gain better insights on Temporal Processing by comparing expert and crowd data. The results could suggest changes in the annotation procedures and re-design of the overall task in a way which is more similar to how people perceive and process complex linguistic phenomena such as the reference to time, events and temporal relations.

Full paper: http://www.lrec-conf.org/proceedings/lrec2016/pdf/966_Paper.pdf

Abstract: 

In this paper we present the evaluation of two Italian Question Answering systems on a specific domain, i.e. the Italian history of the first half of the XX century, and we report on the retraining of on of these systems so to improve its precision and make it an effective tool to be integrated in a real application. In addition, we describe the question set and the dataset of question-answer pairs used to retrain and test the system and the process followed to create such resources. The question set and the dataset will be freely available to the research community.

Full paper: http://www.lrec-conf.org/proceedings/lrec2016/pdf/1050_Paper.pdf

Abstract:

In this work we present the first attempt to apply state-of-the-art Natural Language Processing (NLP) techniques to the outcome of an online public consultation for policy making, with the goal to integrate citizens’ contribution in the Italian school reform. The work is a joint initiative among the Digital Humanites Group at Fondazione Bruno Kessler (FBK), the VU Amsterdam (VU) and the Italian Ministry of Education, Universities and Research (MIUR). The goal of the work is the automatic analysis of linguistic data contained in the answers given by the participants to the public consultation “La Buona Scuola” (commonly referred to in social media as #labuonascuola).

Full paper: http://www.lrec-conf.org/proceedings/lrec2016/pdf/473_Paper.pdf

  • “D(H)ante: A new set of tools for XIII century Italian” by Angelo Basile and Federico Sangati.

Abstract: 

In this paper we described the process of transforming the annotated corpus of DanteSearch into a format which is more suitable for developing NLP applications. The objective of the work is twofold: (1) to provide the NLP community with a tool to perform automatic processing of ancient text and (2) to provide the literature community with more powerful tools for simplifying the annotation process and performing more advanced data analysis. 

Full paper: http://www.lrec-conf.org/proceedings/lrec2016/pdf/412_Paper.pdf

  • “PARSEME Survey on MWE Resources” by Gyri S. Losnegaard, Federico Sangati, Carla Parra Escartín, Agata Savary, Sascha Bargmann and Johanna Monti.

Abstract: 

This paper summarizes the preliminary results of an ongoing survey on multiword resources carried out within the IC1207 Cost Action PARSEME (PARSing and Multi-word Expressions). Despite the availability of language resource catalogs and the inventory of multi-word datasets on the SIGLEX-MWE website, multiword resources are scattered and difficult to find. In many cases, language resources such as corpora, treebanks, or lexical databases include multiwords as part of their data or take them into account in their annotations. However, these resources need to be centralized to make them accessible. The aim of this survey is to create a portal where researchers can easily find multiword(-aware) language resources for their research. We report on the design of the survey and analyze the data gathered so far. We also discuss the problems we have detected upon examination of the data as well as possible ways of enhancing the survey.

Full paper: http://www.lrec-conf.org/proceedings/lrec2016/pdf/718_Paper.pdf