You are here

Two papers accepted at DH2019

News date: 
Friday, 1 March, 2019

We have two accepted papers to be presented at the Digital Humanities Conference 2019 (DH 2019) in Utrecht, 9-12 July 2019:

"Word Embeddings for Processing Historical Texts”  by Rachele Sprugnoli and Giovanni Moretti  

Abstract

In the last years, word embeddings have become important resources to deal with many Natural Language Processing tasks. Several pre-trained word vectors have been released starting from huge amount of contemporary texts. The interest towards this type of distributional approach has recently emerged also in the Digital Humanities community with studies on vectors built from historical or literary texts and employed to track semantic shifts. This submission aims at expanding current research on historical word embeddings by presenting a set of English vectors pre-trained on a corpus of texts published between 1860 and 1939 with three different algorithms. These embeddings have been used to train a new model for the identification of place names in historical texts achieving very satisfactory results in terms of precision, recall and f-measure. 

"Computer-Assisted Curation of Digital Cultural Heritage Resources”  by Matteo Lorenzini, Marco Rospocher and Sara Tonelli

Abstract

The objective of metadata curatorship is to ensure that users can effectively and efficiently access objects of interest from a repository, digital library, catalogue, etc. using well-assigned metadata values aligned with an appropriately chosen schema. However, we are often facing problems related to the low quality of metadata used for the description of digital resources, for example wrong definitions, inconsistencies, or resources with incomplete descriptions. There may be many reasons for that, all completely valid, e.g, in many cases those who host a digital repository have few human resources to work on improving metadata, and often data providers are not themselves the metadata creators. In this paper we present our ongoing work aiming at defining computable metrics to assess metadata quality and automatize metadata quality check process.