Sara Tonelli will give an invited talk at the FAIR Heritage conference on Digital Methods, Scholarly Editing and Tools for Cultural and Natural Heritage, which was supposed to take place in Tours in June 16-17. The event will be online, and registration is free.
Title & abstract of the talk:
“What NLP can do for Metadata Quality: The Case of Descriptions in Cultural Heritage Records”
Metadata allow the access to a wide variety of cultural heritage resources made available through repositories, digital libraries and catalogues. Usually taking the form of a structured set of descriptive elements, metadata assist in the identification, location, processing, tracking, preserving, sharing and retrieval of information, while facilitating content and access management. However, low metadata quality, such as incorrect information provided by textual record elements or inconsistency is still an open issue in many repositories and the manual evaluation of each record is not affordable, even for middle-sized collections. Recent advances in machine learning and natural language processing, however, can greatly support curators in checking metadata quality when text is present, since they enable to automatically evaluate several records in a short time, highlighting lacking information in records and providing aggregate statistics on entire collections. In this talk, I will present ongoing work aiming at automatizing metadata quality analysis using a machine learning approach. The preliminary results obtained on descriptions of visual artworks, archaeology and architecture extracted from the Italian digital library Cultura Italia are promising, and show that supervised machine learning can be effectively used to assist curators in assessing the content of descriptions, significantly reducing the time needed to check metadata quality.