You are here


We propose a method for creating co-word networks starting from keyphrases automatically extracted from the full text of scientific papers.
We release the following resources, available at this link :

  • an R script to convert the output of KD in a list of undirected edges that can be imported in Gephi;

We mapped the persecution of 202 persons of Trentino that were deported to the 3rd Reich camps during the Second World War, giving access to the corresponding dataset both in RDF and through an interactive visualization.

With this script it is possible to define the topics discussed in a document or corpus starting from a list of key-concepts (single or multi-tokens). The script takes as input  one or more lists of key-concepts, and returns them organized into clusters of related items that can represent topics.

The code for the clustering can be found at:

MARTIN (Monitoring and Analysing Real-time Tweets in Italian Natural language) is a stand-alone application to:

  • Scan real-time information on Twitter
  • Compare tweets by pairs of Twitter users
  • Analyse the language of tweets
  • Visualise the output of the analyses

A first prototype of MARTIN won the second prize at the IBM Watson Services Challenge organized in the context of EVALITA 2016.

The LOD NAVIGATOR takes in input the data made available by the Contemporary Jewish Documentation Center (CDEC) in Milan, in collaboration with regesta.exe, and published in Linked Open Data format to show the movements of the Italian victims of Shoah. To this end, we used the SPARQL endpoint to collect biographical data together with information about the persecution and deportation of each victim.

RAMBLE ON is a new project aimed at analysing the mobility of past famous individuals by using Natural Language Processing modules applied to unstructured texts, more specifically to biographies in the English Wikipedia.

The current release (January 2017) includes:

Tint (The Italian NLP Tool) is an open-source Java-based pipeline for Natural Language Processing (NLP) in Italian.

Tint is based on Stanford CoreNLP, and can be used as a stand-alone tool, included as a Java library, or as a REST API service. It is also deployed on Maven Central, therefore it can be easily integrated in an existing project.

L-KD is a tool that relies on available linguistic and knowledge resources to perform keyphrase clustering and labelling. The aim of L-KD is to help finding and tracing themes in English and Italian text data, represented by groups of keyphrases and associated domains.

In this work we build upon the linguistic annotation work of Mirko Tavoni of Dante's corpus to develop a Part of Speech Tagger (PoS) of XIII century Italian language.

The objective of the work is twofold:

Keyphrase Digger (KD) is a rule-based system for keyphrase extraction. It is a Java re-implementation of KX tool (Pianta and Tonelli, 2010) with a new architecture and new features. KD combines statistical measures with linguistic information given by PoS patterns to identify and extract weighted keyphrases from texts.