With this script it is possible to define the topics discussed in a document or corpus starting from a list of key-concepts (single or multi-tokens). The script takes as input  one or more lists of key-concepts, and returns them organized into clusters of related items that can represent topics.

The code for the clustering can be found at:

MARTIN (Monitoring and Analysing Real-time Tweets in Italian Natural language) is a stand-alone application to:

  • Scan real-time information on Twitter
  • Compare tweets by pairs of Twitter users
  • Analyse the language of tweets
  • Visualise the output of the analyses

A first prototype of MARTIN won the second prize at the IBM Watson Services Challenge organized in the context of EVALITA 2016.

The LOD NAVIGATOR takes in input the data made available by the Contemporary Jewish Documentation Center (CDEC) in Milan and published in Linked Open Data format to show the movements of the Italian victims of Shoah. To this end, we used the SPARQL endpoint to collect biographical data together with information about the persecution and deportation of each victim.

RAMBLE ON is a new project aimed at analysing the mobility of past famous individuals by using Natural Language Processing modules applied to unstructured texts, more specifically to biographies in the English Wikipedia.

The current release (January 2017) includes:

Tint (The Italian NLP Tool) is an open-source Java-based pipeline for Natural Language Processing (NLP) in Italian.

Tint is based on Stanford CoreNLP, and can be used as a stand-alone tool, included as a Java library, or as a REST API service. It is also deployed on Maven Central, therefore it can be easily integrated in an existing project.

L-KD is a tool that relies on available linguistic and knowledge resources to perform keyphrase clustering and labelling. The aim of L-KD is to help finding and tracing themes in English and Italian text data, represented by groups of keyphrases and associated domains.

In this work we build upon the linguistic annotation work of Mirko Tavoni of Dante's corpus to develop a Part of Speech Tagger (PoS) of XIII century Italian language.

The objective of the work is twofold:

Keyphrase Digger (KD) is a rule-based system for keyphrase extraction. It is a Java re-implementation of KX tool (Pianta and Tonelli, 2010) with a new architecture and new features. KD combines statistical measures with linguistic information given by PoS patterns to identify and extract weighted keyphrases from texts.

CAT, the Content Annotation Tool (formerly known as CELCT Annotation Tool), is a general-purpose web-based text annotation tool.