KD: Keyphrase Digger

Keyphrase Digger (KD) is a rule-based system for keyphrase extraction. It is a Java re-implementation of KX tool (Pianta and Tonelli, 2010) with a new architecture and new features. KD combines statistical measures with linguistic information given by PoS patterns to identify and extract weighted keyphrases from texts.

Main Features:

Extraction of multi-words

Multilinguality (EN, IT, and DE)

Easily extendible to other languages

Higher customizability than KX

High processing speed

Clustering of keyphrases under the same lemma

Various accepted formats and PoS tagsets: Stanford PoS Tagger (EN), TreeTagger (IT and EN), TextPro (IT and EN)

Boost of specific PoS patterns

Integration of Apache Lucene Library

Reference:

Moretti, G., Sprugnoli, R., Tonelli, S. “Digging in the Dirt: Extracting Keyphrases from Texts with KD“. In Proceedings of the Second Italian Conference on Computational Linguistics (CLiC-it 2015), Trento, Italy.

DOWNLOAD KD SOFTWARE PACKAGE.

[Current release v1.2: German added + new function to add a new language + bug fixes.]

TRY THE ONLINE DEMO.