Keyphrase Digger (KD) is a rule-based system for keyphrase extraction. It is a Java re-implementation of KX tool (Pianta and Tonelli, 2010) with a new architecture and new features. KD combines statistical measures with linguistic information given by PoS patterns to identify and extract weighted keyphrases from texts.

Main Features:

• Extraction of multi-words

• Multilinguality (EN, IT, and DE)

• Easily extendible to other languages

• Higher customizability than KX

• High processing speed

• Clustering of keyphrases under the same lemma

• Various accepted formats and PoS tagsets: Stanford PoS Tagger (EN), TreeTagger (IT and EN), TextPro (IT and EN)

• Boost of specific PoS patterns

• Integration of Apache Lucene Library

Reference:

Moretti, G., Sprugnoli, R., Tonelli, S. “Digging in the Dirt: Extracting Keyphrases from Texts with KD“. In Proceedings of the Second Italian Conference on Computational Linguistics (CLiC-it 2015), Trento, Italy.

DOWNLOAD KD SOFTWARE PACKAGE.

[Current release v1.2: German added + new function to add a new language + bug fixes.] 

TRY THE ONLINE DEMO.