With this script it is possible to define the topics discussed in a document or corpus starting from a list of key-concepts (single or multi-tokens). The script takes as input one or more lists of key-concepts, and returns them organized into clusters of related items that can represent topics.
The code for the clustering can be found at: https://github.com/dhfbk/keyphrase_clustering
The script is tested on a corpus of U.S. political manifestos, available, along with its annotations, here (the data is licensed under a Creative Commons Attribution 4.0 International License).
If you use this script, please cite the following paper, where you can find more details:
Stefano Menini, Federico Nanni, Simone Paolo Ponzetto and Sara Tonelli. “Topic-Based Agreement and Disagreement in US Electoral Manifestos”, To be presented at EMNLP 2017