In the context of a project funded by IPRASE (Istituto provinciale per la Ricerca e la Sperimentazione educativa), we release embeddings and n-grams derived from a large corpus of essays.  In particular, we have analysed more than 2,500 essays written by students from different high-schools in the Autonomous Province of Trento during the exit exam (the so-called Maturità).

  • WORD VECTORS: we built the embeddings with 300 dimensions following three different algorithms: the GloVe algorithm is based on linear bag-of-words contexts, Levy and Goldberg‘s code on dependency parse-trees, whereas fastText takes into account on a bag of character n-grams . These pre-trained word embeddings are available in text format and also visualized through a dedicated stand-alone version of the TensorFlow embedding projector:
  • N-GRAMS: we generated both case-sensitive and case-insensitive sequences per school year, considering the range [1,5].

These resources are available in a shared Drive folder: