In the context of a project funded by IPRASE (Istituto provinciale per la Ricerca e la Sperimentazione educativa), we release embeddings and n-grams derived from a large corpus of essays. In particular, we have analysed more than 2,500 essays written by students from different high-schools in the Autonomous Province of Trento during the exit exam (the so-called Maturità).
- WORD VECTORS: we built the embeddings with 300 dimensions following three different algorithms: the GloVe algorithm is based on linear bag-of-words contexts, Levy and Goldberg‘s code on dependency parse-trees, whereas fastText takes into account on a bag of character n-grams . These pre-trained word embeddings are available in text format and also visualized through a dedicated stand-alone version of the TensorFlow embedding projector: http://dhlab.fbk.eu/TemiVectors/.
- N-GRAMS: we generated both case-sensitive and case-insensitive sequences per school year, considering the range [1,5].
These resources are available in a shared Drive folder: https://drive.google.com/drive/folders/1BZ8JmuLPQqFX859JLUjEpnAPv8o3BCci?usp=sharing