The present resource is about the automatic identification of English-Italian code-mixing in English historical travel writings about Italy. We release:

This resource is available on our github page:

When using this resource, please cite:

A little bit of bella pianura: Detecting Code-Mixing in Historical English Travel Writing“, by  Rachele Sprugnoli, Sara Tonelli, Giovanni Moretti and Stefano Menini. In Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017). Rome. (PDF)

[1] King, Ben, and Steven P. Abney. “Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods.” In HLT-NAACL, pp. 1110-1119. 2013.

[2] Schulz, Sarah, and Mareike Keller. “Code-switching ubique est-language identification and part-of-speech tagging for historical mixed text.” Proc. of LaTeCH (2016).