ESCWA: Cross-lingual Code Switching Corpus
DATA DOWNLOAD is now available!
Collected over two days of meetings of the United Nations Economic and Social Commission for West Asia (ESCWA) in 2019. The data includes intrasentential code alternation between Arabic and English. In the case of Algerian, Tunisian, and Moroccan native speakers, the switch is between Arabic and French.
The 2.8 hours ESCWA includes dialectal Arabic, with a Code Mixing Index (CMI) of ~28%.
Ali, A., Chowdhury, S., Hussein, A., & Hifny, Y. (2021). Arabic code-switching speech recognition using monolingual data. Interspeech 2021