ESCWA: Cross-lingual Code Switching Corpus
DATA DOWNLOAD is now available!
Collected over two days of meetings of the United Nations Economic and Social Commission for West Asia (ESCWA) in 2019. The data includes intrasentential code alternation between Arabic and English. In the case of Algerian, Tunisian, and Moroccan native speakers, the switch is between Arabic and French.
The 2.8 hours ESCWA includes dialectal Arabic, with a Code Mixing Index (CMI) of ~28%.
More details about the ESCWA can be found here.
Chowdhury, S. A., Hussein, A., Abdelali, A., & Ali, A. (2021). Towards one model to rule all: Multilingual strategy for dialectal code-switching Arabic Asr. Interspeech 2021
Ali, A., Chowdhury, S., Hussein, A., & Hifny, Y. (2021). Arabic code-switching speech recognition using monolingual data. Interspeech 2021