ESCWA: Cross-lingual Code Switching Corpus
DATA DOWNLOAD will be available from the end of August 2021. Thank you for your interest
Collected over two days of meetings of the United Nations Economic and Social Commission for West Asia (ESCWA) in 2019. The data includes intrasentential code alternation between Arabic and English. In the case of Algerian, Tunisian, and Moroccan native speakers, the switch is between Arabic and French.
The 2.8 hours ESCWA includes dialectal Arabic, with a Code Mixing Index (CMI) of ~28%. The waveform (Audio) of the data is released with the paper and the WER can be calculated using our Codalab (Coming Soon)
More details about the ESCWA can be found here.
- Aalto System for the 2017 Arabic Multi-genre Broadcast Challenge.
- QCRI Advanced Transcription System (QATS) For The Arabic Multi-Dialect Broadcast Media Recognition: MGB-2 Challenge
- The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition