Egyptian ASR and Five-classes ADI Challenge: MGB-3
The third edition of the Multi-Genre Broadcast (MGB-3 speech recognition challenge in the wild) is an evaluation of speech recognition and five Arabic dialect identification using youtube recordings in dialectal Arabic.
The MGB-3 is using 16 hours of multi-genre data collected from different YouTube channels.
In 2017, the challenge featured two new Arabic tracks based on TV data from Aljazeera as well as youtube recordings. It was an official challenge at the 2017 IEEE Automatic Speech Recognition and Understanding Workshop.
Five classes Arabic Dialect Identification (ADI5)
In this task, participants will be supplied with more than 50 hours labeled for each dialect. This will be divided across the five major Arabic dialects; Egyptian (EGY), Levantine (LAV), Gulf (GLF), North African (NOR), and Modern Standard Arabic (MSA). Participants are encouraged to use the 10 hours per dialect to label more data from both the MGB-2 and MGB-3 data. Dialectal data and baseline code will be shared on QCRI dialect ID Github. The overall accuracy has been used for the evaluation criteria across the five dialects.