Your address will show here +12 34 56 78

Egyptian ASR and Five-classes ADI Challenge: MGB-3

The third edition of the Multi-Genre Broadcast (MGB-3 speech recognition challenge in the wild) is an evaluation of speech recognition and five Arabic dialect identification using youtube recordings in dialectal Arabic.
The MGB-3 is using 16 hours of multi-genre data collected from different YouTube channels. 
In 2017, the challenge featured two new Arabic tracks based on TV data from Aljazeera as well as youtube recordings. It was an official challenge at the 2017 IEEE Automatic Speech Recognition and Understanding Workshop.
More details about the MGB-3 can be found here. An overview paper can be found here.

Egyptian Arabic Automatic Speech Recognition

The Arabic track for the 2017 multi-dialect multi-genre evaluation (speech recognition in the wild) is an extension of the 2016 evaluation (MGB-2).
In addition to the 1,200 hours used in 2016 from Aljazeera TV programs, the MGB-3 explores multi-genre data; comedy, cooking, cultural, environment, family-kids, fashion, movies-drama, sports, and science talks (TEDX).
The MGB-3 Arabic data comprises 16 hours multi-genre data collected from different YouTube channels. The 16 hours have been manually transcribed. The chosen Arabic dialect for this year is Egyptian. Given that dialectal Arabic has no orthographic rules, each program has been transcribed by four different transcribers using this transcription guidelines. The MGB-3 data is split into three groups; adaptation, development, and evaluation data.
Given that dialectal Arabic does not have a clearly defined orthography, different people tend to write the same word in slightly different forms. Therefore, instead of developing strict guidelines to ensure a standardized orthography, variations in spelling are allowed. Thus multiple transcriptions were produced, allowing transcribers to write the transcripts as they deemed correct. Every file has been segmented and transcribed by four different Egyptian annotators.
The 80 YouTube clips have been manually labeled for speech, non-speech segments. About 12 minutes from each program were selected for transcription. The resulting 16 hours speech segments were then distributed into train, development, and test data sets as follows:
  • Adaptation: 12 minutes * 24 programs
  • Development: 12 minutes * 24 programs
  • Evaluation: 12 minutes * 31 programs
You can find samples here: audio, segmentation, transcription in Arabic, and transcription in Buckwalter.
You can find the MGB-5 ASR baseline system here.