Speech Resources

Aiming to create a difference in the Arabic speech resource availability

We share, support and encourage everyone to contribute to Arabic speech resources Numerous efforts have been given to produce spoken Arabic data set resources. From CallHome task (1996/97 NIST benchmark) to the Global Autonomous Language Exploitation (GALE) [2006-2009], many resources have been created.

Hours of Arabic Data
Hours of ASR Data
Hours of ADI Data

ASR Resources

QASR Dataset

2000 hours

MGB-2 Dataset

1200 hours

MGB-3 Dataset

16 hours

MGB-5 Dataset

62 hours

ESCWA-CS Dataset

2.8 hours

DACS Dataset

2 hours

Arabic Dialect Identification Resources

MGB-3 [ADI-5]

50 hours

MGB-5 [ADI-17]

3000 hours