Your address will show here +12 34 56 78

QCRI Aljazeera Speech Resources: QASR



DATA DOWNLOAD will be available soon. Thank you for your interest 


QASR is, till today, the largest transcribed Arabic speech corpus with around 2, 000 hours with multi-layer annotation, in multi-dialect and code-switching speech. The data is crawled from the Aljazeera news channel with lightly supervised transcriptions and linguistically motivated segmentation. QASR is suitable for training and evaluating speech recognition systems, acoustics- and/or linguistics-based Arabic dialect identification, punctuation restoration, speaker identification, speaker linking, and potentially other NLP modules for spoken data
Some benchmark results using QASR data can be found in [1] [2].
More details about the QASR can be found [1].