Arabic Dialect Identification (ADI)

Arabic is the official language for 22 countries with mutually unintelligible spoken dialects. These dialects are the mother-tongues for many individuals in Arabic countries. It is also referred to as Colloquial/Conversational Arabic because of its usage in conversation and spoken form only. Whereas Modern Standard Arabic (MSA) is used as the formal language of communication. Unlike MSA, the dialects are not formally learned in school rather passed down from generation to generation. Because of this, dialectal Arabic does not have a standardized writing system, making it challenging to model and a necessity for the component to Arabic speech pipelin1.

Dialect identification is a specialized case of Language identification (LID) problem and is arguably more challenging than LID since it consists of identifying the different dialects within the same language class. There are four major dialectal groups for Arabic, including Egyptian, Gulf, Levantine and North African dialects spoken in these Arabic regions. Thus, automatically identifying the input dialect from the speech signal has been an interesting research problem both on its own and to improve dialectal automatic speech recognition (ASR)1.

Approaches applied to Arabic dialect identification (ADI) are closely related to those of language recognition. These include Gaussian mixture models, the phonotactic approach and phone recognition2, the i-vector combined with dimensionality reduction 3, and more recently deep learning techniques like end-to-end models 4 5 6 7. Arabic dialect identification has also been closely associated with improving dialectal Arabic ASR, interesting work has been done in the context of the GALE project 8 and recent thesis 9.

In spite of these advances, Arabic dialect recognition remains a challenging problem, and several special sessions and contests have been organized around the subject10. These include good pointers to many techniques and datasets. Also, there are various repositories, given below, that can be a good start for having an experimental setup.

References:

Footnotes

  1. A. Ali, et al. "Automatic dialect detection in Arabic broadcast speech." in Interspeech 2016. 2
  2. Marc A. Zissman, “A comparison of four approaches to automatic language identification of telephone speech,” in IEEE Transactions on Speech and Audio Processing, vol. 4, no. 1, Jan 1996.
  3. N. Dehak, P.A. Torres-Carrasquillo, D. Reynolds and R. Dehak, “Language recognition via i-vectors and dimensionality reduction,” in Interspeech 2011.
  4. O. Ghahabi, A. Bonafonte, J. Hernando and A. Moreno, “Deep neural networks for i-vector language identification of short utterances in cars,” in Interspeech 2016.
  5. S. Shon, A. Ali, and J. Glass. "Convolutional Neural Network and Language Embeddings for End-to-End Dialect Recognition." Proc. Odyssey 2018 The Speaker and Language Recognition Workshop. 2018.
  6. Shon, Suwon, et al. "ADI17: A Fine-Grained Arabic Dialect Identification Dataset." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.
  7. Lindgren, Matias. "Deep learning for spoken language identification." (2020).
  8. F. Biadsy, J. Hirschberg and N. Habash, “Spoken Arabic dialect identification using phonotactic modeling, in Proceedings of EACL workshop on computational approaches to Semitic languages, 2009.
  9. A. Ali. Multi-dialect Arabic broadcast speech recognition. PhD thesis, The University of Edinburgh, 2018.
  10. Zampieri, Marcos, et al. "Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign." Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects. Association for Computational Linguistics, 2018.