List of relevant publications.


Automatic Speech Recognition

  • S Young, et. al "The HTK book" https://www.danielpovey.com/files/htkbook.pdf
  • M Gales, S Young "The Application of Hidden Markov Models in Speech Recognition" 2007, https://mi.eng.cam.ac.uk/~mjfg/mjfg_NOW.pdf
  • H Bourlard, N Morgan, "Connectionist Speech Recognition: A Hybrid Approach", 1994
  • G Hinton, et al. "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The shared views of four research groups", 2012
  • A Mohamed, G Dahl, G Hinton, "Acoustic Modeling using Deep Belief Networks" 2010
  • G Dahl, D. Yu, L Deng, A Acero, "Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition" 2010
  • F Seide, G Li, X Chen, D Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription", 2011
  • A Stolcke, "SRILM - an Extensible Language Modeling Toolkit", 2002
  • T Mikolov, et. al, "RNNLM - Recurrent Neural Network Language Modeling Toolkit" 2010
  • A Graves, Navdeep Jaitly, "Towards End-To-End Speech Recognition with Recurrent Neural Networks" 2014
  • D Bahdanau, et al "End-to-End Attention-based Large Vocabulary Speech Recognition" 2015
  • W Chan, et al "Listen, Attend and Spell" 2015
  • http://kaldi-asr.org/

Arabic Dialect Identifcation

  • A Ali, et al "Automatic dialect detection in Arabic broadcast speech" in Interspeech 2016
  • Marc A Zissman, “A comparison of four approaches to automatic language identification of telephone speech,” in IEEE Transactions on Speech and Audio Processing, vol 4, no 1, Jan 1996
  • N Dehak, PA Torres-Carrasquillo, D Reynolds and R Dehak, “Language recognition via i-vectors and dimensionality reduction,” in Interspeech 2011
  • O Ghahabi, A Bonafonte, J Hernando and A Moreno, “Deep neural networks for i-vector language identification of short utterances in cars,” in Interspeech 2016
  • S Shon, A Ali, and J Glass "MIT-QCRI Arabic dialect identification system for the 2017 multi-genre broadcast challenge" Automatic Speech Recognition and Understanding Workshop (ASRU), 2017
  • M Najafian, et al "Exploiting convolutional neural networks for phonotactic based dialect identification" in ICASSP 2018
  • S Shon, A Ali, and J Glass "Convolutional Neural Network and Language Embeddings for End-to-End Dialect Recognition" Proc Odyssey 2018 The Speaker and Language Recognition Workshop 2018
  • F Biadsy, J Hirschberg and N Habash, “Spoken Arabic dialect identification using phonotactic modeling, in Proceedings of EACL workshop on computational approaches to Semitic languages, 2009
  • A Ali, Multi-dialect Arabic broadcast speech recognition PhD thesis, The University of Edinburgh, 2018
  • Zampieri, Marcos, et al "Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign" Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects Association for Computational Linguistics, 2018
  • https://githubcom/qcri/dialectID/
  • https://githubcom/swshon/dialectID_e2e
  • https://githubcom/swshon/dialectID_siam

Text To Speech

  • A Hunt and A Black, :Unit selection in a concatenative speech synthesis system using a large speech database" In ICASSP-96, volume 1, pages 373--376, Atlanta, Georgia, 1996
  • Hifny, Yasser, et al "ArabTalk®: An Implementation for Arabic Text To Speech System"The proceedings of the 4th Conference on Language Engineering 2004
  • HMM/DNN-based Speech Synthesis System (HTS), http://htsspnitechacjp/
  • Abdel-Hamid, Ossama, Sherif Mahdy Abdou, and Mohsen Rashwan "Improving Arabic HMM based speech synthesis quality" Ninth International Conference on Spoken Language Processing 2006
  • Merlin: The Neural Network (NN) based Speech Synthesis System, https://githubcom/CSTR-Edinburgh/merlin
  • Ping, Wei, et al "Deep voice 3: Scaling text-to-speech with convolutional sequence learning" (2018)
  • Wang, Yuxuan, et al "Tacotron: Towards end-to-end speech synthesis" arXiv preprint arXiv:170310135
  • https://githubcom/youssefsharief/arabic-tacotron-tts
  • Rashwan, Mohsen AA, et al "A stochastic Arabic diacritizer based on a hybrid of factorized and unfactorized textual features"IEEE Transactions on Audio, Speech, and Language Processing 191 (2011): 166-175
  • Darwish, Kareem, Hamdy Mubarak, and Ahmed Abdelali "Arabic diacritization: Stats, rules, and hacks"Proceedings of the Third Arabic Natural Language Processing Workshop 2017
  • Hifny, Yasser "Hybrid LSTM/MaxEnt Networks for Arabic Syntactic Diacritics Restoration"IEEE Signal Processing Letters 2510 (2018): 1515-1519
  • https://githubcom/nawarhalabi/Arabic-Phonetiser

Language Model

  • I Zitouni (Ed), Natural language processing of Semitic languages, theory and applications of natural language processing, Chapter 5 Springer, Berlin, Heidelberg (2014)
  • Reinhard Kneser and Hermann Ney 1995 Improved backing-off for m-gram language modeling In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 181–184
  • Ciprian Chelba and Johan Schalkwyk, 2013 Empirical Exploration of Language Modeling for the googlecom Query Stream as Applied to Mobile Voice Search, pages 197–229 Springer, New York
  • Stanley Chen and Joshua Goodman 1998 An empirical study of smoothing techniques for language modeling Technical Report TR-10-98, Harvard University, August
  • PF Brown VJ DellaPietra PV DeSouza JC Lai RL Mercer "Class-based n-gram models of natural language" Computational Linguistics vol 18 no 4 pp 467-479 1992
  • R A Solsona, E Fosler-Lussier, H J Kuo, A Potamianos and I Zitouni, "Adaptive language models for spoken dialogue systems," 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, 2002, pp I-37-I-40 doi: 101109/ICASSP20025743648
  • G Choueiter, D Povey, S F Chen and G Zweig, "Morpheme-Based Language Modeling for Arabic Lvcsr," 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, 2006, pp I-I
  • doi: 101109/ICASSP20061660205
  • K Kirchhoff, D Vergyri, J Bilmes, K Duth, A Stolcke, “Morphology-based language modeling for conversational Arabic speech recognition” Computer Speech & Language Vol 20 no 4 pp 589-608 Oct 2006
  • Mikolov, T Statistical Language Models based on Neural Networks PhD thesis, Brno University of Technology, 2012
  • W Mulder, S Bethard, MF Moens A survey on the application of recurrent neural networks to statistical language modeling Computer Speech & Language Vol 30 no 1 pp 61-98 March 2015
  • Yoon Kim, Yacine Jernite, David Sontag, and Alexander M Rush 2015 Character-Aware Neural Language Models CoRR, abs/150806615
  • Bengio, Y, Ducharme, R, Vincent, P, and Janvin, C A neural probabilistic language model The Journal of Machine Learning Research, 3:1137–1155, 2003
  • Mikolov, T, Karafi´at, M, Burget, L, Cernock`y, J, and Khudanpur, S Recurrent neural network based language model In INTERSPEECH, pp 1045–1048, 2010
  • Martin Sundermeyer, Hermann Ney, and Ralf Schlüter 2015 From feedforward to recurrent LSTM neural networks for language modeling Trans Audio, Speech and Lang Proc 23, 3 (March 2015), 517-529 DOI: https://doiorg/101109/TASLP20152400218
  • S Yousfi, SA Berrani, C Garcia Contribution of recurrent connectionist language models in improving LSTM-based Arabic text recognition in videos Pattern Recognition Vol 64 pp 245-254 April 2017
  • Devlin, J, Chang, M-W, Lee, K, and Toutanova, K (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding ArXiv e-prints
  • R Jozefowicz, O Vinyals, M Schuster, N Shazeer, ´ and Y Wu Exploring the limits of language modeling arXiv preprint, 160202410, 2016 arxivorg/abs/160202410
  • CMU Statistical Language Modeling Toolkit: http://wwwspeechcscmuedu/SLM/toolkithtml
  • HTK Toolkit: http://htkengcamacuk/downloadshtml
  • SRILM - The SRI Language Modeling Toolkit: http://wwwspeechsricom/projects/srilm/
  • Stanford CoreNLP – Natural language software: https://stanfordnlpgithubio/CoreNLP/
  • The Berkeley NLP Group: http://nlpcsberkeleyedu/softwareshtml