resources

Broadcast News Arabic Text to Speech

Abstract: 

 

Several high-resource Text to Speech (TTS) systems currently produce natural, well-established human-like speech. In contrast, low-resource languages, including Arabic, have very limited TTS systems due to the lack of resources. We propose a method for building TTS in such a low-resource scenario, including data collection and pre-training/fine-tuning strategies for TTS training, using broadcast news as a case study. We propose to adopt a fine-tuning strategy on top of a pre-trained Tacotron2 English model with one hour broadcast recording. We further build a FastSpeech2-based Conformer model by using this fine-tuned Arabic TTS model as a teacher model. Our objective evaluation shows 3.9% character error rate (CER), while the ground truth has 1.3% CER. As for the subjective evaluation, where 0 is bad and 5 is excellent, our FastSpeech2-based Conformer model by using this fine-tuned Arabic TTS model as a teacher model achieved a mean opinion score (MOS) of 4.4 for intelligibility and 4.2 for naturalness.

 

Model list

 

  • Groundtruth: Natural speech
  • FastSpeech2 with finetuned Transformer as the teacher model with vowelization and reduction factor = 1
  • FastSpeech2 with finetuned Transformer as the teacher model with vowelization and reduction factor = 3
  • FastSpeech2 with finetuned Transformer as the teacher model without vowelization and reduction factor = 1
  • FastSpeech2 with finetuned Transformer as the teacher model with vowelization, with PWG and reduction factor = 1
وأشكر ضيفنا في الأستوديو الكاتب الصحفي الأستاذ محمد القدوسي
من خلال أربعة محاور رئيسية
References & Code 
Coming Soon...

Author


massa