In an era where information accessibility is paramount, the advancements in machine learning and speech recognition technology have played a pivotal role in democratizing knowledge. However, a significant challenge persists due to the lack of unnamed data for various languages, hindering the development of high-quality speech recognition and synthesis models. Addressing this issue head-on, the Meta-led Massively Multilingual Speech (MMS) project has made remarkable strides in expanding language coverage and improving the performance of speech recognition and synthesis models.
MMS interprets Holy Bible in 1000+ languages
Through the innovative use of self-supervised learning methods, the MMS project has successfully developed support for over 1,100 languages, surpassing the initial aid of just 100 dialects provided by existing speech recognition models. This breakthrough occurred by leveraging datasets of religious readings, such as interpretations of the Bible in different dialects.
By utilizing openly accessible audio recordings of individuals reading religious texts, the MMS project has curated a comprehensive dataset that includes readings of the New Testament in over 1,100 languages. Furthermore, the project extended its language coverage to over 4,000 dialects by incorporating unnamed recordings of other religious readings.
Despite the dataset’s emphasis on religious texts and the prevalence of male speakers, the resulting models exhibited consistent performance for both male and female voices. Meta, the company leading the project, also assured that no religious bias was part of the process.
Traditionally, equipping supervised speech detection models with only 32 hours of data per language proved insufficient. However, Meta overcame this limitation by leveraging the advantages of the second edition of the wav2vec self-supervised speech representation learning methods.
By training self-supervised models on a staggering 500,000 hours of language data across 1,400 languages, the project significantly reduced the dependency on named data. The outcome models were then fine-tuned for specific speech tasks, such as multilingual speech detectors and language markers.
Preliminary experiments conducted on the MMS data yielded outstanding findings, showcasing the immense potential of the project. As language barriers continue to diminish, the Meta-led MMS project offers a promising glimpse into a future where individuals can access information in their native languages with ease.
As the MMS project continues to push the boundaries of language coverage and speech recognition technology, the possibilities for global communication and knowledge sharing can reach unprecedented heights.
Photo Credit: Brett Jordan (Unsplash)