Meta’s Fundamental AI Research (FAIR) team is dedicated to pushing the boundaries of artificial intelligence, with a special focus on reaching advanced machine intelligence (AMI). This type of AI is designed to emulate human reasoning and perform demanding cognitive tasks, such as language translation. The ultimate aim is to harness this technology to drive innovation and create products that are beneficial for everyone.
One of the key components of this initiative is our collaboration with UNESCO to enhance the support for underserved languages in AI models. By developing AI systems that can tackle multilingual challenges and work with languages that have been traditionally overlooked, we not only promote linguistic diversity and inclusivity in the digital realm but also create more adaptable and intelligent systems capable of learning from new experiences.
Today, we are excited to unveil some of our latest programs, research, and models that align with this vision. We are also extending an invitation to collaborators who wish to contribute to the development of AI translation technologies that encompass a wide spectrum of global languages and dialects.
Language Technology Partner Program
We are on the lookout for partners to join us in expanding and improving Meta’s open-source language technologies, with a particular emphasis on AI translation technologies. Our efforts are concentrated on languages that have not received adequate attention, in support of UNESCO’s initiatives as part of the International Decade of Indigenous Languages.
As part of this partnership, we are seeking collaborators who can contribute over 10 hours of speech recordings with transcriptions, a substantial amount of written text (at least 200 sentences), and sets of translated sentences across diverse languages. These partners will work closely with our teams to integrate these languages into AI-driven speech recognition and machine translation models. Once developed, these models will be open-sourced, making them freely accessible to the community.
Partners will also have the opportunity to participate in technical workshops conducted by our research teams. These workshops are designed to teach partners how to leverage our open-source models to create robust language technologies. We are thrilled to announce that the Government of Nunavut, Canada, has agreed to collaborate with us on this groundbreaking initiative, sharing data in the Inuit languages of Inuktitut and Inuinnaqtun.
To become part of our Language Technology Partner Program, interested parties are encouraged to complete this interest form.
Open Source Translation Benchmark
In tandem with our Language Partner Program, we are also launching an open-source machine translation benchmark. This benchmark serves as a standard test to evaluate the performance of AI models in the realm of translation. Crafted by linguistic experts, the benchmark is intended to highlight the rich diversity of human language.
We invite you to access the benchmark, which is available in seven languages, and contribute to translations that will be made open-source for others to use. Our goal is to establish an unparalleled multilingual machine translation benchmark. Access the benchmark here.
Our Commitment to Linguistic Diversity
The announcements we are making today are part of our long-term pledge to support underserved languages. In 2022, we introduced the No Language Left Behind (NLLB) project, a pioneering open-source machine translation engine. This was the first neural machine translation model developed for many languages and it laid down the foundation for future research and advancements in this field.
We partnered with UNESCO and Hugging Face to create a language translator based on NLLB, which we announced during the United Nations General Assembly week last September.
More recently, to further digital empowerment—one of the key thematic areas of the Global Action Plan of the International Decade of Indigenous Languages—we launched the Meta Massively Multilingual Speech (MMS) project. This project extends audio transcription capabilities to over 1,100 languages. Since its inception, we have continually worked to enhance and expand its functionalities, including the introduction of zero-shot speech recognition in 2024. This new feature enables the system to transcribe audio in languages it has never previously encountered without any prior training.
Ultimately, our ambition is to create intelligent systems that can understand and cater to complex human needs, irrespective of linguistic or cultural differences. As we continue on this path, we are eager to collaborate with others to enhance and broaden machine translation and other language technologies.
In conclusion, the initiatives by Meta’s FAIR team are groundbreaking steps towards a more inclusive and linguistically diverse digital future. By engaging with partners and the wider community, we are laying the groundwork for AI systems that can bridge language barriers and foster global connectivity. These efforts not only advance AI technology but also pave the way for a more inclusive world where every language and culture is valued and understood.
For those interested in contributing to or learning more about these initiatives, further information can be found on the official Meta and UNESCO websites.
For more Information, Refer to this article.