Samsung Research and Samsung R&D Institute Poland (SRPOL) participated in the Workshop on Machine Translation (WMT). WMT is one of the world’s largest machine translation research events, to compare the quality of their translation tools. Teams from all over the world competed in the eight machine translation task competitions. To find new and innovative ways to understand human language using machines and computer programs.
Biomedical Translation task
The team from the Samsung Research Global AI Center’s Language Lab participated in the Biomedical Translation task. The task aimed to evaluate systems for translating sentences from the biomedical domain. The group took home the first prize for successfully interpreting two language pairs: English to Spanish and Spanish to English. They achieved the highest scores in both language pairs. This demonstrates their ability to accurately and fluently translate sentences between the two languages. To achieve this, the team incorporated soft-constrained terminology translation. It gives hints in the form of source sentences about the terminology constraints of the target language.
Enhancing domain-specific translation performance
In the case of domain-specific translation, one of the most important factors that determine its quality is terminology translation. This can be a challenge as compared to general terms. Because normally technical terms are less used and therefore are more difficult to learn. To address this issue Samsung Research Global AI Center’s Language Lab has developed soft-constrained terminology translation which provides hints in the form of target language constraints. Along with source sentences inputted into it. This makes sure that domain-specific terminologies reflect in translation results wherever possible.
Currently, Samsung Research is researching providing patent translation services (Korean—English) on their online service called SR Translate. The goal here is to provide accurate and reliable translations for users. As they need translations quickly without having any issues related to accuracy or relevancy. The issues occur due to incorrect terminologies used while translating from one language into another. With this technology, they hope they will be able to improve the user experience when dealing with domain-specific languages. For instance, patents and other legal documents etcetera by ensuring correct usage of all relevant terminologies during such types of translations jobs
The incorporation of soft-constrained terminology translators has already proven itself quite useful since its introduction. It turned out useful by improving accuracy when it comes down to doing complex tasks involving multiple languages at once. Significantly those related to specialized domains like patents, legal documents etcetera. It should also help reduce time spent manually checking translated content for any errors or discrepancies. Thus, making the entire process much smoother than before. Nonetheless, using advanced technologies such as these can prove beneficial not only for companies. But also end users who depend upon accurate information delivered within a reasonable amount of time.
Focus on improving the quality of corpora
During competitions, such as the International Workshop on Spoken Language Translation (IWSLT), teams are typically given a limited number of corpora. Corpora is a collection of structured texts, to analyze for their translation model. The SRPOL team attributed their success to focusing on improving the quality of these corpora. With the help of processes such as data preprocessing and filtering. As well as optimizing their model’s architecture and AI training process.
SRPOL’s Machine Translation Team created a classifier utilizing a machine learning framework using the improved corpus. The framework is called BERT (Bidirectional Encoder Representations from Transformers). This classifier was able to successfully categorize millions of sentences from the corpus into different domains. It allows SRPOL to create models for general translation, medical, and legal translation.
SRPOL had great success in the field of machine translation, winning the challenges at the IWSLT for four consecutive years. This shows that the goal of attaining a human-like level of language understanding is within our reach. As machine translation and language understanding become more commonplace, Samsung will stay at the forefront of this technology to develop the tools to overcome language barriers and improve our daily lives.