Executive Summary

The objective of work package 3 “Database and MT Infrastructure” is to develop and implement the MMT machine translation architecture and its core components, in a way to meet the scalability and big data requirements of the project. It also includes aspects of research and innovation in order to achieve machine translation training in real time and to improve translation quality through the exploitation of context information.

This report is the second deliverable of WP3. It describes the current status of the implementation of the technologies and algorithms the consortium has worked on in the second year of the project. In fact, the described version has been delivered as a demonstrator (release 0.14-alpha) on October 25th, 2016 (see Deliverable D4.2).

Section 2 introduces the enhancement applied to the MMT system during the second year of the project. Section 3 describes the overall MMT architecture by quickly introducing each single component and its interactions in the training, updating and translation phases. Section 4 focuses on the context analyser, which informs the adaptation process of machine translation by finding the closest domains to the input document. Section 5 describes the machine translation engine which includes text pre- and post-processing modules, the word aligner, adaptive translation, reordering, and language models and the MT decoder. Section 6 reports on the development of domain clustering methods. Finally, Section 7 presents recent tests we run with the current MMT system.

Download the PDF

1430235416_pdf-128MMT – D3.2 – Second Report on Database and MT Infrastructure (PDF, 673 KB)