![]() In the full paper, we will elaborate the many uses of parallel corpora, will summarise the specific interest of the Joint Research Centre i. Motivation to compile the parallel corpus Parallel corpora are extremely useful to train and evaluate automatic text analysis systems and to generate new linguistic resources such as subject-specific monolingual and multilingual terminology lists, and more. The corpus is accompanied by a tool to produce a bilingual paragraph-aligned parallel corpus for all possible language pair combinations. ![]() Pair-wise paragraph alignment information is available for all 190+ language pair combinations. The UTF-8-encoded corpus has been manually classified according to EUROVOC (1994) subject domains and is available in XML format. The average size is about 10 Million words per language. We are presenting a new and unique parallel corpus available in all 2 official European Union (EU) languages, with additional documents available for some EU candidate countries.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |