Suggested Solutions for the Bilingual Internet Lexicography Problems in the LexSite Dictionary Project
This article discusses identification of problems faced by the contemporary bilingual Internet lexicography and describes the LexSite dictionary the authors developed in the search for solutions. Data used in this research was obtained from four most popular Internet dictionaries users turn to for translations in the English-Russian language pair: Google Translate, Yandex.Translator, ABBYY Lingvo and Multitran. The first two are combined dictionaries-automatic translators while the other two are strictly dictionaries. The authors ran quantitative and qualitative comparison of translations offered by these dictionaries against the meanings of the same lexical units found in English thesauri. They also subjected examples of usage provided by the dictionaries to contextual analysis. They evaluated the completeness of translations given by those dictionaries and the quality of translations of individual lexemes and idioms. To test the translation veracity, the authors queried words that have homoforms and non-existing words made up for these tests followed by the analysis of methods applied to simulate translations for these non-existing words. They also investigated comments of grammatical and lexical nature included in the lexical entries, as well as the impact of users' involvement in adding entries to the dictionaries in terms of presentation and content. This research identified the following major issues faced by bilingual Internet dictionaries: (1) Combined dictionaries provide too few translations while ‘pure' dictionaries produce poorly systematized numerous results. (2) Translations of multi-word strings and idioms made by combined dictionaries are of low quality. (3) Combined dictionaries often make up translations of words that they cannot find. (4) Both types of dictionaries often fail to recognize homoforms and give translations of wrong source words. (5) Many entries in these dictionaries come with non-systematic or wrong grammatical and lexical comments. (6) The departure of these dictionaries from a reference source is aggravated due to attempts to involve users in creation of dictionary entries. The outcome of the study suggests that the surveyed online dictionaries are inappropriate sources of the language base information since they produce data of unpredictable quality. The solutions found by the authors have been implemented in the Internet dictionary LexSite. When unable to find translations for the user query, the dictionary informs the user on its inability to find the word. It translates phrases that include the requested word and its usage in the linguistic context. LexSite shows translations of idioms if exact matches are found. The dictionary recognizes morphological specifics of the languages and, if the requested word has homoforms, provides relevant comments. To disambiguate requested polysemic words, LexSite offers reverse translations while keeping the search results.
Keywords
интернет-словарь, двуязычная лексикография, словарь LexSite, словари-переводчики, машинный перевод, словник, Internet Dictionary, bilingual lexicography, LexSite dictionary, machine translation systems, machine translation, lexical datasetAuthors
Name | Organization | |
Berg Elena B. | Ural State Law University | elenabkct@gmail.com |
Kit Mark | Language Interface | clodpool@gmail.com |
References

Suggested Solutions for the Bilingual Internet Lexicography Problems in the LexSite Dictionary Project | Voprosy leksikografii – Russian Journal of Lexicography. 2019. № 16. DOI: 10.17223/22274200/16/6