WORD and low frequency vocabulary in dictionaries of the text editor
This article describes problems of computerized spell checking of Russian-language texts. Microsoft Word ® 2016™ (2018 modification) text editor's built-in spell checking engine is being investigated and evaluated. It is shown that the inclusion of some obsolete and low-frequency words into the internal (system) computer dictionaries sometimes not only does nothing to improve the work of the speller, but also leads to the skipping of errors and typos. It is worth mentioning that many flaws and gaps of previous MS Word versions have been patched in MS Word 2016. Nevertheless, computerized analysis of word concord - in phrases and in standalone word combinations - raises even more questions, especially when compared with previous Orfo™-based spellers. Even detection of spelling errors (as the most developed analysis area) and prompting of possible corrections are still far from being perfect. The suggesting program of the spell checker suggests possible correction variants of words underlined and unidentified by the system (no more than three options in 2017 edition) or revision of the phrase. The suggestion is not always able to propose the normative spelling of the word, especially if it differs from the underlined one by a few letters. In the list of options, words are often split by a space, without taking into account the coherence of the two resulting words. The article contains examples of words quite frequently used in modern phrases and not known by the WinWord system dictionary, which should not be detected as mistakes but should be skipped without remarks. At the same time, there is no reason to keep rare and low-frequent short lexical units which coincide with beginnings and endings of more commonly used words in the system dictionary, because they may appear when a word is unintentionally split by space. The article contains examples of specially constructed phrases with errors: interchange of letters in a word, hyphaeresis or gemination, word split or concatenation. All such words resulting from errors are present or generated within the system dictionary. Word forms do not concord here; however, MS Word is unable to detect syntax errors of this type. Similar phrases can also be used for testing spell checkers of other MS Word versions, not only previous but also newer ones. The author provides a list of rare words considered by MS Word as correct despite of a significant chance of an error in writing more commonly used words. It is advisable to remove some 'specific' rare words from internal system dictionaries or deactivate them for the time being, until the spell checker is more informative about the contextual areas where the words can be used. Many of the flaws described by the author and by other Internet users have recently been eliminated from the MS Word text editor, but the content of its Russian system dictionary and the recommendations of the spell checker suggesting program leave a lot of questions.
Keywords
Microsoft Word, WinWord, компьютерная проверка правописания, текстовый редактор, спеллер, устаревшая лексика, орфографические ошибки, нормативное написание, архаизмы, русский язык, Microsoft Word, WinWord, computer spell checker, text editor, speller, obsolete vocabulary, spelling mistakes, regulatory writing, archaisms, Russian languageAuthors
Name | Organization | |
Lavoshnikova Elina K. | Lomonosov Moscow State University | elavoshnikova@mail.ru |
References

WORD and low frequency vocabulary in dictionaries of the text editor | Vestnik Tomskogo gosudarstvennogo universiteta – Tomsk State University Journal. 2018. № 435. DOI: 10.17223/15617793/435/5