Handwriting recognition and data mining: Possibilities of neural network technologies (based on Admiral Fyodor Lütke’s Diary) | Imagologiya i komparativistika – Imagology and Comparative Studies. 2025. № 23. DOI: 10.17223/24099554/23/17

Handwriting recognition and data mining: Possibilities of neural network technologies (based on Admiral Fyodor Lütke’s Diary)

The main attention in the article is given to the description of a practical experiment conducted in order to search for methods of automated intellectual analysis of a historical manuscript document. On the example of Admiral Fyodor Lutke’s diary from the collections of the State Archive of the Russian Federation, an integrated approach to the problem to be solved was demonstrated. The approach consisted in the use of two neural networks - a specially designed and trained neural network for deciphering the manuscript (obtaining machine-readable symbols) and a large language model of GPT type (YandexGPT) for labelling (collecting) the main metadata in the source. The metadata obtained (identified by the model) make it possible to evaluate the main themes of the diary and to obtain lists of keywords (for each text segment), personalities, geographical locations, and place names. With the help of readymade tools (using the example of the web-application Voyant-Tools), which allow statistical analysis of the entire array of metadata, the researchers were able to carry out a near-automatic process of text mining and obtain information about the composition of the diary, the characteristics of personalities mentioned in the text (including members of the imperial family, representatives of court society, officials of the Military Department, etc.), as well as the distribution of geographical areas and toponyms in the text. The process was carried out without directly reading the historical source itself. The results obtained after the manuscripts were analysed by the YandexGPT model (taking into account the subsequent processing in Voyant-Tools) showed that the system correctly identified the main themes embedded in the text. The list of main themes correlated with the block of selected keywords (acting as a control group in relation to the list of themes), allowing to get a more objective picture. When compiling the register of personalities mentioned in the text and identifying a particular person, the system correctly recorded the patronymic, rank or title in addition to the name. In some cases the system offered possibilities of interpretations concerning this or that name that appeared in the text. The obtained characteristic of the majority of personalities identified in the deciphered text was also quite accurate. It is significant that in addition to describing the status and position of a particular person, the system was able to determine with a high percentage of accuracy the content of the latter’s relationship with the author (Lutke) and other persons. Thanks to the identified toponyms and geographical objects, the authors of the article managed to reconstruct the route of Lutke’s European voyage in 1839, to record his assessments of this part of the world, which were reflected - directly or in a less explicit form - in the diary of the corresponding period. The authors also obtained a complete list of topographical and geographical objects mentioned by Lutke in his diary. As a result of this work, metadata were identified, which with high accuracy form an idea of the studied text and its author. The authors of the article managed to establish that the joint work of two neural networks can be carried out “seamlessly” (without human participation), based on the application of mathematical algorithms and technical means. The resulting metadata can be used as a navigation system, allowing the researcher to learn about the content of an originally unencrypted manuscript. The authors declare no conflicts of interests.

Download file
Counter downloads: 109

Keywords

neural networks, machine data analysis, Fyodor Lutke, diary, Nicholas I, imperial family, large language models

Authors

NameOrganizationE-mail
Boltunova Ekaterina M.National Research University Higher School of Economicsekboltunova@hse.ru
Laptev Anton K.National Research University Higher School of Economicsaklaptev@hse.ru
Всего: 2

References

Государственный архив Российской Федерации (далее ГАРФ). Ф. 1463. Оп. 1. Д. 1111-1114.
Guldi Jo. The Dangerous Art of Text Mining: A Methodology for Digital History. Cambridge : Cambridge University Press, 2023. 496 p.
Литке Ф.П. Четырехкратное путешествие в Северный Ледовитый океан на военном бриге «Новая Земля» в 1821-1824 годах. М. : Географгиз, 1948. 337 с.
Литке Ф.П. Путешествие вокруг света на военном шлюпе «Сенявин» в 1826-1829. М. : Географгиз, 1948. 303 с.
Российский государственный исторический архив (далее РГИА). Ф. 472. Оп. 13. Д. 885.
 Handwriting recognition and data mining: Possibilities of neural network technologies (based on Admiral Fyodor Lütke’s <i>Diary</i>) | Imagologiya i komparativistika – Imagology and Comparative Studies. 2025. № 23. DOI: 10.17223/24099554/23/17

Handwriting recognition and data mining: Possibilities of neural network technologies (based on Admiral Fyodor Lütke’s Diary) | Imagologiya i komparativistika – Imagology and Comparative Studies. 2025. № 23. DOI: 10.17223/24099554/23/17

Download full-text version
Counter downloads: 164