Using the analysis of semantic proximity of words in solving the problem of determining the genre of texts within deep learning
The relevant objective in the processing of text corpora is the classification of texts by topics and genres. Usually this work is done manually, so processing large text corpora is an extremely long process. Moreover, an unambiguous classification is not always possible: in most cases, the same text can be attributed to several topics and genres, with only one of them being the principal one. Therefore, the full automation of the classification process or limiting the choice of a researcher to the list of the most likely topics and genres is of practical interest. To solve the problem, the authors propose to use convolutional neural networks, which, on the one hand, are efficient in classifications, and, on the other hand, are not used and studied properly for text recognition. To present the data in a form suitable for processing by a convolutional neural network, the word2vec model was chosen. This model allows us to conduct vector representations of words that reflect their semantic proximity. To implement the word2vec model, the Skip-gram architecture was chosen, which, despite the slow learning rate, works well with rare words. Based on the results of numerous experiments, the most optimal model hyperparameters were selected. The output of a trained model is the probability of attribution of a work to each class. Based on the analysis of the obtained results, we can conclude that the proposed model of the convolutional neural network is correct and fairly accurately reflects the literary perception of the genre.
Keywords
text natural language processing, word2vec model, convolutional neural networks, machine learning, интеллектуальный анализ текстов, модель word2vec, сверточные нейронные сети, машинное обучениеAuthors
Name | Organization | |
Batraeva Inna A. | Saratov State University | BatraevaIA@info.sgu.ru |
Nartsev Andrey D. | Saratov State University | narcev.andrey@gmail.com |
Lezgyan Artem S. | Saratov State University | lezgyan@yandex.ru |
References

Using the analysis of semantic proximity of words in solving the problem of determining the genre of texts within deep learning | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2020. № 50. DOI: 10.17223/19988605/50/2