On the possibility of mathematical modelling of the evolution of the polysemy of natural language signs with using of non-stationary birth-death processes
We consider the possibility of mathematical modeling of the evolution of polysemy of ensemble of signs of natural language by means of non-stationary processes of birth and death. We showed that an adequate mathematical model of polysemy of ensemble of signs might be built on the base of hidden non-stationary model of the birth and death processes of the meanings of linguistic signs. We assume exponential decay of the intensities of the processes of birth and death: ) = X0 ep- t -10 , = Ц0 p (t - t0 V2 , where t is the current time; t0 is the time moment when the sign appears in the ensemble; X0, ц0 are the initial values of intensities of the processes of birth and death; т = G / X0, т2 = G / ц0 are time decay constants of intensities, and G is the average number of meanings, which the sign may birth and lose during his life: G = jX(t)dt = X0tj , G = |ц(()dt = ц0т2 . We received the conditional (with fixed parameters t0, X0, Ц0, G) probability distribution of states n of this process: · ^ f и(Л \ c (t) Г (a (t) +1) f b (t) f X ^ k=0 k !(n - k )!Г(п0а (t)-k +1)) 1 -b{t ц(<) Pn (t|S) = exp -(1 - b (t)) (1 - b (t)) where b(() = expf- jц(()dt^ , c(()=(t)-;(). In the hidden model of the statistical ensemble of processes of birth and death the parameters t0, X0, ц0, G of each individual process (of each linguistic sign) randomly vary in relation of each to other, subject to certain distribution laws. Under the assumption of a Pois-son distribution of the flow of signs, the distribution density of the parameter t0 can be considered as uniform on a large enough time interval, while the distributions of parameters X0, ц0, G are unknown. Unconditional probability distribution Pn(t) of the state n of an ensemble of the processes of birth-death (of the polysemy of an ensemble of signs) at moment t is the mathematical expectation of the conditional distribution Pn(t|0) over the distribution of parameters t0, X0, ц0, G. We have solved the task of estimation of the parameter distributions (for identifying of hidden model) according to the empirical polysemy distribution Pne obtained from a representative dictionary, with the subsequent calculation of the optimal theoretical distribution Pn(t). As an identification criterion (criterion of proximity of distribution), we select a logarithmic RMS criterion of type: =5) J log Pn (t) - log Pm ( ) log Pns (t) j=± z n0 n=1 ^ min convenient for large (several orders of magnitude) changes in distributions for different n. The criterion was implemented on example of using of the dictionary of Pushkin's language. We obtain a good agreement of distributions Pn(t) and Pne that confirms the possibility of using of hidden mathematical model of non-stationary process of birth-death for the simulation of polysemy evolution of the ensemble of signs of natural language.
Keywords
неоднородный процесс рождения и гибели, скрытая марковская модель, идентификация модели, языковой знак, полисемия, heterogeneous process of birth and death, hidden Markov model, model identification, language sign, polysemyAuthors
Name | Organization | |
Poddubny Vasiliy V. | Tomsk State University | vvpoddubny@gmail.com |
References

On the possibility of mathematical modelling of the evolution of the polysemy of natural language signs with using of non-stationary birth-death processes | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2016. № 3(36).