Estimation of the function of realizability of solving the labor-consuming problems on distributed computer systems (CS) is made. A set of integral equations for calculating the function of realizability of problem solution on distributed CSs is derived. A parallel algorithm for its computing is described.
Технико-экономические показатели функционирования распределенных вычислительных систем и функцияосуществимости решения сложных задач .pdf The distributed computing system (CS) is an association spatially removed from each other concentrated CS, based on principles [1]:1))parallelism of functioning concentrated CS (i.e. abilities of several or all concentrated systems in common and simultaneously to solve one complex problem presented by the parallel program);2)programmable structure (i.e., opportunities automatically to adjust a communication network between concentrated CS);3))homogeneity of structure (i.e., program compatibility various concentrated CS and uniformity of elementary machines in each of them).The concept of distributed computer systems allows creating of robust data processing means with theoretically unlimited power.Quality of CS functioning is estimated by means of system of characteristics of power, reliability, robustness, realizability of problem solution, and technical and economic efficiency [1].1. Technical and economic efficiency analysis of large scale distributed computer systems functioningIt is required to solve the following problem: we have the distributed CS containing N elementary machines and restoring devices (RD) containing m devices, it is required to calculate expenses and income that will come while exploitation of the computer system.For this project the considering method is based on stochastic models describing the process of CS functioning [1]. Stochastic models lead to prime design formulas for coordinates of a vector-function r(t) and D(t) accordingly expenses and incomes.If X and u. are intensions accordingly of stream of breakdowns in one elementary machine (EM) and of recovering of broke-down EM with one RD; then, c\ and c2 are costs accordingly to exploitation of one EM and maintenance of one RD in a time unit.Let K(t) be the average number of trig machines in CS, then Mt(i) is the average number of busy RD in a restoring system. The time of system reconfiguration corresponds to v-1. It makes following equations being correct for coordinates of the vector-function of cost r;(t):dГг. {t) = c [N - Ki {t)] + c2 [m - Мг. {t)] ] (1)Г, (0) = 0, i € E. J The solution of the system (1) is presented by functions:Г (t) = -ßi + yt + рг. 5(f),where under NX < (X+u)ßi = ^(N"0Ц (c-c2), y^-N^(c1-c2)+mc2, 5(t) = e-^ (X+u)„ ik-muNX-mu.. _2k X where E1 =(N-m, N-m+1,..,N), E2 = (0, 1,..., N-m-1).The vector-function r(t) as the solution of a set of equations (1) in stationary conditions can be written as follows:Г (t) = g t.It is explained on the basis of the fact that under bigger t functions ß (X+u)g = CjwulI - (c 2 от + c3wu + c4N + c5) .V Xv jIn that way, using a quite easy method of calculating vector-functions T(t) and D(t), we can get the correlation between the reliability index and the cost of computer systems.2. Realizability of solving problems by CS continued approachThe theory of realizability studies the process of solving problems by non-absolutely reliable computer systems in three regimes. In the mono-program regime a function Ф(г, i) of potential realizability of solving problems by a CS is used as a quantitativecharacteristic. The function is the probability of the fact that by a CS which began functioning with i, 0< i < N, working elementary machines (EMs), for the time t > 0 there will be solved a problem presented as an adaptive parallel program. It is clear thatФ(г, i) = 1 - exp[-ßjK(x, i)dt] , (4)0where ß 1 is the average time of solving a problem by one working EM; K(x, i) is themean value of a number of working EM at a moment т > 0. Let N be the number of EMs constituting a CS; m is the number of repairing devices, X-1 and u-1 are mean values, respectively, of time between failures of an EM and of time to repair a faulty EM by one device. Then, under the condition that for the same time interval the average number of failures in a CS does not exceed the average number of repairs that can be performed all m devices, i.e. under the condition NX < Nu, one can obtain:K(t, i) = + iX~(N - A*1 e-^', (N - m) < i < N.(5) X + u X+uThe calculation of Ф(г, i) should be made when elaborate analysis of realizability of solving problems by a CS is necessary, when it is necessary to estimate the time for a system to gain stationary regime. For instance, such a situation occurs when creating a specialized CS. When operating a CS in commercial situations there will hardly occur the necessity of such calculation. Numerical analysis shows that at modern reliability parameters of microcomputer devices the stationary regime is gained in about 10 hours. Therefore, one may assume that the functioning regime of a general CS is stationary, as a rule. Consequently, for the users of a CS to estimate the realizability of their problems to be solved, it is necessary, to use the simplest formula:Ф(>, i) = 1 - exp[-ßNut /(X + u)]. (6)It should be noted that one of multiprogram regimes - the solution of a set of problems, without loss of generality is reduced to the mono-program one. Indeed, the re-alizability of solving any problem from the set depends on the parameters of the subsystem that has just been attributed to it; the analysis of efficiency of the subsystem does not differ from the analysis of the whole system when solving a problem.The analysis of a CS functioning in the regime of serving a flow of problems is reduced to solving the following problem in a simplified set-up. Let with intensity a the simplest (Poisson) flow of simple problems (represented by sequential programs) be loaded into a CS. It is necessary to calculate mean value A(t) of the number of problems contained in the system at a moment t > 0. Let intensity a be such that there holds a < N ß, i.e. always in the system there are operating and vacant EM to solve entering problems. ThenA(0) = j , j e{0,1,..., i},A(t + At) = A(t) + aAt - A(t)ßAt + o(At),A'(t) = a-ßA(t),A(t) = a / ß + (j -a / ß) exp(-ßt).If the inequality a < N ß does not hold, the calculation of A(t) is made in a similar way. Formulas (4) - (6) allow simply executing express-analysis of functioning of non-absolutely reliable CS in the basic regimes of information processing.3. Realizability function of parallel solving problemsWe suppose that labour-consuming problem is represented by the adaptable parallel program [1] and it can be solved if the system has at least one working EM.The model for calculation of the probability of parallel solving problem on distributed CS during the time f (T) for not more than / failures and restorations of EMs is proposed [2], T - the time of solving problem is just on one EM.Let us introduce the notation: p n( f (t), l, i) is the probability that during the time f(t) CS will solve Q(01] part of the problem provided that at instant of time f (t) and at the initial instant of time there were n and i working EMs, respectively, and during the timef (t) the CS had / failures and restorations of EMs, t e [0, T];P rec(n, f (t)) is the probability of restoration of one failed EM during the time f (t) for failure of (N-m) EMs and m < N restoration devices;Pft(n, f (t)) is the probability of failure of any EM from n working EMs during thetime f (t);Vflt(n, f (t)) is the survival probability functioning of CS from n working EMs during the time f (t);V rec("> f (t)) is the probability of nonrestoring of any failed EM during the time f (t) in case of failure of the (N - n)th EM and in the presence of m < N restoration devices;W n( f (t)) = V rec("> f (t ))V flt(«> f (t)) is the probability that for the time interval f (t) no failures or restorations on CS of n EMs will happen.Let (/-1) failures have happened in the time interval [0, f (т)), and let a failure occur at the instant f (т) (т< t), then we obtain thatP n(f (t), l, i) = P я+1( f (t), l -1, i) P n(f (t) - f (t) + T tun, 0, i) x xd(Pflt(« +1, f (t))Vrec(« +1, f (t)) ), 1 < n < N.Further, assuming (/ - 1) failures in the time interval [0, f (т)), and failure or restoration of EM at the instant f (т) (т< t), and also considering thatPn(f (t) - f (T) + Ttun, 0,1) = Wn( (t - T)k-1) ,(Where kn is the acceleration of problem solution on n EMs, which, for simplicity, is assumed to be independent on the part of the problem solved), we obtainp n{f (t), i, i) = p я+1( f (t), i -1, i)w n( (t - т)к-1) xxd( pflt(n +1, f (t))Vrec(n +1, f (t)) ) + pn-i(f (t), l -1, i) x x Wn( (t - t)k-1) x d( Prec(n -1, f (t)) Va(n -1, f (t)) ), 1 < n < N .We suppose / failures and restorations of EM on the distributed CS during the time f (t). Therefore, the probabilities pn(f (t),/, i) are found from the set of equations:Px{f (t),/,1) = Jp 2( f (T),1 -\,l)W l(t-T)d (P flt (2, f (T))V rec(2, f (T))), 0P (t),/,/) = Jp „+1( f (T),/-1,i)W „((t-t)k-1)d (P flt(«+1, f (T))V rec(«+1, f (T))) +0< + JP^iCf (t),/-1,i)W„((t-T)k-1)d(Prec(«-1,f (t))Vflt(n-1,f (t))), 2> f (t)) = exp(-u(n)f (t)) ,V&(n, f (t)) = exp(-Xnf (t)).In fig. 1 one of calculations of realizability function of parallel solving problem with following parameters kn = T/n, T = 310 hours, N = 32 EM, i = 32 EM, u = 0.0 are presented graphically. Estimation of function f (t) was executed for CS, in which failure rate for each EM X = 0.00001; forf2(t), X = 0.00005; for f3 (t), X = 0.0001. In this case, function f (t) - realizability of solving labor-consuming problems on CS (£=1,2,3). Here, f (t) = 0, Vt e [0, T0), (£=1,2,3); T0 - minimal time of parallel solving problem on distributed CS with all serviceable EM.Pit) 0,9970,9920.9870,9820,9770,9720,967Fig. 1. Realizability function of parallel solving problem on distributed CS3.2. Parallel algorithmWe divide the calculation of each integral for pn(f (t), l, i) of the set of equation (1) into К parts. Every part is solved on corresponding EM of parallel CS.Parallel algorithm for calculating (1) and (2) was implemented in the cluster CS [3] using programming language C and MPI technologies. Results of the paralleling efficiency of algorithm are shown in Fig.2.REFERENCES
Khoroshevsky V.G., Mamoilenko S.N., Maidanov Y.S., Smirnov S.V. Robust cluster computer systems // Optoelectronics, Instrumentation and data processing. 2004. V. 40. No. 1.
Khoroshevsky V.G. Architecture of computer systems (in Russia). M.: MSTU Publishing House, 2005. 512 p.
Pavsky K.V. Analysis of the time of solution of parallel problems on programmable structure computer systems // Optoelectronics, instrumentation and data processing. 2000. No. 2. P. 54 -62.