Testing goodness-of-fit with interval data
The main terms of interval data analysis was initially founded the measurement theory in metrology where an interval uncertainty is naturally introduced. It is expected that every observation is measured by an instrument with absolute error Д. Thus, if the precise value of an observed response is x, measurement error is e e [-Д, Д], then the measurement is equal to x = x + e . In this case, we deal with a usual complete sample Xn = {Xj,...,Xn}. Nevertheless, the measurement can be represented as an interval (x -Д, x + Д) = (L, R). In this case, for the sample of n observations we obtain an interval sample of the form In ={( A, R1),..., (Ln, Rn)}. The main idea of nonparametric estimation of the distribution function with interval data is based on maximization of the loglikelihood function ln L (In )=£ ln (F (R) - F (Lt)) i =1 at the boundary points of observations Li, Ri, i = 1, n, under condition of monotonicity of the distribution function. The Turnbull and ICM algorithms are used for calculation of the nonparametric estimate of the distribution function with interval data. The accuracy of the estimates calculation is the same for both algorithm, but the computing time is less for the ICM algorithm. Unknown distribution parameters can be estimated by the maximum likelihood method, which is based on the maximization of likelihood function by parameter 8 : L (Inl8) = n (F(Ri | 8) - F(Li | 8)). i=1 Thus, the maximum likelihood estimates can be written as 8 = arg max ln L (In | 8) . 8e© In this paper, the modifications of the classical goodness-of-fit tests for composite hypothesis H0 : F(t) e {F0(t; 8), 8 e ©} have been proposed. The main idea of this modification is based on the usage of nonparamet- ric estimate of the distribution function, obtained by the ICM algorithm, instead of the empirical distribution function. In this case, we have the test statistic of the Kolmogorov type as Dn = sup \Fn(t)-F0(t,8), <ffl2 = jf (Fn (t)- F0 (t, 8)) (t, 8 ) 0 and the statistic of Anderson-Darling type test as *=<«>-0 where Fn(t) is the nonparametric estimate of the distribution function by the interval data, 0 < x0 < x1 <... < xm are ordered different values Li and Ri, i = 1, n . We have formulated the sequence of steps for estimation of the p-value for the proposed tests. The hypothesis is not rejected if the obtained p-value is larger than the significance level а . As an example, we have tested the normality hypothesis by the interval sample of consumer demand prices for bio-energy drink SPC "SAVA". It has been shown that there is no reason for rejecting the hypothesis of normality of consumer demand prices.
Keywords
интервальные данные,
алгоритм Тёрнбулла,
ICM-алгоритм,
критерии согласия,
interval data,
nonparametric estimation of distribution function,
Turnbull algorithm,
ICM-algorithmAuthors
Vozhov Stanislav S. | Novosibirsk State Technical University | vss920414@gmail.com |
Chimitova Ekaterina V. | Novosibirsk State Technical University | chimitova@corp.nstu.ru |
Всего: 2
References
Kreinovich V. Interval computations and interval-related statistical techniques: estimating uncertainty of the results of data processing and indirect measurements // Advanced Math-ematical and Computational Tools in Metrology and Testing X. Singapore : World Scien-tific, 2015. P. 38-49. (Book series: Advances in Mathematics for Applied Sciences. V. 86). DOI: 10.1142/9789814678629_0014.
Вощинин А.П. Интервальный анализ данных: развитие и перспективы // Заводская лаборатория. 2002. Т. 68, № 1. С. 118-126.
Вощинин А.П. Метод анализа данных с интервальными ошибками в задачах проверки гипотез и оценивания параметров неявных линейно параметризованных функций // Заводская лаборатория. 2000. Т. 66, № 3. С. 51-64.
Орлов А.И. Основные идеи статистики интервальных данных // Научный журнал КубГАУ. 2013. № 94 (10). С. 1-26.
Зенкова Ж.Н., Краковецкая И.В. Непараметрическая оценка Тёрнбулла для интервально-цензурированных данных в марке тинговом исследовании спроса на биоэнергетические напитки // Вестник Томского государственного университета. Управление, вычислительная техника и информатика. 2013. № 3 (24). С. 64-69.
Лемешко Б.Ю., Постовалов С.Н. Об оценивании параметров распределений по интервальным наблюдениям // Вычислитель ные технологии. 1998. Т. 3, № 2. С. 31-38.
Лемешко Б.Ю., Постовалов С. Н. О решении задач статистического анализа интервальных наблюдений // Вычислительные технологии. 1997. Т. 2, № 1. С. 28-36.
Лемешко Б.Ю., Постовалов С.Н. Статистический анализ наблюдений, имеющих интервальное представление // Сборник научных трудов НГТУ. Новосибирск : Изд-во НГТУ, 1996. № 1. С. 3-12.
Turnbull B.W. Nonparametric estimation of a survivorship function with doubly-censored data // J. Am. Statist. Assoc. 1974. V. 69. P. 169-73.
Вожов С.С. Исследование свойств непараметрической оценки функции распределения по интервальным данным // Сборник научных трудов НГТУ. Новосибирск : Изд-во НГТУ, 2015. № 1 (79). С. 33-44.
Vozhov S., Chimitova E. Investigation of Maximum Likelihood Estimates and Goodness-of-Fit Tests for Data with Known Measurement Error // Applied methods of statistical analysis. Applications in survival analysis, reliability and quality control. AMSA'2015, Novosibirsk, 14-19 Sept. 2015 : proc. of the intern. workshop. Novosibirsk : NSTU publ., 2015. P. 124-130.
Groeneboom P. Asymptotics for interval censored observations // Technical Report 87-18. Department of Mathematics, University of Amsterdam, 1987. 69 p.
Groeneboom P. Nonparametric maximum likelihood estimation for interval censored data // Technical Report, Statistics Department, Stanford University, 1991. 87 p.
Groeneboom P., Wellner J.A. Information Bounds and Nonparametric Maximum Likelihood Estimation. Basel : Birkhauser Verlag, 1992. 126 p.