About the influence of rounding errors on distributions of statistics of the goodness-of-fit tests
Most of the existing tests are designed to checking statistical hypotheses regarding continuous random variables. This is a standard assumption, which determines the correct application of the relevant tests. In real situations, this assumption is often violated, since any measurements are accompanied by some rounding error. Therefore, repeated observations appear in the samples, which should not be in the case of continuity of a random variable. The presence of rounding errors can affect the results of applying statistical hypothesis testing criteria, and in some situations lead to incorrect conclusions. Changing the properties of tests due to the influence of rounding errors does not exclude the possibility of their correct application. This work has two goals. First, to show how, depending on the magnitude of the rounding error A and on the volume n of samples, the distributions of statistics of various criteria for testing statistical hypotheses can change. Secondly, to give recommendations on how to proceed in order to ensure the correctness of the conclusion according to the applicable tests, if such changes cannot be ignored. To ensure the ongoing research in the developed software system, the possibility of simulating the distributions of statistics of the corresponding tests under the conditions of violation of the standard assumption of continuity (for given A and n) is implemented. The number of simulation experiments in the study of the corresponding distributions of statistics and the calculation of estimates of the achieved significance level pvalue by statistical simulating methods, as a rule, was not less than N = 106. Using statistical simulating methods, an example of a set of 30 tests (goodness-of-fit tests, special tests for checking normality and checking exponentiality) demonstrates how significantly the distribution of the statistics of the tests can change depending on the rounding error and for limited sample sizes n. To ensure the correctness of the conclusions on the applicable tests for non-standard conditions is proposed applications (and implemented) interactive study of the real distribution of the test statistic (for given A and n) statistical simulating methods. Two samples of volume n = 50 containing the measurement results with a rounding error of ∆ = 0,2 σ show the results of applying the considered set of tests to checking hypotheses about the belonging of the samples to the normal and exponential laws, respectively. A significant difference is shown between the estimates of pvalue obtained from the asymptotic and real distributions of statistics.
Keywords
goodness-of-fit tests,
normality tests,
exponential tests,
statistics,
distribution of statistics,
rounding errorsAuthors
Lemeshko Boris Yu. | Novosibirsk State Technical University | lemeshko@ami.nstu.ru |
Lemeshko Stanislav B. | Novosibirsk State Technical University | skyer@mail.ru |
Всего: 2
References
Лемешко Б.Ю. Лемешко С.Б., Семёнова М.А. К вопросу статистического анализа больших данных // Вестник Томского государственного университета. Управление, вычислительная техника и информатика. 2018. № 44. С. 40-49. DOI: 10.17223/19988605/44/5
Lemeshko B., Lemeshko S., Semenova M. Features of testing statistical hypotheses under big data analysis // Applied Methods of Statistical Analysis. Statistical Computation and Simulation - AMSA'2019, Novosibirsk, Russia, 18-20 September, 2019 : proc. of the International Workshop. Novosibirsk : NSTU publisher, 2019. P. 122-137.
Pearson E.S., D’Agostino R.B., Bowman K.O. Tests for departure from normality: Comparison of powers // Biometrika. 1977. V. 64. P. 231-246. DOI: 10.1093/biomet/64.2.427-a
Tricker A.R. The effect of rounding on the significance level of certain normal test statistics // Journal of Applied Statistics. 1990. V. 17, No. 1. P. 31-38. DOI: 10.1080/757582644
Tricker A.R. The effect of rounding on the power level of certain normal test statistics // Journal of Applied Statistics. 1990. V. 17, No. 2. P. 219-228. DOI: 10.1080/757582833
ISW - Программная система статистического анализа одномерных наблюдений. URL: https://ami.nstu.ru/~headrd/ISW.htm (дата обращения: 11.02.2020).
Лемешко Б.Ю. Непараметрические критерии согласия : руководство по применению. М. : ИНФРА-М, 2014. 163 с. DOI: 10.12737/11873
Лемешко Б.Ю. Критерии проверки отклонения распределения от нормального закона : руководство по применению. М. : ИНФРА-М, 2015. 160 с. DOI: 10.12737/6086
Лемешко Б.Ю., Блинов П.Ю. Критерии проверки отклонения распределения от равномерного закона : руководство по применению. М. : ИНФРА-М, 2015. 183 с. DOI: 10.12737/11304
Лемешко Б.Ю. Критерии проверки гипотез об однородности : руководство по применению. М. : ИНФРА-М, 2017. 208 с. DOI: 10.12737/22368
Kolmogoroff A.N. Sulla determinazione empirica di una legge di distribuzione // Giornale del Istituto Italiano degli Attuari. 1933. V. 4, No. 1. P. 83-91.
Большев Л.Н., Смирнов Н.В. Таблицы математической статистики. М. : Наука, 1983. 416 с.
Kuiper N.H. Tests concerning random points on a circle // Proceedings of the Koninklijke Nederlandse Akademie van Weten-schappen. Series A. 1960. V. 63. P. 38-47.
Watson G.S. Goodness-of-fit tests on a circle. I // Biometrika. 1961. V. 48, No. 1-2. P. 109-114.
Watson G.S. Goodness-of-fit tests on a circle. II // Biometrika. 1962. V. 49, No. 1-2. P. 57- 63.
Anderson T.W., Darling D.A. A test of goodness of fit // Journal of the American Statistical Association. 1954. V. 29. P. 765-769.
Anderson T.W., Darling D.A. Asymptotic theory of certain “Goodness of fit” criteria based on stochastic processes // The Annals of Mathematical Statistics. 1952. V. 23. P. 193-212.
Zhang J. Powerful goodness-of-fit and multi-sample tests : PhD Thesis / York University. Toronto. 2001. 113 p. URL: http://www.collectionscanada.gc.ca/obj/s4/f2/dsk3/ftp05/NQ66371.pdf (accessed: 03.12.2019).
Никулин М.С. Критерий хи-квадрат для непрерывных распределений с параметрами сдвига и масштаба // Теория вероятностей и ее применение. 1973. Т. XVIII, № 3. С. 583-591.
Никулин М.С. О критерии хи-квадрат для непрерывных распределений // Теория вероятностей и ее применение. 1973. Т. XVIII, № 3. С. 675-676.
Rao K.C., Robson D.S. A chi-squared statistic for goodness-of-fit tests within the exponential family // Communications in Statistics - Theory and Methods. 1974. V. 3. P. 1139-1153.
Статистический анализ данных, моделирование и исследование вероятностных закономерностей. Компьютерный подход / Б.Ю. Лемешко, С.Б. Лемешко, С.Н. Постовалов, Е.В. Чимитова. Новосибирск : Изд-во НГТУ, 2011. 888 с.
Noughabi H.A, Arghami, N.R. General treatment of goodness of fit tests based on Kullback-Leibler information // Journal of Statistical Computation and Simulation. 2013. V. 83. P. 1556-1569.
Noughabi H.A. A new estimator of Kullback-Leibler information and its application in goodness of fit tests // Journal of Statistical Computation and Simulation. 2019. V. 89, No. 10. P. 1914-1934.
Frosini B.V. A survey of a class of goodness-of-fit statistics // Metron. 1978. V. 36, No. 1-2. P. 3-49.
Epps T.W., Pulley L.B. A test for normality based on the empirical characteristic function // Biometrika. 1983. V. 70. P. 723-726.
Hegazy Y.A.S., Green J.R. Some new goodness-of-fit tests using order statistics // Applied Statistics. 1975. V. 24, No. 3. P. 299308.
David H.A., Hartley H.O., Pearson E.S. The distribution of the ratio? In a single normal sample, of range to standard deviation // Biometrika. 1964. V. 512, No. 3-4. P. 484-487.
Geary R.C. Testing for Normality // Biometrika. 1937. V. 34. P. 209-242.
D’Agostino R.B. Transformation to normality of the null distribution of g1 // Biometrika. 1970. V. 57. P. 679-681.
Baringhaus L., Henze N. A class of consistent tests for exponentiality based on the empirical Laplace transform // Annals of the Institute of Statistical Mathematics. 1991. V. 43, No. 3. P. 551-564.
Mimoto N., Zitikis R. The Atkinson index, the Moran statistic, and testing exponentiality // Journal of the Japan statistical society. 2008. V. 38, No. 2. P. 187-205.
Frosini B.V. On the distribution and power of a goodness-of-fit statistic with parametric and nonparametric application // Good-ness-of-fit / ed. by P. Reverz, K. Sarkadi, P.K. Sen // Amdstedam-Oxford-New York : North-Holland. Publ. Comp., 1987. P. 133-154.
Henze N., Meintanis S.G. Tests of fit for exponentiality based on the empirifcal Laplace transform // Statistics: a Journal of Theoretical and Applied Statistics. 2002. V. 36, No. 2. P. 147-161.
Henze N., Meintanis S.G. Recent and classical tests for exponentiality: a partial review with comparisons // Metrika. 2005. V. 61. P. 29-45.
Henze N. A new flexible class of omnibus tests for exponentiality // Communications in Statistics - Theory and Methods. 1993. V. 22, No. 1. P. 115-133.
Klar B. Goodness-of-fit tests for the exponential and the normal distribution based on the integrated distribution function // Annals of the Institute of Statistical Mathematics. 2001. V. 53, No. 2. P. 338-353.
Kimber A.C. Tests for exponential. Weibull and Gumbel distribution based on the stabilized probability plot // Biometrika. 1985. V. 72, No. 3. P. 661-663.
Deshpande J.V. A Class of tests for exponentiality against increasing failure rate average alternatives // Biometrika. 1983. V. 70, No. 2. P. 514-518.
Lemeshko B.Yu. Chimitova E.V., Kolesnikov S.S. Nonparametric goodness-of-fit tests for discrete, grouped or censored data // XIIth Applied Stochastic Models and Data Analysis (ASMDA 2007) International Conference. Book of Abstracts. May 29 -June 1, 2007. Chania, Crete, Greece / ed. C.H. Skiadas. P. 112. URL: https://ami.nstu.ru/~headrd/seminar/publik_ html/LEMESHKO_ASMDA2007_2.pdf (accessed: 18.12.2019).