Выбор метода факторизации в зависимости от исследовательской ситуации: практические рекомендации
Исследование посвящено сравнению применимости семи основных методов факторизации в зависимости от количества индикаторов, их распределения, модельной ошибки, объема выборки, структуры общностей. По итогам статистического эксперимента составлены следующие рекомендации: а) в случае наличия модельной ошибки использовать метод главных осей или альфа-факторный анализ; б) при отсутствии модельной ошибки выбирать между методом максимального правдоподобия и обобщенным методом наименьших квадратов. Авторы заявляют об отсутствии конфликта интересов.
Selection of factor extraction methods in complicated research contexts: practice recommendations.pdf Introduction Despite the longstanding and widespread usage of factor analysis and large number of factor extraction methods, in social sciences the statement “factor analysis was applied” often means that it was, in fact, dimension reduction with principal component analysis. According to publications in the field of social sciences, indexed in the Scopus database in 2017, principal component analysis is used at least twice as often as actual factor extraction methods altogether: among 1574 articles with “factor analysis” in keywords, abstract, or title, only 133 mention principal axis, maximum likelihood or least squares methods; alpha-factor analysis and image factoring were not mentioned once; 110 articles mention principal component analysis. The rest of the articles do not specify, which method of factor extraction was applied. At the same time, textbooks and manuals on quantitative data analysis often limit themselves to discussion of the factor extraction process (factorization), but rarely elaborate on the benefits of each method regarding the properties of data input and on how the results might differ [1; 2]. The reason for the neglect of all other factorization methods might be the lack of a comprehensive set of best practices for their selection. These methods were repeatedly compared in different research circumstances, but, generally, the comparison usually covers two or three of them applied to a very specific type of data [3-7]. In this article we compare six factor extraction methods (unweighted least squares, or ULS; generalized least squares or GLS, maximum likelihood method, or LM; principal axis method or PAX; alpha-factor or AF; Image 153 Suleymanova A.N., Zangieva l.K. Selection of factor extraction methods in complicated research contexts Factor analysis or IF) and one dimension reduction method (Principal Component Analysis or PCA) in regard to a broad spectrum of possible, real world data types. We intend to organize existing recommendations and build a theoretically and empirically confirmed algorithm of factor extraction method selection depending on a pool of input data characteristics that were determined based on existing research and known properties of methods: (1) sample size; (2) indicators to factors ratio; (3) communalities size; (4) communalities range; (5) whether there is a significant chance of model error and (6) distribution of indicators. We call a certain combination of these characteristics a research context. The main goal of the research is to develop an algorithm for the selection of factor extraction method depending on the research context. We compare methods based on their contextual performance, assessed by the occurrence of Heywood cases, and mean squared errors of both factor loadings and communalities produced by each method. Related work Factorization methods are, in their sense, mathematical instruments with a certain algorithm, which means that, in every research context, the choice of the method should be supported by theoretically justified and empirically tested recommendations. Yet the best practice for these kinds of methods so far is a segmental set of insights on the performance of each separate method. Acito and Anderson [3. P. 228-230] compared Alpha-factor (AF), Image Factor (IF) analyses, and PCA with Monte Carlo simulations in situations, when the quality of PCA results declined: a) sample size is small, b) there are less than 6 indicators for one factor and c) communalities size differs significantly. In these situations, AF and IF were more effective, than PCA. Mislevy determined that ML is preferable, when a small number of factors is extracted on the large number of indicators. GLS, on the contrary, is preferable, when many factors are extracted on a small number of indicators. Both methods are applicable, if the numbers of both indicators and factors are small [7. P. 20-21]. Generalized least squares (GLS) is more appropriate in situations, when the sample size and communalities sizes are small, compared to unweighted least squares (ULS) and maximum likelihood (ML) [8. P. 292]. According to Fabrigar [9. P. 272-275], ML is the most appropriate method in cases when the distribution of indicators is close to normal; otherwise, they advise to analysts to use PAX. By MacCallum [10. P. 84-86] it is determined that the sample size and ratio indicators - to-factors have little effect on ML results if communalities are high. In this case, factor loadings always recovered almost perfectly. However, if some communalities are small, indicators to factors ratio and sample size start to influence the quality of recovery of factor loadings. The findings about the properties of factor extraction methods can be schematized as follows (table 1): Theoretically justified advice for the best selection of factor extraction methods, which we are going to verify empirically on the base of statistical experiment, are as follows: • use PCA if the data follows every prerequisite for factor analysis, • use ULS If (1) indicators distribution differs from normal, (2) sample size is small, (3) communalities are small, (4) there is model error and (5) there are few indicators and a lot of factors 154 Социология / Sociology Table 1. Factor extraction methods features Is the method sensitive to the aspect of research context? Factor extraction method Distribution of indicators Sample size Communalities size Communalities range Number of indicators Model error Principal component analysis Yes Yes Yes Yes Yes Yes Principal axis method No Yes Yes Yes Yes Yes Unweighted least squares No No No No No No Generalized least squares No Yes Yes No No Yes Maximum likelihood Yes Yes Yes, but large and consistent communalities lower the sensitiveness to sample size and number of indicators Yes Yes Image factoring Yes No Yes No No Yes Alpha-factor analysis Yes No Yes No No Yes • use GLS if (1) indicators distribution differs from normal, (2) the range of communalities is wide and (3) there are few indicators and a lot of factors. • use PAX if the only deviation from ideal data properties is non-normal distribution of indicators. • use AF or IF if the only deviation from the ideal data properties is small sample size. • use ML if the only deviations from ideal data properties are small sample size and there are few indicators and a lot of factors. This hypothesis can be summarized and visualized in the following scheme of theoretically substantiated selection algorithm for factor extraction methods depending on research context. certainty of model error absence error suspected no error f- N distribution of indicators other ► sample size 2 normal sample size under 300 size of communalities Principal axis analysis Alpha-factor analysis over 300 Maximum likelihood method under 300 over 300 Generalized least squares method Maximum likelihood method Generalized least squares method Maximum likelihood method over 0.6 ^ Generalized least squares method under 0.6 ► Maximum likelihood method Fig. 1. Theoretically substantiated selection algorithm for factor extraction methods depending on research context Design of experiment We determined six aspects of research context that might affect the contextual performance of factor extraction methods: (a) presence of model error, (b) indicators to factors ratio, (c) distribution of indicators, (d) sample size, (e) size of communalities, and (f) range of communalities. According to best practices the threshold parameters for the chosen aspects of research context are: no less than 200 cases in a sample [9, 11], communalities no smaller than 0.6, communalities range no wider than 0.3, 6 indicators per factor and normal distribution of indicators is assumed [12]. We also consider model 155 Suleymanova A.N., Zangieva l.K. Selection of factor extraction methods in complicated research contexts errors in the form of mild cross-loadings. Below- and above threshold parameters of the models were specified as follows (table 2): Table 2. Specification of research context hypothetically best for use of each factorization method PCA ULS PAX GLS AF & IF ML Indicators to factors ratio 6/1 3/1 6/1 3/1 3/1 3/1 Communalities size >0.6 0.6 >0.6 >0.6 >0.6 Communalities range 0.3 0.3 >0.3
Ключевые слова
факторный анализ,
симуляция Монте-Карло,
извлечение факторов,
метод главных компонентАвторы
Сулейманова Анна Наильевна | Национальный исследовательский университет «Высшая школа экономики» | магистр социологии, старший преподаватель кафедры методов сбора и анализа социологической информации, департамента социологии, факультета социальных наук | asuleymanova@hse.ru |
Зангиева Ирина Казбековна | Национальный исследовательский университет «Высшая школа экономики» | андидат социологических наук, доцент кафедры методов сбора и анализа социологической информации, департамента социологии, факультета социальных наук | izangieva@hse.ru |
Всего: 2
Ссылки
Kim, J.O. & Mueller, W. (1978) Factor analysis: Statistical Methods and Practical Issues. Beverly Hills, CA: Sage.
Harman, H. (1976) Modern Factor Analysis. Chicago: The University of Chicago Press.
Acito, F. & Anderson, R. (1980) A Monte Carlo Comparison of Factor Analytic Methods. Journal of Marketing Research. 17(2). pp. 228-236.
Browne, M. (1968) A comparison of factor analysis techniques. Psychometrika. 33(3). pp. 267-334.
Costello, A. & Osborne, J. (2005) Best practices in Exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment Research & Evaluation. 10(7). pp. 1-9.
Keith, T., Caemmerer, J. & Reynolds, M. (2016) Comparison of methods for factor extraction for cognitive test-like data: Which overfactor, which underfactor? Intelligence. 54. pp. 37-54.
Mislevy, R. (1986) Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics. 11(1). pp. 3-31.
Ihara, M. & Okamoto, M. (1985) Experimental comparison of least-squares and maximum likelihood method in factor analysis. Statistics & Probability Letters. 3. pp. 287-293.
Fabrigar, L., MacCallum, R., Strahan, E. & Wegener, D. (1999) Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods. 4(3). pp. 272-299.
MacCallum, R., Widaman, K., Zhang, Sh. & Hong, S. (1999) Sample size in factor analysis. Psychological methods. 4(1). pp. 84-99.
Marsh, H., Balla, J. & McDonald, R. (1988) Goodness-of-Fit Indexes in Confirmatory Factor Analysis: The Effect of Sample Size. Psychological Bulletin. 103(3). pp. 391-410.
De Winter, J. & Dodou, D. (2016) Common Factor Analysis versus Principal Component Analysis: A Comparison of Loadings by Means of Simulations.Communications in Statistics: Simulation and Computation. 45(1). pp. 299-321.
Briggs, N. & MacCallum, R. (2003) Recovery of weak common factors by maximum likelihood and ordinary least squares estimation. Multivariate Behavioral Research. 38(1). pp. 25-56.
Nylund, K., Asparouhov, T. & Muthen, B. (2007) Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study. Structural Equation Modeling: A Multidisciplinary Journal. 14(4). pp. 535-569.
Coughlin, K. (2013) An Analysis of Factor Extraction Strategies: A Comparison of the Relative Strengths of Principal Axis, Ordinary Least Squares, and Maximum Likelihood in Research Contexts that Include both Categorical and Continuous Variables. Graduate Theses and Dissertations. [Online] Available from: http://scholarcommons.usf.edu/etd/4459.2013 (Accessed: 26th October 2022).