• Title, Summary, Keyword: Multicollinearity

Search Result 121, Processing Time 0.035 seconds

ILL-CONDITIONING IN LINEAR REGRESSION MODELS AND ITS DIAGNOSTICS

  • Ghorbani, Hamid
    • The Pure and Applied Mathematics
    • /
    • v.27 no.2
    • /
    • pp.71-81
    • /
    • 2020
  • Multicollinearity is a common problem in linear regression models when two or more regressors are highly correlated, which yields some serious problems for the ordinary least square estimates of the parameters as well as model validation and interpretation. In this paper, first the problem of multicollinearity and its subsequent effects on the linear regression along with some important measures for detecting multicollinearity is reviewed, then the role of eigenvalues and eigenvectors in detecting multicollinearity are bolded. At the end a real data set is evaluated for which the fitted linear regression models is investigated for multicollinearity diagnostics.

Multicollinearity and misleading statistical results

  • Kim, Jong Hae
    • Korean Journal of Anesthesiology
    • /
    • v.72 no.6
    • /
    • pp.558-569
    • /
    • 2019
  • Multicollinearity represents a high degree of linear intercorrelation between explanatory variables in a multiple regression model and leads to incorrect results of regression analyses. Diagnostic tools of multicollinearity include the variance inflation factor (VIF), condition index and condition number, and variance decomposition proportion (VDP). The multicollinearity can be expressed by the coefficient of determination (Rh2) of a multiple regression model with one explanatory variable (Xh) as the model's response variable and the others (Xi [i ≠ h]) as its explanatory variables. The variance (σh2) of the regression coefficients constituting the final regression model are proportional to the VIF (${\frac{1}{1-R_h{^2}}}$). Hence, an increase in Rh2 (strong multicollinearity) increases σh2. The larger σh2 produces unreliable probability values and confidence intervals of the regression coefficients. The square root of the ratio of the maximum eigenvalue to each eigenvalue from the correlation matrix of standardized explanatory variables is referred to as the condition index. The condition number is the maximum condition index. Multicollinearity is present when the VIF is higher than 5 to 10 or the condition indices are higher than 10 to 30. However, they cannot indicate multicollinear explanatory variables. VDPs obtained from the eigenvectors can identify the multicollinear variables by showing the extent of the inflation of σh2 according to each condition index. When two or more VDPs, which correspond to a common condition index higher than 10 to 30, are higher than 0.8 to 0.9, their associated explanatory variables are multicollinear. Excluding multicollinear explanatory variables leads to statistically stable multiple regression models.

Effects of Multicollinearity in Logit Model (로짓모형에 있어서 다중공선성의 영향에 관한 연구)

  • Ryu, Si-Kyun
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.1
    • /
    • pp.113-126
    • /
    • 2008
  • This research aims to explore the effects of multicollinearity on the reliability and goodness of fit of logit model. To investigate the effects of multicollinearity on the multinominal logit model, numerical experiments are performed. The exploratory variables(attributes of utility functions) which have a certain degree of correlations from (rho=) 0.0 to (rho=) 0.9 are generated and rho-squares and t-statistics which are the indices of goodness of fit and reliability of logit model are traced. From the well designed numerical experiments, following findings are validated : 1) When a new exploratory variable is added, some of rho-squares increase while the others decrease. 2) The higher relations between generic variables lead a logit model worse with respect to goodness of fit. 3) Multicollinearity has a tendency to produce over-evaluated parameters. 4) The reliability of the estimated parameter has a tendency to decrease when the correlations between attributes are high. These results suggest that we have to examine the existence of multicollinearity and perform the proper treatments to diminish multicollinearity when we develop logit model.

Multicollinarity in Logistic Regression

  • Jong-Han lee;Myung-Hoe Huh
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.2
    • /
    • pp.303-309
    • /
    • 1995
  • Many measures to detect multicollinearity in linear regression have been proposed in statistics and numerical analysis literature. Among them, condition number and variance inflation factor(VIF) are most popular. In this study, we give new interpretations of condition number and VIF in linear regression, using geometry on the explanatory space. In the same line, we derive natural measures of condition number and VIF for logistic regression. These computer intensive measures can be easily extended to evaluate multicollinearity in generalized linear models.

  • PDF

A Research on Improving the Evaluation Model for Management Innovative Enterprises (서비스 경영 혁신 기업 평가 모형의 개선 방안 연구)

  • Roh, Jae-Whak
    • International Commerce and Information Review
    • /
    • v.12 no.4
    • /
    • pp.279-302
    • /
    • 2010
  • A better selection model on management innovative enterprises is needed since the Korean government provides multi benefits to those selected enterprises. However, the selection model's propriety is suspicious because of the shortage of consideration of assessment items. In particular, the most important two assessment items, strategy and performance are suspected of multicollinearity because of high correlation scores. No consideration on multicollinearity among those items leads to erroneous selection which doubly counts the same components with different item names. The principle component analysis is applied to factor out the uncorrelated items. Using the resulted principle components, the new estimations are carried out. The comparison between estimated results from using principle components and non principle components shows that the present selection model overly considers the performance items compared to the real effect of items, which is a result of multicollinearity between performance and strategy.

  • PDF

The Use Ridge Regression for Yield Prediction Models with Multicollinearity Problems (수확예측(收穫豫測) Model의 Multicollinearity 문제점(問題點) 해결(解決)을 위(爲)한 Ridge Regression의 이용(利用))

  • Shin, Man Yong
    • Journal of Korean Society of Forest Science
    • /
    • v.79 no.3
    • /
    • pp.260-268
    • /
    • 1990
  • Two types of ridge regression estimators were compared with the ordinary least squares (OLS) estimator in order to select the "best" estimator when multicollinearitc existed. The ridge estimators were Mallows's (1973) $C_P$-like statistic, and Allen's (1974) PRESS-like statistic. The evaluation was conducted based on the predictive ability of a yield model developed by Matney et al. (1988). A total of 522 plots from the data of the Southwide Loblolly Pine Seed Source study was used in this study. All of ridge estimators were better in predictive ability than the OLS estimator. The ridge estimator obtained by using Mallows's statistic performed the best. Thus, ridge estimators can be recommended as an alternative estimator when multicollinearity exists among independent variables.

  • PDF

Exploring a Way to Overcome Multicollinearity Problems by Using Hierarchical Construct Model in Structural Equation Model (SEM에서 위계모형을 이용한 다중공선성 문제 극복방안 연구 : 소셜커머스의 재구매의도 영향요인을 중심으로)

  • Kwon, Sundong
    • Journal of Information Technology Applications and Management
    • /
    • v.22 no.2
    • /
    • pp.149-169
    • /
    • 2015
  • This study tried to find out how to overcome multicollinearity problems in the structural equation model by creating a hierarchical construct model about the repurchase intention of social commerce. This study selected, as independent variables, price, quality, service, and social influence, based on literature review about social commerce, and then, as detailed variables of independent variables, selected system quality, information quality, transaction safety, order fulfillment and after-sales service, communication, subjective norms, and reputation. As results of empirical analysis about hierarchical construct model, all the independent variables were accepted having a significant impact on repurchase intention of social commerce. Next, this study analyzed the competition model that eight independent variables of price, system quality, information quality, transaction safety, order fulfillment and after-sales service, communication, subjective norm, and reputation directly influence the repurchase intention of social commerce. As results of empirical analysis, system quality, information quality, transaction safety, communication appeared to be insignificant. This study showed that hierarchical construct model is useful to overcome the multicollinearity problem in structural equational model and to increase explanatory power.

Optimal fractions in terms of a prediction-oriented measure

  • Lee, Won-Woo
    • Journal of the Korean Statistical Society
    • /
    • v.22 no.2
    • /
    • pp.209-217
    • /
    • 1993
  • The multicollinearity problem in a multiple linear regression model may present deleterious effects on predictions. Thus, its is desirable to consider the optimal fractions with respect to the unbiased estimate of the mean squares errors of the predicted values. Interstingly, the optimal fractions can be also illuminated by the Bayesian inerpretation of the general James-Stein estimators.

  • PDF

Optimizing SVM Ensembles Using Genetic Algorithms in Bankruptcy Prediction

  • Kim, Myoung-Jong;Kim, Hong-Bae;Kang, Dae-Ki
    • Journal of information and communication convergence engineering
    • /
    • v.8 no.4
    • /
    • pp.370-376
    • /
    • 2010
  • Ensemble learning is a method for improving the performance of classification and prediction algorithms. However, its performance can be degraded due to multicollinearity problem where multiple classifiers of an ensemble are highly correlated with. This paper proposes genetic algorithm-based optimization techniques of SVM ensemble to solve multicollinearity problem. Empirical results with bankruptcy prediction on Korea firms indicate that the proposed optimization techniques can improve the performance of SVM ensemble.