• Title/Summary/Keyword: Partial least squares regression

Search Result 188, Processing Time 0.029 seconds

Shrinkage Structure of Ridge Partial Least Squares Regression

  • Kim, Jong-Duk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.2
    • /
    • pp.327-344
    • /
    • 2007
  • Ridge partial least squares regression (RPLS) is a regression method which can be obtained by combining ridge regression and partial least squares regression and is intended to provide better predictive ability and less sensitive to overfitting. In this paper, explicit expressions for the shrinkage factor of RPLS are developed. The structure of the shrinkage factor is explored and compared with those of other biased regression methods, such as ridge regression, principal component regression, ridge principal component regression, and partial least squares regression using a near infrared data set.

  • PDF

Unified Non-iterative Algorithm for Principal Component Regression, Partial Least Squares and Ordinary Least Squares

  • Kim, Jong-Duk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.355-366
    • /
    • 2003
  • A unified procedure for principal component regression (PCR), partial least squares (PLS) and ordinary least squares (OLS) is proposed. The process gives solutions for PCR, PLS and OLS in a unified and non-iterative way. This enables us to see the interrelationships among the three regression coefficient vectors, and it is seen that the so-called E-matrix in the solution expression plays the key role in differentiating the methods. In addition to setting out the procedure, the paper also supplies a robust numerical algorithm for its implementation, which is used to show how the procedure performs on a real world data set.

  • PDF

Missing Values Estimation for Time Course Gene Expression Data Using the Sequential Partial Least Squares Regression Fitting (순차적 부분최소제곱 회귀적합에 의한 시간경로 유전자 발현 자료의 결측치 추정)

  • Kim, Kyung-Sook;Oh, Mi-Ra;Baek, Jang-Sun;Son, Young-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.2
    • /
    • pp.275-290
    • /
    • 2008
  • The size of microarray gene expression data is very big and its observation process is also very complex. Thus missing values are frequently occurred. In this paper we propose the sequential partial least squares(SPLS) regression fitting method to estimate missing values for time course gene expression data that has correlations among observations over time points. The SPLS method is to combine the sequential technique with the partial least squares(PLS) regression fitting method. The usefulness of method proposed is evaluated through some simulation study for three yeast time course data.

A modified partial least squares regression for the analysis of gene expression data with survival information

  • Lee, So-Yoon;Huh, Myung-Hoe;Park, Mira
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1151-1160
    • /
    • 2014
  • In DNA microarray studies, the number of genes far exceeds the number of samples and the gene expression measures are highly correlated. Partial least squares regression (PLSR) is one of the popular methods for dimensional reduction and known to be useful for the classifications of microarray data by several studies. In this study, we suggest a modified version of the partial least squares regression to analyze gene expression data with survival information. The method is designed as a new gene selection method using PLSR with an iterative procedure of imputing censored survival time. Mean square error of prediction criterion is used to determine the dimension of the model. To visualize the data, plot for variables superimposed with samples are used. The method is applied to two microarray data sets, both containing survival time. The results show that the proposed method works well for interpreting gene expression microarray data.

Combining Ridge Regression and Latent Variable Regression

  • Kim, Jong-Duk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.51-61
    • /
    • 2007
  • Ridge regression (RR), principal component regression (PCR) and partial least squares regression (PLS) are among popular regression methods for collinear data. While RR adds a small quantity called ridge constant to the diagonal of X'X to stabilize the matrix inversion and regression coefficients, PCR and PLS use latent variables derived from original variables to circumvent the collinearity problem. One problem of PCR and PLS is that they are very sensitive to overfitting. A new regression method is presented by combining RR and PCR and PLS, respectively, in a unified manner. It is intended to provide better predictive ability and improved stability for regression models. A real-world data from NIR spectroscopy is used to investigate the performance of the newly developed regression method.

  • PDF

Development of Virtual Metrology Models in Semiconductor Manufacturing Using Genetic Algorithm and Kernel Partial Least Squares Regression (유전알고리즘과 커널 부분최소제곱회귀를 이용한 반도체 공정의 가상계측 모델 개발)

  • Kim, Bo-Keon;Yum, Bong-Jin
    • IE interfaces
    • /
    • v.23 no.3
    • /
    • pp.229-238
    • /
    • 2010
  • Virtual metrology (VM), a critical component of semiconductor manufacturing, is an efficient way of assessing the quality of wafers not actually measured. This is done based on a model between equipment sensor data (obtained for all wafers) and the quality characteristics of wafers actually measured. This paper considers principal component regression (PCR), partial least squares regression (PLSR), kernel PCR (KPCR), and kernel PLSR (KPLSR) as VM models. For each regression model, two cases are considered. One utilizes all explanatory variables in developing a model, and the other selects significant variables using the genetic algorithm (GA). The prediction performances of 8 regression models are compared for the short- and long-term etch process data. It is found among others that the GA-KPLSR model performs best for both types of data. Especially, its prediction ability is within the requirement for the short-term data implying that it can be used to implement VM for real etch processes.

A Study on the Several Robust Regression Estimators

  • Kim, Jee-Yun;Roh, Kyung-Mi;Hwang, Jin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.2
    • /
    • pp.307-316
    • /
    • 2004
  • Principal Component Regression(PCR) and Partial Least Squares Regression(PLSR) are the two most popular regression techniques in chemometrics. In the field of chemometrics usually the number of regressor variables greatly exceeds the number of observation. So we have to reduce the number of regressors to avoid the identifiability problem. In this paper we compare PCR and PLSR techniques combined with various robust regression methods including regression depth estimation. We compare the efficiency, goodness-of-fit and robustness of each estimators under several contamination schemes.

  • PDF

A Method for Screening Product Design Variables for Building A Usability Model : Genetic Algorithm Approach (사용편의성 모델수립을 위한 제품 설계 변수의 선별방법 : 유전자 알고리즘 접근방법)

  • Yang, Hui-Cheol;Han, Seong-Ho
    • Journal of the Ergonomics Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.45-62
    • /
    • 2001
  • This study suggests a genetic algorithm-based partial least squares (GA-based PLS) method to select the design variables for building a usability model. The GA-based PLS uses a genetic algorithm to minimize the root-mean-squared error of a partial least square regression model. A multiple linear regression method is applied to build a usability model that contains the variables seleded by the GA-based PLS. The performance of the usability model turned out to be generally better than that of the previous usability models using other variable selection methods such as expert rating, principal component analysis, cluster analysis, and partial least squares. Furthermore, the model performance was drastically improved by supplementing the category type variables selected by the GA-based PLS in the usability model. It is recommended that the GA-based PLS be applied to the variable selection for developing a usability model.

  • PDF

Expressions for Shrinkage Factors of PLS Estimator

  • Kim, Jong-Duk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.4
    • /
    • pp.1169-1180
    • /
    • 2006
  • Partial least squares regression (PLS) is a biased, non-least squares regression method and is an alternative to the ordinary least squares regression (OLS) when predictors are highly collinear or predictors outnumber observations. One way to understand the properties of biased regression methods is to know how the estimators shrink the OLS estimator. In this paper, we introduce an expression for the shrinkage factor of PLS and develop a new shrinkage expression, and then prove the equivalence of the two representations. We use two near-infrared (NIR) data sets to show general behavior of the shrinkage and in particular for what eigendirections PLS expands the OLS coefficients.

  • PDF

A new classification method using penalized partial least squares (벌점 부분최소자승법을 이용한 분류방법)

  • Kim, Yun-Dae;Jun, Chi-Hyuck;Lee, Hye-Seon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.931-940
    • /
    • 2011
  • Classification is to generate a rule of classifying objects into several categories based on the learning sample. Good classification model should classify new objects with low misclassification error. Many types of classification methods have been developed including logistic regression, discriminant analysis and tree. This paper presents a new classification method using penalized partial least squares. Penalized partial least squares can make the model more robust and remedy multicollinearity problem. This paper compares the proposed method with logistic regression and PCA based discriminant analysis by some real and artificial data. It is concluded that the new method has better power as compared with other methods.