• Title/Summary/Keyword: variable selection

Search Result 874, Processing Time 0.032 seconds

Variable Selection Based on Direction Vectors

  • Kyungmee Choi
    • Communications for Statistical Applications and Methods
    • /
    • v.5 no.1
    • /
    • pp.25-33
    • /
    • 1998
  • We review a multivariate version of Kendall's tau based on direction vectors of observations. And with this statistic we propose an analog of the forward variable selection method which selects a set of independent variables for further studies to build the eventual predicting model. This method does not assume the distributions of observations and the linear model and it is strong to the outliers with high asymptotic efficiencies relative to the parametric Pearson's correlation coefficient.

  • PDF

Input Variable Selection by Principal Component Analysis and Mutual Information Estimation (주요성분분석과 상호정보 추정에 의한 입력변수선택)

  • Jo, Yong-Hyeon;Hong, Seong-Jun
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.11a
    • /
    • pp.175-178
    • /
    • 2006
  • 본 논문에서는 주요성분분석과 상호정보 추정을 조합한 입력변수선택 기법을 제안하였다. 여기서 주요성분분석은 2차원 통계성을 이용하여 입력변수 간의 독립성을 찾기 위함이고, 상호정보의 추정은 적응적 분할을 이용하여 입력변수의 확률밀도함수를 계산함으로써 변수상호간의 종속성을 좀더 정확하게 측정하기 위함이다. 제안된 기법을 인위적으로 제시된 각 500개의 샘플을 가지는 6개의 독립신호와 1개의 종속신호를 대상으로 실험한 결과, 빠르고 정확한 변수의 선택이 이루어짐을 확인하였다.

  • PDF

Geometrical description based on forward selection & backward elimination methods for regression models (다중회귀모형에서 전진선택과 후진제거의 기하학적 표현)

  • Hong, Chong-Sun;Kim, Moung-Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.5
    • /
    • pp.901-908
    • /
    • 2010
  • A geometrical description method is proposed to represent the process of the forward selection and backward elimination methods among many variable selection methods for multiple regression models. This graphical method shows the process of the forward selection and backward elimination on the first and second quadrants, respectively, of half circle with a unit radius. At each step, the SSR is represented by the norm of vector and the extra SSR or partial determinant coefficient is represented by the angle between two vectors. Some lines are dotted when the partial F test results are statistically significant, so that statistical analysis could be explored. This geometrical description can be obtained the final regression models based on the forward selection and backward elimination methods. And the goodness-of-fit for the model could be explored.

Spectral Analysis Accompanied with Seasonal Linear Model as Applied to Intra-Day Call Prediction (스펙트럼 분석과 계절성 선형 모델을 이용한 Intra-Day 콜센터 통화량예측)

  • Shin, Taek-Soo;Kim, Myung-Suk
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.217-225
    • /
    • 2011
  • In this paper, a seasonal variable selection method using the spectral analysis accompanied with seasonal linear model is suggested. The suggested method is applied to the prediction of intra-day call arrivals at a large North American commercial bank call center and a signi cant intra-month seasonal variable I detected. This newly detected seasonal factor is included in the seasonal linear model and is compared with the seasonal linear models without this variable to see whether the new variable helps to improve the forecasting performance. The seasonal linear model with the new variable outperformed the models without it in one-day-ahead forecasting.

A study on the variable structure control method including robot operational condition (로보트 운용조건을 포함한 가변구조 제어방식에 관한 연구)

  • 이홍규;이범희;최계근
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1988.10a
    • /
    • pp.72-75
    • /
    • 1988
  • Due to the fact that the set point regulation scheme by the variable structure control method concerns only the initial and final locations of a manipulator, many constraints may exist in the application of path tracking with obstracle avoidance. The variable structure parameter should be selected in the trajectory planning step by satisfying the constraints of the travel time and the path deviations This paper presents the selection algorithm of the variable structure parameters with the constraints of the system dynamics and the travel time and the path deviation. This study makes unify the trajectory planning and tracking control using the variable structure control method.

  • PDF

Multi-objective Genetic Algorithm for Variable Selection in Linear Regression Model and Application (선형회귀모델의 변수선택을 위한 다중목적 유전 알고리즘과 응용)

  • Kim, Dong-Il;Park, Cheong-Sool;Baek, Jun-Geol;Kim, Sung-Shick
    • Journal of the Korea Society for Simulation
    • /
    • v.18 no.4
    • /
    • pp.137-148
    • /
    • 2009
  • The purpose of this study is to implement variable selection algorithm which helps construct a reliable linear regression model. If we use all candidate variables to construct a linear regression model, the significance of the model will be decreased and it will cause 'Curse of Dimensionality'. And if the number of data is less than the number of variables (dimension), we cannot construct the regression model. Due to these problems, we consider the variable selection problem as a combinatorial optimization problem, and apply GA (Genetic Algorithm) to the problem. Typical measures of estimating statistical significance are $R^2$, F-value of regression model, t-value of regression coefficients, and standard error of estimates. We design GA to solve multi-objective functions, because statistical significance of model is not to be estimated by a single measure. We perform experiments using simulation data, designed to consider various kinds of situations. As a result, it shows better performance than LARS (Least Angle Regression) which is an algorithm to solve variable selection problems. We modify algorithm to solve portfolio selection problem which construct portfolio by selecting stocks. We conclude that the algorithm is able to solve real problems.

A study on variable selection and classification in dynamic analysis data for ransomware detection (랜섬웨어 탐지를 위한 동적 분석 자료에서의 변수 선택 및 분류에 관한 연구)

  • Lee, Seunghwan;Hwang, Jinsoo
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.497-505
    • /
    • 2018
  • Attacking computer systems using ransomware is very common all over the world. Since antivirus and detection methods are constantly improved in order to detect and mitigate ransomware, the ransomware itself becomes equally better to avoid detection. Several new methods are implemented and tested in order to optimize the protection against ransomware. In our work, 582 of ransomware and 942 of normalware sample data along with 30,967 dynamic action sequence variables are used to detect ransomware efficiently. Several variable selection techniques combined with various machine learning based classification techniques are tried to protect systems from ransomwares. Among various combinations, chi-square variable selection and random forest gives the best detection rates and accuracy.

Input Variable Selection by Using Fixed-Point ICA and Adaptive Partition Mutual Information Estimation (고정점 알고리즘의 독립성분분석과 적응분할의 상호정보 추정에 의한 입력변수선택)

  • Cho, Yong-Hyun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.5
    • /
    • pp.525-530
    • /
    • 2006
  • This paper presents an efficient input variable selection method using both fixed-point independent component analysis(FP-ICA) and adaptive partition mutual information(AP-MI) estimation. FP-ICA which is based on secant method, is applied to quickly find the independence between input variables. AP-MI estimation is also applied to estimate an accurate dependence information by equally partitioning the samples of input variable for calculating the probability density function(PDF). The proposed method has been applied to 2 problems for selecting the input variables, which are the 7 artificial signals of 500 samples and the 24 environmental pollution signals of 55 samples, respectively The experimental results show that the proposed methods has a fast and accurate selection performance. The proposed method has also respectively better performance than AP-MI estimation without the FP-ICA and regular partition MI estimation.

Variable selection for latent class analysis using clustering efficiency (잠재변수 모형에서의 군집효율을 이용한 변수선택)

  • Kim, Seongkyung;Seo, Byungtae
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.721-732
    • /
    • 2018
  • Latent class analysis (LCA) is an important tool to explore unseen latent groups in multivariate categorical data. In practice, it is important to select a suitable set of variables because the inclusion of too many variables in the model makes the model complicated and reduces the accuracy of the parameter estimates. Dean and Raftery (Annals of the Institute of Statistical Mathematics, 62, 11-35, 2010) proposed a headlong search algorithm based on Bayesian information criteria values to choose meaningful variables for LCA. In this paper, we propose a new variable selection procedure for LCA by utilizing posterior probabilities obtained from each fitted model. We propose a new statistic to measure the adequacy of LCA and develop a variable selection procedure. The effectiveness of the proposed method is also presented through some numerical studies.

Guided Selection of Human Antibody Light Chains against TAG-72 Using a Phage Display Chain Shuffling Approach

  • Kim, Sang-Jick;Hong, Hyo-Jeong
    • Journal of Microbiology
    • /
    • v.45 no.6
    • /
    • pp.572-577
    • /
    • 2007
  • To enhance therapeutic potential of murine monoclonal antibody, humanization by CDR grafting is usually used to reduce immunogenic mouse residues. Most humanized antibodies still have mouse residues critical for antigen binding, but the mouse residues may evoke immune responses in humans. Previously, we constructed a new humanized version (AKA) of mouse CC49 antibody specific for tumor-associated glycoprotein, TAG-72. In this study, to select a completely human antibody light chain against TAG-72, guided selection strategy using phage display was used. The heavy chain variable region (VH) of AKA was used to guide the selection of a human TAG-72-specific light chain variable region (VL) from a human VL repertoire constructed from human PBL. Most of the selected VLs were identified to be originated from the members of the human germline VK1 family, whereas the VL of AKA is more homologous to the VK4 family. Competition binding assay of the selected Fabs with mouse CC49 suggested that the epitopes of the Fabs overlap with that of CC49. In addition, they showed better antigen-binding affinity compared to parental AKA. The selected human VLs may be used to guide the selection of human VHs to get completely human anti-TAG72 antibody.