• Title/Summary/Keyword: Partial Least Square

Search Result 505, Processing Time 0.028 seconds

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

  • Mehmood, Tahir;Rasheed, Zahid
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.575-587
    • /
    • 2015
  • The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.

Anthocyanins in 'Cabernet Gernischet' (Vitis vinifera L. cv.) Aged Red Wine and Their Color in Aqueous Solution Analyzed by Partial Least Square Regression

  • Han, Fu-Liang;Jiang, Shou-Mei;He, Jian-Jun;Pan, Qiu-Hong;Duan, Chang-Qing;Zhang, Ming-Xia
    • Food Science and Biotechnology
    • /
    • v.18 no.3
    • /
    • pp.724-731
    • /
    • 2009
  • Anthocyanins are considered one of the main color determinants in aged red wine. The anthocyanins in aged red wine made from 'Cabernet Gernischet' (Vitis vinifera L. cv.) grape were investigated by high performance liquid chromatography- electronic spray ionization- mass spectrometry (HPLC-ESI-MS) and their color presented in aqueous solution were evaluated using partial least square regression (PLS). The results showed that there were 37 anthocyanins identified in this wine, including 22 pyranoanthocyanins. The analysis of PLS indicated that different anthocyanins showed distinct color values: malvidin 3-O-(6-O-acetyl)-glucoside-4-vinylguaiacol (Mv3-acet-glu-vg) presented the highest color values, while malvidin 3-O-glucoside (Mv3-glu) showed least. Among the free non-acylated anthocyanins, peonidin 3-O-oglucoside (Pn3-glu) showed the highest color values; the coumarylated anthocyanins presented higher color values than their corresponding acetylated anthocyanins and parent anthocyanins; pyranoanthocyanins presented also higher color values than their original anthocyanins; the color of anthocyanins depended on their structure. This work will be helpful to reveal evolution in aged red wine.

Non-linear Data Classification Using Partial Least Square and Residual Compensator (부분 최소 자승법과 잔차 보상기를 이용한 비선형 데이터 분류)

  • 김경훈;김태영;최원호
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.10 no.2
    • /
    • pp.185-191
    • /
    • 2004
  • Partial least squares(PLS) is one of multiplicate statistical process methods and has been developed in various algorithms with the characteristics of principal component analysis, dimensionality reduction, and analysis of the relationship between input variables and output variables. But it has been limited somewhat by their dependency on linear mathematics. The algorithm is proposed to classify for the non-linear data using PLS and the residual compensator(RC) based on radial basis function network (RBFN). It compensates for the error of the non-linear data using the RC based on RBFN. The experimental result is given to verify its efficiency compared with those of previous works.

Discrimination of Alismatis Rhizoma According to Geographical Origins using Near Infrared Spectroscopy (근적외선분광법을 이용한 택사의 산지 판별법 연구)

  • Lee, Dong Young;Kim, Seung Hyun;Kim, Hyo Jin;Sung, Sang Hyun
    • Korean Journal of Pharmacognosy
    • /
    • v.44 no.4
    • /
    • pp.344-349
    • /
    • 2013
  • Near infrared spectroscopy (NIRS) combined with multivariate analysis was used to discriminate the geographical origin of Alisma orientale from Korea (n=94) and China (n=72). Two-thirds of samples were selected randomly for the training set, and one-third of samples for the test set. Second derivative was used for the pretreatment of NIR spectra. Partial least square discriminant analysis (PLS-DA) models correctly discriminated 100% of the Korean and Chinese A. orientale samples. These results demonstrate the potential use of NIR spectroscopy combined with multivariate analysis as a rapid and accurate method to discriminate A. orientale according to their geographical origin.

Nondestructive Quantification of Intact Ambroxol Tablet using Near-infrared Spectroscopy (근적외분광분석법을 사용한 암브록솔 정제의 비파괴적 정량분석)

  • 임현량;우영아;김도형;김효진;강신정;최현철;최한곤
    • YAKHAK HOEJI
    • /
    • v.48 no.1
    • /
    • pp.60-64
    • /
    • 2004
  • Near-infrared (NIR) spectroscopy was used to determine rapidly and nondestructively the content of ambroxol in intact ambroxol tablets containing 30 mg (12.5% m/m nominal concentration) by collecting NIR spectra in range 1100-1750 nm. The laboratory-made samples had 10.3∼15.9% m/m nominal ambroxol concentration. The measurements were made by reflection using a fiber-optic probe and calibration was carried out by partial least square regression (PLSR) with autoscaling. Model validation was performed by randomly splitting the data set into calibration and validation data set (7 samples as a calibration data set and 5 samples as a validation data set). The developed NIR method gave results comparable to the known values of tablets in a laboratorial manufacturing Process, standard error of calibration (SEC) and standard error of prediction (SEP) being 0.49% and 0.49% m/m respectively. The method showed good accuracy and repeatability NIR spectroscopic determination in intact tablets allowed the potential use of real time monitoring for a running production process.

Quantitative Analysis of Indomethacin by the Portable Near-Infrared (NIR) System (근적외분광분석법을 이용한 인도메타신의 정량분석)

  • 김도형;우영아;김효진
    • YAKHAK HOEJI
    • /
    • v.47 no.5
    • /
    • pp.261-265
    • /
    • 2003
  • Near-infrared (NIR) system was used to determine rapidly and simply indomethacin in buffer solution for a dissolution test of tablets and capsules. Indomethacin standards were prepared ranging from 10 to 50 ppm using the mixture of phosphate buffer (pH 7.2) and water (1 : 4). The near-infrared (NIR) transmittance spectra of indomethacin standard solutions were collected by using a quartz cell in 1 mm and 2 mm pathlength. Partial least square regression (PLSR) was explored to develop calibration models over the spectral range 1100∼1700 nm. The model using 1 mm quartz cell was better than that using 2 mm quartz cell. The PLSR models developed gave standard error of prediction (SEP) of 0.858 ppm. In order to validate the developed calibration model, routine analysis was performed using another standard solutions. The NIR routine analysis showed good correlation with actual values. Standard error of prediction (SEP) is 1.414 ppm for 7 indomethacin samples in routine analysis and its error was permeable in the regulation of Korean Pharmacopoeia (VII). These results show the potential use of the real time monitoring for indomethacin during a dissolution test.

AI Technology Analysis using Partial Least Square Regression

  • Choi, JunHyeog;Jun, Sunghae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.109-115
    • /
    • 2020
  • In this paper, we propose an artificial intelligence(AI) technology analysis using partial least square(PLS) regression model. AI technology is now affecting most areas of our society. So, it is necessary to understand this technology. To analyze the AI technology, we collect the patent documents related to AI from the patent databases in the world. We extract AI technology keywords from the patent documents by text mining techniques. In addition, we analyze the AI keyword data by PLS regression model. This regression model is based on the technique of partial least squares used in the advanced analyses such as bioinformatics, social science, and engineering. To show the performance of our proposed method, we make experiments using AI patent documents, and we illustrate how our research can be applied to real problems. This paper is applicable not only to AI technology but also to other technological fields. This also contributes to understanding other various technologies by PLS regression analysis.

Utilization of R Program for the Partial Least Square Model: Comparison of SmartPLS and R (부분최소제곱모형을 위한 R 프로그램의 활용: SmartPLS와 R의 비교)

  • Kim, Yong-Tae;Lee, Sang-Jun
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.117-124
    • /
    • 2015
  • As the acceptance of statistical analysis has been increased because of Big Data, the needs for an advanced second generation of statistical analysis method like Structural Equation Model are also increasing. This study suggests how R-Program, as open software, can be utilized when Partial Least Square Model, one of the SEMs, is applied to statistical analysis. R is a free software as a part of GNU projects as well as a powerful and useful tool for statistical analysis including Big Data. The study utilized R and SmartPLS, a representative statistical package of PLS-SEM, and analyzed internal consistency reliability, convergent validity, and discriminant validity of the measurement model. The study also analyzed path coefficients and moderator effects of the structural model and compared the results, respectively. The results indicated that R showed the same results with SmartPLS on the measurement model and the structural model. Therefore, the study confirmed that R could be a powerful tool that is alternative to a commercial statistical package in the future.

Development of On-line Sorting System for Detection of Infected Seed Potatoes Using Visible Near-Infrared Transmittance Spectral Technique (가시광 및 근적외선 투과분광법을 이용한 감염 씨감자 온라인 선별시스템 개발)

  • Kim, Dae Yong;Mo, Changyeun;Kang, Jun-Soon;Cho, Byoung-Kwan
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.35 no.1
    • /
    • pp.1-11
    • /
    • 2015
  • In this study, an online seed potato sorting system using a visible and near infrared (40 1100 nm) transmittance spectral technique and statistical model was evaluated for the nondestructive determination of infected and sound seed potatoes. Seed potatoes that had been artificially infected with Pectobacterium atrosepticum, which is known to cause a soil borne disease infection, were prepared for the experiments. After acquiring transmittance spectra from sound and infected seed potatoes, a determination algorithm for detecting infected seed potatoes was developed using the partial least square discriminant analysis method. The coefficient of determination($R^2_p$) of the prediction model was 0.943, and the classification accuracy was above 99% (n = 80) for discriminating diseased seed potatoes from sound ones. This online sorting system has good potential for developing a technique to detect agricultural products that are infected and contaminated by pathogens.

A Study on Antecedents of Online Trust in the Context of e-Government Services (전자정부 서비스 사용에 있어 온라인 신뢰에 관한 연구)

  • Moon, Chul-Woo;Kim, Jae-Hyoun
    • Journal of Internet Computing and Services
    • /
    • v.12 no.3
    • /
    • pp.57-67
    • /
    • 2011
  • Trust is generally assumed to be an important precondition for people's adoption of e-government services. This study analyzes the direct and indirect impact of information privacy, interactivity, subjective norms and words-of-mouth on perceived trust of e-government services and trust toward government. The Partial Least Square(PLS) was applied to the citizen survey data for hypotheses testing. PLS permits the simultaneous testing of cause-effect hypotheses while also allowing evaluation of the measurement model. Statistical results indicate that interactivity, subjective norms and words-of-mouth positively affects perceived trust of e-government services, which in turn affects the level of political efficacy and the trust toward the government. Interactivity has been found to affect words-of-mouth as well. However, information privacy has no significant effect on the trust of e-government services.