• Title, Summary, Keyword: Partial least squares discriminant analysis

Search Result 49, Processing Time 0.046 seconds

Partial Least Squares-discriminant Analysis for the Prediction of Hemodynamic Changes Using Near Infrared Spectroscopy

  • Seo, Youngwook;Lee, Seungduk;Koh, Dalkwon;Kim, Beop-Min
    • Journal of the Optical Society of Korea
    • /
    • v.16 no.1
    • /
    • pp.57-62
    • /
    • 2012
  • Using continuous wave near-infrared spectroscopy, we measured time-resolved concentration changes of oxy-hemoglobin and deoxy-hemoglobin from the primary motor cortex following finger tapping tasks. These data were processed using partial least squares-discriminant analysis (PLS-DA) to develop a prediction model for a brain-computer interface. The tasks were composed of a series of finger tapping for 15 sec and relaxation for 45 sec. The location of the motor cortex was confirmed by the anti-phasic behavior of the oxy- and deoxy-hemoglobin changes. The results were compared with those obtained using the hidden Markov model (HMM) which has been known to produce the best prediction model. Our data imply that PLS-DA makes better judgments in determining the onset of the events than HMM.

Discrimination Model of Cultivation Area of Alismatis Rhizoma using a GC-MS-Based Metabolomics Approach (GC-MS 기반 대사체학 기법을 이용한 택사의 산지판별모델)

  • Leem, Jae-Yoon
    • YAKHAK HOEJI
    • /
    • v.60 no.1
    • /
    • pp.29-35
    • /
    • 2016
  • Traditional Korean medicines may be managed more scientifically, through the development of logical criterion to verify their cultivation region. It contributes to advance the industry of traditional herbal medicines. Volatile compounds were obtained from 14 samples of domestic Taeksa and 30 samples of Chinese Taeksa by steam distillation. The metabolites were identified by NIST mass spectral library in the obtained gas chromatography/mass spectrometer (GC/MS) data of 35 training samples. The multivariate statistical analysis, such as Principal Component Analysis (PCA), Partial Least Squares Discriminant Analysis (PLS-DA), and Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA), were performed based on the qualitative and quantitative data. Finally trans-(2,3-diphenylcyclopropyl)methyl phenyl sulfoxide (47.265 min), 1,2,3,4-tetrahydro-1-phenyl-naphthalene (47.781 min), spiro[4-oxatricyclo[5.3.0.0.(2,6)]decan-3-one-5,2'-cyclohexane] (54.62 min), 6-[7-nitrobenzofurazan-4-yl]amino-morphinan-4,5-epoxy (54.86 min), p-hydroxynorephedrine (55.14 min) were determined as marker metabolites to verify candidates for the origin of Taeksa. The statistical model was well established to determine the origin of Taeksa. The cultivation areas of test samples, each 3 domestic and 6 Chinese Taeksa were predicted by the established OPLS-DA model and it was confirmed that all 9 samples were precisely classified.

A new classification method using penalized partial least squares (벌점 부분최소자승법을 이용한 분류방법)

  • Kim, Yun-Dae;Jun, Chi-Hyuck;Lee, Hye-Seon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.931-940
    • /
    • 2011
  • Classification is to generate a rule of classifying objects into several categories based on the learning sample. Good classification model should classify new objects with low misclassification error. Many types of classification methods have been developed including logistic regression, discriminant analysis and tree. This paper presents a new classification method using penalized partial least squares. Penalized partial least squares can make the model more robust and remedy multicollinearity problem. This paper compares the proposed method with logistic regression and PCA based discriminant analysis by some real and artificial data. It is concluded that the new method has better power as compared with other methods.

Comparative Analysis of Cultivation Region of Angelica gigas Using a GC-MS-Based Metabolomics Approach (GC-MS 기반 대사체학 기술을 응용한 참당귀의 산지비교분석)

  • Jiang, Guibao;Leem, Jae Yoon
    • Korean Journal of Medicinal Crop Science
    • /
    • v.24 no.2
    • /
    • pp.93-100
    • /
    • 2016
  • Background: A set of logical criteria that can accurately identify and verify the cultivation region of raw materials is a critical tool for the scientific management of traditional herbal medicine. Methods and Results: Volatile compounds were obtained from 19 and 32 samples of Angelica gigas Nakai cultivated in Korea and China, respectively, by using steam distillation extraction. The metabolites were identified using GC/MS by querying against the NIST reference library. Data binning was performed to normalize the number of variables used in statistical analysis. Multivariate statistical analyses, such as Principal Component Analysis (PCA), Partial Least Squares-Discriminant Analysis (PLS-DA), and Orthogonal Partial Least Squares-Discriminant Analysis (OPLS-DA) were performed using the SIMCA-P software. Significant variables with a Variable Importance in the Projection (VIP) score higher than 1.0 as obtained through OPLS-DA and those that resulted in p-values less than 0.05 through one-way ANOVA were selected to verify the marker compounds. Among the 19 variables extracted, styrene, ${\alpha}$-pinene, and ${\beta}$-terpinene were selected as markers to indicate the origin of A. gigas. Conclusions: The statistical model developed was suitable for determination of the geographical origin of A. gigas. The cultivation regions of six Korean and eight Chinese A. gigas. samples were predicted using the established OPLS-DA model and it was confirmed that 13 of the 14 samples were accurately classified.

Comparison of 12 Isoflavone Profiles of Soybean (Glycine max (L.) Merrill) Seed Sprouts from Three Different Countries

  • Park, Soo-Yun;Kim, Jae Kwang;Kim, Eun-Hye;Kim, Seung-Hyun;Prabakaran, Mayakrishnan;Chung, Ill-Min
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.63 no.4
    • /
    • pp.360-377
    • /
    • 2018
  • The levels of 12 isoflavones were measured in soybean (Glycine max (L.) Merrill) sprouts of 68 genetic varieties from three countries (China, Japan, and Korea). The isoflavone profile differences were analyzed using data mining methods. A principal component analysis (PCA) revealed that the CSRV021 variety was separated from the others by the first two principal components. This variety appears to be most suited for functional food production due to its high isoflavone levels. Partial least squares discriminant analysis (PLS-DA) and orthogonal projections to latent structures discriminant analysis (OPLS-DA) showed that there are meaningful isoflavone compositional differences in samples that have different countries of origin. Hierarchical clustering analysis (HCA) of these phytochemicals resulted in clusters derived from closely related biochemical pathways. These results indicate the usefulness of metabolite profiling combined with chemometrics as a tool for assessing the quality of foods and identifying metabolic links in biological systems.

Differentiation of Roots of Glycyrrhiza Species by 1H Nuclear Magnetic Resonance Spectroscopy and Multivariate Statistical Analysis

  • Yang, Seung-Ok;Hyun, Sun-Hee;Kim, So-Hyun;Kim, Hee-Su;Lee, Jae-Hwi;Whang, Wan-Kyun;Lee, Min-Won;Choi, Hyung-Kyoon
    • Bulletin of the Korean Chemical Society
    • /
    • v.31 no.4
    • /
    • pp.825-828
    • /
    • 2010
  • To classify Glycyrrhiza species, samples of different species were analyzed by $^1H$ NMR-based metabolomics technique. Partial least squares discriminant analysis (PLS-DA) was used as the multivariate statistical analysis of the 1H NMR data sets. There was a clear separation between various Glycyrrhiza species in the PLS-DA derived score plots. The PLS-DA model was validated, and the key metabolites contributing to the separation in the score plots of various Glycyrrhiza species were lactic acid, alanine, arginine, proline, malic acid, asparagine, choline, glycine, glucose, sucrose, 4-hydroxy-phenylacetic acid, and formic acid. The compounds present at relatively high levels were glucose, and 4-hydroxyphenylacetic acid in G. glabra; lactic acid, alanine, and proline in G. inflata; and arginine, malic acid, and sucrose in G. uralensis. This is the first study to perform the global metabolomic profiling and differentiation of Glycyrrhiza species using $^1H$ NMR and multivariate statistical analysis.

Unraveling dynamic metabolomes underlying different maturation stages of berries harvested from Panax ginseng

  • Lee, Mee Youn;Seo, Han Sol;Singh, Digar;Lee, Sang Jun;Lee, Choong Hwan
    • Journal of Ginseng Research
    • /
    • v.44 no.3
    • /
    • pp.413-423
    • /
    • 2020
  • Background: Ginseng berries (GBs) show temporal metabolic variations among different maturation stages, determining their organoleptic and functional properties. Methods: We analyzed metabolic variations concomitant to five different maturation stages of GBs including immature green (IG), mature green (MG), partially red (PR), fully red (FR), and overmature red (OR) using mass spectrometry (MS)-based metabolomic profiling and multivariate analyses. Results: The partial least squares discriminant analysis score plot based on gas chromatography-MS datasets highlighted metabolic disparity between preharvest (IG and MG) and harvest/postharvest (PR, FR, and OR) GB extracts along PLS1 (34.9%) with MG distinctly segregated across PLS2 (18.2%). Forty-three significantly discriminant primary metabolites were identified encompassing five developmental stages (variable importance in projection > 1.0, p < 0.05). Among them, most amino acids, organic acids, 5-C sugars, ethanolamines, purines, and palmitic acid were detected in preharvest GB extracts, whereas 6-C sugars, phenolic acid, and oleamide levels were distinctly higher during later maturation stages. Similarly, the partial least squares discriminant analysis based on liquid chromatography-MS datasets displayed preharvest and harvest/postharvest stages clustered across PLS1 (11.1 %); however, MG and PR were separated from IG, FR, and OR along PLS2 (5.6 %). Overall, 24 secondary metabolites were observed significantly discriminant (variable importance in projection > 1.0, p < 0.05), with most displaying higher relative abundance during preharvest stages excluding ginsenosides Rg1 and Re. Furthermore, we observed strong positive correlations between total flavonoid and phenolic metabolite contents in GB extracts and antioxidant activity. Conclusion: Comprehending the dynamic metabolic variations associated with GB maturation stages rationalize their optimal harvest time per se the related agroeconomic traits.

Discrimination model of cultivation area of Corni Fructus using a GC-MS-Based metabolomics approach (GC-MS 기반 대사체학 기법을 이용한 산수유의 산지판별모델)

  • Leem, Jae-Yoon
    • Analytical Science and Technology
    • /
    • v.29 no.1
    • /
    • pp.1-9
    • /
    • 2016
  • It is believed that traditional Korean medicines can be managed more scientifically through the development of logical criteria to verify their region of cultivation, and that this could contribute to the advancement of the traditional herbal medicine industry. This study attempted to determine such criteria for Sansuyu. The volatile compounds were obtained from 20 samples of domestic Corni fructus (Sansuyu) and 45 samples of Chinese Sansuyu by steam distillation. The metabolites were identified in the NIST Mass Spectral Library via the obtained gas chromatography/mass spectrometer (GC/MS) data of 53 training samples. Data binning at 0.2 min intervals was performed to normalize the number of variables used in the statistical analysis. Multivariate statistical analyses, such as principle component analysis (PCA), partial least squares-discriminant analysis (PLS-DA), and orthogonal partial least squares-discriminant analysis (OPLS-DA) were performed using the SIMCA-P software package. Significant variables with a variable importance in the projection (VIP) score higher than 1.0 were obtained from OPLS-DA, and variables that resulted in a p-value of less than 0.05 through one-way ANOVA were selected to verify the marker compounds. Finally, among the 11 variables extracted, 1-ethylbutyl-hydroperoxide (9.089 min), nonadecane (20.170 min), butylated hydroxytoluene (25.319 min), 5β,7βH,10α-eudesm-11-en-1α-ol (25.921 min), 7,9-bis(2-methyl-2-propanyl)-1-oxaspiro[4.5]deca-6,9-diene-2,8-dione (34.257 min), and 2-decyldodecyl-benzene (54.717 min) were selected as markers to indicate the origin of Sansuyu. The statistical model developed was suitable for the determination of the geographical origin of Sansuyu. The cultivation areas of four Korean and eight Chinese Sansuyu samples were predicted via the established OPLS-DA model, and it was confirmed that 11 of the 12 samples were accurately classified.

Impurity profiling and chemometric analysis of methamphetamine seizures in Korea

  • Shin, Dong Won;Ko, Beom Jun;Cheong, Jae Chul;Lee, Wonho;Kim, Suhkmann;Kim, Jin Young
    • Analytical Science and Technology
    • /
    • v.33 no.2
    • /
    • pp.98-107
    • /
    • 2020
  • Methamphetamine (MA) is currently the most abused illicit drug in Korea. MA is produced by chemical synthesis, and the final target drug that is produced contains small amounts of the precursor chemicals, intermediates, and by-products. To identify and quantify these trace compounds in MA seizures, a practical and feasible approach for conducting chromatographic fingerprinting with a suite of traditional chemometric methods and recently introduced machine learning approaches was examined. This was achieved using gas chromatography (GC) coupled with a flame ionization detector (FID) and mass spectrometry (MS). Following appropriate examination of all the peaks in 71 samples, 166 impurities were selected as the characteristic components. Unsupervised (principal component analysis (PCA), hierarchical cluster analysis (HCA), and K-means clustering) and supervised (partial least squares-discriminant analysis (PLS-DA), orthogonal partial least squares-discriminant analysis (OPLS-DA), support vector machines (SVM), and deep neural network (DNN) with Keras) chemometric techniques were employed for classifying the 71 MA seizures. The results of the PCA, HCA, K-means clustering, PLS-DA, OPLS-DA, SVM, and DNN methods for quality evaluation were in good agreement. However, the tested MA seizures possessed distinct features, such as chirality, cutting agents, and boiling points. The study indicated that the established qualitative and semi-quantitative methods will be practical and useful analytical tools for characterizing trace compounds in illicit MA seizures. Moreover, they will provide a statistical basis for identifying the synthesis route, sources of supply, trafficking routes, and connections between seizures, which will support drug law enforcement agencies in their effort to eliminate organized MA crime.

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

  • Mehmood, Tahir;Rasheed, Zahid
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.575-587
    • /
    • 2015
  • The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.