• Title/Summary/Keyword: Crossvalidation

Search Result 10, Processing Time 0.038 seconds

Application of Statistical Geo-Spatial Information Technology to Soil Stratification (통계적 지반 공간 정보 기법을 이용한 지층구조 분석)

  • Kim, Han-Saem;Kim, Hyun-Ki;Shin, Si-Yeol;Chung, Choong-Ki
    • Journal of the Korean Geotechnical Society
    • /
    • v.27 no.7
    • /
    • pp.59-68
    • /
    • 2011
  • Subsurface Investigation results always reflect a level of soil uncertainty, which sometimes requires statistical corrections of the data for the appropriate engineering decision. This study suggests a closed-form framework to extract the outlying data points from the testing results using the statistical geo-spatial information analyses with outlier analysis and kring-based crossvalidation. The suggested analysis method is conducted to soil stratification using the borehole data in Yeouido.

cDNA Microarray Normalization에 대한 연구

  • Kim, Jong-Yeong;Lee, Jae-Won
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.10a
    • /
    • pp.331-334
    • /
    • 2003
  • 마이크로 어레이(microarray)실험에서 표준화(normalization)는 유전자의 발현수준에 영향을 미치는 여러 기술적인 변인을 제거하는 과정이다. cDNA microarray normalization에 있어 여러 방법이 제안되었지만, 이중 print-tip 효과가 존재할 때 사용되는 방법으로 print-tip lowess normalization이 대표적으로 사용된다. normalization에 사용되는 lowess 함수는 데이터의 특성에 따라 window width를 정해야만 연구의 목적에 맞는 결과를 도출할 수 있다. 본 논문에서는 각각의 tip에서 최적의 window width를 계산하는 절차를 논의하였다. 또한 이의 결과와 기존의 같은 window width를 사용하는 print-tip lowess normalization 결과와 비교 평가하여 normalization의 기본 원칙에 대한 타당성을 확인하였다.

  • PDF

QSPR Study of the Absorption Maxima of Azobenzene Dyes

  • Xu, Jie;Wang, Lei;Liu, Li;Bai, Zikui;Wang, Luoxin
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.11
    • /
    • pp.3865-3872
    • /
    • 2011
  • A quantitative structure-property relationship (QSPR) study was performed for the prediction of the absorption maxima of azobenzene dyes. The entire set of 191 azobenzenes was divided into a training set of 150 azobenzenes and a test set of 41 azobenzenes according to Kennard and Stones algorithm. A seven-descriptor model, with squared correlation coefficient ($R^2$) of 0.8755 and standard error of estimation (s) of 14.476, was developed by applying stepwise multiple linear regression (MLR) analysis on the training set. The reliability of the proposed model was further illustrated using various evaluation techniques: leave-many-out crossvalidation procedure, randomization tests, and validation through the test set.

Classification Accuracy Improvement for Decision Tree (의사결정트리의 분류 정확도 향상)

  • Rezene, Mehari Marta;Park, Sanghyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.787-790
    • /
    • 2017
  • Data quality is the main issue in the classification problems; generally, the presence of noisy instances in the training dataset will not lead to robust classification performance. Such instances may cause the generated decision tree to suffer from over-fitting and its accuracy may decrease. Decision trees are useful, efficient, and commonly used for solving various real world classification problems in data mining. In this paper, we introduce a preprocessing technique to improve the classification accuracy rates of the C4.5 decision tree algorithm. In the proposed preprocessing method, we applied the naive Bayes classifier to remove the noisy instances from the training dataset. We applied our proposed method to a real e-commerce sales dataset to test the performance of the proposed algorithm against the existing C4.5 decision tree classifier. As the experimental results, the proposed method improved the classification accuracy by 8.5% and 14.32% using training dataset and 10-fold crossvalidation, respectively.

Semiparametric Regression Splines in Matched Case-Control Studies

  • Kim, In-Young;Carroll, Raymond J.;Cohen, Noah
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.05a
    • /
    • pp.167-170
    • /
    • 2003
  • We develop semiparametric methods for matched case-control studies using regression splines. Three methods are developed: an approximate crossvalidation scheme to estimate the smoothing parameter inherent in regression splines, as well as Monte Carlo Expectation Maximization (MCEM) and Bayesian methods to fit the regression spline model. We compare the approximate cross-validation approach, MCEM and Bayesian approaches using simulation, showing that they appear approximately equally efficient, with the approximate cross-validation method being computationally the most convenient. An example from equine epidemiology that motivated the work is used to demonstrate our approaches.

  • PDF

Comparison of QSAR Methods (CoMFA, CoMSIA, HQSAR) of Anticancer 1-N-Substituted Imidazoquinoline-4,9-dione Derivatives

  • Suh, Myung-Eun;Park, So-Young;Lee, Hyun-Jung
    • Bulletin of the Korean Chemical Society
    • /
    • v.23 no.3
    • /
    • pp.417-422
    • /
    • 2002
  • Comparison studies of the Quantitative Structure Activity Relationship (QSAR) methods with new imidazo-quinolinedione derivatives were conducted using Comparative Molecular Field Analysis (CoMFA), Comparative Molecular Similarity Indices Analysis (CoMSIA), and the Hologram Quantitative Structure Activity Relationship (HQSAR). When the CoMFA crossvalidation value, q2, was 0.625, the Pearson correlation coefficient, r2, was 0.973. In CoMSIA, q2 was 0.52 and r2 was 0.979. In the HQSAR, q2 was 0.501 and r2 was 0.924. The best result was obtained using the CoMSIA method according to a comparison of the calculated values with the real in vitro cytotoxic activities against human ovarian cancer cell lines.

Application of artificial neural network to differential diagnosis of lung lesion: Preliminary results

  • Lee, Hae-Jun;Lee, Yu-Kyung;Hwang, Kyung-Hoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.1614-1615
    • /
    • 2011
  • It is difficult to differentially diagnose between lung cancer and benign inflammatory lung lesion due to high false positive rate on F-18 FDG-PET. We investigated whether application of artificial neural network to this diagnosis may be helpful. We reviewed the medical records and F-18 FDG PET images of 12 patients, selecting clinical and PET variables such as SUV. For selected variables and confirm, multilayer neural perceptron was applied in crossvalidation method and compared to visual interpretation. Neural network correctly classified the lung lesions in 83%, and reduced greately the false positive rate. However, false negative rate was not influenced. Application of neural network to the differential diagnosis between lung cancer and benigh inflammatory lesion may be helpful. Further studies with more patients are warranted.

Two Class Approximation of TLB (Tomato Late Blight) Activity Data (토마토 역병균 항균 활성 데이터의 이분번 근사모델링)

  • Hahn, Hoh-Gyu;M.D., Ashek Ali;Cho, Seung-Joo
    • The Korean Journal of Pesticide Science
    • /
    • v.9 no.2
    • /
    • pp.140-145
    • /
    • 2005
  • Quantitative Structure Activity Relationship (QSAR) assumes the relatedness between physical property and biological activity. However, activity data measured at single concentration such as percent activity have not been used extensively for modeling purpose. This probably comes from the fact that these values are qualitative instead of quantitative. To utilize percent activity data for molecular modeling, we classified the whole data into two classes. One class represents the active while the other signifies the inactive. The percent activity data of ${\beta}$-Ketoacetoanilides measured for TLB (Tomato Late Blight) were investigated. CoMFA (Comparative Molecular Field Analysis) was used as a discriminant function. Using CoMFA provides 3D (three dimensional) information, which is crucial for chemical insight. It can also serve as a predictive model. The resultant model classified the given data correctly (98%). When LOO (leave-one-out) crossvalidation procedure was applied, the classification accuracy was 69%. Therefore two class approximation of percent activity data with CoMFA can be utilized to understand the relationship between chemical structure and biological activity and design subsequent chemical analogs.

Prediction accuracy of incisal points in determining occlusal plane of digital complete dentures

  • Kenta Kashiwazaki;Yuriko Komagamine;Sahaprom Namano;Ji-Man Park;Maiko Iwaki;Shunsuke Minakuchi;Manabu, Kanazawa
    • The Journal of Advanced Prosthodontics
    • /
    • v.15 no.6
    • /
    • pp.281-289
    • /
    • 2023
  • PURPOSE. This study aimed to predict the positional coordinates of incisor points from the scan data of conventional complete dentures and verify their accuracy. MATERIALS AND METHODS. The standard triangulated language (STL) data of the scanned 100 pairs of complete upper and lower dentures were imported into the computer-aided design software from which the position coordinates of the points corresponding to each landmark of the jaw were obtained. The x, y, and z coordinates of the incisor point (XP, YP, and ZP) were obtained from the maxillary and mandibular landmark coordinates using regression or calculation formulas, and the accuracy was verified to determine the deviation between the measured and predicted coordinate values. YP was obtained in two ways using the hamularincisive-papilla plane (HIP) and facial measurements. Multiple regression analysis was used to predict ZP. The root mean squared error (RMSE) values were used to verify the accuracy of the XP and YP. The RMSE value was obtained after crossvalidation using the remaining 30 cases of denture STL data to verify the accuracy of ZP. RESULTS. The RMSE was 2.22 for predicting XP. When predicting YP, the RMSE of the method using the HIP plane and facial measurements was 3.18 and 0.73, respectively. Cross-validation revealed the RMSE to be 1.53. CONCLUSION. YP and ZP could be predicted from anatomical landmarks of the maxillary and mandibular edentulous jaw, suggesting that YP could be predicted with better accuracy with the addition of the position of the lower border of the upper lip.

Development of a deep neural network model to estimate solar radiation using temperature and precipitation (온도와 강수를 이용하여 일별 일사량을 추정하기 위한 심층 신경망 모델 개발)

  • Kang, DaeGyoon;Hyun, Shinwoo;Kim, Kwang Soo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.2
    • /
    • pp.85-96
    • /
    • 2019
  • Solar radiation is an important variable for estimation of energy balance and water cycle in natural and agricultural ecosystems. A deep neural network (DNN) model has been developed in order to estimate the daily global solar radiation. Temperature and precipitation, which would have wider availability from weather stations than other variables such as sunshine duration, were used as inputs to the DNN model. Five-fold cross-validation was applied to train and test the DNN models. Meteorological data at 15 weather stations were collected for a long term period, e.g., > 30 years in Korea. The DNN model obtained from the cross-validation had relatively small value of RMSE ($3.75MJ\;m^{-2}\;d^{-1}$) for estimates of the daily solar radiation at the weather station in Suwon. The DNN model explained about 68% of variation in observed solar radiation at the Suwon weather station. It was found that the measurements of solar radiation in 1985 and 1998 were considerably low for a small period of time compared with sunshine duration. This suggested that assessment of the quality for the observation data for solar radiation would be needed in further studies. When data for those years were excluded from the data analysis, the DNN model had slightly greater degree of agreement statistics. For example, the values of $R^2$ and RMSE were 0.72 and $3.55MJ\;m^{-2}\;d^{-1}$, respectively. Our results indicate that a DNN would be useful for the development a solar radiation estimation model using temperature and precipitation, which are usually available for downscaled scenario data for future climate conditions. Thus, such a DNN model would be useful for the impact assessment of climate change on crop production where solar radiation is used as a required input variable to a crop model.