• 제목/요약/키워드: Classification trees

검색결과 309건 처리시간 0.032초

Threshold를 이용한 의사결정나무의 생성 (Induction of Decision Tress Using the Threshold Concept)

  • 이후석;김재련
    • 산업경영시스템학회지
    • /
    • 제21권45호
    • /
    • pp.57-65
    • /
    • 1998
  • This paper addresses the data classification using the induction of decision trees. A weakness of other techniques of induction of decision trees is that decision trees are too large because they construct decision trees until leaf nodes have a single class. Our study include both overcoming this weakness and constructing decision trees which is small and accurate. First, we construct the decision trees using classification threshold and exception threshold in construction stage. Next, we present two stage pruning method using classification threshold and reduced error pruning in pruning stage. Empirical results show that our method obtain the decision trees which is accurate and small.

  • PDF

New Splitting Criteria for Classification Trees

  • Lee, Yung-Seop
    • Communications for Statistical Applications and Methods
    • /
    • 제8권3호
    • /
    • pp.885-894
    • /
    • 2001
  • Decision tree methods is the one of data mining techniques. Classification trees are used to predict a class label. When a tree grows, the conventional splitting criteria use the weighted average of the left and the right child nodes for measuring the node impurity. In this paper, new splitting criteria for classification trees are proposed which improve the interpretablity of trees comparing to the conventional methods. The criteria search only for interesting subsets of the data, as opposed to modeling all of the data equally well. As a result, the tree is very unbalanced but extremely interpretable.

  • PDF

Object Classification Method Using Dynamic Random Forests and Genetic Optimization

  • Kim, Jae Hyup;Kim, Hun Ki;Jang, Kyung Hyun;Lee, Jong Min;Moon, Young Shik
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권5호
    • /
    • pp.79-89
    • /
    • 2016
  • In this paper, we proposed the object classification method using genetic and dynamic random forest consisting of optimal combination of unit tree. The random forest can ensure good generalization performance in combination of large amount of trees by assigning the randomization to the training samples and feature selection, etc. allocated to the decision tree as an ensemble classification model which combines with the unit decision tree based on the bagging. However, the random forest is composed of unit trees randomly, so it can show the excellent classification performance only when the sufficient amounts of trees are combined. There is no quantitative measurement method for the number of trees, and there is no choice but to repeat random tree structure continuously. The proposed algorithm is composed of random forest with a combination of optimal tree while maintaining the generalization performance of random forest. To achieve this, the problem of improving the classification performance was assigned to the optimization problem which found the optimal tree combination. For this end, the genetic algorithm methodology was applied. As a result of experiment, we had found out that the proposed algorithm could improve about 3~5% of classification performance in specific cases like common database and self infrared database compare with the existing random forest. In addition, we had shown that the optimal tree combination was decided at 55~60% level from the maximum trees.

분류와 회귀나무분석에 관한 소고 (Note on classification and regression tree analysis)

  • 임용빈;오만숙
    • 품질경영학회지
    • /
    • 제30권1호
    • /
    • pp.152-161
    • /
    • 2002
  • The analysis of large data sets with hundreds of thousands observations and thousands of independent variables is a formidable computational task. A less parametric method, capable of identifying important independent variables and their interactions, is a tree structured approach to regression and classification. It gives a graphical and often illuminating way of looking at data in classification and regression problems. In this paper, we have reviewed and summarized tile methodology used to construct a tree, multiple trees and the sequential strategy for identifying active compounds in large chemical databases.

A review of tree-based Bayesian methods

  • Linero, Antonio R.
    • Communications for Statistical Applications and Methods
    • /
    • 제24권6호
    • /
    • pp.543-559
    • /
    • 2017
  • Tree-based regression and classification ensembles form a standard part of the data-science toolkit. Many commonly used methods take an algorithmic view, proposing greedy methods for constructing decision trees; examples include the classification and regression trees algorithm, boosted decision trees, and random forests. Recent history has seen a surge of interest in Bayesian techniques for constructing decision tree ensembles, with these methods frequently outperforming their algorithmic counterparts. The goal of this article is to survey the landscape surrounding Bayesian decision tree methods, and to discuss recent modeling and computational developments. We provide connections between Bayesian tree-based methods and existing machine learning techniques, and outline several recent theoretical developments establishing frequentist consistency and rates of convergence for the posterior distribution. The methodology we present is applicable for a wide variety of statistical tasks including regression, classification, modeling of count data, and many others. We illustrate the methodology on both simulated and real datasets.

퍼지의사결정을 이용한 RC구조물의 건전성평가 (Integrity Assessment for Reinforced Concrete Structures Using Fuzzy Decision Making)

  • 박철수;손용우;이증빈
    • 한국전산구조공학회:학술대회논문집
    • /
    • 한국전산구조공학회 2002년도 봄 학술발표회 논문집
    • /
    • pp.274-283
    • /
    • 2002
  • This paper presents an efficient models for reinforeced concrete structures using CART-ANFIS(classification and regression tree-adaptive neuro fuzzy inference system). a fuzzy decision tree parttitions the input space of a data set into mutually exclusive regions, each of which is assigned a label, a value, or an action to characterize its data points. Fuzzy decision trees used for classification problems are often called fuzzy classification trees, and each terminal node contains a label that indicates the predicted class of a given feature vector. In the same vein, decision trees used for regression problems are often called fuzzy regression trees, and the terminal node labels may be constants or equations that specify the Predicted output value of a given input vector. Note that CART can select relevant inputs and do tree partitioning of the input space, while ANFIS refines the regression and makes it everywhere continuous and smooth. Thus it can be seen that CART and ANFIS are complementary and their combination constitutes a solid approach to fuzzy modeling.

  • PDF

Classification of tree species using high-resolution QuickBird-2 satellite images in the valley of Ui-dong in Bukhansan National Park

  • Choi, Hye-Mi;Yang, Keum-Chul
    • Journal of Ecology and Environment
    • /
    • 제35권2호
    • /
    • pp.91-98
    • /
    • 2012
  • This study was performed in order to suggest the possibility of tree species classification using high-resolution QuickBird-2 images spectral characteristics comparison(digital numbers [DNs]) of tree species, tree species classification, and accuracy verification. In October 2010, the tree species of three conifers and eight broad-leaved trees were examined in the areas studied. The spectral characteristics of each species were observed, and the study area was classified by image classification. The results were as follows: Panchromatic and multi-spectral band 4 was found to be useful for tree species classification. DNs values of conifers were lower than broad-leaved trees. Vegetation indices such as normalized difference vegetation index (NDVI), soil brightness index (SBI), green vegetation index (GVI) and Biband showed similar patterns to band 4 and panchromatic (PAN); Tukey's multiple comparison test was significant among tree species. However, tree species within the same genus, such as $Pinus$ $densiflora-P.$ $rigida$ and $Quercus$ $mongolica-Q.$ $serrata$, showed similar DNs patterns and, therefore, supervised classification results were difficult to distinguish within the same genus; Random selection of validation pixels showed an overall classification accuracy of 74.1% and Kappa coefficient was 70.6%. The classification accuracy of $Pterocarya$ $stenoptera$, 89.5%, was found to be the highest. The classification accuracy of broad-leaved trees was lower than expected, ranging from 47.9% to 88.9%. $P.$ $densiflora-P.$ $rigida$ and $Q.$ $mongolica-Q.$ $serrata$ were classified as the same species because they did not show significant differences in terms of spectral patterns.

SUPPORT Applications for Classification Trees

  • Lee, Sang-Bock;Park, Sun-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권3호
    • /
    • pp.565-574
    • /
    • 2004
  • Classification tree algorithms including as CART by Brieman et al.(1984) in some aspects, recursively partition the data space with the aim of making the distribution of the class variable as pure as within each partition and consist of several steps. SUPPORT(smoothed and unsmoothed piecewise-polynomial regression trees) method of Chaudhuri et al(1994), a weighted averaging technique is used to combine piecewise polynomial fits into a smooth one. We focus on applying SUPPORT to a binary class variable. Logistic model is considered in the caculation techniques and the results are shown good classification rates compared with other methods as CART, QUEST, and CHAID.

  • PDF

특징공간을 사선 분할하는 퍼지 결정트리 유도 (Fuaay Decision Tree Induction to Obliquely Partitioning a Feature Space)

  • 이우향;이건명
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제29권3호
    • /
    • pp.156-166
    • /
    • 2002
  • 결정트리 생성은 특징값들로 기술된 사례들로부터 분류 규칙을 추출하는 유용한 기계학습 방법중 하나이다. 결정트리는 특징공간을 분할하는 형태에 따라 단변수(univariate) 결정트리와 다변수(multivariate) 결정트리로 대별된다. 실제 현장에서 얻어지는 데이터는 관측오류, 불확실성, 주관적인 판단 등의 이유로 특징값 자체에 오류를 포함하는 경우가 많다. 이러한 오류에 대해 강건한 결정트리를 생성하기 위한 방법으로 퍼지 기법을 도입한 결정트리 생성 방법에 대한 연구가 진행되어 왔다. 현재까지 대부분의 퍼지 결정트리에 대한 연구는 단변수 결정트리에 퍼지 기법을 도입한 것들이며, 다변수 결정트리에 퍼지 기법을 적용한 것은 찾아보기 힘들다. 이 논문에서는 다변수 결정트리에 퍼지 기법을 적용하여 퍼지사선형 결정트리라고 하는 퍼지 결정트리를 생성하는 방법을 제안한다. 또한 제안한 결정트리 생성 방법의 특성을 보이기 위한 실험 결과를 보인다.

Split Effect in Ensemble

  • Chung, Dong-Jun;Kim, Hyun-Joong
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2005년도 추계 학술발표회 논문집
    • /
    • pp.193-197
    • /
    • 2005
  • Classification tree is one of the most suitable base learners for ensemble. For past decade, it was found that bagging gives the most accurate prediction when used with unpruned tree and boosting with stump. Researchers have tried to understand the relationship between the size of trees and the accuracy of ensemble. With experiment, it is found that large trees make boosting overfit the dataset and stumps help avoid it. It means that the accuracy of each classifier needs to be sacrificed for better weighting at each iteration. Hence, split effect in boosting can be explained with the trade-off between the accuracy of each classifier and better weighting on the misclassified points. In bagging, combining larger trees give more accurate prediction because bagging does not have such trade-off, thus it is advisable to make each classifier as accurate as possible.

  • PDF