• Title/Summary/Keyword: Rule Discovery

Search Result 92, Processing Time 0.027 seconds

Subgroup Discovery Method with Internal Disjunctive Expression

  • Kim, Seyoung;Ryu, Kwang Ryel
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.1
    • /
    • pp.23-32
    • /
    • 2017
  • We can obtain useful knowledge from data by using a subgroup discovery algorithm. Subgroup discovery is a rule model learning method that finds data subgroups containing specific information from data and expresses them in a rule form. Subgroups are meaningful as they account for a high percentage of total data and tend to differ significantly from the overall data. Subgroup is expressed with conjunction of only literals previously. So, the scope of the rules that can be derived from the learning process is limited. In this paper, we propose a method to increase expressiveness of rules through internal disjunctive representation of attribute values. Also, we analyze the characteristics of existing subgroup discovery algorithms and propose an improved algorithm that complements their defects and takes advantage of them. Experiments are conducted with the traffic accident data given from Busan metropolitan city. The results shows that performance of the proposed method is better than that of existing methods. Rule set learned by proposed method has interesting and general rules more.

Development of a Knowledge Discovery System using Hierarchical Self-Organizing Map and Fuzzy Rule Generation

  • Koo, Taehoon;Rhee, Jongtae
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.431-434
    • /
    • 2001
  • Knowledge discovery in databases(KDD) is the process for extracting valid, novel, potentially useful and understandable knowledge form real data. There are many academic and industrial activities with new technologies and application areas. Particularly, data mining is the core step in the KDD process, consisting of many algorithms to perform clustering, pattern recognition and rule induction functions. The main goal of these algorithms is prediction and description. Prediction means the assessment of unknown variables. Description is concerned with providing understandable results in a compatible format to human users. We introduce an efficient data mining algorithm considering predictive and descriptive capability. Reasonable pattern is derived from real world data by a revised neural network model and a proposed fuzzy rule extraction technique is applied to obtain understandable knowledge. The proposed neural network model is a hierarchical self-organizing system. The rule base is compatible to decision makers perception because the generated fuzzy rule set reflects the human information process. Results from real world application are analyzed to evaluate the system\`s performance.

  • PDF

An Active Candidate Set Management Model on Association Rule Discovery using Database Trigger and Incremental Update Technique (트리거와 점진적 갱신기법을 이용한 연관규칙 탐사의 능동적 후보항목 관리 모델)

  • Hwang, Jeong-Hui;Sin, Ye-Ho;Ryu, Geun-Ho
    • Journal of KIISE:Databases
    • /
    • v.29 no.1
    • /
    • pp.1-14
    • /
    • 2002
  • Association rule discovery is a method of mining for the associated item set on large databases based on support and confidence threshold. The discovered association rules can be applied to the marketing pattern analysis in E-commerce, large shopping mall and so on. The association rule discovery makes multiple scan over the database storing large transaction data, thus, the algorithm requiring very high overhead might not be useful in real-time association rule discovery in dynamic environment. Therefore this paper proposes an active candidate set management model based on trigger and incremental update mechanism to overcome non-realtime limitation of association rule discovery. In order to implement the proposed model, we not only describe an implementation model for incremental updating operation, but also evaluate the performance characteristics of this model through the experiment.

Rule Discovery and Matching for Forecasting Stock Prices (주가 예측을 위한 규칙 탐사 및 매칭)

  • Ha, You-Min;Kim, Sang-Wook;Won, Jung-Im;Park, Sang-Hyun;Yoon, Jee-Hee
    • Journal of KIISE:Databases
    • /
    • v.34 no.3
    • /
    • pp.179-192
    • /
    • 2007
  • This paper addresses an approach that recommends investment types for stock investors by discovering useful rules from past changing patterns of stock prices in databases. First, we define a new rule model for recommending stock investment types. For a frequent pattern of stock prices, if its subsequent stock prices are matched to a condition of an investor, the model recommends a corresponding investment type for this stock. The frequent pattern is regarded as a rule head, and the subsequent part a rule body. We observed that the conditions on rule bodies are quite different depending on dispositions of investors while rule heads are independent of characteristics of investors in most cases. With this observation, we propose a new method that discovers and stores only the rule heads rather than the whole rules in a rule discovery process. This allows investors to define various conditions on rule bodies flexibly, and also improves the performance of a rule discovery process by reducing the number of rules. For efficient discovery and matching of rules, we propose methods for discovering frequent patterns, constructing a frequent pattern base, and indexing them. We also suggest a method that finds the rules matched to a query issued by an investor from a frequent pattern base, and a method that recommends an investment type using the rules. Finally, we verify the superiority of our approach via various experiments using real-life stock data.

Weighted Association Rule Discovery for Item Groups with Different Properties (상이한 특성을 갖는 아이템 그룹에 대한 가중 연관 규칙 탐사)

  • 김정자;정희택
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.6
    • /
    • pp.1284-1290
    • /
    • 2004
  • In market-basket analysis, weighted association rule(WAR) discovery can mine the rules which include more beneficial information by reflecting item importance for special products. However, when items are divided into more than one group and item importance for each group must be measured by different measurement or separately, we cannot directly apply traditional weighted association rule discovery. To solve this problem, we propose a novel methodology to discovery the weighted association rule in this paper In this methodology, the items should be first divided into sub-groups according to the properties of the items, and the item importance is defined or calculated only with the items enclosed to the sub-group. Our algorithm makes qualitative evaluation for network risk assessment possible by generating risk rule set for risk factor using network sorority data, and quantitative evaluation possible by calculating risk value using statistical factors such as weight applied in rule generation. And, It can be widely used for new model of more delicate analysis in market-basket database in which the data items are distinctly separated.

Characterization in Terms of the NUX Rule of G-inserted Mutant Hammerhead Ribozymes with High Level of Catalytic Power

  • Kuwabara, Tomoko;Warashina, Masaki;Kato, Yoshio;Kawasaki, Hiroaki;Taira, Kazunari
    • BMB Reports
    • /
    • v.34 no.1
    • /
    • pp.51-58
    • /
    • 2001
  • Attempts using in vitro and in vivo selection procedures have been made to search for hammerhead ribozymes that have higher activities than the wild-type ribozyme and also to determine whether other sequences might be possible in the catalytic core of the hammerhead ribozyme. Active sequences selected in the past conformed broadly to the consensus core sequence except at A9, and no sequences were associated with higher activity than that of the hammerhead with the consensus core, an indication that the consensus sequence derived from viruses and virusoids is probably the optimal sequence [Vaish et al. (1997) Biochemistry 36, 6495-6501]. Recently, during construction of ribozyme expression vectors, we isolated a mutant hammerhead ribozyme, with an insertion of G between A9 and G10.1, that appeared to show significant activity [Kawasaki et al. (1996) Nucleic Acids Res. 24, 3010-3016; Kawasaki et al. (1998) Nature 393, 284-289]. We, therefore, characterized kinetic properties of the G-inserted mutant ribozymes in terms of the NUX rule. We demonstrate that the NUX rule is basically applicable to the G-inserted ribozymes and, more importantly, one type of G-inserted ribozyme was very active with $k_{cat}$, value of $6.4\;min^{-1}$ in 50 mM Tris-HCl (pH 8.0) and 10 mM $MgCl_2$ at $37^{\circ}C$.

  • PDF

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.

The HCARD Model using an Agent for Knowledge Discovery

  • Gerardo Bobby D.;Lee Jae-Wan;Joo Su-Chong
    • The Journal of Information Systems
    • /
    • v.14 no.3
    • /
    • pp.53-58
    • /
    • 2005
  • In this study, we will employ a multi-agent for the search and extraction of data in a distributed environment. We will use an Integrator Agent in the proposed model on the Hierarchical Clustering and Association Rule Discovery(HCARD). The HCARD will address the inadequacy of other data mining tools in processing performance and efficiency when use for knowledge discovery. The Integrator Agent was developed based on CORBA architecture for search and extraction of data from heterogeneous servers in the distributed environment. Our experiment shows that the HCARD generated essential association rules which can be practically explained for decision making purposes. Shorter processing time had been noted in computing for clusters using the HCARD and implying ideal processing period than computing the rules without HCARD.

  • PDF

Efficient Storage Structures for a Stock Investment Recommendation System (주식 투자 추천 시스템을 위한 효율적인 저장 구조)

  • Ha, You-Min;Kim, Sang-Wook;Park, Sang-Hyun;Lim, Seung-Hwan
    • The KIPS Transactions:PartD
    • /
    • v.16D no.2
    • /
    • pp.169-176
    • /
    • 2009
  • Rule discovery is an operation that discovers patterns frequently occurring in a given database. Rule discovery makes it possible to find useful rules from a stock database, thereby recommending buying or selling times to stock investors. In this paper, we discuss storage structures for efficient processing of queries in a system that recommends stock investments. First, we propose five storage structures for efficient recommending of stock investments. Next, we discuss their characteristics, advantages, and disadvantages. Then, we verify their performances by extensive experiments with real-life stock data. The results show that the histogram-based structure improves the query performance of the previous one up to about 170 times.

An Active Candidate Set Management Model for Realtime Association Rule Discovery (실시간 연관규칙 탐사를 위한 능동적 후보항목 관리 모델)

  • Sin, Ye-Ho;Ryu, Geun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.9D no.2
    • /
    • pp.215-226
    • /
    • 2002
  • Considering the rapid process of media's breakthrough and diverse patterns of consumptions's analysis, a uniform analysis might be much rooms to be desired for interpretation of new phenomena. In special, the products happening intensive sails on around an anniversary or fresh food have the restricted marketing hours. Moreover, traditional association rule discovery algorithms might not be appropriate for analysis of sales pattern given in a specific time because existing approaches require iterative scan operation to find association rule in large scale transaction databases. in this paper, we propose an incremental candidate set management model based on twin-hashing technique to find association rule in special sales pattern using database trigger and stored procedure. We also prove performance of the proposed model through implementation and experiment.