• Title/Summary/Keyword: Emerging Pattern Mining

Search Result 10, Processing Time 0.032 seconds

Sequential Pattern Mining for Intrusion Detection System with Feature Selection on Big Data

  • Fidalcastro, A;Baburaj, E
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.5023-5038
    • /
    • 2017
  • Big data is an emerging technology which deals with wide range of data sets with sizes beyond the ability to work with software tools which is commonly used for processing of data. When we consider a huge network, we have to process a large amount of network information generated, which consists of both normal and abnormal activity logs in large volume of multi-dimensional data. Intrusion Detection System (IDS) is required to monitor the network and to detect the malicious nodes and activities in the network. Massive amount of data makes it difficult to detect threats and attacks. Sequential Pattern mining may be used to identify the patterns of malicious activities which have been an emerging popular trend due to the consideration of quantities, profits and time orders of item. Here we propose a sequential pattern mining algorithm with fuzzy logic feature selection and fuzzy weighted support for huge volumes of network logs to be implemented in Apache Hadoop YARN, which solves the problem of speed and time constraints. Fuzzy logic feature selection selects important features from the feature set. Fuzzy weighted supports provide weights to the inputs and avoid multiple scans. In our simulation we use the attack log from NS-2 MANET environment and compare the proposed algorithm with the state-of-the-art sequential Pattern Mining algorithm, SPADE and Support Vector Machine with Hadoop environment.

A Post-analysis of the Association Rule Mining Applied to Internee Shopping Mall

  • Kim, Jae-Kyeong;Song, Hee-Seok
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.06a
    • /
    • pp.253-260
    • /
    • 2001
  • Understanding and adapting to changes of customer behavior is an important aspect for a company to survive in continuously changing environment. The aim of this paper is to develop a methodology which detects changes of customer behavior automatically from customer profiles and sales data at different time snapshots. For this purpose, we first define three types of changes as emerging pattern, unexpected change and the added / perished rule. Then we develop similarity and difference measures for rule matching to detect all types of change. Finally, the degree of change is evaluated to detect significantly changed rules. Our proposed methodology can evaluate degree of changes as well as detect all kinds of change automatically from different time snapshot data. A case study for evaluation and practical business implications for this methodology are also provided.

  • PDF

Multi-parametric Diagnosis Indexes and Emerging Pattern based Classification Technique for Diagnosing Cardiovascular Disease (심혈관계 질환 진단을 위한 복합 진단 지표와 출현 패턴 기반의 분류 기법)

  • Lee, Heon-Gyu;Noh, Ki-Yong;Ryu, Keun-Ho;Jung, Doo-Young
    • The KIPS Transactions:PartD
    • /
    • v.16D no.1
    • /
    • pp.11-26
    • /
    • 2009
  • In order to diagnose cardiovascular disease, we proposed EP-based(emerging pattern- based) classification technique using multi-parametric diagnosis indexes. We analyzed linear/nonlinear features of HRV for three recumbent postures and extracted four diagnosis indexes from ST-segments to apply the multi-parametric diagnosis indexes. In this paper, classification model using essential emerging patterns for diagnosing disease was applied. This classification technique discovers disease patterns of patient group and these emerging patterns are frequent in patients with cardiovascular disease but are not frequent in the normal group. To evaluate proposed classification algorithm, 120 patients with AP (angina pectrois), 13 patients with ACS(acute coronary syndrome) and 128 normal people data were used. As a result of classification, when multi-parametric indexes were used, the percent accuracy in classifying three groups was turned out to be about 88.3%.

Media coverage of the conflicts over the 4th Industrial Revolution in the Republic of Korea from 2016 to 2020: a text-mining approach

  • Yang, Jiseong;Kim, Byungjun;Lee, Wonjae
    • Asian Journal of Innovation and Policy
    • /
    • v.11 no.2
    • /
    • pp.202-221
    • /
    • 2022
  • The media has depicted an abrupt socio-technological change in the Republic of Korea with the 4th Industrial Revolution. Because technologies cannot realize their potential without social acceptance, studying conflicts incurred by such a change is imperative. However, little literature has focused on conflicts caused by technologies. Therefore, the current study investigated media coverage regarding conflicts related to the 4th Industrial Revolution from 2016 to 2020 in the Republic of Korea, applying text-mining techniques. We found that the overall amount and coverage pattern conforms to the issue attention cycle. Also, the three major topics ("SMEs & Startups," "Mobility Conflict," and "Human & Technology") indicate quarrels between conflicting social entities. Moreover, the temporal change in media coverage implies the political use of the term rather than technological. However, we also found the media's deliberative discussion on the socio-technological impact. This study is significant because we expanded the discussion on media coverage of technologies to the realm of social conflicts. Furthermore, we explored the news articles of the recent five years with a text-mining approach that enhanced the objectivity of the research.

Protein Disorder/Order Region Classification Using EPs-TFP Mining Method (EPs-TFP 마이닝 기법을 이용한 단백질 Disorder/Order 지역 분류)

  • Lee, Heon Gyu;Shin, Yong Ho
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.6
    • /
    • pp.59-72
    • /
    • 2012
  • Since a protein displays its specific functions when disorder region of protein sequence transits to order region with provoking a biological reaction, the separation of disorder region and order region from the sequence data is urgently necessary for predicting three dimensional structure and characteristics of the protein. To classify the disorder and order region efficiently, this paper proposes a classification/prediction method using sequence data while acquiring a non-biased result on a specific characteristics of protein and improving the classification speed. The emerging patterns based EPs-TFP methods utilizes only the essential emerging pattern in which the redundant emerging patterns are removed. This classification method finds the sequence patterns of disorder region, such sequence patterns are frequently shown in disorder region but relatively not frequently in the order region. We expand P-tree and T-tree conceptualized TFP method into a classification/prediction method in order to improve the performance of the proposed algorithm. We used Disprot 4.9 and CASP 7 data to evaluate EPs-TFP technique, the results of order/disorder classification show sensitivity 73.6, specificity 69.51 and accuracy 74.2.

A Post-Analysis of Decision Tree to Detect the Change of Customer Behavior on Internet Shopping Mall

  • Kim, Jae kyeong;Song, Hee-Seok;Kim, Tae-Sung
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.456-463
    • /
    • 2001
  • Understanding and adapting to changes of customer behavior in internet shopping mall is an important aspect to survive in continuously changing environment. This paper develops a methodology based on decision tree algorithms to detect changes of customer behavior automatically from customer profiles and sales data at different time snapshots. We first define three types of changes as emerging pattern, unexpected change and the added/perished rule. Then, it is developed similarity and difference measures for rule matching to detect all types of change. Finally, the degree of change is developed to evaluate the amount of change. A Korean internet shopping mall case is evaluated to represent the performance of our methodology. And practical business implications for this methodology are also provided.

  • PDF

Bioinformatics and Genomic Medicine (생명정보학과 유전체의학)

  • Kim, Ju-Han
    • Journal of Preventive Medicine and Public Health
    • /
    • v.35 no.2
    • /
    • pp.83-91
    • /
    • 2002
  • Bioinformatics is a rapidly emerging field of biomedical research. A flood of large-scale genomic and postgenomic data means that many of the challenges in biomedical research are now challenges in computational sciences. Clinical informatics has long developed methodologies to improve biomedical research and clinical care by integrating experimental and clinical information systems. The informatics revolutions both in bioinformatics and clinical informatics will eventually change the current practice of medicine, including diagnostics, therapeutics, and prognostics. Postgenome informatics, powered by high throughput technologies and genomic-scale databases, is likely to transform our biomedical understanding forever much the same way that biochemistry did a generation ago. The paper describes how these technologies will impact biomedical research and clinical care, emphasizing recent advances in biochip-based functional genomics and proteomics. Basic data preprocessing with normalization, primary pattern analysis, and machine learning algorithms will be presented. Use of integrated biochip informatics technologies, text mining of factual and literature databases, and integrated management of biomolecular databases will be discussed. Each step will be given with real examples in the context of clinical relevance. Issues of linking molecular genotype and clinical phenotype information will be discussed.

An Emerging Pattern Mining based Classification Method for Automated Prediction of Myocardial Ischemia ECG Signals (심근허혈 심전도 신호의 자동화된 예측을 위한 출현 패턴 마이닝 기반의 분류 방법)

  • Heon Gyu Lee;Ming Hao Park;Keun Ho Ryu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.19-22
    • /
    • 2008
  • 최근 서구화된 식생활 패턴과 흡연, 비만 등의 원인으로 인해 심근경색, 협심증과 같은 심근허혈(myocardial ischemia) 질환이 급증하고 있다. 이 논문에서는 심전도 신호로부터 허혈성 심장 질환 진단을 위해 출현 패턴 마이닝을 이용하여 심근경색 및 협심증의 진단 신호인 ischemia beat를 분류 하였다. 또한 기존의 출현 패턴 마이닝에 빠른 패턴 탐사와 저장 공간의 효율성을 고려하여 Apriori-T 빈발 패턴 탐사 알고리즘을 출현 패턴 생성이 가능하도록 확장하였다. PhysioNet의 ST-T 데이터베이스로부터 138개의 대조군(정상)과 ischemia beat 데이터에 제안된 분류 알고리즘을 실험한 결과 최소 75% 및 최대 95%의 예측 정확도를 보였다.

Power Consumption Patterns Analysis Using Expectation-Maximization Clustering Algorithm and Emerging Pattern Mining (기대치-최대화 군집 알고리즘과 출현 패턴 마이닝을 이용한 전력 소비 패턴 분석)

  • Jin Hyoung Park;Heon Gyu Lee;Jin-Ho Shin;Keun Ho Ryu;Hiseok Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.261-264
    • /
    • 2008
  • 전력 회사의 효율적인 운용과 전력 시장에서의 경쟁을 위하여 고객의 전력 소비 패턴 분석 및 정확한 예측이 이루어져야 한다. 이를 위해서 이 논문에서는 원격 검침 시스템에 의한 전국의 고압 고객 데이터를 대상으로 고객의 전력 소비 패턴을 정확히 예측할 수 있는 마이닝 기법을 제안하였다. 먼저, 국내 계약종별 고객 특성에 맞는 부하 패턴의 정확한 구별을 위한 9가지의 특징 벡터를 추출하였고, 기대치-최대화 군집화 알고리즘을 사용하여 고객의 34개 대표 부하프로파일을 생성하였다. 마지막으로 추출된 특징 벡터로부터 각 대표 프로파일에 대한 출현 패턴 기반의 분류 모델을 구성하여 고객의 전력 소비 패턴을 분류하였다. 국내 원격 검침 시스템에 의해 측정된 총 3,895명의 고압 고객 데이터에 대한 실험 결과 약 91%의 분류 정확성을 보였다.

The Evaluation for Web Mining and Analytics Service from the View of Personal Information Protection and Privacy (개인정보보호 관점에서의 웹 트래픽 수집 및 분석 서비스에 대한 타당성 연구)

  • Kang, Daniel;Shim, Mi-Na;Bang, Je-Wan;Lee, Sang-Jin;Lim, Jong-In
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.19 no.6
    • /
    • pp.121-134
    • /
    • 2009
  • Consumer-centric marketing business is surely one of the most successful emerging business but it poses a threat to personal privacy. Between the service provider and the user there are many contrary issues to each other. The enterprise asserts that to abuse the privacy data which is anonymous there is not a problem. The individual only will not be able to willingly submit the problem which is latent. Web traffic analysis technology itself doesn't create issues, but this technology when used on data of personal nature might cause concerns. The most criticized ethical issue involving web traffic analysis is the invasion of privacy. So we need to inspect how many and what kind of personal informations being used and if there is any illegal treatment of personal information. In this paper, we inspect the operation of consumer-centric marketing tools such as web log analysis solutions and data gathering services with web browser toolbar. Also we inspect Microsoft explorer-based toolbar application which records and analyzes personal web browsing pattern through reverse engineering technology. Finally, this identified and explored security and privacy requirement issues to develop more reliable solutions. This study is very important for the balanced development with personal privacy protection and web traffic analysis industry.