• Title/Summary/Keyword: data minning

Search Result 8, Processing Time 0.024 seconds

Development of Data Mining Tool for the Utilization of Shipbuilding Knowledge based on Genetic Programming (조선기술지식 활용을 위한 유전적 프로그래밍 기반의 데이터 마이닝 도구개발)

  • Lee Kyung-Ho;Oh June;Park Jong-Hyun;Park Jong-Hoon
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2006.04a
    • /
    • pp.185-191
    • /
    • 2006
  • As development of information technology, companies stress the need of knowledge management. Companies construct ERP system including knowledge management. But, it is not easy to formalize knowledge in organization. They experience that constructing information system help knowledge management. Now, we focus on engineering knowledge. Because engineering data contains experts' experience and know-how in its own, engineering knowledge is a treasure house of knowledge. Korean shipyards are leader of world shipbuilding industry. They have accumulated a store of knowledges and data. But, they don't have data minning tool to utilize accumulated data. This paper treats development of data minning tools for the utilization of shipbuilding knowledge based on genetic programming (GP).

  • PDF

Analysis of Success Factors for Mobile Commerce using Text Mining and PLS Regression

  • Kim, Yong-Hwan;Kim, Ja-Hee;Park, Ji hoon;Lee, Seung-Jun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.11
    • /
    • pp.127-134
    • /
    • 2016
  • In this paper, we propose factors that influence on the mobile commerce satisfaction conducted by data mining and a PLS regression analysis. We extracted the most frequent words from mobile application reviews in which there are a large number of user's requests. We employed the content analysis to condense the large number of texts. We took a survey with the categories by which data are condensed and specified as factors that influence on the mobile commerce satisfaction. To avoid multicollinearity, we employed a PLS regression analysis instead of using a multiple regression analysis. Discovered factors that are potential consequences of customer satisfaction from direct requests by customers, the result may be an appropriate indicator for the mobile commerce market to improve its services.

100 Article Paper Text Minning Data Analysis and Visualization in Web Environment (웹 환경에서 100 논문에 대한 텍스트 마이닝, 데이터 분석과 시각화)

  • Li, Xiaomeng;Li, Jiapei;Lee, HyunChang;Shin, SeongYoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.157-158
    • /
    • 2017
  • There is a method to analyze the big data of the article and text mining by using Python language. And Python is a kind of programming language and it is easy to operating. Reaserch and use Python to creat a Web environment that the research result of the analysis can show directly on the browser. In this thesis, there are 100 article paper frrom Altmetric, Altmetric tracks a range of sources to capture. It is necessary to collect and analyze the big data use an effictive method, After the result coming out, Use Python wordcloud to make a directive image that can show the highest frequency of words.

  • PDF

Analysis of Economic Development Based on Environment Resources in the Mining Sector

  • NAZIR, Munawir;MURDIFIN, Imaduddin;PUTRA, Aditya Halim Perdana Kusuma;HAMZAH, Nasir;MURFAT, Moch Zulkifli
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.7 no.6
    • /
    • pp.133-143
    • /
    • 2020
  • The purpose of this study is to investigate the economic potential of the regions from the mining sector of North Morowali, Central-Sulawesi, Indonesia, and the formulation of pro-business regional development management that aims to create synergy between the local government and mining sector entrepreneurs. This study uses a descriptive qualitative approach by taking data in the form of primary data from FGD and secondary data observations from statistical bureau data in the North Morowali, Indonesia. The analysis unit uses SWOT analysis to determine the economic potential of the North Morowali and Location Quotient (LQ) to analyze the economic potential of the mining sector. The research period covers one year (2018-2019) in North Morowali, Indonesia. All the mining products have considerable potential as a financing unit in North Morowali, while mining potential has not been maximally exploited. The absence of regulations, facilities such as road access, and optimal land and sea transportation are the causes of the difficulty of optimization and access to explore mining products comprehensively. As a new province at Central Sulawesi, more efforts and the role of government are needed to focus attention to North Morowali as an area with great potential in the mining sector.

The Difference Analysis between Maturity Stages of Venture Firms by Classification Techniques of Big Data (빅데이터 분류 기법에 따른 벤처 기업의 성장 단계별 차이 분석)

  • Jung, Byoungho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.4
    • /
    • pp.197-212
    • /
    • 2019
  • The purpose of this study is to identify the maturity stages of venture firms through classification analysis, which is widely used as a big data technique. Venture companies should develop a competitive advantage in the market. And the maturity stage of a company can be classified into five stages. I will analyze a difference in the growth stage of venture firms between the survey response and the statistical classification methods. The firm growth level distinguished five stages and was divided into the period of start-up and declines. A classification method of big data uses popularly k-mean cluster analysis, hierarchical cluster analysis, artificial neural network, and decision tree analysis. I used variables that asset increase, capital increase, sales increase, operating profit increase, R&D investment increase, operation period and retirement number. The research results, each big data analysis technique showed a large difference of samples sized in the group. In particular, the decision tree and neural networks' methods were classified as three groups rather than five groups. The groups size of all classification analysis was all different by the big data analysis methods. Furthermore, according to the variables' selection and the sample size may be dissimilar results. Also, each classed group showed a number of competitive differences. The research implication is that an analysts need to interpret statistics through management theory in order to interpret classification of big data results correctly. In addition, the choice of classification analysis should be determined by considering not only management theory but also practical experience. Finally, the growth of venture firms needs to be examined by time-series analysis and closely monitored by individual firms. And, future research will need to include significant variables of the company's maturity stages.

Utilization of machining templates to improve 5-axis CAM machining process (5축 CAM 가공 작업 프로세스 개선을 위한 가공 템플릿 활용)

  • Lee, Dong-Cheon;Kim, Seon-Yong
    • Design & Manufacturing
    • /
    • v.11 no.1
    • /
    • pp.45-49
    • /
    • 2017
  • Currently, a lot of efforts to make increases the manufacturing efficiency have tried and there is growing the interest to implementing the machining operation through CAM automation and optimization. This kind of movement has shown gradually in 5X milling as well as 3X milling task. By the way, in case of 5X milling, it is difficult to hire the CAM experts who is an experience for 5X machining and also it has too big trouble to use them due to high cost. For this reason, you can see the manufacturer who is concern the CAM S/W to provide the NC automation program that beginners can generate easily the 5X milling in short term and the existing 5X milling process can be improved. These requirements need to make a NC automation process including the practical machining strategies same as the generation by NC expert. In order to support this, it is necessary to directly apply the 3D machining part based on NC template which includes the machining procedures, standard cutter library, auto machine area selection, analyze tool for part shape, machining condition setting considering the material stiffness to be provided by CimatronE and it should be created the 5axis machining data by a minimized operation. With user-friendly, CimatronE's NC machining automation tools improve the 5-axis machining process and speed up the process, maximizing work efficiency and improving product productivity compared to existing machining tasks.

An Empirical Study of Profiling Model for the SMEs with High Demand for Standards Using Data Mining (데이터마이닝을 이용한 표준정책 수요 중소기업의 프로파일링 연구: R&D 동기와 사업화 지원 정책을 중심으로)

  • Jun, Seung-pyo;Jung, JaeOong;Choi, San
    • Journal of Korea Technology Innovation Society
    • /
    • v.19 no.3
    • /
    • pp.511-544
    • /
    • 2016
  • Standards boost technological innovation by promoting information sharing, compatibility, stability and quality. Identifying groups of companies that particularly benefit from these functions of standards in their technological innovation and commercialization helps to customize planning and implementation of standards-related policies for demand groups. For this purpose, this study engages in profiling of SMEs whose R&D objective is to respond to standards as well as those who need to implement standards system for technological commercialization. Then it suggests a prediction model that can distinguish such companies from others. To this end, decision tree analysis is conducted for profiling of characteristics of subject SMEs through data mining. Subject SMEs include (1) those that engage in R&D to respond to standards (Group1) or (2) those in need of product standard or technological certification policies for commercialization purposes (Group 2). Then the study proposes a prediction model that can distinguish Groups 1 and 2 from others based on several variables by adopting discriminant analysis. The practicality of discriminant formula is statistically verified. The study suggests that Group 1 companies are distinguished in variables such as time spent on R&D planning, KoreanStandardIndustryClassification (KSIC) category, number of employees and novelty of technologies. Profiling result of Group 2 companies suggests that they are differentiated in variables such as KSIC category, major clients of the companies, time spent on R&D and ability to test and verify their technologies. The prediction model proposed herein is designed based on the outcomes of profiling and discriminant analysis. Its purpose is to serve in the planning or implementation processes of standards-related policies through providing objective information on companies in need of relevant support and thereby to enhance overall success rate of standards-related projects.

A Methodology for Automatic Multi-Categorization of Single-Categorized Documents (단일 카테고리 문서의 다중 카테고리 자동확장 방법론)

  • Hong, Jin-Sung;Kim, Namgyu;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.77-92
    • /
    • 2014
  • Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we propose a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. First, we attempt to find the relationship between documents and topics by using the result of topic analysis for single-categorized documents. Second, we construct a correspondence table between topics and categories by investigating the relationship between them. Finally, we calculate the matching scores for each document to multiple categories. The results imply that a document can be classified into a certain category if and only if the matching score is higher than the predefined threshold. For example, we can classify a certain document into three categories that have larger matching scores than the predefined threshold. The main contribution of our study is that our methodology can improve the applicability of traditional multi-category classifiers by generating multi-categorized documents from single-categorized documents. Additionally, we propose a module for verifying the accuracy of the proposed methodology. For performance evaluation, we performed intensive experiments with news articles. News articles are clearly categorized based on the theme, whereas the use of vulgar language and slang is smaller than other usual text document. We collected news articles from July 2012 to June 2013. The articles exhibit large variations in terms of the number of types of categories. This is because readers have different levels of interest in each category. Additionally, the result is also attributed to the differences in the frequency of the events in each category. In order to minimize the distortion of the result from the number of articles in different categories, we extracted 3,000 articles equally from each of the eight categories. Therefore, the total number of articles used in our experiments was 24,000. The eight categories were "IT Science," "Economy," "Society," "Life and Culture," "World," "Sports," "Entertainment," and "Politics." By using the news articles that we collected, we calculated the document/category correspondence scores by utilizing topic/category and document/topics correspondence scores. The document/category correspondence score can be said to indicate the degree of correspondence of each document to a certain category. As a result, we could present two additional categories for each of the 23,089 documents. Precision, recall, and F-score were revealed to be 0.605, 0.629, and 0.617 respectively when only the top 1 predicted category was evaluated, whereas they were revealed to be 0.838, 0.290, and 0.431 when the top 1 - 3 predicted categories were considered. It was very interesting to find a large variation between the scores of the eight categories on precision, recall, and F-score.