• Title/Summary/Keyword: Learning data set

Search Result 1,066, Processing Time 0.039 seconds

A study on the standardization strategy for building of learning data set for machine learning applications (기계학습 활용을 위한 학습 데이터세트 구축 표준화 방안에 관한 연구)

  • Choi, JungYul
    • Journal of Digital Convergence
    • /
    • v.16 no.10
    • /
    • pp.205-212
    • /
    • 2018
  • With the development of high performance CPU / GPU, artificial intelligence algorithms such as deep neural networks, and a large amount of data, machine learning has been extended to various applications. In particular, a large amount of data collected from the Internet of Things, social network services, web pages, and public data is accelerating the use of machine learning. Learning data sets for machine learning exist in various formats according to application fields and data types, and thus it is difficult to effectively process data and apply them to machine learning. Therefore, this paper studied a method for building a learning data set for machine learning in accordance with standardized procedures. This paper first analyzes the requirement of learning data set according to problem types and data types. Based on the analysis, this paper presents the reference model to build learning data set for machine learning applications. This paper presents the target standardization organization and a standard development strategy for building learning data set.

Study on the Improvement of Machine Learning Ability through Data Augmentation (데이터 증강을 통한 기계학습 능력 개선 방법 연구)

  • Kim, Tae-woo;Shin, Kwang-seong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.346-347
    • /
    • 2021
  • For pattern recognition for machine learning, the larger the amount of learning data, the better its performance. However, it is not always possible to secure a large amount of learning data with the types and information of patterns that must be detected in daily life. Therefore, it is necessary to significantly inflate a small data set for general machine learning. In this study, we study techniques to augment data so that machine learning can be performed. A representative method of performing machine learning using a small data set is the transfer learning technique. Transfer learning is a method of obtaining a result by performing basic learning with a general-purpose data set and then substituting the target data set into the final stage. In this study, a learning model trained with a general-purpose data set such as ImageNet is used as a feature extraction set using augmented data to detect a desired pattern.

  • PDF

Deep Learning for Pet Image Classification (애완동물 분류를 위한 딥러닝)

  • Shin, Kwang-Seong;Shin, Seong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.151-152
    • /
    • 2019
  • In this paper, we propose an improved learning method based on a small data set for animal image classification. First, CNN creates a training model for a small data set and uses the data set to expand the data set of the training set Second, a bottleneck of a small data set is extracted using a pre-trained network for a large data set such as VGG16 and stored in two NumPy files as a new training data set and a test data set, finally, learn the fully connected network as a new data set.

  • PDF

Performance Change accroding to Data Set Size Change in Semi-Supervised Learning based Object Detection (준지도 학습 기반 객체 탐지 모델에서 데이터셋 변화에 따른 성능 변화)

  • Seungsoo Yu;Wonjun Hwang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.11a
    • /
    • pp.88-90
    • /
    • 2022
  • Semi Supervised Learning 은 일부의 data 에는 labeling 을 하고 나머지 data 에는 labeling 을 안한채로 학습을 진행하는 방법이다. Object Detection 은 이미지에서 여러개의 객체들의 대한 위치를 여러개의 바운딩 박스로 지정해서 찾는 Computer Vision task 이다. 당연하게도, model training 단계에서 사용되는 data set 의 크기가 크고 객체가 많을 수록 일반적으로 model 의 성능이 좋아 질 것이다. 하지만 실험 환경에 따라 data set 을 잘 확보하지 못하던가, 실험 장치가 데이터 셋을 감당하지 못하는 등의 문제가 발생 할 수 있다. 그렇기에 본 논문에서는 semi supervised learning based object detection model 을 알아보고 data set 의 크기를 조절해가며 modle 을 training 시킨 뒤 data set 의 크기에 따라 성능이 어떻게 변화하는 지를 알아 볼 것이다.

  • PDF

INCREMENTAL INDUCTIVE LEARNING ALGORITHM IN THE FRAMEWORK OF ROUGH SET THEORY AND ITS APPLICATION

  • Bang, Won-Chul;Bien, Zeung-Nam
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1998.06a
    • /
    • pp.308-313
    • /
    • 1998
  • In this paper we will discuss a type of inductive learning called learning from examples, whose task is to induce general description of concepts from specific instances of these concepts. In many real life situations, however, new instances can be added to the set of instances. It is first proposed within the framework of rough set theory, for such cases, an algorithm to find minimal set of rules for decision tables without recalculation for overcall set of instances. The method of learning presented here is base don a rough set concept proposed by Pawlak[2][11]. It is shown an algorithm to find minimal set of rules using reduct change theorems giving criteria for minimum recalculation with an illustrative example. Finally, the proposed learning algorithm is applied to fuzzy system to learn sampled I/O data.

  • PDF

Blended-Transfer Learning for Compressed-Sensing Cardiac CINE MRI

  • Park, Seong Jae;Ahn, Chang-Beom
    • Investigative Magnetic Resonance Imaging
    • /
    • v.25 no.1
    • /
    • pp.10-22
    • /
    • 2021
  • Purpose: To overcome the difficulty in building a large data set with a high-quality in medical imaging, a concept of 'blended-transfer learning' (BTL) using a combination of both source data and target data is proposed for the target task. Materials and Methods: Source and target tasks were defined as training of the source and target networks to reconstruct cardiac CINE images from undersampled data, respectively. In transfer learning (TL), the entire neural network (NN) or some parts of the NN after conducting a source task using an open data set was adopted in the target network as the initial network to improve the learning speed and the performance of the target task. Using BTL, an NN effectively learned the target data while preserving knowledge from the source data to the maximum extent possible. The ratio of the source data to the target data was reduced stepwise from 1 in the initial stage to 0 in the final stage. Results: NN that performed BTL showed an improved performance compared to those that performed TL or standalone learning (SL). Generalization of NN was also better achieved. The learning curve was evaluated using normalized mean square error (NMSE) of reconstructed images for both target data and source data. BTL reduced the learning time by 1.25 to 100 times and provided better image quality. Its NMSE was 3% to 8% lower than with SL. Conclusion: The NN that performed the proposed BTL showed the best performance in terms of learning speed and learning curve. It also showed the highest reconstructed-image quality with the lowest NMSE for the test data set. Thus, BTL is an effective way of learning for NNs in the medical-imaging domain where both quality and quantity of data are always limited.

Leveraging Big Data for Spark Deep Learning to Predict Rating

  • Mishra, Monika;Kang, Mingoo;Woo, Jongwook
    • Journal of Internet Computing and Services
    • /
    • v.21 no.6
    • /
    • pp.33-39
    • /
    • 2020
  • The paper is to build recommendation systems leveraging Deep Learning and Big Data platform, Spark to predict item ratings of the Amazon e-commerce site. Recommendation system in e-commerce has become extremely popular in recent years and it is very important for both customers and sellers in daily life. It means providing the users with products and services they are interested in. Therecommendation systems need users' previous shopping activities and digital footprints to make best recommendation purpose for next item shopping. We developed the recommendation models in Amazon AWS Cloud services to predict the users' ratings for the items with the massive data set of Amazon customer reviews. We also present Big Data architecture to afford the large scale data set for storing and computation. And, we adopted deep learning for machine learning community as it is known that it has higher accuracy for the massive data set. In the end, a comparative conclusion in terms of the accuracy as well as the performance is illustrated with the Deep Learning architecture with Spark ML and the traditional Big Data architecture, Spark ML alone.

Finding Unexpected Test Accuracy by Cross Validation in Machine Learning

  • Yoon, Hoijin
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.549-555
    • /
    • 2021
  • Machine Learning(ML) splits data into 3 parts, which are usually 60% for training, 20% for validation, and 20% for testing. It just splits quantitatively instead of selecting each set of data by a criterion, which is very important concept for the adequacy of test data. ML measures a model's accuracy by applying a set of validation data, and revises the model until the validation accuracy reaches on a certain level. After the validation process, the complete model is tested with the set of test data, which are not seen by the model yet. If the set of test data covers the model's attributes well, the test accuracy will be close to the validation accuracy of the model. To make sure that ML's set of test data works adequately, we design an experiment and see if the test accuracy of model is always close to its validation adequacy as expected. The experiment builds 100 different SVM models for each of six data sets published in UCI ML repository. From the test accuracy and its validation accuracy of 600 cases, we find some unexpected cases, where the test accuracy is very different from its validation accuracy. Consequently, it is not always true that ML's set of test data is adequate to assure a model's quality.

A Model of Strawberry Pest Recognition using Artificial Intelligence Learning

  • Guangzhi Zhao
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.2
    • /
    • pp.133-143
    • /
    • 2023
  • In this study, we propose a big data set of strawberry pests collected directly for diagnosis model learning and an automatic pest diagnosis model architecture based on deep learning. First, a big data set related to strawberry pests, which did not exist anywhere before, was directly collected from the web. A total of more than 12,000 image data was directly collected and classified, and this data was used to train a deep learning model. Second, the deep-learning-based automatic pest diagnosis module is a module that classifies what kind of pest or disease corresponds to when a user inputs a desired picture. In particular, we propose a model architecture that can optimally classify pests based on a convolutional neural network among deep learning models. Through this, farmers can easily identify diseases and pests without professional knowledge, and can respond quickly accordingly.

A Study on Training Data Selection Method for EEG Emotion Analysis using Semi-supervised Learning Algorithm (준 지도학습 알고리즘을 이용한 뇌파 감정 분석을 위한 학습데이터 선택 방법에 관한 연구)

  • Yun, Jong-Seob;Kim, Jin Heon
    • Journal of IKEEE
    • /
    • v.22 no.3
    • /
    • pp.816-821
    • /
    • 2018
  • Recently, machine learning algorithms based on artificial neural networks started to be used widely as classifiers in the field of EEG research for emotion analysis and disease diagnosis. When a machine learning model is used to classify EEG data, if training data is composed of only data having similar characteristics, classification performance may be deteriorated when applied to data of another group. In this paper, we propose a method to construct training data set by selecting several groups of data using semi-supervised learning algorithm to improve these problems. We then compared the performance of the two models by training the model with a training data set consisting of data with similar characteristics to the training data set constructed using the proposed method.