• Title/Summary/Keyword: ONNX

Search Result 4, Processing Time 0.021 seconds

Performance comparison of ONNX Runtime on embedded device and possibility of new runtime (임베디드 기기에서 ONNX Runtime 성능 비교와 새로운 Runtime 의 가능성)

  • Kim, Sungmin;Bum, Junghyun;Choo, Hyunseung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.886-888
    • /
    • 2020
  • ONNX 은 인공신경망 모델 교환을 위한 표준 중 하나이다. 인공신경망 모델을 구현하는 연구자 입장에선 ONNX 형태로 모델을 배포함으로써 이질적인 플랫폼 간의 호환성을 보장받을 수 있다. 서로 다른 플랫폼에서 ONNX 표준에선 ONNX 모델을 실행하는 엔진을 ONNX Runtime 이라고 하는데, ONNX Runtime 은 순수 S/W 형태이거나, 다양한 H/W 가속 기술과 결합된 형태가 있다. 본 논문에선 ONNX Backend Scoreboard 에 등록 되어있는 3 종류의 엔진과 본 논문에서 새롭게 제안하는 C-ONNX 의 성능을 웍스테이션과 임베디드 기기에서 비교해보고 임베디드 기기에 특화된 C-ONNX 의 가능성에 대해 알아본다.

Model Transformation and Inference of Machine Learning using Open Neural Network Format (오픈신경망 포맷을 이용한 기계학습 모델 변환 및 추론)

  • Kim, Seon-Min;Han, Byunghyun;Heo, Junyeong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.3
    • /
    • pp.107-114
    • /
    • 2021
  • Recently artificial intelligence technology has been introduced in various fields and various machine learning models have been operated in various frameworks as academic interest has increased. However, these frameworks have different data formats, which lack interoperability, and to overcome this, the open neural network exchange format, ONNX, has been proposed. In this paper we describe how to transform multiple machine learning models to ONNX, and propose algorithms and inference systems that can determine machine learning techniques in an integrated ONNX format. Furthermore we compare the inference results of the models before and after the ONNX transformation, showing that there is no loss or performance degradation of the learning results between the ONNX transformation.

Lightweight of ONNX using Quantization-based Model Compression (양자화 기반의 모델 압축을 이용한 ONNX 경량화)

  • Chang, Duhyeuk;Lee, Jungsoo;Heo, Junyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.1
    • /
    • pp.93-98
    • /
    • 2021
  • Due to the development of deep learning and AI, the scale of the model has grown, and it has been integrated into other fields to blend into our lives. However, in environments with limited resources such as embedded devices, it is exist difficult to apply the model and problems such as power shortages. To solve this, lightweight methods such as clouding or offloading technologies, reducing the number of parameters in the model, or optimising calculations are proposed. In this paper, quantization of learned models is applied to ONNX models used in various framework interchange formats, neural network structure and inference performance are compared with existing models, and various module methods for quantization are analyzed. Experiments show that the size of weight parameter is compressed and the inference time is more optimized than before compared to the original model.

Conversion Tools of Spiking Deep Neural Network based on ONNX (ONNX기반 스파이킹 심층 신경망 변환 도구)

  • Park, Sangmin;Heo, Junyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.165-170
    • /
    • 2020
  • The spiking neural network operates in a different mechanism than the existing neural network. The existing neural network transfers the output value to the next neuron via an activation function that does not take into account the biological mechanism for the input value to the neuron that makes up the neural network. In addition, there have been good results using deep structures such as VGGNet, ResNet, SSD and YOLO. spiking neural networks, on the other hand, operate more like the biological mechanism of real neurons than the existing activation function, but studies of deep structures using spiking neurons have not been actively conducted compared to in-depth neural networks using conventional neurons. This paper proposes the method of loading an deep neural network model made from existing neurons into a conversion tool and converting it into a spiking deep neural network through the method of replacing an existing neuron with a spiking neuron.