• Title/Summary/Keyword: floating-point arithmetic

Search Result 66, Processing Time 0.027 seconds

Analysis of Some Strange Behaviors of Floating Point Arithmetic using MATLAB Programs (MATLAB을 이용한 부동소수점 연산의 특이사항 분석)

  • Chung, Tae-Sang
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.56 no.2
    • /
    • pp.428-431
    • /
    • 2007
  • A floating-point number system is used to represent a wide range of real numbers using finite number of bits. The standard the IEEE adopted in 1987 divides the range of real numbers into intervals of [$2^E,2^{E+1}$), where E is an Integer represented with finite bits, and defines equally spaced equal counts of discrete numbers in each interval. Since the numbers are defined discretely, not only the number representation itself includes errors but in floating-point arithmetic some strange behaviors are observed which cannot be agreed with the real world arithmetic. In this paper errors with floating-point number representation, those with arithmetic operations, and those due to order of arithmetic operations are analyzed theoretically with help of and verification with the results of some MATLAB program executions.

Algebraic Accuracy Verification for Division-by-Convergence based 24-bit Floating-point Divider Complying with OpenGL (Division-by-Convergence 방식을 사용하는 24-비트 부동소수점 제산기에 대한 OpenGL 정확도의 대수적 검증)

  • Yoo, Sehoon;Lee, Jungwoo;Kim, Kichul
    • Journal of IKEEE
    • /
    • v.17 no.3
    • /
    • pp.346-351
    • /
    • 2013
  • Low-cost and low-power are important requirements in mobile systems. Thus, when a floating-point arithmetic unit is needed, 24-bit floating-point format can be more useful than 32-bit floating-point format. However, a 24-bit floating-point arithmetic unit can be risky because it usually has lower accuracy than a 32-bit floating-point arithmetic unit. Consecutive floating-point operations are performed in 3D graphic processors. In this case, the verification of the floating-point operation accuracy is important. Among 3D graphic arithmetic operations, the floating-point division is one of the most difficult operations to satisfy the accuracy of $10^{-5}$ which is the required accuracy in OpenGL ES 3.0. No 24-bit floating-point divider, whose accuracy is algebraically verified, has been reported. In this paper, a 24-bit floating-point divider is analyzed and it is algebraically verified that its accuracy satisfies the OpenGL requirement.

A design of floating-point arithmetic unit for superscalar microprocessor (수퍼스칼라 마이크로프로세서용 부동 소수점 연산회로의 설계)

  • 최병윤;손승일;이문기
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.5
    • /
    • pp.1345-1359
    • /
    • 1996
  • This paper presents a floating point arithmetic unit (FPAU) for supescalar microprocessor that executes fifteen operations such as addition, subtraction, data format converting, and compare operation using two pipelined arithmetic paths and new rounding and normalization scheme. By using two pipelined arithmetic paths, each aritchmetic operation can be assigned into appropriate arithmetic path which high speed operation is possible. The proposed normalization an rouding scheme enables the FPAU to execute roundig operation in parallel with normalization and to reduce timing delay of post-normalization. And by predicting leading one position of results using input operands, leading one detection(LOD) operation to normalize results in the conventional arithmetic unit can be eliminated. Because the FPAU can execuate fifteen single-precision or double-precision floating-point arithmetic operations through three-stage pipelined datapath and support IEEE standard 754, it has appropriate structure which can be ingegrated into superscalar microprocessor.

  • PDF

A Fixed-point Digital Signal Processor Development System Employing an Automatic Scaling (자동 스케일링 기능이 지원되는 고정 소수집 디지털 시그날 프로세서 개발 시스템)

  • 김시현;성원용
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.29A no.3
    • /
    • pp.96-105
    • /
    • 1992
  • The use of fixed-point digital signal processors, such as the TMS 320C25, requires scaling of data at each arithmetic step to prevent overflows while keeping the accuracy. A software which automatizes this process is developed for TMS 320C25. The programmers use a model of a hypothetical floating-point digital signal processor and a floating-point format for data representation. However, the program and data are automatically translated to a fixed-point version by this software. Thus, the execution speed is not sacrificed. A fixed-point variable has a unique binary-point location, which is dependent on the range of the variable. The range is estimated from the floating-point simulation. The number of shifts needed for arithmetic or data transfer step is determined by the binary-points of the variables associated with the operation. A fixed-point code generator is also developed by using the proposed automatic scaling software. This code generator produces floating-point assembly programs from the specifiations of FIR, IIR, and adaptive transversal filters, then floating-point programs are transformed to fixed-point versions by the automatic scaling software.

  • PDF

Design of a Floating Point Multiplier for IEEE 754 Single-Precision Operations (IEEE 754 단정도 부동 소수점 연산용 곱셈기 설계)

  • Lee, Ju-Hun;Chung, Tae-Sang
    • Proceedings of the KIEE Conference
    • /
    • 1999.11c
    • /
    • pp.778-780
    • /
    • 1999
  • Arithmetic unit speed depends strongly on the algorithms employed to realize the basic arithmetic operations.(add, subtract multiply, and divide) and on the logic design. Recent advances in VLSI have increased the feasibility of hardware implementation of floating point arithmetic units and microprocessors require a powerful floating-point processing unit as a standard option. This paper describes the design of floating-point multiplier for IEEE 754-1985 Single-Precision operation. Booth encoding algorithm method to reduce partial products and a Wallace tree of 4-2 CSA is adopted in fraction multiplication part to generate the $32{\times}32$ single-precision product. New scheme of rounding and sticky-bit generation is adopted to reduce area and timing. Also there is a true sign generator in this design. This multiplier have been implemented in a ALTERA FLEX EPF10K70RC240-4.

  • PDF

A Design of Dual-Phase Instructions for a effective Logarithm and Exponent Arithmetic (효율적인 로그와 지수 연산을 위한 듀얼 페이즈 명령어 설계)

  • Kim, Chi-Yong;Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.14 no.2
    • /
    • pp.64-68
    • /
    • 2010
  • This paper proposes efficient log and exponent calculation methods using a dual phase instruction set without additional ALU unit for a mobile enviroment. Using the Dual Phase Instruction set, it extracts exponent and mantissa from expression of floating point and calculates 24bit single precision floating point of log approximation using the Taylor series expansion algorithm. And with dual phase instruction set, it reduces instruction excution cycles. The proposed Dual Phase architecture reduces the performance degradation and maintain smaller size.

Design of Pipelined Floating-Point Arithmetic Unit for Mobile 3D Graphics Applications

  • Choi, Byeong-Yoon;Ha, Chang-Soo;Lee, Jong-Hyoung;Salclc, Zoran;Lee, Duck-Myung
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.6
    • /
    • pp.816-827
    • /
    • 2008
  • In this paper, two-stage pipelined floating-point arithmetic unit (FP-AU) is designed. The FP-AU processor supports seventeen operations to apply 3D graphics processor and has area-efficient and low-latency architecture that makes use of modified dual-path computation scheme, new normalization circuit, and modified compound adder based on flagged prefix adder. The FP-AU has about 4-ns delay time at logic synthesis condition using $0.18{\mu}m$ CMOS standard cell library and consists of about 5,930 gates. Because it has 250 MFLOPS execution rate and supports saturated arithmetic including a number of graphics-oriented operations, it is applicable to mobile 3D graphics accelerator efficiently.

  • PDF

A design of Floating Point Arithmetic Unit for Geometry Operation of Mobile 3D Graphic Processor (모바일 3D 그래픽 프로세서의 지오메트리 연산을 위한 부동 소수점 연산기 구현)

  • Lee, Jee-Myong;Lee, Chan-Ho
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.711-714
    • /
    • 2005
  • We propose floating point arithmetic units for geometry operation of mobile 3D graphic processor. The proposed arithmetic units conform to the single precision format of IEEE standard 754-1985 that is a standard of floating point arithmetic. The rounding algorithm applies the nearest toward zero form. The proposed adder/subtraction unit and multiplier have one clock cycle latency, and the inversion unit has three clock cycle latency. We estimate the required numbers of arithmetic operation for Viewing transformation. The first stage of geometry operation is composed with translation, rotation and scaling operation. The translation operation requires three addition and the rotation operation needs three addition and six multiplication. The scaling operation requires three multiplication. The viewing transformation is performed in 15 clock cycles. If the adder and the multiplier have their own in/out ports, the viewing transformation can be done in 9 clock cycles. The error margin of proposed arithmetic units is smaller than $10^{-5}$ that is the request in the OpenGL standard. The proposed arithmetic units carry out operations in 100MHz clock frequency.

  • PDF

Design and Simulation of ARM Processor with Floating Point Instructions (부동소수점 명령어를 지원하는 ARM 프로세서의 설계 및 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.187-193
    • /
    • 2020
  • Floating point arithmetic in microprocessor is the computation of addition, subtraction, multiplication, and division of floating point data to improve accuracy. In general, when designing a processor, floating point instructions are often excluded because of its complexity and only integer instructions are provided. However, in order to carry out the computations for not only engineering and technical operations but also artificial intelligence and neural networks that are in the spotlight today, floating point operations must be included. In this paper, we design a 32-bit ARMv4 family of processors with floating-point arithmetic instructions using VHDL and verify with ModelSim. As a result, ARM's floating point instructions are successfully executed.

MLP Design Method Optimized for Hidden Neurons on FPGA (FPGA 상에서 은닉층 뉴런에 최적화된 MLP의 설계 방법)

  • Kyoung Dong-Wuk;Jung Kee-Chul
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.429-438
    • /
    • 2006
  • Neural Networks(NNs) are applied for solving a wide variety of nonlinear problems in several areas, such as image processing, pattern recognition etc. Although NN can be simulated by using software, many potential NN applications required real-time processing. Thus they need to be implemented as hardware. The hardware implementation of multi-layer perceptrons(MLPs) in several kind of NNs usually uses a fixed-point arithmetic due to a simple logic operation and a shorter processing time compared to the floating-point arithmetic. However, the fixed-point arithmetic-based MLP has a drawback which is not able to apply the MLP software that use floating-point arithmetic. We propose a design method for MLPs which has the floating-point arithmetic-based fully-pipelining architecture. It has a processing speed that is proportional to the number of the hidden nodes. The number of input and output nodes of MLPs are generally constrained by given problems, but the number of hidden nodes can be optimized by user experiences. Thus our design method is using optimized number of hidden nodes in order to improve the processing speed, especially in field of a repeated processing such as image processing, pattern recognition, etc.