• Title/Summary/Keyword: Floating-point

Search Result 494, Processing Time 0.032 seconds

Algebraic Accuracy Verification for Division-by-Convergence based 24-bit Floating-point Divider Complying with OpenGL (Division-by-Convergence 방식을 사용하는 24-비트 부동소수점 제산기에 대한 OpenGL 정확도의 대수적 검증)

  • Yoo, Sehoon;Lee, Jungwoo;Kim, Kichul
    • Journal of IKEEE
    • /
    • v.17 no.3
    • /
    • pp.346-351
    • /
    • 2013
  • Low-cost and low-power are important requirements in mobile systems. Thus, when a floating-point arithmetic unit is needed, 24-bit floating-point format can be more useful than 32-bit floating-point format. However, a 24-bit floating-point arithmetic unit can be risky because it usually has lower accuracy than a 32-bit floating-point arithmetic unit. Consecutive floating-point operations are performed in 3D graphic processors. In this case, the verification of the floating-point operation accuracy is important. Among 3D graphic arithmetic operations, the floating-point division is one of the most difficult operations to satisfy the accuracy of $10^{-5}$ which is the required accuracy in OpenGL ES 3.0. No 24-bit floating-point divider, whose accuracy is algebraically verified, has been reported. In this paper, a 24-bit floating-point divider is analyzed and it is algebraically verified that its accuracy satisfies the OpenGL requirement.

Design of the floating point multiplier performing IEEE rounding and addition in parallel (IEEE 반올림과 덧셈을 동시에 수행하는 부동 소수점 곱셈 연산기 설계)

  • 박우찬;정철호
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.11
    • /
    • pp.47-55
    • /
    • 1997
  • In general, processing flow of the conventional floating-point multiplication consists of either multiplication, addition, normalization, and rounding stage of the conventional floating-point multiplier requries a high speed adder for increment, increasing the overall execution time and occuping a large amount of chip area. A floating-point multiplier performing addition and IEEE rounding in parallel is designed by using the carry select addder used in the addition stage and optimizing the operational flow based on the charcteristics of floating point multiplication operation. A hardware model for the floating point multiplier is proposed and its operational model is algebraically analyzed in this paper. The proposed floating point multiplier does not require and additional execution time nor any high spped adder for rounding operation. Thus, performance improvement and cost-effective design can be achieved by this suggested approach.

  • PDF

Design and Simulation of ARM Processor with Floating Point Instructions (부동소수점 명령어를 지원하는 ARM 프로세서의 설계 및 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.187-193
    • /
    • 2020
  • Floating point arithmetic in microprocessor is the computation of addition, subtraction, multiplication, and division of floating point data to improve accuracy. In general, when designing a processor, floating point instructions are often excluded because of its complexity and only integer instructions are provided. However, in order to carry out the computations for not only engineering and technical operations but also artificial intelligence and neural networks that are in the spotlight today, floating point operations must be included. In this paper, we design a 32-bit ARMv4 family of processors with floating-point arithmetic instructions using VHDL and verify with ModelSim. As a result, ARM's floating point instructions are successfully executed.

Goldschmidt's Double Precision Floating Point Reciprocal Computation using 32 bit multiplier (32 비트 곱셈기를 사용한 골드스미트 배정도실수 역수 계산기)

  • Cho, Gyeong-Yeon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.5
    • /
    • pp.3093-3099
    • /
    • 2014
  • Modern graphic processors, multimedia processors and audio processors mostly use floating-point number. Meanwhile, high-level language such as C and Java uses both single-precision and double precision floating-point number. In this paper, an algorithm which computes the reciprocal of double precision floating-point number using a 32 bit multiplier is proposed. It divides the mantissa of double precision floating-point number to upper part and lower part, and calculates the reciprocal of the upper part with Goldschmidt's algorithm, and computes the reciprocal of double precision floating-point number with calculated upper part reciprocal as the initial value is proposed. Since the number of multiplications performed by the proposed algorithm is dependent on the mantissa of floating-point number, the average number of multiplications per an operation is derived from some reciprocal tables with varying sizes.

A Fixed-point Digital Signal Processor Development System Employing an Automatic Scaling (자동 스케일링 기능이 지원되는 고정 소수집 디지털 시그날 프로세서 개발 시스템)

  • 김시현;성원용
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.29A no.3
    • /
    • pp.96-105
    • /
    • 1992
  • The use of fixed-point digital signal processors, such as the TMS 320C25, requires scaling of data at each arithmetic step to prevent overflows while keeping the accuracy. A software which automatizes this process is developed for TMS 320C25. The programmers use a model of a hypothetical floating-point digital signal processor and a floating-point format for data representation. However, the program and data are automatically translated to a fixed-point version by this software. Thus, the execution speed is not sacrificed. A fixed-point variable has a unique binary-point location, which is dependent on the range of the variable. The range is estimated from the floating-point simulation. The number of shifts needed for arithmetic or data transfer step is determined by the binary-points of the variables associated with the operation. A fixed-point code generator is also developed by using the proposed automatic scaling software. This code generator produces floating-point assembly programs from the specifiations of FIR, IIR, and adaptive transversal filters, then floating-point programs are transformed to fixed-point versions by the automatic scaling software.

  • PDF

Design of a Floating Point Unit for 3D Graphics Geometry Engine (3D 그래픽 Geometry Engine을 위한 부동소수점 연산기의 설계)

  • Kim, Myeong Hwm;Oh, Min Seok;Lee, Kwang Yeob;Kim, Won Jong;Cho, Han Jin
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.10 s.340
    • /
    • pp.55-64
    • /
    • 2005
  • In this paper, we designed floating point units to accelate real-time 3D Graphics for Geometry processing. Designed floating point units support IEEE-754 single precision format and we confirmed 100 MHz performance of floating point add/mul unit, 120 MHz performance of floating point NR inverse division unit, 200 MHz performance of floating point power unit, 120 MHz performance of floating point inverse square root unit at Xilinx-vertex2. Also, using floating point units, designed Geometry processor and confirmed 3D Graphics data processing.

Newton-Raphson's Double Precision Reciprocal Using 32 bit multiplier (32 비트 곱셈기를 사용한 뉴톤-랍손 배정도실수 역수 계산기)

  • Cho, Gyeong-Yeon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.18 no.6
    • /
    • pp.31-37
    • /
    • 2013
  • Modern graphic processors, multimedia processors and audio processors mostly use floating-point number. High-level language such as C and Java use both single precision and double precision floating-point number. In this paper, an algorithm which computes the reciprocal of double precision floating-point number using a 32 bit multiplier is proposed. It divides the mantissa of double precision floating-point number to upper part and lower part, and calculates the reciprocal of the upper part with Newton-Raphson algorithm. And it computes the reciprocal of double precision floating-point number with calculated upper part reciprocal as the initial value. Since the number of multiplications performed by the proposed algorithm is dependent on the mantissa of floating-point number, the average number of multiplications per an operation is derived from some reciprocal tables with varying sizes.

A Study on the Behavior of Floating-Point Unit Conforming the ANSI/IEEE Std. 754-1985 (ANSI/IEEE Std. 754-1985에 의거한 부동소수점 연산기의 동작원리에 관한 연구)

  • Kim, Kwang-Uk;Chung, Tae-Sang
    • Proceedings of the KIEE Conference
    • /
    • 1999.11c
    • /
    • pp.788-790
    • /
    • 1999
  • A software implementation of floating-point addition and multiplication is presented. For this, the ANSI/IEEE standard for binary floating-point arithmetic is reviewed briefly. The architecture and behavior of the $Intel^{(R)}\;80{\times}87$ FPU is fully studied and basic algorithms for floating-point addition and multiplication are used for the implementation. Some examples and their verifications are also presented.

  • PDF

C++ Template-based Fixed-Point Arithmetic Library (C++ 템플릿 기반의 Fixed-Point 연산 라이브러리)

  • Hwang, Seon Joong;Kim, Seon Wook;Min, Byung Gueon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.04a
    • /
    • pp.49-52
    • /
    • 2010
  • 디지털 신호처리 알고리즘들은 실제 시스템에 적용할 때 임베디드 시스템 등 하드웨어의 성능과 소비전력 및 비용에 제약이 있을 경우 연산 정밀도가 높은 floating-point 연산 대신 제한된 정밀도와 적은 연산 비용을 요구하는 fixed-point 연산을 사용하여 구현한다. 시스템의 개발단계에서는 적용할 알고리즘을 floating-point 연산을 이용한 코드를 먼저 작성한 후 이를 fixed-point 연산으로 대체하는 과정을 거치게 되는데, 이는 숙련된 개발자와 상당한 양의 개발기간을 요하는 까다로운 작업이다. 이에 본 연구에는 코드작성 편의를 높이고 개발기간을 단축하기 위해 C++ template 기반의 fixed-point 연산 라이브러리를 개발하였다. 이는 floating-point 연산 코드와 fixed-point 연산 코드를 별도로 개발할 필요 없이 하나의 코드를 이용하여 자유로이 연산 정밀도를 지정할 수 있으며 개발자는 기존의 floating-point 연산을 이용하는 코드를 작성하는 것처럼 쉽게 코드를 작성할 수 있도록 한다. 또한, template 기반으로 작성되어 기존의 연구들과 달리 추가적인 작업도구 없이도 범용 C++ 컴파일러가 최적화된 코드를 생성할 수 있도록 되어있는 것이 특징이다.

A Variable Latency K'th Order Newton-Raphson's Floating Point Number Divider (가변 시간 K차 뉴톤-랍손 부동소수점 나눗셈)

  • Cho, Gyeong-Yeon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.5
    • /
    • pp.285-292
    • /
    • 2014
  • The commonly used Newton-Raphson's floating-point number divider algorithm performs two multiplications in one iteration. In this paper, a tentative K'th Newton-Raphson's floating-point number divider algorithm which performs K times multiplications in one iteration is proposed. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications per an operation in single precision and double precision divider is derived from many reciprocal tables with varying sizes. In addition, an error correction algorithm, which consists of one multiplication and a decision, to get exact result in divider is proposed. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a floating point number divider unit. Also, it can be used to construct optimized approximate reciprocal tables.