논문 2012-49TC-5-3

## 나눗셈회로가 필요없는 치엔머신의 최적설계

## (Optimizing the Chien Search Machine without using Divider)

안 형 근\*

(Hyeong-Keon An)

본논문을 통해, 리드솔로몬 복호기에서 매우 복잡한 나누기회로를 사용않고, 오류위치를 찾아내는 치엔기기의 최적설계기 법을 제시했다. 최적화는 매우간단한 제곱/4제곱회로를 사용하고, 병렬처리를 통해 가능했다. 이법은 현대 디지털 통신및/가전 기기 대부분에 응용되질수 있다.

### Abstract

In this paper, we show new method to find the error locations of received Reed-Solomon code word . New design is much faster and has much simpler logic circuit than the former design method. This optimization was possible by very simplified square/ $X^4$  calculating circuit ,parallel processing and not using the very complex Divider. The Reed Solomon decoder using this new Chien Machine can be applicated for data protection of almost all digital communication and consumer electronic devices[7].

Keywords: Reed-Solomon(RS), Decoder, GF(24), Square computing, Digital, Chien search , Divider

### I. Introduction

Reed–Solomon Encoder and Decoder are commonly used in data transmission and storage applications, such as broadcast equipment, wireless LANs, cable modems, xDSL, satellite commu nications, microwave networks, and digital TV. In this paper, we show how to optimize the Chien search machine of RS codec. In abstract algebra, the Chien search, named after R. T. Chien, is a fast algorithm for determining roots of polynomials defined over a finite field. The most typical use of the Chien search is in finding the roots of error–locator polynomials encountered in decoding Reed–Solomon codes and BCH codes.

In this paper, we propose new method to optimize

(Tong Myung University)

the arithmatic logic unit for the Chien Machine<sup>[6]</sup>. To do so, huge  $GF(2^8)$  multiplier is replaced by much simpler  $GF(2^4)$  multiplier , squaring and  $X^4$  finding circuits. Also very complicated divider is removed. Equation 1 shows the nth order error locator polynomial whose solutions are v error locations in 1 RS code word. In section II we present the flow steps to optimize the circuit to calculate the coefficients of error locator polynomial (equation (1)). In section III, we present how to optimize the processor structure . In section IV, we showed the application of the new optimized processor to the 4 error cases in 1 RS code word. In section V, we showed step by step example to find 4 error locations in the received code word using the processor.

Finally, in section VI we make a concluding remark and future works to improve the design method of Reed Solomon decoder.

<sup>\*</sup> 정회원, 동명대학교

접수일자: 2012년2월21일, 수정완료일: 2012년5월12일

## II. Optimizing the processor for calculating the coefficients of Error locator polynomial

Error locator polynomial is as Equation (1).

$$\mathbf{x}^{v} + \sigma_1 \mathbf{x}^{v-1} + \dots + \sigma_{v-1} \mathbf{x} + \sigma_v = 0 \tag{1}$$

$$A_{n} = \begin{bmatrix} S_{1} & S_{2} & \cdots & S_{n} \\ S_{2} & S_{3} & \cdots & S_{n+1} \\ \vdots & \cdots & \vdots \\ S_{n} & S_{n+1} & \cdots & S_{2n-1} \end{bmatrix}$$
(2)

In Equation (2),  $A_n$  is Nth order Characteristic matrix and  $S_k$ 's are kth order syndromes. Then, if there are v errors in the Reed solomon code, Coefficients of error locator polynomial are calculated as in equation (3)<sup>[2,4]</sup>.

$$\delta_{v} = \begin{pmatrix} \sigma_{v} \\ \sigma_{v-1} \\ \sigma_{v-2} \\ \vdots \\ \sigma_{1} \end{pmatrix} = A_{v}^{-1} \begin{bmatrix} S_{v+1} \\ S_{v+2} \\ S_{v+3} \\ \vdots \\ S_{2v} \end{bmatrix}$$
(3)

Now we define new error locator polynomial coefficient vector as in equation (4).

$$\delta'_v = \operatorname{Det}(\mathbf{A}_v) \ \delta_v \tag{4}$$

$$\delta_{v}^{'} = \operatorname{Adj}(A_{v}) \bullet \begin{pmatrix} S_{v+1} \\ S_{v+2} \\ \vdots \\ S_{2v} \end{pmatrix}$$
(5)

and new Error locator polynomial is :

$$Det(A_{v}) \bullet x^{v} + \sum_{k=1}^{v} \sigma_{k}^{'} X^{v-k} = 0$$
(6)

By doing this, when we solve the error locator Polynomial equation, we need not use the dividing circuit, just using multiplier and adder to solve the equation  $(6)^{[5]}$ .

In this way, we don't need divider circuit to solve the new error locator polynomial coefficient values. Also for calculating equation 1 we can use  $X^4$ ,  $X^2$ circuit in addition to mul tiplier to speedup the calculation by parallel processing. In fig.1, we show



### 그림 1. GF(2<sup>4</sup>) 4제곱기회로

Fig. 1.  $X^4$  circuit in GF(2<sup>4</sup>).



그림 2. 3개의 GF(2<sup>4</sup>)승산기와 4 개의 덧셈기를 사용한 GF(2<sup>8</sup>) 승산기

Fig. 2. F(2<sup>8</sup>) multiplier calculation by 3 GF(2<sup>4</sup>) multipliers and 4 adders.

 $X^4$ ,  $X^2$  circuit in GF(2<sup>4</sup>). As we see in fig.1 the circuits are very simple. Also huge GF(2<sup>8</sup>) multiplier is replaced by 3 GF(2<sup>4</sup>) multipliers and 4 GF(2<sup>4</sup>) adders as shown in fig.2.

Hence to optimize the processor circuiit :

(1) We added simple  $X^4$ ,  $X^2$  circuits to do parallel processing to speed up the computing.

(2) We replaced the huge  $GF(2^8)$  multiplier wi th 3  $GF(2^4)$  multipliers and 4  $GF(2^4)$  adders to simplify the circuit and speedup the processor in  $GF(2^4)$ .

(3) Total gate count of the 3 GF( $2^4$ ) multipliers, 4 GF( $2^4$ ) adders and X<sup>2</sup>/X<sup>4</sup> is smaller than that of the huge GF( $2^8$ ) multiplier<sup>[6, 8]</sup>.

(4) We don't use the huge divider circuit to

(7)

simplify the machine greatly.

In Fig. 1, the  $X^4$  circuit is derived by repeatedly apply the  $X^2$  circuit.

# III. Optimized Processor structure and operation for Chien search machine

All the GF(28) operations are converted to GF(2<sup>4</sup>) operations for optimization and for par allel processing, we need Multiplier, Squarer, X<sup>4</sup> circuits. Let

C = A X B, and D= A<sup>2</sup>, E = A<sup>4</sup> where

 $C,D,A,B,E \in GF(2^8)$ 

Here if

$$C = C0 + \beta C1$$
,  $C1$  and  $C0 \in GF(2^4)$ 

Also

D= D0 + 
$$\beta$$
D1, D0 and D1  $\in$  GF(2<sup>4</sup>)

and

$$E = E0 + \beta E1$$
. E0 and  $E1 \in GF(2^4)$ 

Then

$$C0= A0B0 + A1B1\chi$$
  
C1= A0B1+ A1B0+A1B1 (8)

Also





$$C = C0 + \beta(A0B0 + (A0+A1)(B0+B1))$$
(10)

$$E0= A0^{4} + A1^{4} y^{2} + A1^{4} y$$
$$E1= A1^{4}$$
(11)

Now we show  $X^4$  circuit in Fig. 3.

If we describe  $X^4(\mathrm{GF}(2^8))\text{operation}$  in  $\mathrm{GF}(2^4)$  micro executions :

A1<sup>4</sup>; E1, X<sup>4</sup> in  $GF(2^4)$ 

y A1<sup>4</sup> ; y multiplier, y  $^2$  A1<sup>4</sup> ; y multiplier

$$A0^4 + A1^4 y^2 + A1^4 y$$
; Adder in  $GF(2^4)$ 





그림 4. (a) GF(24)덧셈기, 승산기, y승산기를 를 포함하 는 최적 처리기 구조 (b) 머신 1과 버스의 연결 Fig. 4. (a) Optimized Processor Structure, which contains GF(24) adders, Multipliers, y multiplier, GF(24) X<sup>4</sup>,X<sup>2</sup> units (b) Machine 1 connection to buses.

Now when we see equations 9, 10, 11 we see that 3 GF(2<sup>4</sup>) multipliers,4 GF(2<sup>4</sup>) adders,  $\chi$  multiplier,X<sup>4</sup>, X<sup>2</sup> circuits in GF(2<sup>4</sup>) are enough to do GF(2<sup>8</sup>) multiply, X<sup>4</sup>,X<sup>2</sup> operations simultaneously. Using these GF(2<sup>4</sup>) micro execution units, we can make faster and logically simpler processor for calculating error locations of Reed Solomon decoder<sup>[3, 8]</sup>. In Fig.4 we show the Processor structure.

### IV. Application of optimized Processor to the analysis of 4 error case

Error locator polynomial for 4 error case is as in equation (12).

$$Det(A_4)x^4 + \sigma_1'x^3 + \sigma_2'x^2 + \sigma_3'x + \sigma_4' = 0$$
(12)

as we already showed  $X^4 \in GF(2^8)$  circuit in section 3,we here show  $X^2$  and multiply operation using  $GF(2^4)$  execution units. So parallel processing of  $X^4, X^2$  and multiply in  $GF(2^8)$  field is possible.

 $X^2 {\in} GF(2^8)$  operation in  $GF(2^4)$  micro executions :  $A0^2$  :  $X^2 \ \in \ GF(2^4)$  execution

 $A1^2$ : D1, $X^2 \in GF(2^4)$  execution

 $A1^2$  y : y multiplier

 $A0^{2+}$   $A1^{2}$  y :  $GF(2^{4})$  Adder ,D0

These micro operations are from equation (9).

Multiply in  $GF(2^8)$  using  $GF(2^4)$  microexecutions ; A1B1.A0B0 : 2  $GF(2^4)$  multiplying

A0+A1, B0+B1 : 2  $GF(2^4)$  adder

(A0+A1)(B0+B1) : 3rd  $GF(2^4)$  multiplying

(A0+A1)( B0+B1)+A0B0 : C1 3rd GF(24) adder

A1B1y : y multiplier

A0B0 + A1B1y : GF(2<sup>4</sup>) adder, C0

These micro operations are from equation (10).

Since  $GF(2^4)$  multiply takes much more time than  $GF(2^4)$  adding,  $x^2$  and  $x^4$ , we can parallelly process  $GF(2^8)$  multiply,  $x^2$ ,  $x^4$  executions without computational time loss. In Fig.5 we show critical paths of  $GF(2^4)$  multiplier and we see that it is 2.5 times longer than  $GF(2^4)$   $x^2$  and  $x^4$  circuit in Fig.1.

The reason why we do X<sup>2</sup>, X<sup>4</sup>, multiply operations



```
그림 5. GF(24) 승산기의 최장경로
```

Fig. 5. Critical path of GF(24) multiplier (x2->z0).

all in  $GF(2^8)$  field ,using components (execution all in  $GF(2^8)$  field, using execution units in  $GF(2^4)$ , is: We can execute  $X^2$ ,  $X^4$ , multiplier units of  $GF(2^4)$  simultaneously.

So we can greately save the Comp uting time.

IV-1.  $X^k$  (k=1,2,3,4) computing

1. Compute X<sup>2</sup>, X<sup>4</sup> by GF(2<sup>4</sup>) X<sup>2</sup>, X<sup>4</sup> circuits

2. Compute  $X^3$  by  $X^2$  multiply X using  $GF(2^4)$  multipliers.

IV-2. Error locator polynomial coefficients computing

Now from (6)

$$\sum_{k=0}^{4} \sigma'_{k} X^{-k} = 0 \text{ and } \sigma'_{0} = \det(A_{4})$$
(13)

Here we need not use the Dividing circuit.

1. for example to compute

$$\sigma_{4}' = S_{5} \gamma_{1} + S_{6} \gamma_{2} + S_{7} \gamma_{3} + S_{8} \gamma_{4} = (S_{3}S_{5}^{2}S_{7} + S_{5}^{4} + S_{5}S_{6}^{2}S_{3} + S_{4}^{2}S_{7}S_{5}) + (S_{2}S_{6}^{3} + S_{3}S_{4}S_{7}S_{6} + S_{4}S_{5}^{2}S_{6} + S_{2}S_{5}S_{7}S_{6} + S_{3}S_{5}S_{6}^{2} + S_{4}^{2}S_{6}^{2}) + (14) + (S_{2}S_{4}S_{7}^{2} + S_{3}S_{5}^{2}S_{7} + S_{3}S_{4}S_{6}S_{7} + S_{4}^{2}S_{5}S_{7} + S_{2}S_{5}S_{6}S_{7} + S_{3}^{2}S_{7}^{2}) + (S_{2}S_{4}S_{6}S_{8} + S_{4}^{3}S_{8} + S_{2}S_{5}^{2}S_{8} + S_{3}^{2}S_{6}S_{8})$$

2. Equation (14) takes about 44T, where T is  $GF(2^4)$  multiplier execution time, since  $S_5^2$ ,  $S_5^4$ ,  $S_3S_4(x^2, x^4)$ , multiply operations) are simultaneous usly done.

3. So To compute  $\sigma'_{j}$  (j= 4, 3, 2, 1,0), we need approximately 220(44X5)T. If we use old method,

 $60X5T_8$  (T<sub>8</sub> is GF(2<sup>8</sup>) multiplier execution time). So New processor is much faster than old method [2,4].

## V. Step by step example to find out 4 error locations using the Processor

Let transmitted code is  $(0,0, \dots,0)$  (all zeroes) and received code is  $(\alpha^{13}, \alpha^8, \alpha^5, 1, 0, \dots, 0) \in GF(2^8)$ . Find error Loc ator Polynomial and its solutions.

<Sol>

First we get syndromes as follows.

$$S_8 = \mathbf{r}(\mathbf{x}) \mid_{x = \alpha^8} = a^{13} + a^{16} + a^{21} + a^{24} = a^{181} \in GF(2^8) = (a^{10}, a^{12}) \in GF(2^4)$$
(15)

 $S_{7} = r(x) \mid_{x = \alpha^{7}} = r_{0} + r_{1}x + r_{2}x^{2} + r_{3}x^{3} \mid_{x = \alpha^{7}} = a$   $^{13} + a^{8}a^{7} + a^{5}a^{14} + a^{21} = a^{13} + a^{15} + a^{19} + a^{21} = a^{254} \in A^{19}$   $(GF(2^{8}) = (a^{0}, a^{13}) \in GF(2^{4})$  (16)

Similarlily,

$$S_6 = \mathfrak{a}^{138} \in \operatorname{GF}(2^8) = \mathfrak{a} + \beta \mathfrak{a}^6 =$$
$$= (\mathfrak{a}, \mathfrak{a}^6) \in \operatorname{GF}(2^4)$$
(17)

$$S_{5}=0 \in GF(2^{8})= (0,0) \in GF(2^{4})=S_{4}$$

$$S_{3}=a^{109} \in GF(2^{8})=(0, a^{3}) \in GF(2^{4})$$

$$S_{2}=a^{74} \in GF(2^{8})=(a^{6}, a^{14}) \in GF(2^{4})$$

$$S_{1}=a^{39} \in GF(2^{8})=(a^{8}, a^{2}) \in GF(2^{4})$$
(18)

The fourth order characteristic matrix is [5]

$$A_{4} = \begin{pmatrix} S_{1} S_{2} S_{3} S_{4} \\ S_{2} S_{3} S_{4} S_{5} \\ S_{3} S_{4} S_{5} S_{6} \\ S_{4} S_{5} S_{6} S_{7} \end{pmatrix} = \begin{pmatrix} \alpha^{39} \alpha^{74} \alpha^{109} \\ \alpha^{74} \alpha^{109} & 0 & 0 \\ \alpha^{74} \alpha^{109} & 0 & 0 \\ \alpha^{109} & 0 & \alpha^{138} \\ 0 & 0 & \alpha^{138} \alpha^{254} \end{pmatrix}$$
(19)

From equations (19) and (3), (4), We get :

$$\begin{pmatrix} \sigma'_{4} \\ \sigma'_{3} \\ \sigma'_{2} \\ \sigma'_{1} \end{pmatrix} = \text{Adjoint of } A_{4} \cdot \begin{pmatrix} S_{5} \\ S_{6} \\ S_{7} \\ S_{8} \end{pmatrix}$$

$$= \begin{pmatrix} C_{11} C_{21} C_{31} C_{41} \\ C_{12} C_{22} C_{32} C_{42} \\ C_{13} C_{23} C_{33} C_{43} \\ C_{14} C_{24} C_{34} C_{44} \end{pmatrix} \begin{pmatrix} S_{5} \\ S_{6} \\ S_{7} \\ S_{8} \end{pmatrix}$$
(20)

Where  $C_{ij}$  is ith row jth column cofactor of A<sub>4</sub>.

표 1. C<sub>ij</sub> 값들 Table 1 Values of C

| able 1. Values of $C_i$ |
|-------------------------|
|-------------------------|

|                                    | <b>GF(2</b> <sup>8</sup> ) | GF(2 <sup>4</sup> ) |  |
|------------------------------------|----------------------------|---------------------|--|
| C <sub>11</sub>                    | $\pmb{\alpha}^{130}$       | $(a,a^3)$           |  |
| C <sub>12</sub>                    | $\pmb{\alpha}^{95}$        | $(a^2,a^2)$         |  |
| C <sub>13</sub>                    | $\pmb{\alpha}^{_{217}}$    | $(a^{14}, a^{11})$  |  |
| C <sub>14</sub>                    | $\pmb{\alpha}^{101}$       | (α <sup>3</sup> ,α) |  |
| C <sub>22</sub>                    | $\pmb{\alpha}^{101}$       | (α <sup>3</sup> ,α) |  |
| C <sub>23</sub>                    | $\pmb{a}^{182}$            | $(a^2, a^9)$        |  |
| C <sub>24</sub>                    | $\pmb{\alpha}^{66}$        | $(a^8, a^3)$        |  |
| C <sub>33</sub>                    | 0                          | (0, 0)              |  |
| C <sub>44</sub>                    | $\boldsymbol{\alpha}^{72}$ | $(a^{10},a^6)$      |  |
| * $C_{ij} = C_{ji}$ (i,j= 1,2,3,4) |                            |                     |  |

Because  $S_5$ ,  $S_4$  is zero,

 $\begin{array}{rcl} \mathbf{C}_{11} = & \mathbf{S}_{6}^{-2}\mathbf{S}_{3}, \ \mathbf{C}_{12} = & \mathbf{S}_{2}\mathbf{S}_{6}^{-2} = & \mathbf{C}_{21}, \ \mathbf{C}_{13} = & \mathbf{C}_{31} = & \mathbf{S}_{3}^{-2}\mathbf{S}_{7}, \\ \mathbf{C}_{14} = & \mathbf{C}_{41} = & \mathbf{S}_{3}^{-2}\mathbf{S}_{6}, \ \mathbf{C}_{22} = & \mathbf{S}_{6}^{-2}\mathbf{S}_{1} + & \mathbf{S}_{3}^{-2}\mathbf{S}_{7}, \\ \mathbf{C}_{23} = & \mathbf{C}_{32}, \ \mathbf{C}_{33} = & \mathbf{S}_{1} \ \mathbf{S}_{3}\mathbf{S}_{7} + & \mathbf{S}_{2}^{-2}\mathbf{S}_{7}, \\ \mathbf{S}_{13} = & \mathbf{S}_{1}^{-2}\mathbf{S}_{6}, \ \mathbf{C}_{44} = & \mathbf{S}_{3}^{-3}. \end{array}$ 

Table 1 shows the values of  $C_{ij}$  in  $GF(2^8)$  and  $GF(2^4)$ .

From equation (20), we find equ ation (21).

$$\begin{pmatrix} \sigma'_{4} \\ \sigma'_{3} \\ \sigma'_{2} \\ \sigma'_{1} \end{pmatrix}^{=} \begin{bmatrix} \alpha^{77} \\ \alpha^{149} \\ \alpha^{65} \\ \alpha^{146} \end{bmatrix}^{\in} \operatorname{GF}(2^{8})$$

$$= \begin{pmatrix} 1 & \alpha^{4} \\ \alpha^{7} & \alpha^{4} \\ \alpha^{10} & \alpha^{9} \\ \alpha^{11} & \alpha^{11} \end{pmatrix} \in \operatorname{GF}(2^{4})$$

$$(21)$$

Also det(A<sub>4</sub>) = 
$$\sum_{k=1}^{4} C_{k1} S_k = \alpha^{71}$$
  
 $\in GF(2^8) = (1, \alpha^3) \in GF(2^4)$  (22)

So error locator polynomial is :

$$det(A_4) x^{4+} \sigma_1' x^{3+} \sigma_2' x^{2+} \sigma_3' x^{+} \sigma_4' = \alpha^{71} x^{4+} \alpha^{146} x^{3+} \alpha^{65} x^{2+} \alpha^{149} x^{+} \alpha^{77} = 0 = \sigma(x)$$
(23)

Now substitute x=1, a,  $a^2$ ,  $a^3$ ,  $a^4$ ...,  $\alpha^{254}$  to (23) to find table 2.

From table 2. we see that Error locations are  $a^0$ , a,  $a^2$ ,  $a^3$ . This is correct !!.

| X                         | $\sigma(x) \in \mathbf{GF}(2^8)$ | $\sigma(x) \in \mathbf{GF}(2^4)$  |
|---------------------------|----------------------------------|-----------------------------------|
| $\boldsymbol{\alpha}^{0}$ | 0                                | (0,0)                             |
| =1                        | 0                                | (0,0)                             |
| α                         | 0                                | (0,0)                             |
| $\boldsymbol{\alpha}^2$   | 0                                | (0,0)                             |
| $\boldsymbol{\alpha}^{3}$ | 0                                | (0,0)                             |
| $\boldsymbol{\alpha}^{4}$ | $\alpha^{220}$ NonZero           | $(\alpha^4, \alpha^2)$ NonZero    |
| :                         | ÷                                | :                                 |
| $\pmb{a}^{254}$           | $(\alpha^{210})$ NonZero         | $(\alpha^5, \alpha^{13})$ NonZero |

표 2. 오류위치 추적표 Table 2. Error location finding table.

### VI. Conclusion

In this paper, we showed that by using subfield theory, No divider circuit,parallel pr ocessing, Chien search machine can be des igned in much higher speed and simpler circu it so being resulted in optimized Chien search Processor<sup>[3, 7]</sup>.

In Future, we will design the o8yptimized pro cessor to find the error value of Reed Solom on decoder using very efficient and high spe ed Galois field Divider<sup>[1]</sup>.

#### References

- 최효진, 지현우, 성원용, "낸드 플래시 메모리오류 정정을 위한 병렬 BCH 복호기의 최적설계", 제16 회 한국반도체학술대회, pp505~506, 2009
- [2] HKAn," Design Optimization of the Arit hmatic Logic Unit Circuit for the Processor to Determine the Number of Errors in the Reed Solomon Decoder,"Jour.of KICS, pp 649~654 ,2011 Nov.
- [3] Paulius Ruzgys, "Analyzing and Implementing a Reed–Solomon decoder in ADSL", 2007 June, MS Thesis,Institute ofElectronic systems, Aalborg Univ.,Denmmark.
- [4] Joschi Brauchle, Ralf Koetter; A Systematic Reed Solomon Encoder with Arbitrary Parity Positions, IEEE GLO BECOM 2009 Proceedings.
- [5] T.K. Moon, Error Correction Coding: Mathematical Methods and Algorithms, Hoboken, NJ:

John Wiley & Sons, Inc., 2005.

- [6] Hsu; Yueh-Teng, USP7984366 Efficient chien search method in reed-solomon dec oding, and recording medium, Aug.2007
- [7] M. L.Cury, A.Skjellum, H.L. Ward, "Accelerating Reed-Solomon coding in RAID systems with GPUs", in Parallel and Distributed Processing,2008,IPDPS 2008. IEEE International Symposium on,2008
- [8] Pusan Patel, "Parallel Multiplier design for Galois/Counter Mode of operation", Univ. of Wateroo Canada, Dep. of EE ,2008 MS Thesis.

- 저 자 소 개 -



안 형 근(정회원) 1979년 서울대학교 전기공학과 졸업 1981년 KAIST 전기 및 전자과 졸업 1988년 뉴욕 주립대 전기과 Ph.D 1988년~1998년 삼성전자 수석 1998년~1999년 텔슨전자 이사

2000년~현재 동명대학교 정보통신과 교수 <주관심분야: Digital System Design, LCD/OLED display,반도체>

20