## 역 셔플익스체인지 네트워크의 재정돈성

박 병 수

#### 요 약

이 논문은 멀티스테이지 역 셔플익스체인지 네트워크에서 하나의 새로운 재정돈 알고리즘을 제안한다. 대청성 멀티스테이지에 있어서 재정돈성을 위한 가장 잘 알려진 스테이지의 최저 경계는 2logN-1이다. 그러나, 지금까지 비대청성 멀티 스테이지에 있어서 재정돈성이 증명된 사실은 없다. 현재, 비대청성 멀티스테이지에 있어서 재정돈성이 증명된 사실은 없다. 현재, 비대청성 멀티스테이지에 있어서 재정돈성에 있어서 최상의 경계는 3logN-3이다. 따라서, 이 논문에서 모든 임의의 N≤16인 퍼뮤테이션에 대하여 멀티스테이지 역 셔플익스체인지 인터커넥션 네트워크의 재정돈성을 설정한다. 이러한 재정돈성은 일련의 재정돈 가능한 네트워크에 있어서 위상적 동일성을 유지하고 중간 스테이지에 하나의 스테이지를 첨가하여 그 스위치를 제안된 알고리즘을 적용하여 결정함으로서 전체적으로 감소된 크기의 네트워크을 허용하도록 설정한다. 결과적으로 이 논문은 역 셔플익스체인지 네트워크을 재정돈성에 있어서 N≤16의 경우 최상의경계 2logN을 가능하게 하고, 입력의 수가 증가하는 N>16의 경우 가능성을 보여준다.

### Rearrangeability of Reverse Shuffle/Exchange Networks

Byoungsoo Park †

#### **ABSTRACT**

This paper proposes a new rearrangeable algorithm in multistage reverse shuffle/exchange network. The best known lower bound of stages for rearrangeability in symmetric networks is  $2\log N - 1$  stages. However, it has never been proved for nonsymmetric networks before. Currently, the best upper bound for the rearrangeability of a shuffle/exchange network in nonsymmetric networks is  $3\log N - 3$  stages. We describe the rearrangeability of reverse shuffle/exchange multistage interconnection network on every arbitrary permutation with  $N \le 16$ . This rearrangeability can be established by setting one more stages in the middle stage of the network to allow the reduced network to be topological equivalent to a class of rearrangeable networks. The results in this paper enable us to establish an upper bound,  $2\log N$  stages for rearrangeable reverse shuffle/exchange network with  $N \le 16$ , and leads to the possibility of this bound when N > 16.

#### 1. Introduction

For any parallel computer systems which consist of many processing elements and memories, the communication pathways among processing elements and memories are the most important issue. A multistage interconnection network is rearrangeable if its permutation states are able to perform any one-to-one connection of input to output terminals. The Benes binary network that is extracted from the three-stage Clos networks[3] is a rearrangeable network which requires optimal hardware, as well as single-stage recirculating network such as the shuffle-exchange[15].

Most of research on the permutation networks has been focused on the rearrangeability, and their routing algorithms have been developed for the class of multistage interconnection networks.[1, 5, 7, 14]

Since a single stage multiple pass recirculation technique, in (Fig. 1), requires less hardware complexity to construct the connections to attain the capability of realizing any arbitrary permutations, the universality of shuffle/exchange networks was studied by many a researchers in order to reduce the stages that are required for any arbitrary permutations. By sorting the arbitrary data sequence, Stone[12] has proposed an algorithm with (logN)<sup>2</sup> passes from Batcher.[2] Siegel [11] also presented an algorithm that realize any arbitrary permutations on a recirculation shuffle/exchange network with 2(logN)2 passes. Parker[8] described 3 logN passes are sufficient for rearrangeability. Wu and Feng[16] showed  $3\log N - 1$  passes are sufficient. Furthermore, Huang and Tripathi[4] constructed the best known shuffle/exchange network with passes.



(Fig. 1) A reverse shuffle/exchange network

In this paper, we suggest a new rearrangeable algorithm with 2logN stages, for the reverse shuffle/ exchange interconnection network when  $N \le 16$ . This network consists of three parts, which are the first half stages  $(0, \log N - 2)$ , the middle stage  $(\log N - 1)$ , and the last half stages (log N, 2log N - 1). These are controlled by their three different control algorithms.

The switch setting in the first half stages are controlled by an algorithm that recursively partitions inputs into groups according to the fixed bit in a sequence. In order to transform an arbitrary permutation into the last half passable permutation and have the equivalence to Benes network in topology, switching elements in the middle stage between the first half stages and the last half stages can be set as follows: One stage is set with complement relationship to each switching element, the other is always set with straight connection. Thus, all switching elements in the stage are redundant. In this way, we prove that a stage in middle stage is sufficient for the rearrangeability of 2logN stages reverse shuffle/exchange networks when  $N \le 16$ . The last half is controlled by the usual bit composition method so that the order of fixed bit is opposite to that in the first half stages. Through our approach, we might have the possibility of establishing a new upper bound when it can be extended into the bigger number.

#### 2. Permutation Routing

In this paper, without loss of generality, we begin to describe conventional notations and definitions required for multistage reverse shuffle/exchange networks. Such  $N \times N$  multistage networks consist of three parameters, which are communication paths, stages, and interconnection links[11]. The logN stages of  $2\times2$ switching elements(SE's), which have straight or cross connection, are considered throughout this paper(note that, for simplicity, log<sub>2</sub>N is denoted as logN in this paper). Each stage consists of N/2 switching elements. and the link's patterns between stages depend upon which sort of networks you have, for example the data manipulator, omega network, baseline, regular SW Banyan network (S = F = 2), and so on. The stages are labeled from 0 to log N-1. That is to say, 0 is for the leftmost stage and log N-1 is for the rightmost stage, sequentially. The levels of connectivity between the stages of switching elements are labeled in a sequence from 0 to  $\log N-1$ . The SE's are also labeled from 0 to N/2-1, which 0 is for the top switching element and N/2-1 is for the bottom SE, sequentially. The link connectivity of level l of the multistage interconnection network identify the permutation or the fixed connection of reverse shuffle/exchange network and is given by:

$$\sigma_i^{-1}(b_{n-1}b_{n-2}\cdots b_i\cdots b_0) = \underbrace{b_ib_{i-1}\cdots b_0b_{n-1}\cdots b_{i+1}}_{i+1 \text{ bits}} \underbrace{b_{n-i-1} \text{ bits}}_{n-i-1 \text{ bits}} \tag{1}$$

where  $0 \le i < \log N$ .

The binary representation of a number,  $b_{n-1} \cdots b_i \cdots b_0$  is used to describe the address of this number, where bit  $b_{n-1}$  is the most significant bit(msb) and bit  $b_0$  is the least significant bit(lsb). For decomposition, a function, denotes that one group is divided into two groups:

$$\Psi\{q_{k,j}\} = \{q_{k+1,j}\} \{q_{k+1,j+2^k}\}$$
 (2)

where  $0 \le j \le 2^k$ ,  $0 \le k \le n - 1$ .

The depth of decomposition is identified to k. The group,  $q_{k,j}$ , represents the jth group of decomposition in depth k and  $2^{n-k}$  is the number of elements in the group.



(Fig. 2) The partition in the first half stages

#### 2.1 The first half stages

The routing scheme for arbitrary permutations in these stages of the reverse shuffle/exchange network is based on the connection by the recursive decomposition of input data of each stage. However, there exist so many methods for decomposition according to the following definitions. Thus, we will select one to assure no conflict in the rest of the stages, and also to assure the sorted result that we want in the final destination. The decomposition is realized by the following definitions.

Definition 2.1:All  $2^n$  (0 through  $2^{n-1}$ ) integers that are represented in  $b_{n-1}b_{n-2}\cdots b_0$  can be separated into two different groups that have  $2^{n-1}$  elements.

Assume that bit position i is fixed and the binary representations of an element in one group are always same as that of elements in the other group except bit,  $b_i$ . For example, let  $2^n$  integers be a group,  $q_{0,0}$ . Select an element from the group,  $q_{0,0}$ . If we regard a fixed bit as a *don't care bit*, the general binary representation of the all  $2^n$  integers is as follows:

$$2^{n} = b_{n-1} b_{n-2} \cdots b_{i+1} c b_{i-1} \cdots b_{0}$$
 (3)

where  $b_i$  is c (don't care bit).

When c is 0 or 1 from (2) and (3), let two groups be  $q_{1,0}$  and  $q_{1,1}$ . Then, it can be guaranteed that one of both is listed on  $q_{1,0}$ , and the other on  $q_{1,1}$ . Whenever the values of other bit positions except  $b_i$  are changed, each of the other pairs can be listed on the different group with each other. Eventually, if we apply the same scheme for the rest of them, it is guaranteed that every pair can be separated into group  $q_{1,0}$  and  $q_{1,1}$  one by one, which has  $2^{n-1}$  size of elements.

Procedure: SE's setting at the first half stages begin

While  $n-1 \ge j \ge 1$  do  $b_j := don't$  care bit;  $Upper\ output[] := x_0$ ;

```
Lower output | := y_0;
  Link\_start := x_0;
//the initialization for link_start//
  While 1 \le m \le \lfloor N/2^{i+1} \rfloor do
//i is stage//
     If (y_{m-1} = x_m \& y_{m-1} \neq Link\_start)
//checking the link connection//
     begin
      Upper output[]:=x_m;
      Lower output []:=y_m;
     end:
     Else begin
      Upper output[]:=x_m;
      Lower output[]:= y_m;
      Link\_start := x_m;
//the newly initialization for link_start//
     end;
  endwhile;
endwhile;
```

Definition 2.2: All  $2^n$  (0 through  $2^{n-1}$ ) integers that are represented into  $b_{n-1}b_{n-2}\cdots b_0$  can be decomposed into  $2^{n-1}$  groups.

end

It is shown that there exist two groups  $q_{1,0}$  and  $q_{1,1}$  in Definition 2.1. Assume that the two different groups  $q_{2,0}$  and  $q_{2,2}$  are newly generated by group  $q_{1,0}$ , and the rest of the groups  $q_{2,1}$  and  $q_{2,3}$  are also done by  $q_{1,1}$ , respectively. From Definition 1, group  $q_{1,0}$  can be divided into two different groups,  $q_{2,0}$  and  $q_{2,2}$  again. Similarly group  $q_{1,1}$  can also become two different groups,  $q_{2,1}$  and  $q_{2,3}$ . Finally, the number of groups generated by recursive method in (Fig. 2) are  $2^{n-1}$  groups such as

$$q_{n-1,0}, q_{n-1,1}, \cdots$$
, and  $q_{n-1,2^{n-1}-1}$  by (2)

The decomposition that is proper for a permutation network is able to be obtained from Definition 2.1 and 2.2. They depend upon which bit position is fixed sequentially. So, we define a decomposition method requested upon permutation network, which is a backward-decomposition that bit position is fixed in a sequence from bit  $b_{n-1}$  to bit  $b_1$ . This has the properties of Definition 2.1 and 2.2

Now, we are ready to discuss the procedure for SE's setting at the first half stages shown above. As stated before, the decomposition is realized after logN stages. The bits  $b_{n-1}$ ,  $b_{n-2}$ , ..., and  $b_1$  are fixed sequentially at every stage, for example bit  $b_{n-1}$  in stage 0, bits  $b_{n-1}$  and  $b_{n-2}$  in stage 1, bits  $b_{n-1}$ ,  $b_{n-2}$ and  $b_{n-3}$  in stage 2 are fixed stage by stage. The first SE is set straight in each block. After finding the linkage of destination address from one of two inputs of the first SE, it is capable of being connected to upper or lower output. If the other is equal to one of the destination addresses that is already decomposed, that linkage ends. One more SE at the same stage will be set straight again. Then, a new linkage begins again. This method will be continued until all SE's are set in the stage.

#### 2.2 The middle stages

Since the defined decomposition in the last section is realized according to the fixed bits sequentially, each group produced some properties. Thus, these properties will be used for eliminatic conflicts with the next stages.

Property 2.1: Assume that  $T = \{e \in q_{k, j} | e \mod 2^k\}$  for all  $q_{k, j}$ . A set T consists of  $2^{k-1}$  different elements. Property 2.2: After the decomposition is realized completely, each group for k = n - 2, and  $\alpha = k + 1$  has

$$q_{\alpha,\beta}^{u_0} \oplus q_{\alpha,\beta+2^{r-1}}^{u_0} = 1, \qquad 0 \le \beta \le 2^{\alpha-1} - 1$$
 (4)

$$q_{\alpha,\beta}^{u_0} \oplus q_{\alpha,\beta}^{l_0} = 1, \qquad 0 \le \beta \le 2^{\alpha} - 1 \tag{5}$$

where  $u_i$  and  $l_i$  mean the ith bit of upper and lower connections, respectively.

From property 2.1 and 2.2, we can observe the fact that an *lsb* in every group is always complemented with the other *lsb*. Therefore, we have a definition for the next theorem as follows:

Definition 2.3: There exist two combinations between  $q_{\alpha,\beta}$  and  $q_{\alpha,\beta+2^{-1}}$  such that:

$$\begin{cases} don't \ care \ group: q_{\alpha,\beta}^{u_1} \oplus q_{\alpha,\beta}^{l_1} = 0, \ q_{\alpha,\beta+2^{s-1}}^{u_1} \oplus q_{\alpha,\beta+2^{s-1}}^{l_0} = 0 \\ no \ don't \ care \ group: otherwise \end{cases}$$
(6)

where  $0 \le \beta \le 2^{\alpha-1} - 1$ .

Theorem 2.1: The SE's setting in the  $(\log N - 1)$ th stage of reverse shuffle/exchange can guarantee the conditions as follows:

$$q_{a,\beta}^{u_0} \oplus q_{a,\beta+2^{-2}}^{u_0} = q_{a,\beta}^{u_1} \oplus q_{a,\beta+2^{-1}}^{u_1} = 1$$

$$q_{a,\beta+2^{-2}}^{u_0} \oplus q_{a,\beta+3\cdot2^{-2}}^{u_1} = q_{a,\beta+2^{-1}}^{u_1} \oplus q_{a,\beta+3\cdot2^{-2}}^{u_2} = 1.$$

where k=n-2,  $\alpha=k+1$ , and  $0 \le \beta \le 1$ .



(Fig. 3) The complemented SE's loop for  $\alpha = k + 1$ , and  $0 \le \beta \le 1$ 

**Proof**: For simplicity, we will only argue the result for  $\beta = 0$ . The proof for  $\beta = 1$  is identical. We can consider four cases that satisfy all conditions in (Fig. 3).

Case i:link\_a and link\_c are don't care group.

It is obvious that  $link\_a$  and  $link\_c$  always hold (6) from Definition 2.3. This means  $q_{\alpha,\beta}^{\mu_1} \oplus q_{\alpha,\beta+2^{-1}}^{\mu_1} = 1$  and  $q_{\alpha,\beta+2^{-1}}^{\mu_1} \oplus q_{\alpha,\beta+3\cdot2^{-1}}^{\mu_1} = 1$ . Therefore, the connection of SE1 and SE3 can be fixed independently such that:

$$q_{\alpha,\beta}^{u_0} \oplus q_{\alpha,\beta+2^{n-2}}^{u_0} = 1$$
 and  $q_{\alpha,\beta+2^{n-1}}^{u_0} \oplus q_{\alpha,\beta+3\cdot2^{n-2}}^{u_0} = 1$ .

Case ii: Only link\_a is don't care group.

It always says  $q_{\alpha,\beta}^{u_1} \oplus q_{\alpha,\beta+2^{-1}}^{u_1} = 1$ . In order to satisfy  $q_{\alpha,\beta}^{u_0} \oplus q_{\alpha,\beta+2^{-1}}^{u_1} = 1$ , the connection of SE1 can be decided. According to this sequence, the connections of SE3 and SE2 are also able to fixed to hold  $q_{\alpha,\beta+2^{-1}}^{u_1} \oplus q_{\alpha,\beta+3\cdot2^{-1}}^{u_1} = 1$  and  $q_{\alpha,\beta+2^{-1}}^{u_2} \oplus q_{\alpha,\beta+3\cdot2^{-1}}^{u_2} = 1$ , respectively. Since  $link_a$  satisfy don't care group (6), it is not necessary to change the connection of SE0.

Case iii: Only link\_c is don't care group.

If  $link\_c$  is instead of  $link\_a$ , the result is the same as the situation of Case ii

Case iv: There is no don't care group.

By property 2.1 and 2.2, we obtain:

$$q_{\alpha,\beta}^{u_1} \oplus q_{\alpha,\beta+2^{\alpha-1}}^{u_1} = 0 \tag{7}$$

Therefore, SE2 can be set to cross connection. It means  $q_{\alpha,\beta}^{u_0} \oplus q_{\alpha,\beta+2^{-1}}^{u_0} = 0$ . Next, assume that  $q_{\alpha,\beta}^{u_0} \oplus q_{\alpha,\beta+2^{-2}}^{u_0} = 1$ . Then, we also get:

$$q_{\alpha,\beta+2^{s-1}}^{n_0} \oplus q_{\alpha,\beta+2^{s-1}}^{n_0} = 1$$
 and  $q_{\alpha,\beta+2^{s-1}}^{n_1} \oplus q_{\alpha,\beta+2^{s-1}}^{n_1} = 1$ 

Lastly, in order to check the connection of SE3, let  $q_{\alpha,\beta+2^{-2}}^{u_{\alpha}} \oplus q_{\alpha,\beta+3\cdot2^{-1}}^{u_{\alpha}}$  be 1. Then, we have:

$$q_{\alpha,\beta+2^{n-1}}^{u_1} \oplus q_{\alpha,\beta+3\cdot2^{n-2}}^{u_1} = 1$$
 (8)

From (8), we also get:

$$q_{\alpha,\beta+2^{-1}}^{u_1} \oplus q_{\alpha,\beta+3\cdot2^{-2}}^{u_1} = 1$$

Four cases complete this proof.

Property 2.3: For  $k = \log N - 2$ ,  $\alpha = k + 1$  and  $0 \le \beta \le 2^{\alpha - 1}$  after SE's setting in the logNth stage, the two *lsb's* in the outputs of each SE's consist of:

$$q_{\alpha,2\beta}^{m_1m_2} = \{00, 01, 10, 11\}, \text{ and } q_{\alpha,2\beta+1}^{m_1m_2} = \{00, 01, 10, 11\}$$

The following procedure only illustrates for the first stage in the middle stages and all SE's of the second stages are set to straight connections. Procedure: SE's setting for the middle stages. begin

```
While 0 \le \beta \le 2^{\alpha-1} - 1 do
    //u_i and l_i are the ith bit in upper and lower input
of the same group q_{\alpha, \beta}//
        If(q_{\alpha,\beta}^{u_1} \oplus q_{\alpha,\beta}^{l_1} = 0 \& q_{\alpha,\beta+2^{n-1}}^{u_1} \oplus q_{\alpha,\beta+2^{n-1}}^{l_1} = 0)
             q_{\alpha, \beta} = don't \ care \ group;
             q_{\alpha, \beta+2^{n-1}} = don't \ care \ group;
    endwhile
    While 0 \le \beta \le 2^{\alpha-1} - 1 do
        If (q_{\alpha,\beta}^{u_1} \oplus q_{\alpha,\beta+2^{\alpha-1}}^{u_1} = 0 \& \text{ no don't care group})
             q_{\alpha,\beta+2^{n-1}}^n \longleftrightarrow q_{\alpha,\beta+2^{n-1}}^l;
    //exchange upper into lower input//
    endwhile
    While 0 \le \beta \le 2^{\alpha-2} - 1 do
        If (q_{a,b}^{u_0} \oplus q_{a,b+2^{a-2}}^{u_0} = 0)
            If (don't care group exist)
                 exchange in don't care group SE;
            Else begin
                q_{\alpha,\beta+2^{n-2}}^* \leftrightarrow q_{\alpha,\beta+2^{n-2}}^l;
                q_{\alpha,\beta+3\cdot2^{n-2}}^u \longleftrightarrow q_{\alpha,\beta+3\cdot2^{n-2}}^l;
   endwhile;
end
```

#### 2.3 The last half stages

Once a valid position adjustment in the middle stages is obtained, actual routing of the permutation on the reverse shuffle/exchange network is performed in these stages. Now, in order to accomplish the final destination of arbitrary inputs we can take opposite direction to the decomposition that is done in the first half stages, which is called composition. Like the definition of decomposition, it is a forward-composition that controls bit at each stage fixed in a sequence from bit  $b_0$  to bit  $b_{n-1}$ .

Definition 2.4: A connection can be defined according to control bit as follows:

$$\begin{cases} con\_bit = 0; \text{ upper output} \\ con\_bit = 1; \text{ lower output} \end{cases}$$
 (9)

Definition 2.4 is the result according to properties of Definition 2.1 and 2.2. If two inputs of each switching element always consist of 0 and 1 in con\_bit of (9), it is possible to realize a permutation without conflict. We now prove that the above definition of routing scheme is correct with no conflicts in this network.



(Fig. 4) The links between the middle stage and the last half stages for no conflicts

Theorem 2.2: There are no conflicts at the last half stages iff

$$q_{\alpha,\beta}^{u_0} \oplus q_{\alpha,\beta+2^{s-1}}^{u_0} = q_{\alpha,\beta}^{u_1} \oplus q_{\alpha,\beta+2^{s-1}}^{u_1} = 1$$
  
 $q_{\alpha,\beta+2^{s-1}}^{u_0} \oplus q_{\alpha,\beta+3\cdot2^{s-2}}^{u_0} = q_{\alpha,\beta+2^{s-1}}^{u_1} \oplus q_{\alpha,\beta+3\cdot2^{s-1}}^{u_1} = 1.$ 

where  $k = \log N - 2$ ,  $\alpha = k + 1$ , and  $0 \le \beta \le 1$ .

Proof: It is extracted from the (Fig. 4). The inputs of SE0 (switching element 0) at stage 5 connected with the upper outputs of SE0 and SE2 at stage 3 according to reverse shuffle/exchange link, (1). Because of  $q_{\alpha,\beta}^{\text{ne}} \oplus q_{\alpha,\beta+2^{n-2}}^{\text{ne}} = 1 \ (\alpha = 3, \beta = 0)$  these bits are complemented with each other. It means there is no conflict SEO at stage 5. The other SE's at stage 5 can consist of inputs with conflict-free connections using to the same choice for bits. Thus, it is possible to realize the permutation with conflict free at stage 5. Next, the upper outputs of SE0 and SE4 at stage 3 are linked to the inputs of SE0 at stage 6 according to link function (1). Because of  $q_{\alpha,\beta}^{\mu_1} \oplus q_{\alpha,\beta+2}^{\mu_1} = 1 \ (\alpha = 3, \beta = 0)$ and property 3, one input is 0 in an SEO of stage 6, the other is 1. It means there is no conflict SE0 at stage 6. The other SE's at stage 6 is able to be composed of inputs with conflict free by assumption. Thus, it is possible to realize the permutation with conflict free at stage 6 also. Therefore, no conflicts can exist in any of the last half stages.



(Fig. 5) The cases of no conflict for next stage.

It is shown in (Fig. 4) that the SE's setting in the middle stages can guarantee no conflicts on the permutation in the last half stages. An example is shown in (Fig. 5). For the assumption of no conflicts at stage i, there are two cases without conflict, such as  $\{(00, 11), (10,01)\}, \{(00, 01), (10, 11)\}$ . As stated out in the earlier literature[6][7][15], the bit control can be easily obtained by simple bit operations. Since it can identify all permutations with no conflicts, such similar algorithms have been used in multistage interconnection networks. The routing permutation procedure at the last half stage is as follows:

# **Procedure**: SE's setting at the last stages begin

While  $0 \le i < n$  do

If  $con\_bit_i == 0$ goto upper output;

Else

goto lower output;

endwhile;

#### 3. Example for Permutation

In order to demonstrate the result of the routing algorithm in the last section, an example is shown in the next three steps. Let's take the arbitrary permutation p:

$$p = \begin{pmatrix} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 \\ 6 & 3 & 5 & 1 & 8 & 11 & 2 & 12 & 9 & 14 & 7 & 13 & 0 & 10 & 15 & 4 \end{pmatrix}.$$

 $\langle \text{Table 1} \rangle$  The SE's setting in the first half stages for N = 16

|              | <b>Q(1)</b>     | ICD:            | ***              | 140             |
|--------------|-----------------|-----------------|------------------|-----------------|
| upper        | 6,c110          | 5,c101          | 11, <b>c</b> 011 | 2, <b>c</b> 010 |
| lower        | 3, <b>c</b> 011 | 1, <b>c</b> 001 | 8, <b>c</b> 000  | 12,c100         |
| stg.0        | 0               | 0               | 1                | 0               |
| - <b>40%</b> | 40              | . 10            | . at             | 7,10            |
| upper        | 9 <b>,c0</b> 01 | 7, <b>c</b> 111 | 0, <b>c000</b>   | 4,c100          |
| lower        | 14,c110         | 13,c101         | 10, <b>c</b> 010 | 15,c111         |
| stg.0        | 0               | 0               | 0                | 1               |

| ar.   | (1)             | (0)     | 20               | 物       |
|-------|-----------------|---------|------------------|---------|
| иррег | 6,cc10          | 11,0011 | 9,cc01           | 0,cc00  |
| lower | 5, <b>cc</b> 01 | 2,cc10  | 7,cc11           | 4,cc00  |
| stg.1 | 0               | 0       | 0                | 0       |
| 466   | <b>(W</b> )     |         |                  | 7(3)    |
| upper | 3,cc11          | 8,cc00  | 13, <b>cc</b> 01 | 10,cc10 |
| lower | 1,cc01          | 12,cc00 | 14,cc10          | 15,cc11 |
| stg.1 | 0               | 0       | . 1              | 0       |

| We'r     | w)              | i i i i i i i i i i i i i i i i i i i | TTEN 3   | 730     |
|----------|-----------------|---------------------------------------|----------|---------|
| upper    | 6,ccc0          | 9,0001                                | 3,eec1   | 10,ccc0 |
| lower    | 11,0001         | 0,000                                 | 8,ccc0   | 13,ccc1 |
| stg.2    | 0               | 0                                     | 0        | 1       |
| <b>M</b> | 40)             | <i>1</i> (1)                          | <b>a</b> | 100     |
| upper    | 5, <b>cc</b> c1 | 4,ccc0                                | 1,cccl   | 14,000  |
| lower    | 2,ccc0          | 7,ccc1                                | 12,ccc0  | 15,ccc1 |
| stg.2    | 0               | 1                                     | 0        | 0       |

\*c is don't care bit

Step 1: In the first half stages.

Assume there are arbitrary inputs at the first stage like above. The  $\langle \text{Table 1} \rangle$  shows an example of a decomposition for SE's setting at the first half stages and their linkages according to arbitrary inputs at each stage. For stage 0, the switches can be set such that both upper and lower outputs are composed of (c000, c001, c010, c011, c100, c101, c110, and c111) except bit  $b_3$ , where c is don't care bit. For stage 1, each block can be decomposed independently. Similarly they also consist of (cc00, cc01, cc10, and cc11) except bits  $b_3$  and  $b_2$ . At last, for stage 2, there are eight subblocks such as (ccc0 and ccc1) except bits  $b_3$ ,  $b_2$  and  $b_1$ . At each stage of  $\langle \text{Table 1} \rangle$ , 0 and 1 mean straight and cross connection, respectively

Step 2: In the middle stages.

From the proved Theorem 1 and procedure, the SE's setting at stage 3 is shown in  $\langle \text{Table 2} \rangle$ .

 $\langle \text{Table 2} \rangle$  The SE's setting in the middle stages for N=16

| W.    | 0       | 100              |                 | 1       |
|-------|---------|------------------|-----------------|---------|
| upper | 6,cc10  | 3, <b>cc</b> 11  | 5, <b>cc</b> 01 | 1,cc01  |
| lower | 9,cc01  | 10,cc10          | 4,cc00          | 14,cc10 |
| stg.2 | 0       | 1                | 0               | 0       |
| SE's  | 4       | 5(0)             | 660             | y . 7   |
| upper | 11,0011 | 8,cc00           | 2, <b>cc</b> 10 | 12,cc00 |
| lower | 0,cc00  | 13, <b>cc</b> 01 | 7,ce11          | 15,cc11 |
| stg.2 | 1       | 0                | 1               | 1       |

\* d is don't care group

Step 3: In the last half stages.

Once all permutations from the middle stage are ready for no conflict, the control bit position can be fixed from bit  $b_0$  of stage 5 to bit  $b_3$  of stage 8 sequentially. In  $\langle \text{Table 3} \rangle$ , the connection can be shown by *con-bit* (0 is upper output, and 1 is lower output).

Eventually, the nine stages reverse shuffle/exchange networks in (Fig. 6) for N=16 shows the routing of the example arbitrary permutation p that is obtained from the result of (Table 1), 2 and 3.

 $\langle$ Table 3 $\rangle$  The SE's setting in the last half stages for N=16

| : S8's :    | 0 | 1 | nceil <b>y</b> arise |   |
|-------------|---|---|----------------------|---|
| stg.4       | 0 | 0 | 1                    | 1 |
| stg.5       | 1 | 0 | 1                    | 1 |
| stg.6       | 0 | 0 | 1                    | 0 |
| stg.7       | 0 | 1 | 0                    | 1 |
| <b>96</b> % | 4 |   | <b>.6.</b>           | 7 |
| stg.4       | 0 | 0 | 1                    | 1 |
| stg.5       | 0 | 0 | 0                    | 1 |
| stg.6       | 1 | 0 | 1                    | 1 |
| stg.7       | 0 | 0 | 0                    | 0 |

#### 4. Conclusions

This paper presents a new routing permutation for rearrangeability on the reverse shuffle/exchange multistage interconnection network. It is the same upper bound as  $3\log N - 3$  stages[5] for N = 16. However, the algorithm for arbitrary permutations on the network is totally different. In order to satisfy the symmetric structure of theoretical the lower bound for the rearrangeability it is necessary to emulate it through the middle stages. Thus, a simple method to understand the rearrangeability with  $2\log N$  stages on reverse shuffle/exchange network ( $N \le 16$ ) can be realized. This result gives us the fact that we are capable of establishing the possibility of new upper bound for N > 16. We are currently studying the problem of proving that eleven stages are sufficient for the rearrangeabi-



(Fig. 6) The example of routing permutation

lity of a reverse shuffle/exchange network. This work will improve the upper bound for N>16 if it is proven.

#### Reference

- [1] S. Andresen, "The looping algorithm extended to base 2<sup>t</sup> rearrangeable switching networks," *IEEE Trans. Commun.*, vol. COM-25, no. 10, pp. 1507-1063, Oct. 1977.
- [2] K. E. Batcher, "Sorting networks and their applications," Proc. AFIPS Spring Joint Comput. Conf., vol. 32, pp. 307-314, 1968.
- [3] C. Clos, "A study of non-blocking switching networks," Bell. Syst. Tech. J., vol. x32, pp. 406-424, Mar. 1953.
- [4] S. T. Huang and S. K. Tripathi, "Finite state model and compatibility theory: New analysis tools for permutation networks," *IEEE Trans. Comput.*, vol. C-35, pp. 12-27, July 1986.
- [5] K. Y. Lee, "On the rearrangeability of a (2logN-1) stage permutation network," *IEEE Trans. Comput.*, vol. C-34, pp. 412-425, May 1985.
- [6] B. L. Menezes and U. Bakhru, "New bounds on the reliability of Augmented shuffle-Exchange networks," *IEEE Trans. Comput.*, vol. 44, No. 1, pp. 123-129, Jan. 1995.
- [7] D. C. Opferman and N. T. Tsao-Wu, "On a class of rearrangeable switching networks, Part I: Control algorithm," *Bell. Syst. Tech. J.*, vol. 50, pp. 1579-1600, 1971.
- [8] D. S. Paker, "Notes on shuffle/exchange-type switching networks," IEEE Trans. Comput., vol. C-29, pp. 213-222, Mar. 1980.
- [9] C. S. Raghavendra, and A. Varma, "Rearrange-ability of the 5-stage shuffle/exchange network for N=8," IEEE Trans. Commun., vol. COM-35, pp. 808-812, Aug. 1987.
- [10] D. J. Shyy and C. T. Lea, "Rearrangeable non-blocking networks," *IEEE Trans. Commun.*, vol. 42, No. 5, pp. 2084-2086, May 1994.
- [11] H. J. Siegel, "The universality of the shuffle-ex-

- change type permutation networks," in *Proc, 10th Ann. Symp. Comp. Architecture*, pp. 70-79, Mar. 1977.
- [12] H. S. Stone, "Parallel processing with the perfect shuffle," *IEEE Trans. Comput.*, vol. C-20, pp. 153-161, Feb. 1971.
- [13] A. Varma, and C. S. Raghavendra, "Rearrangeability of multistage shuffle/exchange network," *IEEE Trans. Commun.*, vol. 36, No. 10, pp. 1138-1147, Oct. 1988.
- [14] A. Waksman, "A permutation network," J. Ass. Comput. Mach., vol. 15, pp. 159-163, Jan. 1968.
- [15] C. L. Wu and T. Feng, "The universality of the shuffle-exchange network," *IEEE Trans. Comput.*, vol. C-30, pp. 324-332, May 1981.



박 병 수

1986년 한양대학교 전자공학과 졸업

1989년 한양대학교 대학원 전자 공학과 졸업(공학석사)

1994년 Texas A&M Univ., Dept. of Electrical Eng. (공학 박사)

1994년~1995년 현대전자(주) 선임연구원 1995년~현재 상명대학교 정보과학과 조교수 관심분야: 인터커넥션 알고리즘, 성능 분석, ATM 스 위치