# EFFICIENT BIST SCHEMES FOR RNS DATAPATHS

D. G. Nikolos<sup>1</sup>, D. Nikolos<sup>1,2</sup>, H. T. Vergos<sup>1,2</sup>, and C. Efstathiou<sup>3</sup>

<sup>1</sup>Computer Engineering and Informatics Dept., University of Patras, 26500 Patras, Greece <sup>2</sup>Computer Technology Institute, 3 Kolokotroni Str., 26221 Patras, Greece <sup>3</sup>Informatics Dept., TEI of Athens, Ag. Spyridonos St., 12210 Egaleo, Athens, Greece.

# ABSTRACT

It has recently been shown that accumulators can be used efficiently for test pattern generation as well as for test response compaction. In this paper we present a BIST scheme for accumulators where the accumulator is simultaneously used as a test pattern generator and a response compactor during its own testing. We also show that the proposed BIST scheme is especially suitable for accumulator, adder and multiplier-accumulator RNS channels leading to minimal hardware overhead and short test sequences

# 1. INTRODUCTION

Built-In Self-Test (BIST) [1] is an effective approach for testing contemporary integrated circuits (ICs) that reduces the need for external test, since the circuit and its tester are implemented in the same chip, enabling the circuit to test itself. The main components of a BIST scheme are: the Test Pattern Generator (TPG) that produces the test patterns applied to the Circuit Under Test (CUT), the Test Response Compactor (TRC) that compacts the responses of the CUT to a single pattern called signature and the BIST controller which generates the necessary control signals and compares the signature of the CUT with the expected one. Minimal test applications. Also, complete (100%) fault coverage is often desirable. In this paper, we present an efficient BIST scheme for Residue Number System (RNS) accumulator, adder and multiplieraccumulator channels.

The RNS has been widely investigated and used in Digital Signal Processing (DSP) applications [2][3]. A set of L moduli, suppose  $(m_1, m_2, \ldots, m_L)$ , that are pair-wise relative prime is used to define a RNS. Any integer X, with  $0 \le X < M$ , where  $M = m_1 \times m_2 \ldots \times m_L$ , has a unique representation in the RNS given by the L-tuple of residues  $X = (x_1, x_2, \ldots, x_L)$ , where  $X_i = X \mod m_i$ . A two operand RNS operation, suppose \*, is defined as  $(z_1, z_2, \ldots, z_L) = (x_1, x_2, \ldots, x_L) * (y_1, y_2, \ldots, y_L)$ , where  $z_i = (x_i * y_i) \mod m_i$ . In most RNS applications \* is either addition, subtraction or multiplication. According to the above, each residue of the result can be computed independently of the others allowing fast data processing in L parallel independent channels.

The latency of an RNS operation depends on the latency of the slowest among the channels. The delay of an adder modulo mis greater when  $m \neq 2^n$ . Efficient designs for the RNS channels have been recently proposed in [4] and [5] for m of the form  $2^n - 1$ . The authors of [4] propose parallel-prefix design methodologies for modulo  $2^n - 1$  adders that lead to implementations that can operate as fast as modulo  $2^n$  adders. Modulo  $2^n - 1$  modified Booth multipliers that can operate as fast as the corresponding integer multipliers were introduced in [5]. According to the above, RNS channels based on moduli of the form  $2^n, 2^n - 1, 2^k - 1, 2^l - 1, \ldots$ , with l < k < n, lead to the fastest implementations. In the most cases three channels are adequate. For this reason, we focus to RNS with moduli of the  $(2^n, 2^n - 1, 2^{n-1} - 1)$  form.

The already known BIST schemes which can be applied to an accumulator impose unacceptably large hardware overhead. Figures 1 (a) and (b) respectively present an accumulator and a BIST scheme for it, according to which test patterns are generated by two LFSRs and test response compaction takes place in the MISR. The hardware overhead of this scheme consists of the hardware required for registers A and C to function respectively as an LFSR and MISR in test mode, and the hardware for the implementations of LFSR B and the multiplexer MUX. As we will show in the last section this hardware overhead is too large. It has been shown that a MISR can also be used for test pattern generation. In the BIST scheme of figure 2 (a), one part of a test vector applied to the adder is generated by LFSR A and the other by MISR C [6]. The hardware overhead of this scheme is significantly lower than the previous one but still high. The deterministic BIST scheme proposed in [7] is shown in figure 2 (b). Although suitable for RNS adders, it leads to high hardware overhead in the accumulator case.



Figure 1: (a) Typical Accumulator (b) LFSR Based BIST scheme

Because of the high integration capabilities of the current manufacturing technologies the misconception that hardware is cheap has been spread. Given that 20% increase of the chip area implies a 50% increase of cost [8], we perceive that the hardware overhead reduction remains a significant target. To this end, we will give minimal hardware overhead BIST schemes for accumulator, adder and multiplier-accumulator RNS channels.

It was shown that an accumulator can be used efficiently as a Test Pattern Generator (TPG) [9] as well as a Test Response Compactor (TRC) [10][11]. The two BIST schemes we present in this paper, are based on using the accumulator simultaneously as a test pattern generator and a response compactor during its own testing. The first of the proposed schemes has the minimal hardware overhead and offers 100% fault coverage in nearly all cases. The second one sacrificing slightly larger hardware achieves complete fault coverage with shorter test sequences than the first scheme.



Figure 2: (a) MISR based BIST scheme (b) Deterministic BIST

## 2. LOW HARDWARE OVERHEAD BIST SCHEMES

Figure 3 presents the BIST scheme with the minimal hardware overhead. We will call this the Minimal Hardware Overhead (MinHO) BIST scheme.



Figure 3: MinHO BIST scheme for accumulators

In test mode, at every clock cycle, the adder receives a test vector consisting of two parts; part A is the output of an LFSR while part B is the sum of all the previously received input vectors. At the end of testing register C contains the signature. We have to consider :

- the maximum length of the test sequence generated at the inputs of the adder
- the pre-compaction fault coverage that can be achieved
- the post compaction fault coverage (PCFC).

## 2.1. Test Sequence Length

We consider that the LFSR is characterized by a primitive polynomial of degree n. It is well known that

$$||x+y|_{m} + z|_{m} = |x+y+z|_{m}, \qquad (1)$$

where  $|a|_m = a \mod m$ .

Let  $A_i$  and  $B_i$  denote the LFSR state and the contents of register C at clock cycle *i* respectively. Since the LFSR implements a primitive polynomial, at the first  $2^n - 1$  clock cycles it will generate all distinct non-zero  $2^n - 1$  states. Therefore, the first  $2^n - 1$ input vectors of the adder will be distinct, irrespectively of whether the adder is modulo  $2^n$  or modulo  $2^n - 1$ .

The content of register C at clock cycle k is  $B_k = \left| \sum_{i=1}^{k-1} A_i \right|_m$ , where m is  $2^n$  or  $2^n - 1$  depending on the RNS channel examined.

Considering the  $2^n - 1$  distinct LFSR states as decimal numbers we get the first  $2^n - 1$  terms of an arithmetic series with common difference equal to one. Taking into account the sum of the

first  $2^n - 1$  terms of an arithmetic series with common difference one, as well as the relation (1), we get

$$\mathsf{B}_{2^{n}} = \left| \frac{(1+2^{n}-1)(2^{n}-1)}{2} \right|_{m} = \left| 2^{n-1}(2^{n}-1) \right|_{m}.$$
 (2)

At first we will study a modulo  $2^n$  adder. Since

$$2^{n-1}(2^n - 1) = 2^{2n-1} - 2^{n-1} = 2^{n-1} \cdot 2^n - 2 \cdot 2^{n-1} + 2^{n-1}$$
$$= 2^n(2^{n-1} - 1) + 2^{n-1},$$

from (2) we get that  $B_{2^n} = |2^n(2^{n-1}-1) + 2^{n-1}|_{2^n} = |2^{n-1}|_{2^n}$ . Therefore,  $B_{2^n} \neq 0$ . Since  $B_1 = 0$  we conclude that the first and the  $2^n$ th input vectors are distinct. For  $1 \leq k \leq 2^n - 1$ , we know that,

$$A_{2^{n}-1+k} = A_{k},$$

$$B_{k} = \left| \sum_{i=1}^{k} A_{i} \right|_{2^{n}},$$

$$B_{2^{n}-1+k} = \left| 2^{n-1} + \left| \sum_{i=1}^{k} A_{i} \right|_{2^{n}} \right|_{2^{n}}$$

and  $B_1 = 0$ . Therefore for  $k < 2^n - 1$  we get  $B_{2^n-1+k} \neq B_k$ and for  $k = 2^n - 1$  we get,

$$\left|\sum_{i=1}^{k} A_{i}\right|_{2^{n}} = \left|\frac{(1+2^{n}-1)(2^{n}-1)}{2}\right|_{2^{n}} = \left|2^{n-1}\right|_{2^{n}}.$$
 (3)

Hence,  $B_{2^{n+1}-2} = |2^{n-1} + 2^{n-1}|_{2^n} = |2^n|_{2^n} = 0 = B_1$ . Since  $A_{2^{n+1}-1} = A_1$  we conclude that the modulo  $2^n$  adder at clock cycle  $2^{n+1} - 1$  will receive a test vector which it was already received. Considering the circuit of figure 3 as an FSM with its internal state determined by the contents of LFSR A and Register C we can see that each state explicitly defines the next one. Therefore, the machine enters a loop and it will never reach a state that has not been reached before. We can see that only a small subset of all possible test vectors can be applied to the modulo  $2^n$  adder.

Now we will study the modulo  $2^n - 1$  adders, that is,  $m = 2^n - 1$ . Relation (2) implies  $B_{2^n} = |2^{n-1}(2^n - 1)|_{2^{n-1}}$ . In 1s complement arithmetic zero has two representations, the all 0s and the all 1s representation. Since the all 0s representation is produced only when all 0s vectors are added, the  $B_{2^n}$  has the all 1s representation. Hence, although  $B_{2^n}$  and  $B_1$  have the same value they are two different vectors.  $B_{2^n+1}$  and  $B_2$  are identical, the same is valid for  $A_2$  and  $A_{2^n+1}$ , hence the modulo  $2^n - 1$  adder receives  $2^n$  different vectors. As we can see a small subset of all possible test vectors can be applied to the modulo  $2^n - 1$  adder.

#### 2.2. Pre-compaction Fault Coverage

For determining the pre-compaction fault coverage, for the modulo  $2^n$  channel we simulated parallel-prefix Ladner-Fischer adders, while for a modulo  $2^n - 1$  and  $2^{n-1} - 1$  we simulated the parallel prefix adders of [4].

We found out that the attained fault coverage and the test sequence lengths depend heavily on the chosen primitive polynomial. Polynomials with many terms give better results than polynomials with a small number of terms. Polynomials with terms spread over the range of the polynomial give also better results than polynomials with the same number of terms concentrated in a region.

#### 2.3. Post-Compaction Fault Coverage (PCFC)

With extensive simulations we found out that the PCFC is near to 100% in all cases except the modulo  $2^n$  adders. However, the required test sequences are significantly longer than the test sequences of the LFSR or MISR based BIST schemes.

Modulo  $2^n$  adders are known to suffer more from aliasing compared to modulo  $2^n - 1$  adders or stored carry bit adders [10][11]. The stored carry bit adder is a modulo  $2^n$  adder with its carry output added as a carry input during the next clock cycle. It is easy to see that the stored carry bit adder receives  $2^n + 1$  different input patterns. Using the stored carry bit adder the achieved post-compaction fault coverage becomes 100%.

In the case of the modulo  $2^8 - 1$  and modulo  $2^7 - 1$  adders from Table 1 we can see that a test sequence longer than the maximum test sequence (256 and 127 vectors respectively) is required in order to achieve a fault coverage approaching 100%. This could not happen in a case that the test response compactor is a circuit distinct from the CUT. However in our case the CUT is also used as a test pattern generator and response compactor. Hence, the appearance of an error due to a fault, at the output of the adder influences the test vectors that will be applied to the adder during the next clock cycles.

The PCFC of the MinHO BIST scheme is given in Table 1. We can see that the required test sequences are significantly longer than the test sequences of the LFSR or MISR based BIST schemes.



Figure 4: Partially Inverted Feedback BIST scheme for accumulators

In figure 4 we present a BIST scheme, hereafter called Partially Inverted Feedback (PIF) BIST scheme. PIF requires slightly larger hardware overhead than the MinHO BIST scheme as we will show in the next section, but it achieves complete fault coverage with short test sequences.

In the PIF BIST scheme we can see that in test mode some of the bits of Register C are inverted before they are applied as a part of a test vector to the adder in test mode. The selection of the bits to be inverted is a design parameter. In our experiments this selection was made in a random manner.

The application of MinHO as well as the PIF BIST scheme to a MAC is evident. Figure 5 presents the way that PIF BIST scheme can be used in the case of an adder. Removing from the block diagram of figure 5 the block "INV module" we get the application of the MinHO BIST scheme. The INV modules has NOT gates instead of XOR gates.



Figure 5: Partially inverted feedback BIST scheme for adders

## 3. EVALUATION AND COMPARISONS

In this section we compare the MinHO and PIF BIST schemes against the LFSR and MISR based BIST schemes as well as against the embedded BIST scheme.

We examined parallel prefix modulo  $2^n$ , modulo  $2^n - 1$  and modulo  $2^{n-1} - 1$  accumulator (ACC), adder and multiplier- accumulator channels. Modified Booth multipliers with Wallace tree partial product reduction were assumed in every case [5]. For the modulo  $2^n$  parallel prefix adders Ladner-Fischer (LF) carry computation unit was used. The modulo  $2^n - 1$  and modulo  $2^{n-1} - 1$ adders have been designed according to [4]. We have examined the n = 8 and n = 16 cases.

For our comparisons, we use as metrics:

- the test application time in number of test vectors
- the post-compaction fault coverage attained and
- · the area overhead imposed by each BIST scheme

The procedure we followed for our experiments is the following. For every circuit and every technique 40 LFSR seeds were randomly generated. In the case of the PIF BIST scheme technique, 40 combinations of seeds and position of bits to be inverted were randomly generated. For each of these cases we used 4 LFSR primitive polynomials. For the 160 different cases that we produced, the sequence applied to the circuit was generated, and pre-compaction fault coverage was calculated. For the best 4 attained results out of the 160 simulated pre-compaction results, the post-compaction fault coverage was calculated. The best post compaction result out of these 4 is shown in Table 1. Each entry in Table 1, is an ordered triplet. The first element is for the modulo  $2^n - 1$ and modulo  $2^{n-1} - 1$  channel respectively.

From Table 1 we can see that the LFSR based BIST scheme achieves 100% PCFC with the shortest test vector sequences while the MISR based BIST scheme, as well as the PIF achieve 100% PCFC in almost all cases with slightly longer test sequences. The MinHO BIST scheme in most cases achieves 100% PCFC but with significantly longer test vector sequences.

A RNS accumulator, adder or MAC is usually embedded in a larger circuit, hence its inputs and outputs are not accessible by the primary inputs and outputs of the chip. For applying a test set in such embedded circuits one usually relies on scan paths. Therefore, for evaluating the hardware overheads imposed by the different BIST schemes we use as a basis a scheme in which the flip-flops of the input and output registers of the RNS channel are chained together in a single scan path.

For getting realistic measures of the area overhead we described the examined RNS adders in HDL and used the Synopsys<sup>®</sup> tools driven by the UMC VST 25 implementation technology (0.25  $\mu$ m, up to 5-metal layers, 1.8 / 3.3 V) for our implementations. Our targeted operating frequency was set to 300 MHz for

|                                                                          | LFSR Based    | MISR Based        | Deterministic | MinHO               | PIF             |  |  |  |  |
|--------------------------------------------------------------------------|---------------|-------------------|---------------|---------------------|-----------------|--|--|--|--|
| $\langle 2^8, 2^8 - 1, 2^7 - 1 \rangle$ RNS Accumulators / Adders        |               |                   |               |                     |                 |  |  |  |  |
| PCFC(%)                                                                  | (100,100,100) | (100,100,100)     | (100,100,100) | (100,99.41,99.33)   | (100,100,100)   |  |  |  |  |
| Test Cycles                                                              | (21,59,47)    | (26,65,111)       | (80,80,80)    | (103,371,410)       | (44,80,63)      |  |  |  |  |
| $(2^{16}, 2^{16}, 1, 2^{15}, 1)$ RNS Accumulators / Adders               |               |                   |               |                     |                 |  |  |  |  |
| PCFC(%)                                                                  | (100,100,100) | (100,100,100)     | (100,100,100) | (100,100,100)       | (100,100,100)   |  |  |  |  |
| Test Cycles                                                              | (56,579,387)  | (60,1102,709)     | (288,288,288) | (1696,6230,6319)    | (81,745,610)    |  |  |  |  |
| $\langle 2^8, 2^8-1, 2^7-1 \rangle$ RNS Multiplier-Accumulators          |               |                   |               |                     |                 |  |  |  |  |
| PCFC(%)                                                                  | (100,100,100) | (100,99.91,99.80) | Cannot Apply  | (100,99.96,99.95)   | (100,100,99.90) |  |  |  |  |
| Test Cycles                                                              | (50,308,562)  | (69,696,81)       | Cannot Apply  | (1460,2493,2834)    | (77,310,128)    |  |  |  |  |
| $\langle 2^{16}, 2^{16}-1, 2^{15}-1 \rangle$ RNS Multiplier-Accumulators |               |                   |               |                     |                 |  |  |  |  |
| PCFC(%)                                                                  | (100,100,100) | (100,100,100)     | Cannot Apply  | (100,99.95,99.97)   | (100,100,100)   |  |  |  |  |
| Test Cycles                                                              | (127,821,959) | (110,1106,1110)   | Cannot Apply  | (12070,17191,18575) | (159,1260,869)  |  |  |  |  |

Table 1: Fault Coverage and Test Times

|                                                  | LFSR Based | MISR Based   | Deterministic | MinHO  | PIF    |  |  |  |  |
|--------------------------------------------------|------------|--------------|---------------|--------|--------|--|--|--|--|
| RNS Accumulators                                 |            |              |               |        |        |  |  |  |  |
| $\langle 2^8, 2^8 - 1, 2^7 - 1 \rangle$          | 54.97%     | 14.97%       | 65.88%        | 14.25% | 11.63% |  |  |  |  |
| $\langle 2^{16}, 2^{16} - 1, 2^{15} - 1 \rangle$ | 53.80%     | 17.00%       | 61.20%        | 4.23%  | 13.15% |  |  |  |  |
| RNS Adders                                       |            |              |               |        |        |  |  |  |  |
| $\langle 2^8, 2^8 - 1, 2^7 - 1 \rangle$          | 20.56%     | Cannot Apply | 35.29%        | 11.07% | 6.28%  |  |  |  |  |
| $\langle 2^{16}, 2^{16} - 1, 2^{15} - 1 \rangle$ | 9.42%      | Cannot Apply | 19.83%        | 0.29%  | 5.35%  |  |  |  |  |
| RNS Multiplier-Accumulators                      |            |              |               |        |        |  |  |  |  |
| $\langle 2^8, 2^8 - 1, 2^7 - 1 \rangle$          | 25.40%     | 10.78%       | Cannot Apply  | 4.16%  | 6.75%  |  |  |  |  |
| $\langle 2^{16}, 2^{16} - 1, 2^{15} - 1 \rangle$ | 15.79%     | 8.36%        | Cannot Apply  | 2.52%  | 3.67%  |  |  |  |  |

Table 2: Area Comparisons

accumulators and 50 MHz for multiplier accumulators, for typical process parameters.

The comparison results are presented in Table 2. As we can see the MinHO and the PIF BIST schemes require the least hardware overhead. The reason that the hardware overhead of the MinHO technique is larger than that of PIF for n = 8 is that the test vector sequence length in this case is larger than the LFSR cycle length. This means that a comparator that compares the LFSR state with the stored LFSR last state is not enough. Therefore, we need larger control unit in order to detect the last test vector. The MISR based technique has not been used in adders since it requires as much hardware overhead as the LFSR based BIST scheme and does not achieve as good PCFC.

## 4. CONCLUSIONS

In this paper we presented two BIST schemes, MinHO and PIF, which require minimal hardware overhead. In the case of MinHO, we have given the criteria for the selection of the LFSR so as near to 100% (in most cases 100%) post compaction fault coverage to be obtained. We have shown also that the PIF BIST scheme with hardware overhead slightly larger than that of MinHO achieves 100% post compaction fault coverage with test sequence lengths comparable to those of the LFSR based BIST scheme.

# 5. REFERENCES

 M. Abramovici, M. A. Breuer and A. D. Friedman, Digital Systems Testing and Testable Design, *Computer Science Press*, New York, NY, 1990.

- [2] E. D. Di Claudio, et. al., "Fast Combinatorial RNS Processors for DSP Applications", *IEEE Trans. Comp.*, Vol. 44, No. 5, pp. 624-633, May 1995.
- [3] V. Paliouras and T. Stouraitis, "Multifunction Architectures for RNS Processors", *IEEE Trans. Circ. Syst. II*, Vol. 46, No. 8, pp. 1041-1054, Aug. 1999.
- [4] L. Kalampoukas et. al., "High Speed Parallel-Prefix Modulo 2<sup>n</sup> - 1 adders", *IEEE Trans. Comp.*, Vol. 49, No. 7, pp. 673-680, July 2000.
- [5] C. Efstathiou and H. T. Vergos, "Modified Booth 1's Complement and Modulo 2" - 1 Multipliers", *ICECS '2K*, Vol. II, pp. 637-640, Dec. 2000.
- [6] K. Kim, et. al., "On Using Signature Registers as Pseudorandom Test Pattern Generators in Built-In Self-Testing" *IEEE Trans. CAD*, Vol. 7, No. 8, pp. 919-928, Aug. 1998.
- [7] H. T. Vergos et. al., "Deterministic BIST for RNS adders", *IEEE Trans. Comp.*, to appear.
- [8] M. G. Walker, "Modeling the wiring of deep submicron ICs", *IEEE Spectrum*, pp. 65-71, March 2000.
- [9] Al. Stroele "BIST Pattern Generators Using Addition and Substraction Operators", JETTA 11, pp. 69-80, 1997.
- [10] J. Rajski and J. Tyszer, "Test Responses Compaction in Accumulators with Rotated Carry Adders", *IEEE Trans. CAD*, Vol. 12, No. 4, pp. 531-539, April 1993.
- [11] J. Rajski and J. Tyszer, "Accumulator Based Compaction of Test Responses", *IEEE Trans. Comp.*, Vol. 42, No. 6, pp. 643-650, June 1993.