# Design of An Efficient (*m*, 3)-Adder Circuit for Implementation of High-Performance Multipliers

# Myungchul Yoon

Professor, Department of Electronics and Electrical Engineering, Dankook University, Republic of Korea.

## Abstract

The efficient (m, 3)-adder circuits  $(4 \le m \le 7)$  are presented in this paper. An (m, 3)-adder adds m bits of the same weight simultaneously and produces 3 outputs: S, C1, C2 such that the binary number (C2 C1 S) represents the number of 1s in the m bits. The (m. 3)-adders are designed to implement a high performance (7, 3)-adder based multiplier. Using (7, 3)-adders, a multiplier can reduce partial products in fewer number of stages than using (3, 2)-adders. The new (m, 3)-adders are faster, smaller, and consume less power than the existing (m, 3)-adder circuits. A 64-bit radix-4 Booth multiplier is designed with the new (m, 3)-adders to estimate the efficiency of the (m, 3)-adders. The multiplier reduces 33 partial products into two 128-bit binary numbers in three (7, 3)-adder-stages and one (3, 2)adder-stage while the Wallace-tree requires eight stages. In addition, the number of (m, 3)-adders used in the reduction stages is about 1/3 that of (3, 2)-adders used in the Wallace-tree multiplier. Although the (7, 3)-adder has delay 1.7 times larger and consumes power 1.3 times more than a (3, 2)-adder, reduction of the partial products with the (7, 3)-adders is about 24% faster and consumes power 50% less than reducing by the (3, 2)-adders. Through the design of a 64-bit radix-4 Booth multiplier with the new (m, 3)-adders, it is verified that a fast and low-power multiplier can be designed with the efficient (7, 3)-adder.

**Keywords:** High-Speed multiplier, (7, 3)-adder, Multiplier design, Wallace-tree multiplier, (m, 3)-adder circuit.

# I. INTRODUCTION

Binary multiplier is a widely used building block in the design of microprocessors, digital signal processors, and graphic processors so that it is one of the key hardware in most digital signal processing systems. Therefore, various studies have been conducted to implement a high speed multiplier [1]-[6].

The operation of *N*-bit multipliers consists of three main steps: 1) generation of partial products; 2) reduction of the partial products to obtain two 2N-bit binary numbers; 3) carry-propagate addition of the two numbers to obtain the final result. Usually, the step-3 is not a big concern in multiplier design because it can be solved by using today's highly energy-delay optimized carry propagation adders (CPA). Therefore, most

research on the multiplier design is focused on the step-1 and the step-2.

The generation scheme employed in step-1 determines the number of partial products. Since the number of partial products affects delay, power, and size of multipliers, most of studies try to reduce the number of partial products.

The radix-r ( $r = 2^k$ ) modified Booth encoding [4][5] is widely used to reduce the number of partial products. While ordinary  $N \times N$  multipliers generate N partial products, the radix-rmodified Booth multipliers generate [(N + 1)/k] partial products. Radix-4 modified Booth is the most widely used Booth encoding scheme since it can reduce half of the partial products with little overhead for encoding. Higher radix Booth encoding is less popular because generation of partial products requires odd multiples of the multiplicand which requires carry-propagation additions. Although the requirement of CPAs increases delay of the multiplier, higher radix such as radix-8 [7][8] or radix-16 [9] Booth multiplier is used for the purpose of speed-power optimization.

When the number of partial products is determined, the next issue is how to add all the partial products to get the result. Generally, summation of partial products is reduced to a carry-propagation addition of two binary numbers through the reduction step. Wallace-tree [2] is a widely used reduction scheme. Although many reduction schemes are developed so far, most of multipliers use a (3, 2)-adder, usually called full-adder, as the basic unit for addition. Some wide-input adders, called adder compressor [10], are used for multiplier design, but the structure of most adder compressors is basically cascades of (3, 2)-adders.

Recently, the (7, 3)-adder based multiplier design is proposed [11]. An (*m*, 3)-adder ( $4 \le m \le 7$ ) adds *m* bits of the same weight simultaneously and produces 3 outputs: S (sum), C1 (carry-1), and C2 (carry-2). S has the same weight with the inputs while the weights of C1 and C2 are 2 and 4 times greater than the weight of inputs respectively. A (7, 3)-adder based multiplier uses a (7, 3)-adder as the basic unit for the reduction step, and other (*m*, 3)-adders, (3, 2)-adder and (2, 2)-adder, as auxiliary units. While a stage with (3, 2)-adders reduces 1/3 of bits, a stage with (7, 3)-adders can reduce 4/7 of bits so that the partial products are reduced into two binary numbers with much fewer stages.



**Fig. 1.** Critical path for the reduction step of a 64-bit radix-4 unsigned Booth multiplier with Wallace-tree structure.

One of the main problems of the (7, 3)-adder based multiplier is the complexity of (m, 3)-adder circuits. A set of (m, 3)-adders are presented in [11], and it is shown that the multiplier built with those adders can faster than the (3, 2)-adders based multiplier. However those (m, 3)-adders are bulky so that the size and the power consumption of the multiplier become much larger than that.

In this paper, new (m, 3)-adder  $(4 \le m \le 7)$  circuits are presented. The new (m, 3)-adders are smaller, faster, and consume less power than the (m, 3)-adders in [11]. A 64-bit radix-4 Booth multiplier is designed to verify their efficiency. The multiplier built with the new (m, 3)-adders is faster and consumes less power than the (3, 2)-adder based multiplier.

In Section II, the advantages and disadvantages of the (7, 3)adder based multiplier are compared with high radix Booth multipliers. The newly designed (m, 3)-adder circuits are described in Section III. Simulations results for the circuits are presented in Section IV, and Section V concludes this paper.

# II. THE (7, 3)-ADDER BASED MULTIPLIER vs. THE BOOTH MULTIPLIER

Recently the (7, 3)-adder based multiplier is suggested as an alternative to the (3, 2)-adder based multipliers. The main advantage of using (7, 3)-adders is that it can reduce the partial products more rapidly than the (3, 2)-adder. In the reduction step, one stage with (3, 2)-adder reduces 1/3 of the number of bits in the same digit while a stage with (7, 3)-adders can reduce 4/7 of them, therefore, the reduction rate of (7, 3)-adder is about double the (3, 2)-adder's. The critical path of the reduction step for the 64-bit radix-4 Booth unsigned multiplier employing the Wallace tree and a (7, 3)-adder based tree are shown in Fig. 1 and Fig. 2 respectively. As we can see in Fig. 1 and Fig. 2, 33 partial products can be reduced by three (7, 3)-adder stages and



**Fig. 2.** Critical path for the reduction step of a 64-bit radix-4 unsigned Booth multiplier with (7, 3)-adder based structure.

one (3, 2)-adder stage while 8 stages are required when only (3, 2)-adders are used.

As in Fig. 2, the number of bits in a digit is reduced by half with one stage of (7, 3)-adders, of which the effect is the same as quadrupling the radix of Booth multiplier. Therefore, inserting one (7, 3)-adder stage results in almost the same effect of employing 4 times higher radix Booth algorithm. This property can be used for the speed-power or the speed-size optimization in (7, 3)-adder based Booth multiplier design. The delay and complexity of the circuit for generating partial products increase rapidly along the radix of the Booth algorithm. The speed-size optimized multiplier can be designed by comparing the delay and hardware overheads for a stage of (7, 3)-adder to those in quadrupling the radix of a Booth multiplier.

Radix-4 Booth multiplier is the most effective because it removes a stage of (7, 3)-adders with little overhead in delay and hardware. A radix-16 Booth multiplier requires to produce  $\times 3$ ,  $\times 5$  and  $\times 7$  multiples of the multiplicand which requires carry propagation adders (CPAs). Since the speed of a (7, 3)adder is much faster than that of the CPA, a radix-16 Booth multiplier is slower than a (7, 3)-adder based radix-4 Booth multiplier. Therefore, the (7, 3)-adder based radix-4 Booth multiplier is the best option for the speed-oriented multiplier design.

# III. DESIGN OF A 64-BIT RADIX-4 BOOTH MULTIPLIER WITH EFFICIENT (m, 3)-ADDER CIRCUITS

The most critical part in the (7, 3)-adder based multiplier is the design of efficient (m, 3)-adders. Because the delay, power, and size of an adder increase rapidly with the number of inputs and most of the adders used in (7, 3)-adder based multipliers are the



**Fig. 3.** Circuit diagram of the new (m, 3)-adders. (a) Common structure of the (m, 3)-adder circuit. (b) The Early-Signal Follower (ESF) circuit. (c) The tree circuit of (4, 3)-adder. (d) The tree circuit of (5, 3)-adder. (e) The tree circuit of (6, 3)-adder. (f) The tree circuit of (7, 3)-adder.

**Table 1.** Comparison of the number of components used to build a radix-4 64-bit unsigned number Booth multiplier with Wallace-tree and (7, 3)-adder based design

|             |        | Wallace-Tree | (7, 3) based |  |
|-------------|--------|--------------|--------------|--|
|             | (2, 2) | 67           | 26           |  |
| # of adders | (3,2)  | 2021         | 218          |  |
|             | (4, 3) | -            | 95           |  |
|             | (5,3)  | -            | 100          |  |
|             | (6, 3) | -            | 102          |  |
|             | (7, 3) | -            | 378          |  |
|             | CPA    | 119-bit      | 120-bit      |  |
| # of stages | (7, 3) |              | 3            |  |
|             | (3,2)  | 8            | 1            |  |
|             | CPA    | 1            | 1            |  |

(7, 3)-adder, the design of the (7, 3)-adder determines the speed, power, and size of the multiplier.

It is natural that a (7, 3)-adder is slower, larger, and consumes more power than a (3, 2)-adder. However, the number of (7, 3)adders used to build a multiplier is much less than the number of (3, 2)-adders in a Wallace-tree multiplier, so that a (7, 3)adder based multiplier could be faster, smaller, and lower power than a (3, 2)-adder based multiplier.

The possibility of a faster (7, 3)-adder based multiplier is verified in [11]. However the (7, 3)-adder presented in [11] is bulky and consumes large power so that the multiplier built with the adders is bigger and consumes more power than the Wallace multiplier, though the number of (7,3)-adders in the multiplier is 1/4 the number of (3, 2)-adders required to build a Wallace multiplier.

A lower-power, smaller, and faster (m, 3)-adder than the (m, 3)adders in [11] is designed. The newly designed (m, 3)-adders are drawn in Fig. 3. The basic structure for the new (m, 3)adders is a binary tree made with simple NMOS multiplexers. The **0** signal at the vertex of the tree propagates to a leaf of the tree according to the inputs. The position of a leaf stands for the number of 1's in the inputs. A problem of this structure is that propagation speeds of the **0**-signal and the **1**-signal differ by the property of NMOS transistor which results in the difference in the rising delay and the falling delay at the leaves. The difference increases rapidly along the number of inputs. To solve this problem, the newly designed (m, 3)-adder circuits employ a dual tree structure. The one tree is used to increase the falling speed while the other tree is to boost the rising speed.

A 64-bit radix-4 Booth multiplier is designed by using the new (m, 3)-adders. The radix-4 Booth multiplier is chosen because it can reduce 1/2 of the number of partial products with little overhead in speed and hardware. The components required to build a 64-bit radix-4 Booth multiplier with the Wallace-tree structure and with the (7, 3)-adder based tree structure is given in Table 1. Only a half number of reduction stages is required

|               |         |        | ( <i>m</i> , 3)-adder |        |        |        |  |
|---------------|---------|--------|-----------------------|--------|--------|--------|--|
|               |         | (3, 2) | (4, 3)                | (5, 3) | (6, 3) | (7, 3) |  |
| # of Tr.      | Ref.    | 26     | -                     | -      | -      | -      |  |
|               | In [11] | -      | 54                    | 82     | 154    | 216    |  |
|               | New     | -      | 116                   | 138    | 163    | 188    |  |
| Delay<br>(ps) | Ref.    | 280    | -                     | -      | -      | -      |  |
|               | In [11] | -      | 315                   | 381    | 422    | 528    |  |
|               | New     | -      | 335                   | 373    | 405    | 477    |  |
| Power<br>(µW) | Ref.    | 107    |                       |        |        |        |  |
|               | In [11] | -      | 130                   | 343    | 297    | 826    |  |
|               | New     | -      | 105                   | 109    | 116    | 140    |  |

\* Power is measure at 100 MHz frequency.

for the (7, 3)-adder based multiplier, and the number of (m, 3)adders is about 1/3 of the number of (3, 2) adders in the Wallace-tree multiplier. These ratios can be the guideline for the (m, 3)-adder design. The delay of a (7, 3)-adder should be less than twice of a (3, 2)-adder's, and its power and size are less than triple of the (3, 2)-adder's.

After the reduction step, the two 128-bit binary numbers should be added by a CPA to obtain the final result. Among 128 bits, some bits in the lowest digits can be obtained during the reduction steps. The Wallace tree multiplier can reach at the final value up to the 9 lowest digits (two digits in the first stage and one digit per stage in the other stages). While one digit can reach the final value in a (3, 2)-adder stage, two digits per one (7, 3)-adder stage are possible to get the final values, because the carry propagation addition of (2, 2) - (3, 2) or (2, 2) - (4, 3)adders can be done in the delay of a (7, 3)-adder stage.

# **IV. EXPERIMENTS**

The characteristics of newly designed (m, 3)-adders are estimated by simulations and they are compared with those of the existing (m, 3)-adder circuits in [11]. The simulation is performed by HSPICE with 1.2V-0.13µm model parameter.

Comparisons of the newly designed circuits and the circuits in [11] are summarized in Table 2. The delay of an adder is measured by cascading ten identical adders (fanout=1). The delays and power of the circuits are measured for various combination of input changes. The delay and power are taken as the worst case values in the simulations.

It can be found in Table 2 that the newly designed (m, 3)-adder circuits are smaller, faster and consume less power than the adders in [11]. Although (4, 3)-adder in [11] is smaller and faster than the new (4, 3)-adder, the number of (4, 3)-adder is only 14 % of the number of all (m, 3)-adders and the delay of (4, 3)-adder is not important since the speed of a multiplier is determined by the delay of the (7, 3)-adder. Therefore the characteristics of the (7, 3)-adder are the most critical features.

Table 2. Comparison of (m, 3)-adder circuits



Fig. 4. A circuit of the reference (3, 2)-adder (full-adder)

Although the number of transistors in the new (6, 3) and (7, 3)adder is comparable to that in [11], the size of these new adders is smaller than that in [11] due to the difference in the transistor sizes. While the (6, 3) and (7, 3)-adders in [11] are needed to use large size transistors to reduces their delay, the size of NMOS used in the new adders is just 1.5 times of the minimum size transistor.

Although the size of the (7, 3)-adder based multiplier with the new (m, 3)-adders is smaller than the multiplier with the adders in [11], it is still bigger than the (3, 2)-adder based multipliers. Compared to the reference (3, 2)-adder [12] as in Fig. 4, about 7 times more transistors are used to build a (7, 3)-adder circuit. Although the number of (m, 3)-adders needs to build the multiplier is about 1/3 of the number of (3, 2)-adders in the Wallace-multiplier, it cannot avoid increasing the size of multiplier due to the large discrepancy of the transistor count.

Despite of the disadvantage in the multiplier size, the (7, 3)adder based multiplier with the new circuits has advantages in speed and power. Compared with the (3, 2)-adder, the delay of the new (7, 3)-adder is 1.7 times larger than that of the (3, 2)adder. Although the delay of a stage is increases 70%, its effect is overwhelmed by the reduced number of stages. As the result, the reduction process becomes 22% faster by using the new (7, 3)-adder circuits.. Furthermore, the power of the (7, 3)-adder is 1.3 times larger than that of the (3, 2)-adder. Since the number of (m, 3)-adders used in the (7, 3)-adder based 64-bit radix-4 Booth multiplier is about 1/3 of the number of (3, 2)-adders in the Wallace multiplier, a (7, 3)-adder based multiplier with the new (m, 3)-circuits can greatly reduce the power of the multiplier. Considering the numbers in Table 1, using the new (m, 3)-adders saves about 50% of the power consumed in the eight (3, 2)-adder reduction stages.

Consequently, the (m,3)-adder circuits presented in this paper are very efficient in reducing the power and the delay of multipliers.

## V. CONCLUSION

A set of (m, 3)-adder circuits are presented in this paper. The newly designed (m, 3)-adders are smaller, faster, and consume

less power than existing (m, 3)-adder circuits. The (m, 3)adders are useful in the reduction step of a multiplier which reduces the summation of many partial products into an addition of two binary numbers. A 64-bit radix-4 Booth multiplier is designed by employing (7, 3)-adder based design and it is compared with the (3, 2)-adder based Wallace multiplier. By the new (m, 3)-adders, the (7, 3)-adder based design performs the reduction step 24% faster with 50% less power than the Wallace tree implementation. However, the size of the (7, 3)-adder is much bigger than a (3, 2)-adder so that the (7, 3)-based multiplier is larger than Wallace multipliers.

Through the design of a 64-bit radix-4 Booth multiplier with the newly designed (m, 3)-adders, we show that the (7, 3)-adder based multiplier can be faster and consume less power than (3, 2)-adder based multipliers.

#### REFERENCES

- M. C. Park, B. W. Lee, G. M. Kim, and D. H. Kim, 1993, "Compact and Fast Multiplier Using Dual Array Tree Structure", *IEEE Int. Symp. on Circuit and Systems*, Chigago, vol. 3, pp. 1817-1820.
- [2] C. S. Wallace, 1964, "A Suggestion for a Fast Multiplier", *IEEE Trans. Electron. Compt.*, vol. 13, no. 1, pp. 14-17.
- [3] L. Dadda, 1965. Some Schemes for Parallel Multipliers, Alta Frequenza.
- [4] A. D. Booth, 1951, "A Signed Binary Multiplication Technique," *Jour. of Mech. Appl. Math.*, vol. 4, pp. 236-240.
- [5] W. C. Yeh and C. W. Jen, 2000, "High-Speed Booth Encoded Parallel Multiplier Design", *IEEE Trans. on Compt.*, vol. 49, no. 7, pp. 692-701.
- [6] P. Mehta, D. Gawali, 2009, "Conventional versus Vedic mathematical method for Hardware implementation of a multiplier", *IEEE Int. Conf. on Advances in Computing, Control, and Telecommun. Technologies*, pp. 640-642.
- [7] E. M. Schwarz, R. M. A. III, and L. J. Sigal, 1997, "A radix-8 CMOS S/390 multiplier," in *Proc. 13th IEEE Symp. Comput. Arithmetic (ARITH)*, pp. 2–9.
- [8] G. Colon-Bonet and P. Winterrowd, 2008, "Multiplier evolution: A family of multiplier VLSI implementations" *Comput. J.*, vol. 51, no. 5, pp. 585– 594.
- [9] R. Riedlinger *et al.*, 2012, "A 32 nm, 3.1 billion transistor, 12 wide issue itanium processor for missioncritical servers," *IEEE J. Solid-State Circuits*, vol. 47, no. 1, pp. 177–193.
- [10] M. Rao, S. Dubey, 2012, "A high speed and area efficient Booth recoded Wallace tree multiplier for fast arithmetic circuits", Microelectronics and Electronics (PrimeAsia), 2012 Asia Pacific Conference on Postgraduate Research. pp. 220-223.

- [11] M. Yoon, 2019, "Design of A Fast Multiplier with (*m*, 3)-Adders,' IJERT vol. 12, no. 10, pp. 1757-1763.
- [12] Neil H.E. Weste, and David M. Harris, 2010, Kamran, Principles of CMOS VLSI Design: A systems perspective, Pearson, Fourth Edition.