# Low-Swing Clock Domino Logic Incorporating Dual Supply and Dual Threshold Voltages

Seong-Ook Jung University of Illinois Urbana, IL 61801 sojung@uivlsi.csl.uiuc.edu Ki-Wook Kim Pluris Incorporation Cupertino, CA 95014 kkim@pluris.com Sung-Mo Kang University of California Santa Cruz, CA 95064 kang@soe.ucsc.edu

# **ABSTRACT**

High-speed domino logic is now prevailing in performance critical block of a chip. Low Voltage Swing Clock (LVSC) domino logic family is developed for substantial dynamic power saving. To boost up the transition speed in proposed circuitry, a well-established dual threshold voltage technique is exploited. Dual supply voltage technique in the LVSC domino logic is geared to reduce power consumption in clock tree and logic gates effectively. Delay Constrained Power Optimization (DCPO) algorithm allocates low supply voltage to logic gates such that dynamic power consumed by logic gates is minimized. Delay time variations due to gate-to-source voltage change and and input signal arrival time difference are considered for accurate timing analysis in DCPO.

# **Categories and Subject Descriptors**

B.6 [Hardware]: Logic Design; B.7 [Hardware]: Integrated Circuits

# **General Terms**

Design

### **Keywords**

domino logic, low swing clock, dual supply voltage, dual threshold voltage, low power

# 1. INTRODUCTION

Domino logic is faster than counterpart static logic since the output of domino logic gate drives only NMOS transistors while the output of static logic gate drives both PMOS and NMOS transistors [4]. Thus, domino logic has become a popular circuit design style to meet the high speed requirement. However, domino logic may be less power effective compared to static logic due to significant clock loading and high switching activity. The total dynamic power of

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

*DAC 2002* June 10-14, 2002, New Orleans, Louisiana, USA. Copyright 2002 ACM 1-58113-461-4/02/0006 ...\$5.00.

domino logic is consumed mainly by both the clock tree and the domino logic gate. Since switching power consumption dominates dynamic power consumption, the total dynamic power consumption  $P_{total}$  of a circuit which is implemented by domino logic can be represented by

$$P_{total} = P_{clk} + P_{logic} \tag{1}$$

$$P_{clk} = \alpha_{clk} \cdot (C_{clk-line} + C_{clk-tr}) \cdot V_{clk-swing}^2 \cdot f_{clk}$$
 (2)

$$P_{logic} = \sum_{i} \alpha_{logic\_i} \cdot C_{logic\_i} \cdot V_{logic\_swing}^{2} \cdot f_{clk}$$
 (3)

where,  $P_{clk}$  and  $P_{logic}$  are the dynamic power consumption in the clock tree and the logic gates, respectively.  $\alpha_{clk}$  and  $\alpha_{logic\_i}$  are the switching probability of the clock and each domino logic gate, respectively.  $C_{clk-line}$ ,  $C_{clk-tr}$ , and  $C_{logic\_i}$  represent the clock interconnection line capacitance, the clock load transistor capacitance, and the internal and output node capacitance of each domino logic gate.  $V_{clk-swing}$  and  $V_{logic-swing}$  are the voltage swing level of the clock and the logic gate.  $f_{clk}$  is the operating frequency.

Since switching power consumption is quadratically proportional to the supply voltage, supply voltage reduction is one of the most effective way to reduce power consumption. Dual supply voltage techniques have been proposed, which use high supply voltage  $(V_{ddH})$  in the critical path to obtain high performance while using low supply voltage  $(V_{ddL})$  in the non-critical path to reduce power consumption for static logic in [1, 3]. Rjoub in [6] proposed design technique to save power consumption by reducing the voltage swing only on the domino gate output nodes. Gate-level voltage scaling approach for domino logic was presented in [8]. However, it focuses only on the logic power minimization by replacing domino logic gates using  $V_{ddH}$  in the non-critical path with domino logic gates using  $V_{ddL}$ .

In this paper, we propose a new domino logic family, namely LVSC domino logic, in which low swing clock is used for domino logic gates to reduce clock tree power consumption. In order to improve performance, low threshold voltage  $(V_t)$  is applied to NMOS transistor in the critical path. The proposed LVSC domino logic is effectively incorporated with dual supply voltage technique. Delay Constrained Power Optimization (DCPO) algorithm is also proposed to replace logic gates using  $V_{ddH}$  with logic gates using  $V_{ddL}$ . In this algorithm, delay time variations according to gate-source voltage change and input arrival time difference are considered. The experiment results for ISCAS'85 benchmark show that proposed technique saves the total dynamic power consumption significantly while maintaining performance.

# 2. LVSC DOMINO LOGIC FOR DUAL SUP-PLY VOLTAGE TECHNIQUES

# 2.1 Level Shifting Issue in Dual Supply Voltage Technique



Figure 1: DC current and level shifter in dual supply voltage technique for static logic

The reduction of voltage swing level is the most effective way to reduce power consumption since dynamic power consumption is quadratically proportional to the voltage swing level. However, if low supply voltage is applied to all logic gates, performance is degraded since driving capability is lowered due to the reduced gate to source voltage. Dual supply voltage techniques for static logic use  $V_{ddH}$  for static logic gates in the critical path to achieve high performance while using  $V_{ddL}$  for static logic in the non-critical path to reduce power consumption. However, when the output of logic gate using  $V_{ddL}$  drives logic gate using  $V_{ddH}$ , PMOS transistor of logic gate using  $V_{ddH}$  does not completely turn off since gate to source voltage of PMOS transistor is less than 0 ( $V_{GS_p} < 0$ ). Thus, DC current flows through PMOS and NMOS transistors as shown in Figure 1 (a). To prevent DC current, a level shifter should be inserted between the output of logic gate using  $V_{ddL}$  and the input of logic gate using  $V_{ddH}$ . Figure 1 (b) shows an example of the level shifter. Since the level shifter consumes much more power than an inverter, the effect of power saving varies with the number of added level shifters. Furthermore, the level shifter is much slower than normal gates. Thus, the level shifter can limit dual supply voltage technique for static logic [9].

On the other hand, DC current does not flow and the level shifter is not required in dual supply voltage technique for domino logic since the output of domino logic gate using  $V_{ddL}$  drives only NMOS transistor in the following domino logic gate. Thus, dual supply voltage technique is very efficient for domino logic to reduce power consumption while keeping performance.

# 2.2 Proposed LVSC Domino Logic

Shieh in [8] proposed the dual supply voltage technique for domino logic. The power consumption in the clock tree is large portion of the total power consumption in domino logic due to the large clock interconnection capacitance, the large clock load transistor capacitance, and the high switching probability. However, the previous dual supply voltage technique for domino logic does not consider on reducing power consumption in the clock tree. It uses  $V_{ddH}$  for the

clock tree. AND all inputs of domino logic gates in the critical path should be driven by  $V_{ddH}$ , which limits high supply voltage replacement with  $V_{ddL}$ . The delay variations to gate-to-source voltage change and input arrival time difference are not considered either, which result in significant delay offset.

To reduce the power consumption in the clock tree, the clock can share the supply voltage with logic gates such that high voltage swing clock is used for high  $V_{dd}$  domino logic gates, while low voltage swing clock is used for low  $V_{dd}$  domino logic gates. We refer to this clock scheme as Dual Voltage Swing Clock (DVSC). In this scheme, two clock trees are required for both high and low supply voltage devices. It can complicate routing, and thus increase the effective load capacitance of clock tree. Eventually, power saving effect can be mitigated by increased interconnection line capacitance.



(a) Domino Logic Gate using VddL

(b) Domino Logic Gate using VddH

Figure 2: Low Voltage Swing Clock (LVSC) in dual supply voltage technique for domino logic

To further reduce the power consumption in the clock tree, we propose Low Voltage Swing Clock (LVSC) scheme, in which low voltage swing clock controls all domino logic gates regardless of supply voltage level of domino logic gate as shown in Figure 2. Since low voltage voltage clock controls PMOS precharge transistor in a high  $V_{dd}$  logic gate as shown in 2 (b), PMOS cannot completely turn off. As a result, DC current flows through PMOS transistor, which is similar to static logic without the level shifter. To resolve this problem by increasing  $V_t$  of PMOS precharge transistor, the well of PMOS precharge transistor is biased with higher voltage than  $V_{ddH}$ . Domino logic gate usually has smaller number of PMOS transistors than NMOS transistors. Thus, the well of PMOS precharge transistor can be readily separated from the other PMOS transistors with different well bias. Even though the precharge time may increase due to high  $V_t$  of PMOS precharge transistor, the precharge time of domino logic gate does not determine the circuit performance. Note that the evaluation time of domino logic gate using  $V_{ddH}$ , which is in the critical path, can also be degraded due to the low voltage swing clock. To compensate for such performance degradation, low  $V_t$  is applied to NMOS transistor in the critical path.

# 2.3 Simulation Results for LVSC Domino Logic

All simulation results in this section are based on a 0.18  $\mu m$  CMOS process. The high and low supply voltage are 1.5V and 1.0V. Figure 3 shows the delay time variation according to the input arrival time difference for 2-input AND gate. VDDH-conv denotes the conventional domino logic



Figure 3: Delay time variations according to the input arrival time difference for 2-input AND gate



Figure 4: Subthreshold current increase in LVSC scheme for several domino logic gates

gate using  $V_{ddH}$  and mVDDH\_LVSC denotes the domino logic gate using  $V_{ddH}$  in LVSC scheme. At the primary input stage, in which two input signals and the clock arrive at the same time, the delay time of domino logic gate in LVSC scheme increases by 15% compared to the conventional domino logic gate since clock gated NMOS transistor (MNCLK) of domino logic gate in LVSC scheme is driven by  $V_{ddL}$ . The delay overhead can be reduced to 5% by applying low  $V_t$  to MNCLK with small subthreshold current  $(I_{sub})$  overhead.

From the next logic level, the LVSC scheme is faster than the conventional domino logic gate regardless of input arrival time difference. The reason is that the clock is already high and the driving capability improvement of low  $V_t$  NMOS transistor in the critical path overcomes low voltage swing clock effect. The delay time tends to be reduced as the input arrival time difference increases and it becomes invariant to the input arrival time difference beyond some specific value. We refer to the time difference as the arrival time difference threshold, or  $\tau$ , of the gate.

Since low  $V_t$  is used for performance, the change in  $I_{sub}$  needs to be considered. In domino logic, all domino logic

gates are in the precharge phase during the standby mode. Thus,  $I_{sub}$  in the standby mode is determined by both stacked NMOS evaluation transistors and the PMOS transistor of output inverter. Figure 4 shows  $I_{sub}$  increase in LVSC scheme for several domino logic gates. To drive large fanout, stage ratio between dynamic part and static part of domino logic gate is usually greater than 2 and the static output inverter of domino logic is usually high-skewed for high speed operation. Furthermore,  $I_{sub}$  of stacked transistor (NMOS evaluation transistor) is much less than that of non-stacked transistor [2]. Thus, the increase in  $I_{sub}$  by low  $V_t$  NMOS transistor in the critical path serves relatively small portion of the total static power consumption since large PMOS transistor of output inverter mainly contributes total  $I_{sub}$ . As a result, the performance of domino logic gate can be improved by applying low  $V_t$  to NMOS transistor in the critical path with small amount of increase in  $I_{sub}$ .

#### 3. CIRCUIT DELAY CHARACTERIZATION

### 3.1 Gate delay

The gate delay d(u) for a gate u can be represented by using alpha power law model [7].

$$d(u) \propto \frac{C_L \cdot V_{dd}}{K \cdot (V_{GS} - V_{th})^{\alpha}} \tag{4}$$

where  $C_L$  denotes load capacitance and  $V_{GS}$  is the gate-to-source voltage. K is a drivability factor and equals  $\mu_0 C_{ox}(W/L)$ .  $\mu_0$  is the zero bias mobility.  $C_{ox}$  is the gate oxide capacitance per unit area.  $\alpha$  is the velocity saturation index.

The following expression for gate delay  $d_0(u)$  of a domino logic gate u can be obtained from Equation (4) [5].

$$d_0(u) \propto \frac{C_{dyn} \cdot V_{ddH}}{K_n \cdot (V_{ddH} - V_{thn})^{\alpha}} + \frac{C_{out} \cdot V_{ddH}}{K_p \cdot (V_{ddH} - V_{thp})^{\alpha}}$$
 (5)

where  $d_0(u)$  is the intrinsic delay for high supply voltage  $(V_{ddH})$ , all fanin gates with  $V_{ddH}$  of  $V_{GS}$ , and no input arrival time difference effect on gate delay.  $C_{dyn}$  and  $C_{out}$  represent the capacitance of the dynamic node and the output node in domino logic gate.  $V_{thn}$   $(K_n)$  and  $V_{thp}$   $(K_p)$  denote the threshold voltage (the drivability factor) of the NMOS evaluation transistor and the PMOS transistor of the output inverter. The first term in Equation (5) represents the delay time from input to dynamic node and the second term represents the delay time from dynamic node to output node.

In our novel circuit scheme, timing analysis needs to be carried out in transistor level because supply voltage replacement, and  $V_{GS}$  and input signal arrival difference changes due to the replacement affect the delay time of each logic gate. The gate delay d(u) and delay increase  $\Delta d(u)$  of domino logic gate u are given by the following equations.

$$d(u) = d_0(u) + \Delta d_{V_{dd}}(u) + \Delta d_{V_{GS}}(u) + \Delta d_{z(u)}(u)$$
(6)

$$\Delta d(u) \stackrel{\triangle}{=} d(u) - d_0(u) = \Delta d_{Vdd}(u) + \Delta d_{VGS}(u) + \Delta d_{z(u)}(u) \quad (7)$$

$$\Delta d_{Vdd}(u) \propto \frac{C_{dyn} \cdot V_{ddL}}{K_n (V_{ddH} - V_{thn})^{\alpha}} + \frac{C_{out} \cdot V_{ddL}}{K_p (V_{ddL} - V_{thp})^{\alpha}} - d_0(u) (8)$$

$$\Delta d_{V_{GS}}(u) \propto \frac{C_{dyn} \cdot V_{dd}}{K_n (V_{ddL} - V_{thn})^{\alpha}} - \frac{C_{dyn} \cdot V_{dd}}{K_p (V_{ddH} - V_{thp})^{\alpha}}$$
(9)

$$\Delta d_{z(u)}(u) = \begin{cases} 0 & \text{if } z(u) \ge \tau(u) \\ d_{AT0} & \text{otherwise} \end{cases}$$
 (10)

The gate delay varies depending on (1) supply voltage of the gate u ( $\Delta d_{V_{dd}}(u)$ ), (2) gate-to-source voltage of the NMOS transistor, namely the supply voltage of fanin gate ( $\Delta d_{V_{GS}}(u)$ ), and (3) arrival time difference of input signal from fanin gates ( $\Delta d_{z(u)}(u)$ ). z(u) denotes arrival time difference between the latest signal and other signal and  $\tau(n)$  denotes arrival time difference threshold.  $d_{AT0}$  represents increase in delay time due to the reduced input arrival time difference.

Suppose that the supply voltage of a gate u is scaled down. Then, the gate delay d(u) increases by  $\Delta d_{V_{dd}}(u)$ . Since the supply voltage of a gate u is scaled down, the gate-to-source voltage of the NMOS transistor in fanout gates of the gate u reduces from  $V_{ddH}$  to  $V_{ddL}$ . As a result, the gate delay d(v) for fanout gate v of v aggravates by v determinant of the fanout gate delay is the variation of input signal arrival time difference that results from gate delay change in v, which is denoted by v delay variations on the fanout are the integral parts of the timing analysis, which can be extracted from SPICE simulation.

### 3.2 Definitions and preliminaries

A combinational circuit is represented by a directed acyclic graph G=(V,E). A node  $u\in V$  represents a gate in the circuit. There is an edge  $(u,v)\in E$  if the output signal of node u is fed directly as an input of node v.

**Definition:** The arrival time AT(u) for gate u defines the latest path delay from the primary inputs to the gate output of u such that

$$AT(u) = \max_{i \in FI(u)} AT(i) + d_0(u) + \Delta d_{V_{dd}}(u) + \Delta d_{V_{GS}}(u) + \Delta d_{z(u)}(u)$$
(11)

where FI(u) is a set of fanin gates of gate u.

The required time RT(u) is the latest time the signal must be available at the output of u such that

$$RT(u) = \min_{j \in FO(u)} RT(j) - (d_0(j) + \Delta d_{V_{dd}}(j) + \Delta d_{V_{GS}}(u, j) + \Delta d_{z(j)}(j))$$
(12)

where FO(u) is a set of fanout gates of gate u. The slack time ST(u) is defined as the time difference between required time and arrival time such that

$$ST(u) = RT(u) - AT(u). \tag{13}$$

If all the gates in a circuit have non-negative slack time, then the circuit is said to be safe.

**Theorem** 3.1. For an edge  $(u,v) \in E$  in G, with delay increase by  $\Delta d_{V_{dd}}(u)$  in u, if  $\Delta AT(v) \equiv \{(d_0(v) + \Delta d_{V_{dd}}(u) + \Delta d_{V_{GS}}(v) + \Delta d_{Z(v)}(v)) - (AT(v) - AT(u))\} \geq 0$  then AT(v) increases by  $\Delta AT(v)$ .

**Definition:** Gate v is arrival-time sensitive to gate u, or  $v \in ATS(u, \Delta d(u))$ , if  $\Delta AT(v) \geq 0$  with delay increase in u by  $\Delta d(u)$ .

**Theorem** 3.2. A circuit is safe for delay variation  $\Delta d(u)$  in gate u, if the slack time for all gates in the arrival time sensitive gate set is non-negative such that

$$\forall v \in ATS(u, \Delta d(u)), \quad ST(v) \ge 0.$$
 (14)



(a) Timing information for high Vdd configuration (circuit delay = 15)



(b) Timing information when gate u has low Vdd (circuit delay = 16)

Figure 5: Timing analysis example

**Example** 3.1.: This example shows the proposed timing analysis ensures accuracy by accounting timing variations due to gate-to-source voltage and input signal arrival time. In this example, let us suppose the following. First, for any gates (say u),  $V_{dd}$  scaling leads to gate delay increase by 2 (or,  $\Delta d_{V_{dd}}(y) = 2$ ). Second, if a gate (say u) lowers its supply voltage, then the delay time of corresponding fanout gate (say v is a fanout gate of u) increases by 1 (or,  $\Delta d_{V_{GS}}(v) = 1$ ). Third, if the arrival time difference of input signals z(u) is less than arrival time difference threshold  $\tau(u)$ , then the delay time increases by 1 (or  $\Delta d_{z(v) \le \tau(v)}(v) = 1$ ).

Now let us investigate the gate u in Figure 5(b). Voltage scaling on u adds 2 on gate delay. The arrival time for gate u increases by  $\Delta d_{V_{dd}}(u) = 2$  resulting in AT(u) = 7. Since the arrival time difference z(v) becomes 0, which is less than  $\tau(v)$  of 2, the gate delay increase by 1 (namely,  $\Delta d_{z(v)}(v) = 1$ ). Furthermore,  $V_{GS}$  reduction on v increases d(v) by  $d_{V_{GS}}(v) = 1$ . Thus, the arrival time for gate vincreases from 9 to 11. Since the updated arrival time difference z(x) of 1 is less than  $\tau(x)$  of 2, the delay time of gate x increases by  $\Delta d_{z(x)}(x) = 1$ . In this particular example, circuit delay aggravates by 1 due to delay variations by gate-to-source voltage and input arrival time difference. Thus, the power supply voltage of gate u can not be scaled down. However, the previous dual supply voltage technique for domino logic does not consider those delay time variations. Thus, it replaces the supply voltage of gate u with  $V_{ddL}$ , which results in increasing the critical path delay.

```
procedure DCPO (circuit C, delay_constraint d(C))
  Build a circuit graph G for the circuit C;
   Create gate delay table for each gate in library with
     different V_{GS} and input arrival time difference z;
   Set circuit delay as d(C);
   Initialize all the gates using V_{ddH} in LVSC scheme;
   repeat
     for each gate u using V_{ddH}
      if circuit C is safe after voltage scaling on u
         add u to candidate gate set;
         set the weight of gate u to the gain \sigma(u);
       end
     end
     Build a timing sensitivity graph
       for the candidate gate set;
     Prune delay-orthogonal gates by applying
       maximal weight independent set algorithm
       to the timing sensitivity graph;
     foreach gate u in resulting delay-orthogonal set
       Update gate delay d(u) := d(u) + \Delta d_{V_{dd}}(u);
       Update gate delay of fanout gates
         according to V_{GS} change and
         input arrival time difference variation;
       Update arrival time for all gates
         in ATS(u, \Delta d_{V_{dd}}(u));
       Update slack time for all gates
         in ATS(u, \Delta d_{V_{dd}}(u));
     end
   until (candidate gate set is empty)
  return power optimal circuit C_{power\_opt}
end
```

Figure 6: Delay optimization algorithm

# 4. DELAY CONSTRAINED POWER OPTI-MIZATION (DCPO) ALGORITHM

The main idea for power optimization algorithm is to scale down the supply voltage of gates in the non-critical path with delay constraint d(C) that is obtained from initial circuit with  $V_{ddH}$ . For each candidate gate u, the gain represents the amount of power-delay product reduction as a result of low  $V_{dd}$  application which is formally defined as

$$\sigma(u) = \frac{1}{(1 - \Delta P(u)/P_o)(1 + \Delta D(u)/D_o)}$$
(15)

where  $\Delta P(u)$  represents saved power,  $P_o$  is original power before update,  $\Delta D(u)$  is circuit delay overhead, and  $D_o$  is the original circuit delay. The more the gain is, the more favorable candidate gate is for  $V_{dd}$  reduction.

The pseudo-code of the algorithm is shown in Figure 6. Given a combinational circuit, a circuit graph G is built. For all the gates defined in cell library, generate a timing information base for keeping precharacterized delay variations according to  $V_{dd}$ ,  $V_{GS}$ , and z. Then all the gates are configured to domino cells with high supply voltage in LVSC scheme.

For each iteration, we identify a candidate gate set that saves power while circuit delay overhead is within given delay constraint d(C). If a gate satisfies delay constraint, assign the power-delay product gain in Equation (15) to the gate as a weight, which will be utilized as a priority in candidate gate selection step. Then, in order to prune out a set of gates that are independent of each other with respect to timing variation, timing sensitivity graph is set up. If a gate v is in  $ATS(u, \Delta d_{V_{dd}}(u))$ , then there is a directed edge from u to v in the timing sensitivity graph. Applying maximal weighted independent set algorithm to the timing sensitivity graph yields a set of temporally orthogonal gates. When two gates u and v are chosen as a result of this step, then gate delay variation of one gate does not affect timing information such as arrival time, required time, or slack time of the other gate. Thus, timing information of all the gates in this set can be updated at the same time. This iteration continues not until candidate gate set is empty. Finally, the procedure yields power-optimal circuit configuration with given delay overhead.

#### 5. EXPERIMENTAL RESULTS

To simplify the analysis, technology mapping is carried out to map the logic to a library which contains AND and OR gates. For domino logic realization, all the internal inverters are eliminated by using duplication technique.

Initially, all logic gates use  $V_{ddH}$ . First, all logic gates are replaced with logic gates using  $V_{ddH}$  in LVSC scheme. The replacement results for ISCAS85 benchmark circuits are shown in column "mVDDH\_LVSC" in Table 1. All the values shown in Table 1 are delay improvement and power reduction percentages with respect to the initial circuits with  $V_{ddH}$ . " $\Delta t_D$ " denotes critical path speed-up. " $\Delta P_{logic}$ " and " $\Delta P_{std}$ " denote dynamic and standby power reductions, respectively. We assume that all domino logic gates are precharged in standby mode. Critical path delay time is improved up to 11.68% with small amount of increase in dynamic and static power consumptions. However, total dynamic power consumption can be saved significantly since low voltage clock swing is used. Then, DCPO algorithm is applied to further reduce the total dynamic power consumption within delay constraint. The results for ISCAS85 benchmark circuits are shown in column "LVSC\_DualVDD" in Table 1. "Rep" denotes the percentages of high supply voltage replacement using low supply voltage. The dynamic power saving in logic gates ranges from 13.54% to 36.94% and the standby power saving in logic gates ranges from 5.63% to 27.61%. The dynamic power saving in logic tree (" $\Delta P_{clk}$ ") is assumed to be 44.44% since clock swing voltage is reduced to 1.0V from 1.5V.

Table 2 shows the total dynamic power saving for IS-CAS85 benchmark according to the ratio of  $P_{clk}$  over  $P_{total}$ . "Conv." and "LVSC" denote the total dynamic power saving without and with low swing clock, respectively. If  $P_{clk}$  is 20% of  $P_{total}$ , the total dynamic power saving of LVSC scheme increases up to 29.77% on average while the total dynamic power saving of conventional scheme is 20.81% on average. For "C432" and "C6288", power saving of LVSC scheme is around 2 times bigger than that of conventional scheme. If  $P_{clk}$  is 50% of  $P_{total}$ , the dynamic power saving of LVSC scheme is more than 2.5 times that of conventional scheme on average. Thus, the power saving of proposed LVSC scheme is significantly greater than that of conventional scheme.

Table 1: Delay speed-up and power saving for ISCAS85 benchmark circuits

| Circuit | Gates | ${ m mVDDH\_LVSC}$ |                    |                  | LVSC_DualVDD |              |                    |                  |                  |
|---------|-------|--------------------|--------------------|------------------|--------------|--------------|--------------------|------------------|------------------|
|         |       | $\Delta t_D$       | $\Delta P_{logic}$ | $\Delta P_{std}$ | Rep          | $\Delta t_D$ | $\Delta P_{logic}$ | $\Delta P_{clk}$ | $\Delta P_{std}$ |
| C432    | 171   | 9.34%              | -2.05%             | -2.73%           | 40.35%       | 4.18%        | 14.86%             | 44.44%           | 11.60%           |
| C880    | 376   | 11.06%             | -1.78%             | -2.75%           | 80.21%       | 4.93%        | 31.76%             | 44.44%           | 23.30%           |
| C1355   | 669   | 11.26%             | -1.57%             | -2.98%           | 36.31%       | 6.04%        | 13.54%             | 44.44%           | 5.63%            |
| C1908   | 550   | 11.34%             | -1.67%             | -2.87%           | 63.46%       | 1.86%        | 24.77%             | 44.44%           | 17.84%           |
| C2670   | 953   | 9.73%              | -1.70%             | -2.89%           | 92.67%       | 6.36%        | 36.94%             | 44.44%           | 27.61%           |
| C3540   | 1492  | 11.68%             | -1.78%             | -2.84%           | 81.62%       | 0.57%        | 32.34%             | 44.44%           | 24.18%           |
| C5315   | 2133  | 11.28%             | -1.60%             | -2.88%           | 86.00%       | 0.00%        | 34.19%             | 44.44%           | 25.50%           |
| C6288   | 4588  | 10.33%             | -1.78%             | -2.84%           | 45.70%       | 0.00%        | 14.68%             | 44.44%           | 17.30%           |
| C7552   | 3048  | 9.92%              | -1.55%             | -2.95%           | 91.57%       | 0.00%        | 31.85%             | 44.44%           | 27.51%           |
| Avg.    |       | 10.66%             | -1.72%             | -2.86%           | 68.65%       | 2.60%        | 26.10%             | 44.44%           | 20.05%           |

Table 2: Total dynamic power saving for ISCAS85 benchmark circuits according to the ratio of  $P_{clk}$  over  $P_{total}$ 

| Circuit | 20%    |        | 30%    |        | 40%    |        | 50%    |        |
|---------|--------|--------|--------|--------|--------|--------|--------|--------|
|         | Conv.  | LVSC   | Conv.  | LVSC   | Conv.  | LVSC   | Conv.  | LVSC   |
| C432    | 11.89% | 20.78% | 10.40% | 23.73% | 8.92%  | 26.69% | 7.43%  | 29.65% |
| C880    | 25.41% | 34.30% | 22.23% | 35.56% | 19.06% | 36.83% | 15.88% | 38.10% |
| C1355   | 10.83% | 19.72% | 9.48%  | 22.81% | 8.12%  | 25.91% | 6.77%  | 28.99% |
| C1908   | 19.82% | 28.70% | 17.34% | 30.67% | 14.86% | 32.64% | 12.39% | 34.61% |
| C2670   | 29.55% | 38.44% | 25.86% | 39.19% | 22.16% | 39.94% | 18.47% | 40.69% |
| C3540   | 25.87% | 34.76% | 22.64% | 35.97% | 19.40% | 37.18% | 16.17% | 38.39% |
| C5315   | 27.35% | 36.24% | 23.93% | 37.27% | 20.51% | 38.29% | 17.10% | 39.32% |
| C6288   | 11.74% | 20.63% | 10.28% | 23.61% | 8.81%  | 26.58% | 7.34%  | 29.56% |
| C7552   | 25.48% | 34.37% | 22.30% | 35.63% | 19.11% | 36.89% | 15.93% | 38.15% |
| Avg.    | 20.88% | 29.77% | 18.27% | 31.60% | 15.66% | 33.44% | 13.05% | 35.72% |

# 6. CONCLUSION

Domino logic has been widely used in high performance circuit design. Even though high performance and small area can be achieved using domino logic gate, the power consumption of domino logic is greater than that of static logic due to clock network and high switching probability. In this chapter, dual supply voltage technique for domino logic is proposed to save total power consumption. To reduce the power consumption in clock tree, low voltage swing clock domino logic family is proposed. Low voltage swing clock scheme utilizes low  $V_t$  transistor for NMOS evaluation transistor in the critical path to overcome performance degradation due to low voltage swing clock with the cost of tiny amount of  $I_{sub}$ . The delay variations according to the gate-to-source voltage and the input arrival time difference when supply voltage is replaced are elaborated in timing analysis. The delay constrained power optimization algorithm with proposed low swing clock domino logic yields significant power saving while maintaining the critical path delay.

# 7. ACKNOWLEDGMENTS

This work was supported in part by Semiconductor Research Corporation under Grant SRC891.

#### 8. REFERENCES

[1] C. Chen, A. Srivastava, and M. Sarrafzadeh. On gate level power optimization using dual-supply voltages. *IEEE Trans.* 

- VLSI Systems, 9(5):616-629, Oct. 2001.
- [2] Z. Chen, M. Johnson, L. Wei, and K. Roy. Estimation of standby leakage power in CMOS circuits considering accurate modeling of transistor stacks. In Proc. Int. Symp. Low Power Electronics and Design, pages 239-244, 1998.
- [3] M. Igarashi, K. Usami, K. Nogami, F. Minami, Y. Kawasaki, T. Aoki, M. Takano, C. Mizuno, T. Ishikawa, M. Kanazawa, and S. S. A Low-power design method using multiple supply voltages. In *Proc. Int. Symp. Low Power Electronics and Design*, pages 36–41, 1997.
- [4] S. M. Kang and Y. Leblebici. CMOS Digital Integrated Circuits: Analysis and Design. McGraw-Hill, 2nd ed., 1999.
- [5] B. Ramasubramanian, H. Schmit, and L. R. Carley. Mixed-swing QuardRail for low power dual-rail logic logic. In Proc. Int. Symp. Low Power Electronics and Design, pages 82–84, 1999.
- [6] A. Rjoub, O. Koufopavlou, and S. Nikolaidis. Low power / low swing domino CMOS logic. In Proc. IEEE Int. Symp. Circuits and Systems, pages II.13-II.16, 1998.
- [7] T. Sakurai and A. R. Newton. Alpha-power law mosfet model and its application to CMOS inverter delay and other formulas. *IEEE J. Solid State Circuits*, 29(6):655–662, June 1994.
- [8] S. J. Shieh and J. S. Wang. Design of low-power domino circuits using multiple supply voltages. In Proc. IEEE Int. Conf. Electronics, Circuits and Systems, pages 711-714, 2001.
- [9] Y. J. Yeh, S. Y. Kuo, and J. Y. Jou. Converter free multiple scaling techniques for low-power CMOS digital design. *IEEE Trans. Computer-Aided Design*, 20(1):172-176, Jan. 2001.