

# Optimization and Control of $V_{DD}$ and $V_{TH}$ for Low-Power, High-Speed CMOS Design

Tadahiro Kuroda

Department of Electrical Engineering, Keio University  
3-14-1, Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan

## Abstract

It is essential to control  $V_{DD}$  and  $V_{TH}$  for low-power, high-speed CMOS design. In this paper, it is shown that these two parameters can be controlled by designers as objectives of design optimization to find better trade-offs between power and speed. Quantitative analysis of trade-offs between power and speed is presented. Some of the popular circuit techniques and design examples to control  $V_{DD}$  and  $V_{TH}$  are introduced. A simple theory to compute optimum multiple  $V_{DD}$ 's and  $V_{TH}$ 's is described. Scaling scenarios of variable and/or multiple  $V_{DD}$ 's and  $V_{TH}$ 's is discussed to show future technology directions.

## 1. Introduction

Lowering supply voltage,  $V_{DD}$  is the most attractive choice due to the quadratic dependence. However, as  $V_{DD}$  becomes lower, circuit delay increases and chip throughput degrades. There are three different approaches used to maintain chip throughput at low  $V_{DD}$ : 1) utilize parallel and/or pipeline architectures to compensate for the degraded circuit speed [1], 2) lower  $V_{TH}$  to recover the circuit speed, and 3) employ multiple  $V_{DD}$ 's and  $V_{TH}$ 's for non-critical circuits. The idea behind the first approach is that circuits can be slow with good architecture. Silicon area is traded for power reduction. The idea in the second approach is that circuit should be fast. This approach combined with low  $V_{TH}$  increases subthreshold leakage current and, consequently, standby power dissipation. In standby mode,  $V_{TH}$  should be raised. Furthermore, the requirement for circuit speed in active mode often changes from time to time. Consequently, variable  $V_{DD}$  and  $V_{TH}$  are essential. In the third approach, some circuits should be fast and others can be slow. In other words, this approach utilizes a timing surplus. Since speed requirements differs spatially from circuit to circuit, multiple  $V_{DD}$ 's and  $V_{TH}$ 's are effective.

Circuit design techniques for the second and third approaches, as well as theoretical models for quantitative understanding will be discussed in detail.

## 2. Variable $V_{DD}$ and $V_{TH}$

Figure 1 depicts equi-power (solid lines) and equi-speed



Fig. 1  $V_{DD}$ - $V_{TH}$  design space.

(broken lines) curves on the  $V_{DD}$ - $V_{TH}$  plane calculated by using a theoretical model [2] and device parameters for a  $0.11\mu\text{m}$  CMOS device. It is assumed that logic depth is 15 stages and average activation ratio is 0.15. A rectangle in the figure illustrates ranges of  $V_{DD}$  change and  $V_{TH}$  fluctuation that should be taken into account. This rectangle is a design window because all the circuit specifications should be satisfied within the rectangle for yield conscious design. In the design window, the circuit speed becomes the slowest at the upper-left corner, while at the lower-right corner, the power dissipation becomes the highest. The equi-speed and equi-power curves are normalized at these corners as designated by normalized factors  $\kappa_s$  and  $\kappa_p$ , so that the amount of speed and power that must be improved or degraded, compared to those in the typical condition can be calculated by sliding and sizing the design window on the  $V_{DD}$ - $V_{TH}$  plane.

When the design window is moved toward lower- $V_{DD}$  and lower- $V_{TH}$  along the equi-speed curve, power dissipation is reduced. Since the subthreshold leakage current increases rapidly as  $V_{TH}$  is lowered, the power dissipation will be increased again at the point where the leakage current dominates the power dissipation. In Fig. 1 it can be seen that the power dissipation is at a minimum around where the power dissipation due to the subthreshold leakage current makes up several dozen percentage



Fig. 2 Variable threshold-voltage CMOS (VTCMOS).

of the total power dissipation. This condition is also depicted as a broken line in the figure to indicate the optimum  $V_{TH}$ 's. The power minimum condition differs from circuit to circuit. In this simulation, a high-end microprocessor is assumed. For instance, if a memory circuit is assumed where logic depth is larger and the activation ratio is lower, the optimum  $P_{Leakage}/P_{total}$  should be lower. A quantitative analysis is found in [3] which leads to approximately the same conclusion.

Lowering both  $V_{DD}$  and  $V_{TH}$ , however, raises problems. An exponential increase in subthreshold leakage current due to  $V_{TH}$  reduction, not only shortens battery life in portable equipment, but also disables the IDDQ testing. For these reasons it is very difficult to lower  $V_{TH}$  below 0.2 volts. In addition, significant delay increase due to  $V_{TH}$  variation at a low  $V_{DD}$  degrades worst-case circuit speed. However, it is difficult to lower  $\Delta V_{TH}$  by means of process and device refinement.

There are two approaches to solve these problems. Conventional power-down schemes either on a board or in a chip can solve the battery life problem. The other approach is to control  $V_{TH}$  through substrate bias, which can solve all three problems.

A Variable Threshold-voltage CMOS (VTCMOS) technology [4-6] controls  $V_{TH}$  by means of substrate bias control, as depicted in Fig. 2. The measured chip leakage current of an MPEG-4 chip fabricated in VTCMOS technology is plotted in Fig. 3. VTCMOS technology sets the leakage current below 10mA in active mode and below 10 $\mu$ A in standby mode, independently from processed  $V_{TH}$  and temperature. The analytical model, device design, and scaling scenario for VTCMOS technology are found in [7].

The Penalty associated with body bias is negligibly small. The Current penalty by a charge pump for reverse body bias is only 0.2%. In order to control body bias by feedback control, a sensing circuit for body potential is required, which consumes about 1 $\mu$ A. Energy for charging and discharging well capacitance is around 50nJ per cycle. Area penalty due to body tie separation and routing is around 5%. As far as delay penalty in transition between active and standby modes is concerned, it



Fig. 3 Measured chip leakage current.

takes around 100 $\mu$ s to go to the standby mode by using the charge pump, while it takes only 0.1 $\mu$ s to go back to the active mode by connecting substrate to  $V_{DD}/V_{SS}$  by using a switching transistor. “A slow falling asleep, but fast awakening” feature may be often acceptable.

Recently, the range of body bias has been extended from reverse to forward. Forward substrate bias is used during active operation in order to lower  $V_{TH}$  for high-speed operation, and zero substrate bias during standby mode in order to raise  $V_{TH}$  for low leakage. The substrate biasing technique has begun to be applied to high-end products such as microprocessors and communications chips for low-power, high-speed operation [8-9].

The power supply voltage can be varied by an embedded DC-DC converter. If both  $V_{DD}$  and  $f_{CLK}$  are dynamically varied in response to computational load demands, the energy/operation can be reduced for the low computational periods, while retaining peak throughput when required. This strategy, called dynamic voltage scaling (DVS), was first applied to a MIPS-compatible RISC core in 1998 [5]. Measured performance in MIPS/W was improved by a factor of more than two compared with that of a conventional design. In 2000, a DVS processor with an ARM8 core was reported [10]. Operating systems for voltage scheduling have also been extensively investigated [11-12]. The power efficiency of the embedded DC-DC converter has been improved to 95% [13].

To probe further, [14-15] are helpful references.

### 3. Multiple $V_{DD}$ 's and $V_{TH}$ 's

There are three ways to save power dissipation while maintaining maximum operating frequency by utilizing surplus timing in non-critical paths: 1) employing multiple power supplies to lower supply voltage, 2) employing multiple threshold voltages to reduce leakage current, and 3) employing multiple transistor widths,  $W$ 's, to reduce circuit capacitance.



Fig. 4 Power reduction in dual- $V_{DD}$  design.

In the past, single  $V_{DD}$ , single  $V_{TH}$ , and single  $W$  were employed in CMOS design. Recently dual  $V_{DD}$ 's, dual  $V_{TH}$ 's, and several  $W$ 's are often used for low-power design. In the future, will many more multiple  $V_{DD}$ 's,  $V_{TH}$ 's, and  $W$ 's be used for low-power design? How many parameters will be required for what degree of power reduction? How will the parameters be optimized? Which of the three approaches will be most effective?

Theoretical models are developed to answer these questions and to derive knowledge for future design [16]. For simplicity the theoretical models assume non-crossing parallel signal paths that are composed of concatenated gates.

### 3.1 Dual $V_{DD}$ 's

Clustered voltage scaling employing two power supplies ( $V_L < V_H$ ) is discussed first.

$V_L/V_H$  should be used to minimize power dissipation of circuits. A theory to deal with the optimal  $V_L/V_H$  is described in [17]. According to the theory, the power reduction ratio  $R$  can be calculated as a function of  $V_L/V_H$  when  $p(t)$  is provided, in which  $p(t)$  represents the normalized number of paths whose delay is  $t$  when  $V_L=V_H$ . The power ratio  $R$  is calculated for several artificial examples of  $p(t)$ , as depicted in Fig. 4(a). Interestingly,  $R$  becomes minimum at  $V_L$ 's between  $0.6V_H$  and  $0.7V_H$  for all the examples, even though the minimum value of  $R$  depends on  $p(t)$ . This means that  $V_L$  should always be set at around from  $0.6V_H$  to  $0.7V_H$  to minimize the power dissipation. In order to verify this theory, a discrete cosine transform block in an MPEG-4 video codec is designed by using an EDA tool for the clustered voltage scaling [18] at various  $V_L$ 's, and the power dissipation is monitored. As shown in Fig. 4(b), the experimental result shows a good agreement with the theory when  $p(t)$  of lambda-shape is assumed. Power dissipation is reduced by about 40%.

Two MPEG-4 video codec chips are developed by the two approaches, controlling  $V_{DD}$  and  $V_{TH}$ , and employing two  $V_{DD}$ 's [19]. Power dissipation of the chips are simulated and measured. By optimizing  $V_{DD}$  and  $V_{TH}$ , the power supply voltage can be lowered to 2.5V from 3.3V so that power dissipation is reduced by 43% in all the circuits. By employing one more  $V_{DD}$ , 1.75V for non-critical circuits, power dissipation

is further reduced by 25%, in total by 55% compared to the conventional design at 3.3V.

### 3.2 Multiple $V_{DD}$ 's

In multiple power supplies  $\{V_1 > V_2 > \dots > V_n\}$  power dissipation is given by

$$P_n = f \cdot \left\{ \left( C_1 - \sum_{i=2}^n C_i \right) \cdot V_1^2 + \sum_{i=2}^n C_i \cdot V_i^2 \right\}, \quad (1)$$

where  $C_i$  is total capacitance of circuits and interconnections that will operate under  $V_i$ , and  $f$  is an operating frequency. The ratio of power dissipation in the multiple power supplies compared to that in a single power supply is given by

$$R_{VDD} \equiv \frac{P_n}{P_1} = 1 - \sum_{i=2}^n \left[ \left( \frac{C_i}{C_1} \right) \cdot \left\{ 1 - \left( \frac{V_i}{V_1} \right)^2 \right\} \right]. \quad (2)$$

As shown in a design example of 64bit integer datapath for a 333MHz CPU core in reference [16], delay and capacitance is mostly in proportion. Therefore,  $C_i/C_1$  is calculated by

$$\frac{C_i}{C_1} = \frac{\frac{1}{0} \int p(t) \cdot t_i \cdot dt}{\frac{1}{0} \int p(t) \cdot t \cdot dt}, \quad (3)$$

where  $p(t)$  is a normalized path-delay distribution function, and  $t_i$  is total delay of circuits at  $V_1$  that will operate under  $V_i$ . Consider a path whose total delay  $t$  is between  $t_{i,0}$  and  $t_{i-1,0}$ , where  $t_{i,0}$  denotes path delay at  $V_1$  that will be equal to cycle time (=1) when all the circuits operate under  $V_i$ . Among many combinations of power supplies that make up the total delay of the path to the cycle time, power dissipation is minimized when  $\{V_i, V_{i-1}\}$  is applied, as is derived from the Theorem and Lemma in Appendix. Accordingly,  $t_i$  is given by

$$t_i = \frac{t_{i,0}}{t_{i,0} - t_{i+1,0}} (t - t_{i+1,0}) \quad (t_{i+1,0} \leq t \leq t_{i,0}),$$

$$t_i = \frac{t_{i,0}}{t_{i-1,0} - t_{i,0}} (t_{i-1,0} - t) \quad (t_{i,0} \leq t \leq t_{i-1,0}), \quad (4)$$

where  $t_{i,0}$  is given by

$$t_{i,0} = \left( \frac{V_1}{V_i} \right) \cdot \left( \frac{V_i - V_{TH}}{V_1 - V_{TH}} \right)^\alpha, \quad (5)$$

$$t_{n+1,0} = 0,$$

$V_{TH}$  is threshold voltage, and  $\alpha$  is velocity saturation index. From equation's (2)-(5)  $R_{VDD}$  can be calculated for given  $p(t)$ ,  $V_1$ ,  $V_i$ , and  $V_{TH}$ .



Fig. 5 Power dissipation ratio in triple  $V_{DD}$ 's where  $V_1 = 1.5V$ ,  $V_{TH} = 0.3V$ ,  $p(t) = 0.5 - |0.5 - t/t_{1.0}|$  (lambda shape); (a) 3-D graph, and (b) contour lines.



Fig. 6 Optimum multiple  $V_{DD}$ 's and optimized power dissipation compared to that in single power supply  $V_1$ . Rule of thumb (lines) shows a good agreement with calculation in theoretical model (marks).

Calculation result for dual supplies  $\{V_1, V_2\}$  shows a good agreement with simulation result in Fig. 4. For triple power supplies  $\{V_1, V_2, V_3\}$ , computed 3-D graph and its contour lines are depicted in Fig. 5.

In Fig. 6 calculated optimum  $V_{DD}$ 's and the optimized power dissipation are plotted. Taking the results of after-layout static timing analysis into consideration, lambda-shaped  $p(t)$  is adopted here.

A rough rule of thumb for optimum  $V_{DD}$ 's is derived:

| [Rule of Thumb for Optimum Multiple $V_{DD}$ 's] |                                                                                      |
|--------------------------------------------------|--------------------------------------------------------------------------------------|
| For $\{V_1, V_2\}$                               | $\frac{V_2}{V_1} = 0.5 + 0.5 \frac{V_{TH}}{V_1}$                                     |
| For $\{V_1, V_2, V_3\}$                          | $\frac{V_2}{V_1} = \frac{V_3}{V_2} = 0.6 + 0.4 \frac{V_{TH}}{V_1}$                   |
| For $\{V_1, V_2, V_3, V_4\}$                     | $\frac{V_2}{V_1} = \frac{V_3}{V_2} = \frac{V_4}{V_3} = 0.7 + 0.3 \frac{V_{TH}}{V_1}$ |

This rule of thumb gives almost optimum  $V_{DD}$ 's under which power is reduced to the point that is within 1% difference from the precise minimum. It is also understood from Fig. 6 that the more  $V_{DD}$ 's, the less power, but the effect will be saturated. The power reduction effect will also be diminished as the power supply voltage is scaled. The following equation gives a good approximation.

$$\frac{P_n}{P_1} = 0.3 + 0.7 \frac{V_{TH}}{V_1}. \quad (6)$$



Fig. 7 Leakage current ratio in triple  $V_{TH}$ 's where  $V_{DD} = 1.5V$ ,  $V_{TH,1} = 0.3V$ ,  $p(t) = 0.5 - |0.5 - t/t_{1.0}|$  (lambda shape); (a) 3-D graph, and (b) contour lines.



Fig. 8 Optimum multiple  $V_{TH}$ 's and optimized leakage current compared to that in single threshold voltage  $V_{TH,1}$ . Rule of thumb (lines) shows a good agreement with calculation in theoretical model (marks).

### 3.3 Multiple $V_{TH}$ 's

In multiple threshold voltages  $\{V_{TH,1} < V_{TH,2} < \dots < V_{TH,n}\}$  chip leakage current is given by

$$I_n = \left( \frac{I_0}{W_0} \right) \cdot \left\{ \left( W_1^* - \sum_{i=2}^n W_i^* \right) \cdot 10^{\frac{-V_{TH,1}}{S}} + \sum_{i=2}^n W_i^* \cdot 10^{\frac{-V_{TH,i}}{S}} \right\}, \quad (7)$$

where  $W_i^*$  is total gate width of pMOS and nMOS whose threshold voltage is  $V_{TH,i}$  and whose source is connected to  $V_{DD}$  and  $V_{SS}$ . Ratio of chip leakage current in multiple threshold voltages compared to that in a single threshold voltage is given by

$$R_{VTH} \equiv \frac{I_n}{I_1} = 1 - \sum_{i=2}^n \left[ \left( \frac{W_i^*}{W_1^*} \right) \cdot \left\{ 1 - 10^{\frac{-V_{TH,i} - V_{TH,1}}{S}} \right\} \right]. \quad (8)$$

In a typical design where buffer size and the number of repeaters are optimally designed, delay and transistor width is mostly in proportion, and  $W_i^*/W_1^*$  is calculated by

$$\frac{W_i^*}{W_1^*} = \frac{\frac{1}{0} \int p(t) \cdot t_i \cdot dt}{\frac{1}{0} \int p(t) \cdot t \cdot dt}. \quad (9)$$



Fig. 9 Optimum multiple  $V_{TH}$ 's and optimized power dissipation compared to that in single threshold voltage  $V_{TH,1}$ . Chip leakage current is unchanged.

The chip leakage current ratio  $R_{VTH}$  can be computed in the same way as in  $R_{VDD}$ .

A computed 3-D graph for triple threshold voltages  $\{V_{TH,1}, V_{TH,2}, V_{TH,3}\}$  and its contour lines are depicted in Fig. 7. In Fig. 8 calculated optimum  $V_{TH}$ 's and the optimized chip leakage current are plotted.

A rough rule of thumb for the optimum  $V_{TH}$ 's is derived:

[Rule of Thumb for Optimum Multiple  $V_{TH}$ 's]

|                                                  |                                    |
|--------------------------------------------------|------------------------------------|
| For $\{V_{TH,1}, V_{TH,2}\}$                     | $V_{TH,2} = 0.10V_{DD} + V_{TH,1}$ |
| For $\{V_{TH,1}, V_{TH,2}, V_{TH,3}\}$           | $V_{TH,2} = 0.06V_{DD} + V_{TH,1}$ |
|                                                  | $V_{TH,3} = 0.07V_{DD} + V_{TH,2}$ |
| For $\{V_{TH,1}, V_{TH,2}, V_{TH,3}, V_{TH,4}\}$ | $V_{TH,2} = 0.04V_{DD} + V_{TH,1}$ |
|                                                  | $V_{TH,3} = 0.05V_{DD} + V_{TH,2}$ |
|                                                  | $V_{TH,4} = 0.06V_{DD} + V_{TH,3}$ |

This rule of thumb gives almost optimum  $V_{TH}$ 's as shown in Fig. 8.

It is also understood from Fig. 8 that the more  $V_{TH}$ 's, the less leakage current, but the effect will be saturated. The leakage reduction effect will also be diminished as the power supply voltage is scaled. At  $V_{DD}=1.5V$ , the percentage in total transistor width in  $V_{TH,1}$ ,  $V_{TH,2}$ ,  $V_{TH,3}$ , and  $V_{TH,4}$ , is 0.4%, 3%, 11%, and 85%, respectively.

For those designs, such as high-end microprocessors, where power dissipation due to leakage current makes up fairly large amount of power dissipation due to low  $V_{TH}$ , reducing leakage current by more than one order of magnitude is very effective.

For other designs where the leakage current is suppressed to a fairly small amount, the leakage current reduction can be converted to a reduction of AC power by lowering  $V_{TH}$ 's, and accordingly,  $V_{DD}$ . In Fig. 9,  $V_{TH}$ 's are lowered to the point where chip leakage current is the same as that in  $V_{TH,1}$ . As a result, AC power is reduced by about 20%.

### 3.4 Multiple transistor widths

Optimizing transistor sizing is a standard design practice by employing a cell library where several transistor size options for a gate are available. However, optimum sizing in terms of



Fig. 10 Power dissipation ratio in using three  $W$ 's.  $V_{DD}=1.5V$ ,  $V_{TH,1}=0.3V$ ,  $p(t)=\text{lambda}$  shape; (a) 3-D graph, and (b) contour lines.

power dissipation is not well known. In this section, optimum transistor sizing is theoretically discussed. When multiple transistor width  $\{W_1 > W_2 > \dots > W_n\}$  is employed power dissipation is given by

$$P_n = f \cdot \left\{ \left( C_{MOS,1} - \sum_{i=2}^n C_{MOS,i} \right) + \sum_{i=2}^n \left( C_{MOS,i} \cdot \frac{W_i}{W_1} \right) + C_{INT} \right\} \cdot V^2, \quad (10)$$

where  $C_{MOS,i}$  is the total gate and diffusion capacitance of transistors whose channel width will be scaled to  $W_i$ , and  $C_{INT}$  is the total interconnection capacitance. The ratio of power dissipation when using the multiple transistor width to that when using a single transistor width is given by

$$R_W \equiv \frac{P_n}{P_1} = 1 - \frac{1}{m+1} \sum_{i=2}^n \left( \frac{C_{MOS,i}}{C_{MOS,1}} \right) \cdot \left( 1 - \frac{W_i}{W_1} \right), \quad (11)$$

where  $m$  is  $C_{INT}/C_{MOS,1}$ . Since delay and transistor capacitance is mostly in proportion,  $C_{MOS,i}/C_{MOS,1}$  is calculated by

$$\frac{C_{MOS,i}}{C_{MOS,1}} = \frac{\frac{1}{0} \int p(t) \cdot t_i \cdot dt}{\frac{1}{0} \int p(t) \cdot t \cdot dt}. \quad (12)$$

The power dissipation ratio  $R_W$  can be computed in the same way as in  $R_{VDD}$ .

A computed 3-D graph for triple transistor width  $\{W_1, W_2, W_3\}$  and its contour lines are depicted in Fig. 10. A rough rule of thumb for the optimum  $W$ 's is derived:

[Rule of Thumb for Optimum Multiple  $W$ 's]

|                              |                                                                    |
|------------------------------|--------------------------------------------------------------------|
| For $\{W_1, W_2\}$           | $W_2 = \frac{1}{2}W_1$                                             |
| For $\{W_1, W_2, W_3\}$      | $W_2 = \frac{2}{3}W_1, W_3 = \frac{1}{3}W_1$                       |
| For $\{W_1, W_2, W_3, W_4\}$ | $W_2 = \frac{3}{4}W_1, W_3 = \frac{1}{2}W_1, W_4 = \frac{3}{4}W_1$ |

Circuit capacitance is reduced by 40%, which reduces 15% of the total capacitance.

## 4. Conclusion

It is essential to control  $V_{DD}$  and  $V_{TH}$  for low-power, high-speed CMOS design. Variable  $V_{DD}$  and  $V_{TH}$  are essential, since the requirement for circuit speed changes from time to time. Multiple  $V_{DD}$ 's and  $V_{TH}$ 's are effective, since speed requirement differs spatially from circuit to circuit. Especially, using multiple  $V_{TH}$ 's is effective in reducing leakage current in active mode. In order for designers to control these two parameters as objectives of design optimization for better trade-offs between power and speed, a CAD tool and a design methodology should be developed.

$$\text{Now, } v_M^2 = p(v_M) = p(\tau^{-1}(\tau(v_M)))$$

We are given that  $\tau(v_M) = \gamma\tau(v_H) + (1-\gamma)\tau(v_L)$  and we have shown above that  $p(\tau^{-1}(v))$  is concave.

$$\begin{aligned} \therefore v_M^2 &= p(\tau^{-1}(\tau(v_M))) = p(\tau^{-1}(\gamma\tau(v_H) + (1-\gamma)\tau(v_L))) \\ &< \gamma p(\tau^{-1}(\tau(v_H))) + (1-\gamma)p(\tau^{-1}(\tau(v_L))) \\ &= \gamma p(v_H) + (1-\gamma)p(v_L) \\ &= \gamma v_H^2 + (1-\gamma)v_L^2 \end{aligned}$$

Therefore, power dissipation is increased by dividing a circuit into two parts while keeping the total delay unchanged.

Q.E.D

## 5. Acknowledgments

The author is grateful to A. Harris, M. Hamada, and N. Miura for discussion.

## 6. Appendix

**Theorem:** Power dissipation is minimized when supply voltages are chosen for gates whose path delay is  $t$  such that:

$$V_k \text{ and } V_{k-1} \text{ for } t_{k,0} < t \leq t_{(k-1),0} \quad (k=1 \text{ to } n-1, t_{0,0} = 1)$$

$$V_{n-1} \text{ for } 0 \leq t \leq t_{(n-1),0}$$

**(Proof)** clear from the following Lemma.

**Lemma:** Power dissipation is increased by dividing a circuit into two parts while keeping the total delay unchanged, one part operating at a higher supply voltage and the other part at a lower supply voltage than the original supply voltage.

**(Proof)**

If a function  $f(v)$  is “concave up”, for any points  $v_L$  and  $v_H$  in the domain of  $f$ , and any  $\gamma \in (0,1)$ ,

$$f(\gamma v_H + (1-\gamma)v_L) < \gamma f(v_H) + (1-\gamma)f(v_L)$$

It is clear from the graph that a power dissipation function,  $p(v) = v^2$ , is concave up for  $v > 0$ . It is also found from the

graph that a delay function,  $\tau(v) = \frac{v}{(v - v_{TH})^\alpha}$ , is concave up

for  $\alpha > 1$  and  $v > v_{TH}$ .

The inverse function  $\tau^{-1}(v)$  is also concave up, which can be easily found by sketching the graph, simply by reflecting the graph of  $\tau(v)$  in the line of  $y=v$ .

If  $f_1, f_2$  are both concave functions, and  $f_2$  is also monotonic increasing, then

$$\begin{aligned} f_2(f_1(\gamma v_H + (1-\gamma)v_L)) &< f_2(\gamma f_1(v_H) + (1-\gamma)f_1(v_L)) \\ &< \gamma f_2(f_1(v_H)) + (1-\gamma)f_2(f_1(v_L)) \end{aligned}$$

i.e., the composition  $f_2(f_1(v))$  is also a concave function.

## 7. References

- [1] A. Chandrakasan, *et al.*, “Low-power CMOS digital design,” *IEEE J. Solid-State Circuits*, vol. 27, no. 4, pp. 473-484, April 1992.
- [2] T. Kuroda, and T. Sakurai, “Overview of Low-Power ULSI Circuit Techniques,” *IEICE Trans. on Electronics*, vol. E78-C, no. 4, pp. 334-344, April 1995.
- [3] K. Nose, and T. Sakurai, “Optimization of VDD and VTH for Low-Power and High-Speed Applications,” in *Proc. of ASPDAC*, pp. 469-474, Jan. 2000.
- [4] T. Kuroda, *et al.*, “A 0.9V 150MHz 10mW 4mm<sup>2</sup> 2-D discrete cosine transform core processor with variable-threshold-voltage scheme,” *ISSCC'96 Dig. Tech. Papers*, pp. 166-167, Feb. 1996.
- [5] T. Kuroda, *et al.*, “Variable supply-voltage scheme for low-power high-speed CMOS digital design,” *IEEE J. Solid-State Circuits*, vol. 33, no. 3, pp. 454-462, Mar. 1998.
- [6] T. Kuroda, *et al.*, “Variable threshold-voltage CMOS technology,” *IEICE Trans. on Electronics*, vol. E83-C, no. 11, pp. 1705-1715, Nov. 2000.
- [7] H. Im, *et al.*, “VTCMOS Characteristics and Its Optimum Conditions Predicted by a Compact Analytical Model,” in *ISLPED'01 Dig. Tech. Papers*, pp. 123-128, Aug. 2001.
- [8] S. Narendra, *et al.*, “1.1V 1GHz Communications Router with On-Chip Body Bias in 150nm CMOS,” in *ISSCC'02 Dig. Tech. Papers*, pp. 270-271, Feb. 2002.
- [9] S. Vangal, *et al.*, “A 5GHz 32b Integer-Execution Core in 130nm Dual-V<sub>T</sub> CMOS,” in *ISSCC'02 Dig. Tech. Papers*, pp. 412-413, Feb. 2002.
- [10] T. Burd, *et al.*, “A Dynamic Voltage Scaled Microprocessor System,” *IEEE J. Solid-State Circuits*, vol. 35, pp. 1571-1580, Nov. 2000.
- [11] T. Ishihara, and H. Yasuura, “Voltage Scheduling Problem for Dynamically Variable Voltage Processors,” in *ISLPED'98 Dig. Tech. Papers*, pp. 197-202, Aug. 1998.
- [12] S. Lee, and T. Sakurai, “Run-Time Voltage Hopping for Low-Power Real-Time Systems,” in *Proc. of DAC*, pp. 806-809, Jun. 2000.
- [13] F. Ichiba, *et al.*, “Variable Supply-Voltage Scheme with 95%-Efficiency DC-DC Converter for MPEG-4 Codec,” in *ISLPED Dig. Tech. Papers*, pp. 54-58, Aug. 1999.
- [14] A. Chandrakasan, W. Bowhill, and F. Fox. Ed. *Design of High-Performance Microprocessor Circuits, Chapter 4*, IEEE Press, Piscataway, NJ, 2000.
- [15] V. Oklobdzija, Ed. *The Computer Engineering, Section IV*, CRC Press, New York, 2002.

- [16] M. Hamada, Y. Ootaguro, and T. Kuroda, "Utilizing Surplus Timing for Power Reduction," in *Proc. of CICC'2001*, pp. 89-92, May 2001.
- [17] T. Kuroda, and M. Hamada, "Low-power CMOS digital design with dual embedded adaptive power supplies," *IEEE J. Solid-State Circuits*, vol. 35, no. 4, pp.652-655, April 2000.
- [18] K. Usami, *et al.*, "Design Methodology of Ultra Low-Power MPEG4 Codec Core Exploiting Voltage Scaling Techniques," in *Proc. of DAC*, pp. 483-488, June 1998.
- [19] M. Takahashi, *et al.*, "A 60mW MPEG4 video codec using clustered voltage scaling with variable supply-voltage scheme," *IEEE J. Solid-State Circuits*, vol. 33, no. 11, pp. 1772-1780, Nov. 1998.