## **Dynamic VTH Scaling Scheme for Active Leakage Power Reduction**

Chris H. Kim and Kaushik Roy Department of Electrical and Computer Engineering Purdue University West Lafayette, IN 47907, USA Emails : {hyungil, kaushik}@ecn.purdue.edu

## Abstract

We present a Dynamic VTH Scaling (DVTS) scheme to save the leakage power during active mode of the circuit. The power saving strategy of DVTS is similar to that of the Dynamic VDD Scaling (DVS) scheme, which adaptively changes the supply voltage depending on the current workload of the system. Instead of adjusting the supply voltage, DVTS controls the threshold voltage by means of body bias control, in order to reduce the leakage power. The power saving potential of DVTS and its impact on dynamic and leakage power when applied to future technologies are discussed. Pros and cons of the DVTS system are dealt with in detail. Finally, a feedback loop hardware for the DVTS which tracks the optimal VTH for a given clock frequency, is proposed. Simulation results show that 92% energy savings can be achieved with DVTS for 70nm circuits.

#### 1. Introduction

It has been a long concern that leakage power will continue to increase and become fatal to battery life of portable digital systems in the future. Furthermore, predictions on future technologies project that the leakage power will be so high, that it will become substantial even when the chip is in active mode. Hence, leakage power management should be done not only in the standby mode of the system, but also in the active mode. Fig. 1 shows another reason why active leakage control is important. The dynamic and leakage power of a 70nm inverter for different operating temperatures are shown. The leakage power, which was initially 10% of the total power at room temperature, increases up to 49% as the temperature goes up to 125C°. Since in active mode, the operating temperature will increase due to the switching activities of the transistors, leakage problem will be amplified. Recently, Dynamic VDD Scaling (DVS) has gained a lot of attention as an efficient method to reduce total power dissipation [1,2,3]. For background tasks or high-latency tasks, which



Figure 1. Dynamic and leakage power for different temperatures. Simulation results are from an inverter using the 70nm predictive technology model from UC-Berkeley [4].

can be executed at a reduced frequency, the supply voltage of the system is scaled down and the power consumption is minimized. The two underlying facts that make the DVS efficient is that (1) systems do not necessarily have to deliver maximum performance at all time, and that (2) the total power is dominated by the dynamic power. Thus for circuits where the leakage power is dominant, the DVS will not be as efficient in saving total power as it was for dynamic power dominant circuits. In sub-1V VDD, very low VTH VLSI systems where the active leakage is substantial, dynamically scaling the VTH by controlling the body bias can be effective for total power savings [5,6].

This paper presents a Dynamic VTH Scaling (DVTS) scheme for active leakage power reduction. Whenever there is a slack during computation, the VTH is adaptively changed to a higher value via changing the body bias voltage (VBB). This will deliver just enough amount of throughput required for the current workload. In order to examine the effectiveness of DVTS, comparisons between DVS and DVTS for current (0.25 $\mu$ m) and future (0.07 $\mu$ m) process generations are performed. A careful investigation on the advantages and disadvantages of DVTS over DVS is also made. A DVTS hardware that has a feedback loop consisting of a voltage controlled oscillator (VCO), charge

pumps a feedback controller is proposed. The clock frequency of the system for a certain workload is determined by the operating system in run-time. The DVTS hardware tracks the optimal VTH for the given clock frequency by dynamically adjusting the VBB.

## 2. The DVTS Scheme

#### 2.1 Overview

Fig. 2 shows how the DVTS scheme adaptively controls the body bias to change the VTH. For a time period when the workload is less than the maximum, the operating system will recommend a lower clock frequency to the hardware. Then the DVTS hardware will increase the PMOS body bias and decrease the NMOS body bias to raise the VTH and reduce power dissipation. In cases when there is no workload at all, the VTH can be increased as much as the upper limit of VBB, to significantly save the standby leakage power. Power savings using the DVS and DVTS scheme for different technology generations are shown in fig. 3. Simulated results are for a single inverter with fan out =1.

**250 nm technology**: Reducing the clock frequency will proportionally reduce the total power. However, simply reducing the clock rate does not affect the energy consumed per operation. Whereas by scaling the supply voltage together with the frequency, we can gain significant power savings as shown in fig. 3(a). This is because the dynamic power dominates the total power.

Scaling the threshold voltage instead of scaling the supply voltage saves mostly the leakage power. For 0.25µm technology where the leakage power is a minute portion of the total power, DVTS is less efficient than DVS in saving total power. Moreover, the maximum VTH that can be attained by applying a body bias has an upper bound. The maximum VBB is determined by the maximum reverse breakdown voltage of the diffusion-substrate junction [6]. Thus, DVTS cannot be further applied after the highest VTH is reached due to the upper limit of VBB. In our simulations, DVTS could track the optimal VTH only until when the clock rate is 70% of the maximum value. Below this clock frequency, which is represented by the broken line in fig. 3(a), the DVTS scheme cannot provide the optimal VTH due to the physical constraints.

**70 nm technology** : We have used a Predictive Technology Model (PTM) to run simulations for 70nm devices [4]. As shown in fig. 3(b), the leakage power consists 52% of the total power, when the operating temperature was set as  $125C^{\circ}$ . By simply reducing the clock frequency, only the dynamic power can be reduced, leaving the leakage power virtually unchanged. Since the leakage power is so high for scaled CMOS technologies, DVTS appears to be comparable to DVS in saving total power. From fig. 3(b),



Figure 2. Dynamic VTH scaling by adaptively changing the body bias for a given clock frequency profile.



(nominal VDD = 0.9 V, nominal VTH = 0.15 V)

# Figure 3. Power versus clock frequency for DVS and DVTS systems in different technology generations.

92% total energy savings can be achieved using DVTS for 70nm process technology. Another merit that DVTS has for future technologies such as 70nm is the wide control of the power and delay just by adjusting the VTH. Fig. 3(b) shows that the VTH can be adjusted to its optimal value for a wide range of given clock frequency.



Figure 4. Dynamic and leakage power compositions of an inverter with fan out=1 for DVS and DVTS.

Additionally, DVTS can substantially reduce the standby leakage power. Fig. 4 shows the dynamic and leakage power savings for DVS and DVTS. Both the dynamic power and leakage power are lowered by either using DVS or DVTS. DVS will help reduce leakage power since the sub-threshold leakage and the leakage due to Drain Induced Barrier Lowering (DIBL), will decrease as the supply voltage is scaled down. Vice versa, DVTS can reduce the dynamic power by suppressing the short circuit current. Nevertheless, DVS mainly reduces the dynamic power, and DVTS, the leakage power. If we observe the dynamic and leakage power composition for low clock rates in fig. 4, DVS is not able to suppress the leakage power as much as the DVTS. Thus DVTS is more effective in reducing active and standby leakage power for future technology generations.

## 2.2 Advantages of DVTS

Until this point, we have shown the advantages of the DVTS when used for future technologies. Also the exponential reduction of active leakage power by using DVTS is described. Additional merits of DVTS are as follows.

*No voltage level converters* : DVS or multiple VDD systems require a voltage level shifter whenever a low VDD signal is driving a high VDD receiver. Although the

conventional level converters prevent the static power consumption, the dynamic power consumption is large enough to cancel out the power savings gained from supply voltage scaling [7]. Since DVTS systems use the same supply voltage throughout the chip, no voltage level converters are required.

*Simple hardware*: Charge pumps are a simple solution for boosting voltages. No external inductors are needed and power consumption is very low compared to buck converters, which are used for DVS systems. Charge pumps are used for our DVTS system to generate the body bias voltages as shown later in fig. 8.

Less power loss charging/discharging internal nodes : Transition energy wasted charging/discharging the VDDground capacitance is the power overhead of the DVS scheme. For low-to-high and high-to-low transition of supply voltage, current is extracted or placed back to the power supply. Even though there is no computation during this cycle, transition energy is consumed. Since the supply voltage is fixed for DVTS, it has less transition energy loss while charging and discharging the internal nodes. Compensation of chip-to-chip variation : DVTS generates a VBB that gives the desired VTH for the current clock frequency. Variations due to VDD fluctuation or temperature changes will cause the delay of a circuit to vary. Continuous control of VBB will take these variations into consideration and adapt the VTH to regulate the delay of the circuit [8]. For example, if VDD fluctuation causes the delay of a circuit to increase, the feedback loop in the DVTS system will further lower the VTH to compensate the increase in delay.

*Improvement in noise immunity* : Signal integrity has become an important issue for deep sub micron devices as crosstalk noise becomes considerable. Increasing VTH for low workload periods in DVTS will help improve noise immunity, especially for noise-susceptible circuits such as domino logic and pulsed flip-flops.

## 2.3 Drawbacks of DVTS

The following discussions address the overheads and drawbacks of the DVTS.

**Substrate capacitance** : Results from Variable Threshold CMOS (VTCMOS) show that for a test chip using  $0.3\mu$ m triple well technology with 120,000 transistors, energy required to charge the substrate from -3.3V to -0.5V is around 10nJ [6]. This overhead transition energy for DVTS systems comes to play when charging and discharging this substrate capacitance.

**Substrate noise** : Charge pumps generate an unregulated body bias voltage due to the absence of external inductors. Any fluctuation in the body bias will induce VTH variation or act as a noise source for logic.

*Process complexity* : PMOS and NMOS body biases of the DVTS control circuit must be isolated from the target

system in order to function as a reference. Thus, deep Nwell or triple well technology is essential for the DVTS systems. Though the overall cost penalty by using these processes is less than 5% [6].

## 3. System Implementation

A block diagram of the DVTS feedback loop is presented in fig. 5. A clock speed scheduler, which is embedded in the operating system, determines the (reference) clock frequency at run-time. The DVTS controller adjusts the PMOS and NMOS body bias so that the oscillator frequency of the VCO tracks the given reference clock frequency. The error signal, which is the difference between the reference clock frequency and the oscillator frequency, is fed into the feedback controller. The continuous feedback loop also compensates for variation in temperature and supply voltage. The following sections describe the design of each sub block.



Figure 5. Schematic of the DVTS hardware.

#### 3.1 Voltage Controlled Oscillator

An inverter chain based VCO in fig. 6 has been devised to convert the PMOS and NMOS body biases to a corresponding oscillator frequency. As the two body biases are adjusted by the feedback loop, the output frequency of the VCO will be changed correspondingly. The error between this VCO output frequency and the reference clock frequency is detected to control the feedback loop. The VCO must closely track the chip's critical path delay across temperature, supply voltage and process variations. Simulation results of the ratio between the VCO frequency and the actual critical path of an 8-by-8 multiplier using SPICE are shown in fig. 7. Fig 7(a) and 7(b) show that this ratio is almost constant for temperature and supply voltage variations, respectively. From this, we can conclude that the simple inverter chain based VCO in fig. 6 has a delay-VBB property proportional to that of the actual critical path. Thus it can be used as an equivalent critical path to represent the actual system.

## 3.2 Charge Pump

Circuit diagram and the equivalent diode-capacitor diagram of the charge pump for substrate biasing is shown



Figure 6. Block diagram of the inverter chain based voltage-controlled oscillator.



Figure 7. Ratio of VCO period to critical path delay versus VBB. Critical path delay was measured from an 8-by-8 muliplier.



Figure 8. Charge pump for P well body bias.

in fig. 8. The clock  $\phi$  and the inverted clock  $\phi$  drive the intermediate nodes of the diodes to shift the charge from the P-well to ground. VBB is determined by the frequency of the clocks driving the intermediate nodes. For the feedback algorithm to control VBB, the clock frequency must be programmable. A ring oscillator in fig. 9(a) shows how the clock frequency can be programmed by the control signals from the feedback algorithm.



(a) Programmable clock generator circuit.



(b) Control signals and clock output waveforms

Figure 9. Ring oscillator for generating the charge pump clock input. ctrl[0:2] are the control signals generated from the feedback algorithm block.

#### 3.3 Feedback Algorithm

The feedback controller generates a control signal to change the frequency of the charge pump clock. The transient response of VBB will vary, depending on the type of feedback controller used. A simple feedback controller similar is proposed for our DVTS implementation. The charging or discharging frequencies of the charge pumps are determined by the feedback control table shown in table 1. For a positive error, the P-substrate is charged so that the VTH is lowered and the VCO clock frequency is ramped up to be locked with the reference clock frequency. For negative errors, the feedback controller acts in an exactly reverse manner. Only a simple hardware such as some shifters and a small number of logic gates are required to implement this feedback table.

## 4. Conclusions

To mitigate the active leakage problem, a Dynamic VTH Scaling (DVTS) scheme is presented. Simulation results show that the DVTS will become comparable to Dynamic VDD Scaling (DVS) in saving total power for future technologies such as 70nm. Moreover, the DVTS has additional merits such as dramatic savings in standby

| Table 1. A | \ simpl | e fee | dbacl | k rul | e tal | ble f | for |
|------------|---------|-------|-------|-------|-------|-------|-----|
| re         | gulatin | g the | NMC   | )S V  | BB.   |       |     |

|                                             | Discharging clock | Charging clock |
|---------------------------------------------|-------------------|----------------|
| $\operatorname{Error}[n] < -2^3$            | 50 MHz            | Х              |
| $-2^3 \le \operatorname{error}[n] \le -2^2$ | 33 MHz            | Х              |
| $-2^2 < \operatorname{error}[n] < 0$        | 15 MHz            | Х              |
| $0 < error[n] < 2^{2}$                      | Х                 | Х              |
| $2^2 < error[n] < 2^3$                      | Х                 | 15 MHz         |
| $2^3 < error[n] < 2^4$                      | Х                 | 33 MHz         |
| $2^4 \leq \operatorname{error}[n]$          | Х                 | 50 MHz         |

leakage power, simple hardware and compensation of VDD, temperature variations. Finally, a feedback loop consisting of a VCO, charge pumps and a feedback controller is proposed to realize the DVTS scheme. Fabrication of the DVTS test chip is in progress.

## 5. Acknowledgments

This research was funded in part by DARPA MARCO Gigascale Silicon Research Center under contract # SA3273JB and Intel Corporation. The author would also thank M. Lin for the valuable discussions.

## 6. References

- P. Macken et al, "A Voltage Reduction Technique for Digital Systems", International Solid-State Circuit Conference, Feb. 1990, pp. 238-239
- [2] T. D. Burd et al, "A Dynamic Voltage Scaled Microprocessor System", IEEE Journal of Solid-State Circuits, vol. 35, no. 11, Nov. 2000, pp. 1571-1580
- [3] G. Wei and M. Horowitz, "A Fully Digital, Energy –Efficient Adaptive Power-Supply Regulator", IEEE Journal of Solid-State Circuits, vol. 34, no. 4, Apr. 1999, pp. 520-528
- [4] http://www-device.eecs.berkeley.edu/~ptm/
- [5] K. Nose et al, "Vth-hopping Scheme for 82% Power Saving in Low-voltage Processors", Proceedings of IEEE Custom Integrated Circuits Conference, May 2001, pp. 93-96
- [6] T. Kuroda et al, "A 0.9-V 150-MHz, 10-mW, 4mm<sup>2</sup>, 2-D Discrete Cosine Transform Core Processor with Variable Threshold-Voltage (VT) Scheme", IEEE Journal of Solid-State Circuits, vol. 31, no. 11, Nov. 1996, pp. 1770-1779
- [7] K. Usami and M. Horowitz, "Clustered Voltage Scaling Technique for Low-Power Design", International Symposium on Low-Power Electronic Design, April 1995, pp. 3-8
- [8] S. Narendra, D. Antoniadis, and V. De, "Impact of Using Adaptive Body Bias to Compensate Die-to-die Vt Variation on Within-die Vt Variation", International Symposium on Low-Power Electronic Design, Aug. 1999, pp. 229-232