SIGDA Super Compendium, ISLPED 1999, Abstracts

<BODY> <H3><A HREF="../../../../../main_ss.htm">Main Page</A></BR> ISLPED'99 : [ <A HREF="lpedtoc.htm">Table of Contents</A> ] [ <A HREF="lpedsess.htm">Session Index</A> ] [ <A HREF="lpedai.htm">Author Index</A> ]</H3> <HR> <H2>ISLPED'99 ABSTRACTS</H2> <B> Sessions: [<A HREF="#keynote1">Keynote Session 1</A>] [<A HREF="#sessk11">K1.1</A>] [<A HREF="#sessk12">K1.2</A>] [<A HREF="#sesss1">S1</A>] [<A HREF="#sesss2">S2</A>] [<A HREF="#sesss3">S3</A>] [<A HREF="#sesss4">S4</A>] [<A HREF="#sessp1">P1</A>] [<A HREF="#sessp2">P2</A>] [<A HREF="#sesss5">S5</A>] [<A HREF="#sesss6">S6</A>] [<A HREF="#panel">Panel</A>] [<A HREF="#keynote2">Keynote Session 2</A>] [<A HREF="#sessk21">K2.1</A>] [<A HREF="#sessk22">K2.2</A>] [<A HREF="#sesss7">S7</A>] [<A HREF="#sesss8">S8</A>] [<A HREF="#sesss9">S9</A>] [<A HREF="#sesss10">S10</A>] [<A HREF="#sessp3">P3</A>] [<A HREF="#sessp4">P4</A>] </P> </B> <HR> <H2> <A NAME ="keynote1"> Keynote Session 1 </A></H2> <I>Chair: David Blaauw</I><P> <HR> <H2> <A NAME="sessk11"> Session K1.1 </A></H2> <DL> <DT> <A HREF="../../pdffiles/k1_1scan.pdf" NAME="k1_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>Low Power RF Integrated Circuits: Principles and Practice [p. 1] </B> <DD><I>A. A. Abidi, H. Darabi</I><P> Ultralow power wireless communications, such as paging receivers operating from a single cell, have prompted the development and discovery of circuit techniques and architectures which lower power consumption of RF IC's ten-fold or more compared to today's norms. This paper illustrates many of these design principles, mainly in a CMOS context. It argues for seeking strategic combinations of high quality off-chip passives with RF integrated circuits, and searching for better architectures in wireless receivers (and transmitters) to lower power. The principles are illustrates with specific examples.<P> </DL> <HR> <H2> <A NAME="sessk12"> Session K1.2: </A></H2> <DL> <DT> <A HREF="../../pdffiles/k1_2scan.pdf" NAME="k1_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>Algorithm and Architecture of a 1V Low Power Hearing Instrument DSP [p. 7]</B> <DD><I>Finn Meller, Nikolai Bisgaard, John Melanson</I><P> This paper presents a 1 V digital signal processor used in the Danalogic hearing aid manufactured by GN Danavox. The processor is the first general purpose programmable used in behind-the-ear and in-the-ear hearing aid applications. It is integrated with memories, in a 0.5u CMOS process with standard thresholds. At 2 MHz processing speed, the processor consumes 800uA from a single cell battery. Using a dual multiply-accumulate architecture, the processor executes a 256 point block floating-point FFT in just 2900 instruction cycles.<P> </DL> <HR> <H2> Session S1: <A NAME="sesss1"> Low-Power Digital Circuit Techniques </A></H2> <I>Chair: Kaushik Roy</I><P> <DL> <DT> <A HREF="../../pdffiles/01_1.pdf" NAME="01_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S1.1 A Shared-Bitline SRAM Cell Architecture for 1-V Ultra Low-Power Word-Bit Configurable Macrocells [p. 12]</B> <DD><I>Hiroki Morimura, Satoshi Shigematsu, Shinsuke Konaka</I><P> 1-V ultra low-power SRAM circuit techniques are described for word-bit configurable memory macrocells. A shared bitline SRAM cell architecture with modified address assignment is proposed to reduce wasted memory-cell current to zero while suppressing the area penalty. For the new SRAM cell design, we devise a multiplexer-merged charge-transfer amplifier for high-sensitive read operation and a bitline precharge scheme with an equalizing line for high-speed write-recovery operation. A 1-V operating 64-kb (2kw x 16b x 2) test chip was designed using a 0.35-mm multithreshold-voltage CMOS (MTCMOS) logic process. The simulated power dissipation is 1/4 (486 mW) that of the conventional 1-V word-bit configurable SRAM macrocell with a 13% area increase.<P> <DT> <A HREF="../../pdffiles/01_2.pdf" NAME="01_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S1.2 Retractile Clock-Powered Logic [p. 18]</B> <DD><I>Nestoras Tzartzanis, William Athas</I><P> Retractile clock-powered logic is presented as a low-over-head energy-recovery logic style. It uses energy-efficient clock-steering circuits, pass-transistor logic, and a four-phase clocking scheme to recover energy from all circuit nodes but the latches. A 16-bit retractile clock-powered adder is described and evaluated through HSPICE simulations. The simulation results indicate that this approach can offer superior energy versus delay performance but the benefit depends strongly on the switching activity of the clock-powered nodes.<P> <DT> <A HREF="../../pdffiles/01_3.pdf" NAME="01_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S1.3 Energy-Efficient Dynamic Circuit Design in the Presence of Crosstalk Noise [p. 24]</B> <DD><I>Ganesh Balamurugan, Naresh R. Shanbhag</I><P> This paper describes the impact of crosstalk noise on low power design techniques based on voltage scaling. It is shown that this power saving strategy aggravates the crosstalk noise problem and reduces circuit noise immunity. A new energy-efficient, noise-tolerant dynamic circuit technique is presented to address this problem. In a 0.35m CMOS technology and at a given supply voltage, the proposed technique provides an improvement in noise-immunity of 1.8X(for an AND gate) and 2.5X(for an adder carry chain) over domino at the same speed. We use this fact to operate the noise-tolerant circuit at a lower supply voltage to obtain energy savings of about 30%, while expending 30% more area. Also, to achieve a given noise immunity, the proposed technique consumes 40% less energy compared to existing noise-tolerance techniques.<P> </DL> <HR> <H2>Session S2: <A NAME="sesss2"> Signal Processing Architecture </A></H2> <I>Chair: Naresh Shanbhag</I><P> <DL> <DT> <A HREF="../../pdffiles/02_1.pdf" NAME="02_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S2.1 Energy-Efficient Signal Processing via Algorithmic Noise-Tolerance [p. 30]</B> <DD><I>Rajamohana Hegde, Naresh R. Shanbhag</I><P> In this paper, we propose a framework for low-energy digital signal processing (DSP) where the supply voltage is scaled beyond the critical voltage required to match the critical path delay to the throughput. This deliberate introduction of input-dependent errors leads to degradation in the algorithmic performance, which is compensated for via algorithmic noise-tolerance (ANT) schemes. The resulting setup that comprises of the DSP architecture operating at sub-critical voltage and the error control scheme is referred to as soft DSP. It is shown that technology scaling renders the proposed scheme more effective as the delay penalty suffered due to voltage scaling reduces due to short channel effects. The effectiveness of the proposed scheme is also enhanced when arithmetic units with a higher "delay-imbalance" are employed. A prediction based error-control scheme is proposed to enhance the performance of the filtering algorithm in presence of errors due to soft computations. For a frequency selective filter, it is shown that the proposed scheme provides 60% reduction in energy dissipation for filter bandwidths up to 0.5 pi (where 2 pi corresponds to the sampling frequency fs) over that achieved via conventional voltage scaling, with a maximum of 0.5dB degradation in the output signal-to-noise ratio (SNR<SUB>sigma</SUB>). It is also shown that the proposed algorithmic noise-tolerance schemes can be used to improve the performance of DSP algorithms in presence of bit-error rates of up to 10<SUP>3</SUP> due to deep submicron (DSM) noise.<P> <DT> <A HREF="../../pdffiles/02_2.pdf" NAME="02_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S2.2 Reducing Power Consumption of Turbo Code Decoder Using Adaptive Iteration With Variable Supply Voltage [p. 36]</B> <DD><I>Oliver Yuk-Hang Leung, Chung-Wai Yue, Chi-ying Tsui, Roger S. Cheng</I><P> Turbo code becomes popular for the next generation wireless communication systems because of its remarkable coding performance. One of the problems for decoding turbo code in the receiver is the complexity and the high power consumption since multiple iterations of Soft Output Viterbi Algorithm (SOVA) have to be carried out to decode a data frame. In this paper, we address the issues of reducing the complexity and power consumption of the turbo code decoder. An approach using cyclic redundancy checking (CRC) to adaptively terminate the SOVA iteration of each frame is presented. This results in system that has variable workload of which the amount of computation required for each data frame is different. Dynamic voltage scaling is then used to further reduce the power consumption. However, since the workload is not yet known at the time when the data is being decoded, optimum voltage assignment is not feasible. In this work, we propose two heuristic algorithms to assign supply voltage for different decoding iterations. Simulation results show that significant reduction of power consumption is achieved comparing with system using fixed supply voltage.<P> <DT> <A HREF="../../pdffiles/02_3.pdf" NAME="02_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S2.3 A Low Energy Architecture for Fast PN Acquisition [p. 42]</B> <DD><I>Christopher Deng, Charles Chien</I><P> Spread spectrum systems are being widely deployed today and are becoming more prevalent as most next-generation wireless systems are adopting it for their common air interface. These systems include the digital cellular IS-95A/B/C, IEEE 802.11 wireless local area networks, as well as third-generation wideband code-division multiple access systems. In spread-spectrum systems, the receiver must synchronize on to the transmitted pseudo-noise (PN) code to obtain the improvement performance achieved through spreading. Since PN acquisition must process the spread-spectrum signal at a speed much faster than the transmitted data rate, its energy consumption can become significant and should be minimized for portable applications. Typically, either matched filters or serial correlators are used to acquire the PN code timing. This paper describes a hybrid PN acquisition architecture which employs both matched filters and serial correlators to achieve a lower energy consumption and fast acquisition time as compared to the traditional approaches of using either matched filters or serial correlators alone. The hybrid architecture has been implemented in RTL VHDL and synthesized down to gate level in 0.5-micron CMOS library. Synthesis results show a factor of four reduction in energy for the hybrid scheme as compared to the matched filters architecture and a factor of two reduction in energy as compared to the serial architecture.<P> </DL> <HR> <H2>Session S3: <A NAME="sesss3"> Power Management and Low Voltage </A></H2> <I>Chair: Rick Carley<BR> </I><P> <DL> <DT> <A HREF="../../pdffiles/03_1.pdf" NAME="03_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S3.1 Vibration-to-Electric Energy Conversion [p. 48]</B> <DD><I>Scott Meninger, Jose Oscar Mur-Miranda, Rajeevan Amirtharajah, Anantha Chandrakasan, Jeffrey Lang</I><P> A system is proposed to convert ambient mechanical vibration into electrical energy for use in powering autonomous low-power electronic systems. The energy is transduced through the use of a variable capacitor, which has been designed with MEMS (microelectromechanical systems) technology. A low-power controller IC has been fabricated in a 0.6um CMOS process and has been tested and measured for losses. Based on the tests, the system is expected to produce 8uW of usable power.<BR> Keywords - Energy Conversion, MEMS, Low-Power, Self-Powered<P> <DT> <A HREF="../../pdffiles/03_2scan.pdf" NAME="03_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S3.2 Variable Supply-Voltage Scheme with 95%-Efficiency DC-DC Converter for MPEG-4 Codec [p. 54]</B> <DD><I>Fuyuki Ichiba, Kojiro Suzuki, Shinji Mita, Tadahiro Kuroda, Tohru Furuyama</I><P> A variable supply-voltage (VS) scheme with a high power-conversion-efficiency DC-DC converter is presented. A new pulse width modulation (PWM) circuit for the DC-DC converter is proposed to reduce both of power consumption and chip area. The power conversion efficiency reaches up to 95%, and the area is less than half of the conventional design. The VS scheme contains critical path replica circuits of an MPEG-4 codec LSI, and its output voltage is controlled by monitoring delay time of the replica circuits. Consequently the VS scheme can automatically generate minimal internal supply voltage that meets the demand from the operation frequency of an MPEG-4 codec LSI. The advantages of this circuit are successfully demonstrated through fabrication of a test chip using a 0.3 um CMOS technology.<BR> Keywords: DC-DC, low power, low voltage, PWM, variable supply voltage.<P> <DT> <A HREF="../../pdffiles/03_3.pdf" NAME="03_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S3.3 Circuit Methods for the Integration of Low Voltage (1.1-1.8V) Analog Functions on System-on-a-Chip IC's in a Single-Poly CMOS Processes [p. 60]</B> <DD><I>Vladimir Koifman, Yachin Afek, Joseph Shor</I><P> Several new building blocks are demonstrated, which enable low-power (1.1-1.8V) analog functionality in a single-poly, digital CMOS process. These cells facilitate the integration of analog converters on system-on-a-chip IC's without adding any extra cost to the process. A voice A/D, designed with these circuits, exhibited an SNR of 68 dB at an analog supply voltage of 1.1V, and 75dB at 1.8V. This is despite the noisy digital environment of an on-chip DSP operating at 60 Mhz and a digital supply voltage of 2.5V.<P> </DL> <HR> <H2>Session S4: <A NAME="sesss4"> Cache/Memory Systems </A></H2> <I>Chair: Christian Piguet</I><P> <DL> <DT> <A HREF="../../pdffiles/04_1.pdf" NAME="04_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S4.1 Using Dynamic Cache Management Techniques to Reduce Energy in a High-Performance Processor [p. 64]</B> <DD><I>Nikolaos Bellas, Ibrahim Hajj, Constantine Polychronopoulos</I><P> In this paper, we propose a technique that uses an additional mini cache, the L0-Cache, located between the instruction cache (I-Cache) and the CPU core. This mechanism can provide the instruction stream to the data path and, when managed properly, it can effectively eliminate the need for high utilization of the more expensive I-Cache. In this work, we propose, implement, and evaluate a series of run-time techniques for dynamic analysis of the program instruction access behavior, which are then used to proactively guide the access of the L0-Cache. The basic idea is that only the most frequently executed portions of the code should be stored in the L0-Cache since this is where the program spends most of its time. We present experimental results to evaluate the effectiveness of our scheme in terms of performance and energy dissipation for a series of SPEC95 benchmarks. We also discuss the performance and energy tradeoffs that are involved in these dynamic schemes.<P> <DT> <A HREF="../../pdffiles/04_2.pdf" NAME="04_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S4.2 Reducing Power in Superscalar Processor Caches Using Subbanking, Multiple Line Buffers and Bit-Line Segmentation[p. 70]</B> <DD><I>Kanad Ghose, Milind B. Kamble</I><P> Modern microprocessors employ one or two levels of on-chip caches to bridge the burgeoning speed disparities between the processor and the RAM. These SRAM caches are a major source of power dissipation. We investigate architectural techniques, that do not compromise the processor cycle time, for reducing the power dissipation within the on-chip cache hierarchy in superscalar microprocessors. We use a detailed register-level simulator of a superscalar microprocessor that simulates the execution of the SPEC benchmarks and SPICE measurements for the actual layout of a 0.5 micron, 4metal layer cache, optimized for a 300 MHz. clock. We show that a combination of subbanking, multiple line buffers and bit-line segmentation can reduce the on-chip cache power dissipation by as much as 75% in a technology-independent manner.<BR> Key words: Low power caches, power estimation.<P> <DT> <A HREF="../../pdffiles/04_3.pdf" NAME="04_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S4.3 Energy Efficient Data Transfer and Storage Organization for a MAP Turbo Decoder Module [p. 76]</B> <DD><I>Curt Schurgers, Francky Catthoor, Marc Engels</I><P> Turbo codes are the most recent breakthrough in coding theory. Although their decoding algorithm is highly data dominated, no systematic memory optimization study has been performed yet. We have applied the IMEC Data Transfer and Storage Exploration (DTSE) methodology to the MAP (Maximum A Posteriori) class of turbo decoding algorithms. We present an extensive overview of our optimizations and tradeoffs, which result in a parametric family of new optimized algorithms. The optimal choice of parameters depends on the specific turbo code and on the implementation target, which can be either hardware or software.<P> </DL> <HR> <H2>Session P1: <A NAME="sessp1"> Low Power Circuits </A></H2> <I>Chair: Barton Brock</I><P> <DL> <DT> <A HREF="../../pdffiles/p1_1.pdf" NAME="p1_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P1.1 Mixed-Swing QuadRail for Low Power Dual-Rail Domino Logic [p. 82]</B> <DD><I>Bharath Ramasubramanian, Herman Schmit, L. Richard Carley</I><P> This paper describes a new mixed-swing topology for dual-rail domino logic that results in a simultaneous energy and delay reduction. HSPICE simulation results for a 1-bit full adder cell show a 24% delay decrease and a 24% energy reduction for the mixed-swing topology compared to standard dual-rail domino. Energy and delay trends with supply voltage scaling are also presented for the adder cell. An 8-bit by 8-bit multiplier design with mixed-swing dual-rail domino adders is presented. Simulation results show this implementation to be 10% faster with an 18% energy savings. <DT> <A HREF="../../pdffiles/p1_2.pdf" NAME="p1_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P1.2 Databus Charge Recovery: Practical Considerations [p. 85]</B> <DD><I>Benjamin Bishop, Mary Jane Irwin </I><P> The charge recovery databus is a scheme which reduces energy consumption through the application of adiabatic circuit techniques. Previous work [2] gives a solid theoretical analysis of this scheme, including quantitative data assuming random bus values. We extend this earlier work by presenting a quantitative analysis of the charge recovery databus using 15 benchmarks and 4 high-level bus coding schemes. We show that a very simple implementation of the charge recovery databus is capable of reducing average energy consumption by 28% beyond traditional high-level bus encoding techniques.<P> <DT> <A HREF="../../pdffiles/p1_3.pdf" NAME="p1_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P1.3 SC<SUP>2</SUP>L; A Low-Power High-Performance Dynamic Differential Logic Family[p. 88]</B> <DD><I>Amr M. Fahim, Mohamed I. Elmasry </I><P> A new dynamic differential logic family, Short-Circuit Current Logic (SC 2 L), is proposed for low-power high-performance applications. It achieves low-power consumption by using an aggressively reduced logic swing without requiring restoration circuitry. Using a 0.35mm CMOS technology and a nominal supply voltage of 3.3V, a SC 2 L full-adder 8 carry ripple adder (CRA) is implemented. It offers an order of magnitude less power-delay product than several other logic families.<BR> Keywords: Digital circuits, high-performance, low-power, low swing logic<P> <DT> <A HREF="../../pdffiles/p1_4.pdf" NAME="p1_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P1.4 Conforming Inverted Data Store for Low Power Memory [p. 91]</B> <DD><I>You-Sung Chang, Bong-II Park, Chong-Min Kyung</I><P> In this paper, we propose a 'conforming inverted data store' scheme for reducing the power consumption in memory components. It reduces the power consumption by conforming memory contents to a precharging value of the memory. It selectively stores normal or inverted data so to reduce the total number of accessing bits different from the precharging value. In this way, bitline toggling during memory access is minimized and this ultimately contributes to reduction in power consumption. We develop two practical implementations for the proposed method, that are vertical strip, and horizontal strip inversion schemes. Simulation results indicate that implementation of the strip-based inversion schemes contribute to a power reduction up to 50%.<P> <DT> <A HREF="../../pdffiles/p1_5.pdf" NAME="p1_5"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P1.5 Ultra-Low Power Digital Subthreshold Logic Circuits [p. 94]</B> <DD><I>Hendrawan Soeleman, Kaushik Roy</I><P> Numerous efforts in balancing the trade-off between power, area and performance have been done in the medium performance, medium power region of the design spectrum. However, not much study has been done at the two extreme ends of the design spectrum, namely the ultra-low power with acceptable performance at one end (the focus of this paper), and high performance with power within limit at the other. One solution to achieve the ultra-low power requirement is to operate the digital logic gates in subthreshold region. We analyze both CMOS and Pseudo-NMOS logic families operating in subthreshold region. We compare the results with CMOS in normal strong inversion region and with other known low-power logic, namely, energy recovery logic. Our results show an energy per switching reduction of two orders of magnitude for an 8x8 carry save array multiplier when it is operated in subthreshold region.<BR> 1.1 Keywords: Ultra-low power, digital logic, subthreshold circuits<P> <DT> <A HREF="../../pdffiles/p1_6.pdf" NAME="p1_6"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P1.6 Single-Phase Source-Coupled Adiabatic Logic [p. 97]</B> <DD><I>Suhwan Kim, Marios C. Papaefthymiou</I><P> Adiabatic circuits offer a promising alternative to conventional circuitry for low energy design. Their operation is nevertheless subject to fundamental energy-speed trade-offs, just like any other physical realization of boolean logic. Thus, adiabatic circuits with very low energy consumption at low frequencies fail to function at high operating frequencies. Conversely, high-speed adiabatic circuits tend to be dissipative at low clock rates. This paper describes SCAL, a single-phase source-coupled adiabatic logic family that operates efficiently across a wide range of operating frequencies. In layout-based simulations with 0.5um CMOS process parameters, pipelined carry-lookahead adders developed in our logic function correctly from 10MHz up to 280MHz. Our SCAL adders are less dissipative than corresponding designs in alternative adiabatic families that remain functional across the same frequency range. Moreover, they are about as dissipative as other adiabatic circuits that are geared towards very efficient operation at low frequencies. In comparison with their CMOS counterparts, our SCAL adders are 3 to 10 times more energy efficient.<P> </DL> <HR> <H2>Session P2: <A NAME="sessp2"> System & Analog Low Power Design </A></H2> <I>Chair: Enrico Macii</I><P> <DL> <DT> <A HREF="../../pdffiles/p2_1.pdf" NAME="p2_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P2.1 Global Register Allocation for Minimizing Energy Consumption [p. 100]</B> <DD><I>Yumin Zhang, Xiaobo (Sharon) Hu, Danny Z. Chen</I><P> Data referencing during program execution can be a significant source of energy consumption especially for data-intensive programs. In this paper, we propose an approach to minimize such energy consumption by allocating data to proper registers and memory. Through careful analysis of boundary conditions between consecutive blocks, our approach efficiently handles various control structures including branches, merges and loops, and achieves superior allocation results for the whole program. The computational cost of our approach for solving the global register allocation problem is rather low comparing with known approaches while the quality of our results is very encouraging.<P> <DT> <A HREF="../../pdffiles/p2_2.pdf" NAME="p2_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P2.2 Power Macro-Models for DSP Blocks with Application to High-Level Synthesis [p. 103]</B> <DD><I>Subodh Gupta, Farid N. Najm</I><P> In this paper, we propose a modeling approach for the average power consumption of macro-blocks that are typically used in digital signal processing (DSP) systems, such as adders, multipliers and delay elements, in terms of their input/output signal switching statistics. The resulting power macro-model, consisting of a quadratic or cubic equation in four variables, can be used to estimate the average power consumed in the macro-block for any given input/output signal statistics. This enables high-level power estimation and allows one to compare the power performance of different competing DSP systems during high-level synthesis. This approach has been implemented and models have been built and tested for many macro-blocks.<P> <DT> <A HREF="../../pdffiles/p2_3.pdf" NAME="p2_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P2.3 Power Minimization of High-Performance Submicron CMOS Circuits Using a Dual-V<SUB>dd</SUB> Dual-V<SUB>th</SUB> (DVDV) Approach [p. 106]</B> <DD><I>Muhammad M. Khellah, K.I. Elmasry</I><P> A new method, called DVDV, for low-power design of high-performance CMOS logic circuits is presented. DVDV utilizes a library of gates with dual supply voltages (V<SUB>dd</SUB>) and dual threshold voltages (V<SUB>th</SUB>) to achieve high-performance with minimum dynamic and leakage power. A Depth-First-Search (DFS) based heuristic for DVDV node assignment is described. Exercising the techniques on a set of benchmarks shows significant power savings over the dual-V<SUB>dd</SUB> (with a single V<SUB>th</SUB>) scheme, and faster speeds than those possible with the dual-V<SUB>th</SUB>(and a single V<SUB>dd</SUB>) approach.<BR> 1.1 Keywords: low-power, low-voltage, logic design, CMOS, dual-V<SUB>dd</SUB>, dual-V<SUB>th</SUB><P> <DT> <A HREF="../../pdffiles/p2_4.pdf" NAME="p2_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P2.4 A Completely On-Chip Voltage Regulation Technique for Low Power Digital Circuits [p. 109]</B> <DD><I>L. Richard Carley, Akshay Aggarwal</I><P> This paper describes a completely on-chip voltage regulation technique for locally generating an adaptive low voltage power supply rail from a given higher voltage power supply without requiring any external component. The on-chip regulator, based on delay servoing, primarily comprises of a critical path replica, charge pump and a high performance voltage buffer which is the most critical component of the design. Simulation results in 0.5mm CMOS process demonstrate that the buffer offers a low DC output impedance, a high degree of voltage regulation (output ripple of 12% of Vdd) and a superior line regulation (upto the maximum clock frequency of 50MHz) even under strongly varying load conditions. The regulator response for a typical worst case load exhibits a maximum voltage fluctuation of 4% of Vdd with a reasonably fast response time.<P> <DT> <A HREF="../../pdffiles/p2_5.pdf" NAME="p2_5"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P2.5 Comparison of Class A Amplifiers for Low-Power and Low-Voltage Switched Capacitor Applications [p. 112]</B> <DD><I>Christoph Schwoerer, Dominique Morche, Patrice Senn</I><P> Owing to their higher output dynamic two-stage amplifiers may become an interesting alternative to cascoded single-stage amplifiers for low voltage switched capacitor applications. Therefore, a comparison of the minimum power consumption of both approaches, based on an optimisation methodology, is It is worked out, which amplifier type should be used to achieve minimum power consumption for a given supply voltage, capacitor ratio and desired settling precision.<P> </DL> <HR> <H2>Session S5: <A NAME="sesss5"> Power Estimation </A></H2> <I>Chair: Rajendran Panda</I><P> <DL> <DT> <A HREF="../../pdffiles/05_1.pdf" NAME="05_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S5.1 Lower and Upper Bounds on the Switching Activity in Scheduled Data Flow Graphs [p. 115]</B> <DD><I>Lars Kruse, Eike Schmidt, Gerd Jochens, Wolfgang Nebel</I><P> In this paper we present an approach to calculate lower and upper bounds for the switching activity in scheduled data flow graphs. The technique can be used to prune the design space in high level synthesis for low power before allocation and binding of functional units and registers. The low power allocation and binding problem is formulated. It is shown that this problem can be relaxed to the bipartite weighted matching problem which is solvable in where n is the number of functional units or registers, respectively. The application of the technique on benchmarks shows the tightness of the bounds. Most of the investigated bounds were less than 1% off the minimum respectively maximum solutions.<BR> 1.1 Keywords: High-level power estimation, bounds estimation<P> <DT> <A HREF="../../pdffiles/05_2.pdf" NAME="05_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S5.2 Energy-Per-Cycle Estimation at RTL [p. 121]</B> <DD><I>Subodh Gupta, Farid N. Najm</I><P> We present a novel macromodeling technique for estimating the energy dissipated in a logic circuit for every input vector pair (we call this the energy-per-cycle). The macromodel is based on classifying the input vector pairs on the basis of their Hamming distances and using a different equation-based macromodel for every Hamming distance. The variables of our macromodel are the zero-delay transition counts at three logic levels inside the circuit. We present an automatic characterization process by which such macromodels can be constructed. This energy-per-cycle macromodel provides a transient energy waveform, and can also be used to estimate the moving average energy over any time window. This approach has been implemented and models have been built and tested for many circuits. The average error observed in estimating the energy-per-cycle is under 20%. The model can also be used to measure the long-term average power, with an observed error of under 10% on average.<P> <DT> <A HREF="../../pdffiles/05_3.pdf" NAME="05_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S5.3 Efficient Switching Activity Computation During High-Level Sythesis of Control-Dominated Designs [p. 127]</B> <DD><I>A. Bogliolo, L. Benini, B. Riccó, G. De Micheli</I><P> In this work we propose an exact technique for efficient computation of signal statistics during high-level synthesis for low-power of general control-dominated designs. Our approach does not require iterative simulation: simulation is performed once for all to collect boundary information that will be repeatedly exploited for computing signal statistics for alternative implementations.<P> <DT> <A HREF="../../pdffiles/05_4.pdf" NAME="05_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S5.4 Non-Stationary Effects in Trace-Driven Power Analysis [p. 133]</B> <DD><I>Radu Marculescu, Diana Marculescu, Massoud Pedram</I><P> The objective of this paper is to present an analytic technique for power analysis under non-stationary conditions. We use the transitive closure calculation to identify the transient component in the behavior of the target machine and then, based on the fundamental matrix and a symbolic approach (or support from simulation), we find the actual power distribution that corresponds to the transient regime. The present technique complements the current techniques (either for average or peak power estimation) to handle the case when transient effects exist and cannot be ignored.<BR> 1.1 Keywords: power consumption, transient regime, Markov chains<P> </DL> <HR> <H2>Session S6: <A NAME="sesss6"> Circuit-Level Low-Power Design Techniques </A></H2> <I>Chair: George Stamoulis </I><P> <DL> <DT> <A HREF="../../pdffiles/06_1.pdf" NAME="06_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S6.1 Low Power Synthesis of Dual Threshold Voltage CMOS VLSI Circuits [p. 139]</B> <DD><I>Vijay Sundararajan, Keshab K. Parhi</I><P> The use of dual threshold voltages can significantly reduce the static power dissipated in CMOS VLSI circuits. With the supply voltage at 1V and threshold voltage as low as 0.2V the subthreshold leakage power of transistors starts dominating the dynamic power. Also, many times a large number of devices spend a long time in a standby mode where the leakage power is the only source of power consumption. We present a near-optimal approach to synthesize low static power CMOS VLSI circuits with two threshold voltages that reduces power consumption compared with a previous approach by up to 29.45%. Also, presented is a technique which finds static power optimal configurations for CMOS VLSI circuits when arbitrary number of threshold voltages are allowed.<P> <DT> <A HREF="../../pdffiles/06_2.pdf" NAME="06_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S6.2 Clock Distribution Using Multiple Voltages [p. 145]</B> <DD><I>Jatuchai Pangjun, Sachin S. Sapatnekar </I><P> Clock networks account for a significant fraction of the power dissipation of a chip and are critical to the performance. This paper presents theory and algorithms for building a low power clock tree. Two low power schemes are used: a reduced swing scheme and one using multiple supply voltages. We analyze the issue of tree construction and present conclusions relevant to various technology generations according to the National Technology Roadmap of Semiconductors (NTRS). Our experimental results show that the power could be saved an average of 45% for a 0.25 um technology using multiple supply voltages, and 31% using reduced swing buffers.<P> <DT> <A HREF="../../pdffiles/06_3.pdf" NAME="06_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S6.3 Monotonic Static CMOS and Dual V<SUB>T</SUB> Technology [p. 151]</B> <DD><I>Tyler Thorp, Gin Yee, Carl Sechen</I><P> We developed a methodology and tools for synthesizing monotonic static CMOS networks, which consist of alternating low-skewed and high-skewed static gates. When used with a dual V T process, monotonic static CMOS can simultaneously reduce standby static power and increase performance by using low V T devices in the evaluation networks and making all other devices high V T . Experimental results show monotonic static CMOS to be 1.67 times faster than traditional static CMOS.<P> <DT> <A HREF="../../pdffiles/06_4.pdf" NAME="06_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S6.4 VIP - An Input Pattern Generator for Identifying Critical Voltage Drop for Deep Sub-Micron Designs [p. 156]</B> <DD><I>Yi-Min Jiang, Tak K. Young, Kwang-Ting Cheng </I><P> We present a novel input pattern generator for dynamic power network simulation. The obtained patterns successfully identify critical voltage drop areas for a set of industrial designs, which are difficult to be found using functional vectors. The search engine of the pattern generator for worst-case IR voltage drop is based on the multi-objective genetic algorithm. To achieve high coverage for critical voltage drop cells, we propose to model the search criteria into the maximum weighted matching of a bipartite graph, and guide the search direction according to the matching results. Experimental results show that, compared with the other approaches, our patterns give a higher coverage of critical voltage drop cells.<P> </DL> <HR> <A NAME="panel"> <H2>Evening Panel:<P> </A> <A HREF="../../pdffiles/panel.pdf" NAME="panel"><IMG SRC="../../../../../icons/pdficon.gif"></A> From Devices to Systems : Re-Directing the Future of Low Power Design [p. 162]</B> </H2> <I>Moderator: Massoud Pedram <BR> Panelists : William C. Athas, Edmund K. Cheng, Robert P. Colwell, William P. Kaiser, Jose Munoz, Saed Younis</I><P> </DL> <HR> <H2> <A NAME="keynote2"> Keynote Session 2 </A></H2> <I>Chair: Jason Cong</I><P> <HR> <H2> <A NAME="sessk21"> Session K2.1 </A></H2> <DL> <DT> <A HREF="../../pdffiles/k2_1scan.pdf" NAME="k2_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>Technology and Design Challenges for Low Power and High Performance [p. 163]</B> <DD><I>Vivek De, Shekhar Borkar</I><P> We discuss key barriers to continued scaling of supply voltage and technology for microprocessors to achieve low-power and high-performance. In particular, we focus on short-channel effects, device parameter variations, excessive subthreshold and gate oxide leakage, as the main obstacles dictated by fundamental device physics. Functionality of special circuits in the presence of high leakage, SRAM cell stability, bit line delay scaling, and power consumption in clocks & interconnects, will be the primary design challenges in the future. Soft error rate control and power delivery pose additional challenges. All of these problems are further compounded by the rapidly escalating complexity of microprocessor designs. The excessive leakage problem is particularly severe for battery-operated, high-performance microprocessor.<BR> Keywords: Microprocessor, VLSI design, memory, low-power design.<P> </DL> <HR> <H2> <A NAME="sessk22"> Session K2.2 </A></H2> <DL> <DT> <A HREF="../../pdffiles/k2_2scan.pdf" NAME="k2_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>Advanced Battery Systems: Chemistry, Construction and Characteristics [p. 169]</B> <DD><I>Subbarao Surampudi</I><P> </DL> <HR> <H2>Session S7: <A NAME="sesss7"> Digital Logic and Arithmetic </A></H2> <I>Chair: John Arends</I><P> <DL> <DT> <A HREF="../../pdffiles/07_1.pdf" NAME="07_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S7.1 Power Scalable Processing Using Distributed Arithmetic [p. 170]</B> <DD><I>Rajeevan Amirtharajah, Thucydides Xanthopoulos, Anantha Chandrakasan</I><P> A recent trend in low power design has been the employment of reduced precision processing methods for decreasing arithmetic activity and average power dissipation. Such designs can trade off power and arithmetic precision as system requirements change. This work explores the potential of Distributed Arithmetic (DA) computation structures for low power precision-on-demand computation. We present two proof-of-concept VLSI implementations whose power dissipation changes according to the precision of the computation performed.<P> <DT> <A HREF="../../pdffiles/07_2.pdf" NAME="07_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S7.2 Challenges in Clockgating in a Low Power ASIC Methodology [p. 176]</B> <DD><I>David Garrett, Mircea Stan, Alvar Dean</I><P> Gating the clock is an important technique used in low power design to disable unused modules of a circuit. Gating can save power by both preventing unnecessary activity in the logic modules as well as by eliminating power dissipation in the clock distribution network.There is an inherent pitfall though in implementing gating groups for hierarchical gated clock distribution because the groups are typically developed at the logic level with no information of the physical layout of the clocktree. Depending on the distribution of underlying sinks, maintaining gating groups can cause a wiring overhead that is potentially greater than the savings due to reduced switching. We look at modifications of zero-skew tree algorithms to consider both the physical and logical aspects of hierarchical gating. The algorithms are applied to data taken from a low power ASIC design. The best gated clocktree is created using both physical and logical information.<BR> Keywords: clocktree, clockgating, low power, physical design<P> <DT> <A HREF="../../pdffiles/07_3.pdf" NAME="07_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S7.3 Modeling and Automating Selection of Guarding Techniques for Datapath Elements [p. 182]</B> <DD><I>William E. Dougherty, Donald E. Thomas</I><P> While guarded evaluation has proven an effective energy saving technique in arithmetic circuits, good methodologies do not exist for determining when and how to guard for maximal savings. Three new internal guarding techniques are presented in adders that increase energy savings up to 38% over existing external guarding techniques. This allows guarded evaluation to be effective at duty cycles as much as 20% higher than are currently practical. A modeling methodology is presented defining the energy and energy delay of a unit in a generic application space. These models can easily be incorporated into an automated selection technique to determine the optimal guarded implementation. This technique is tested on a DSP ASIP, increasing overall energy savings by preventing unnecessary guarding. The data is generalized and it is observed that guarding is most beneficial when the ratio of guarding transistors to driven computational transistors is 1 /10 or lower.<BR> Keywords: Guarded evaluation, low power design, datapath energy modeling.<P> <DT> <A HREF="../../pdffiles/07_4.pdf" NAME="07_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S7.4 The Design of a Low Energy FPGA [p. 188]</B> <DD><I>Varghese George, Hui Zhang, Jan M. Rabaey</I><P> This work presents the design of an energy efficient FPGA architecture. Significant reduction in the energy consumption is achieved by tackling both circuit design and architecture optimization issues concurrently. A hybrid interconnect structure incorporating Nearest Neighbor Connections, Symmetric Mesh Architecture, and Hierarchical connectivity is used. The energy of the interconnect is also reduced by employing low-swing circuit techniques. These techniques have been employed to design and fabricate an FPGA. Preliminary analysis show energy improvement of more than an order of magnitude when compared to existing commercial architectures.<BR> 1.1 Keywords: FPGA, low power, low swing signalling<P> </DL> <HR> <H2>Session S8: <A NAME="sesss8"> Power Aware System Design </A></H2> <I>Chair: Diana Marculescu</I><P> <DL> <DT> <A HREF="../../pdffiles/08_1scan.pdf" NAME="08_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S8.1 Stochastic Modeling of a Power-Managed System: Construction and Optimization [p. 194]</B> <DD><I>Qinru Qiu, Qing Wu, Massoud Pedram</I><P> The goal of a dynamic power management policy is to reduce the power consumption of an electronic system by putting system components into different states, each representing certain performance and power consumption level. The policy determines the type and timing of these transitions based on the system history, workload and performance constraints. In this paper, we propose a new abstract model of a power-managed electronic system. We formulate the problem of system-level power management as a controlled optimization problem based on the management as a controlled optimization problem based on the theories of continuous-time Markov decision processes and stochastic networks. This problem is solved exactly and efficiently using a "policy iteration" approach. Our method is compared with existing heuristic approaches for different workload statistics. Experimental results show that power management method based on Markov decision process outperforms heuristic approaches in term of power dissipation savings for a given level of system performance.<P> <DT> <A HREF="../../pdffiles/08_2.pdf" NAME="08_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S8.2 The Impact of Battery Capacity and Memory Bandwidth on CPU Speed-Setting: A Case Study [p. 200]</B> <DD><I>Thomas L. Martin, Daniel P. Siewiorek</I><P> The purpose of this paper is to report the power and performance of an application on a real system as the CPU frequency varies. Previous work in CPU speed-setting considered only the power of the CPU and only CPU's that vary supply voltage with frequency. This work takes a broader approach, considering total system power, battery capacity and main memory bandwidth. The results, which are up to a factor of four less than ideal, show that all three must be considered when setting the CPU speed, whether the speed is fixed at a single value or varied dynamically during operation.<P> <DT> <A HREF="../../pdffiles/08_3.pdf" NAME="08_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S8.3 Selective Instruction Compression for Memory Energy Reduction in Embedded Systems [p. 206]</B> <DD><I>Luca Benini, Alberto Macii, Enrico Macii, Massimo Poncino</I><P> We propose a technique for reducing the energy required by firmware code to execute on embedded systems. The method is based on the idea of compressing the most commonly executed instructions so as to reduce the energy dissipated in memory accesses. Instruction decompression is performed on the y by a hardware module located between processor and memory: No changes to the processor architecture are required. Hence, our technique is well-suited for systems employing IP cores whose internal architecture cannot be modified. We describe a number of decompression schemes and architectures that effectively trade off hardware complexity for memory energy and bandwidth reduction, as proved by experimental data collected by executing several sample programs.<P> <DT> <A HREF="../../pdffiles/08_4.pdf" NAME="08_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S8.4 Energy-Efficient Design of Battery-Powered Embedded Systems [p. 212]</B> <DD><I>Tajana Simunic, Luca Benini, Giovanni De Micheli</I><P> Energy-efficient design of battery-powered embedded systems demands optimizations in both hardware and software. In this work we leverage cycle-accurate energy consumption models to explore compiler and source code optimizations aimed at reducing energy consumption. In addition, we extend cycle-accurate architectural power simulation with battery models that provide battery lifetime estimates. The enhanced simulator and software optimizations are used to study and optimize the power dissipation of Smart-Badge , a wearable system based on the ARM microprocessor developed by HP Laboratories. We found that standard compiler optimizations give less than 1% energy savings. Source code optimizations are capable of up to 90% energy savings. In addition, our analysis of battery lifetime for the MPEG decoder implemented on the SmartBadge shows that battery efficiency varies greatly with discharge currents on cycle-by-cycle basis and can cause up to 16% reduction in battery lifetime.<P> </DL> <HR> <H2>Session S9: <A NAME="sesss9"> Low Voltage Technology and Modeling </A></H2> <I>Chair: Jim Burn</I><P> <DL> <DT> <A HREF="../../pdffiles/09_1.pdf" NAME="09_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S9.1 A Physical Alpha-Power Law MOSFET Model [p. 218]</B> <DD><I>Keith A. Bowman, Blanca L. Austin, John C. Eble, Xinghai Tang, James D. Meindl</I><P> A new compact physics-based Alpha-Power Law MOSFET Model is introduced to enable projections of low power circuit performance for future generations of technology by linking the simple mathematical expressions of the original Alpha-Power Law Model with their physical origins. The new model, verified by HSPICE simulations and measured data, includes: 1) a subthreshold region of operation for evaluating the on/off current trade-off that becomes a dominant low power design issue as technology scales, 2) the effects of vertical and lateral high field mobility degradation and velocity saturation, and 3) threshold voltage roll-off. Model projections for MOSFET CV/I indicate a 2X-performance opportunity compared to NTRS extrapolations for the 250, 180, and 150nm generations subject to maximum leakage current estimates of the roadmap. NTRS and model calculations converge at the 70nm technology generation, which exhibits pronounced on/off current interdependence for low power gigascale integration (GSI).<P> <DT> <A HREF="../../pdffiles/09_2.pdf" NAME="09_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S9.2 Hysteresis Effect in Floating-Body Partially-Depleted SOI CMOS Domino Circuits [p. 223]</B> <DD><I>R. Puri, C.T. Chuang</I><P> This paper investigates the basic mechanisms of hysteretic delay and noise margin variations for floating-body Partially-Depleted SOI CMOS domino circuits in detail. Three cases, based on whether the input signals are "domino input signals" from other domino circuits; "static input signals" from static circuits or latches; or a combination of "domino and static input signals" are examined and differentiated. It is shown that hysteretic delay variation is larger and noise margin worse for the later case with "mixed domino and static input signals." Although the delay and noise margin disparities between the three types of input signals are significant at beginning of the clock cycles, they converge as the circuit approaches steady-state.<P> <DT> <A HREF="../../pdffiles/09_3scan.pdf" NAME="09_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S9.3 Impact of Using Adaptive Body Bias to Compensate Die-to-Die Vt Variation on Within-Die Vt Variation [p. 229]</B> <DD><I>Siva Narendra, Dimitri Antoniadis, Vivek De</I><P> Scaling of supply voltage (Vdd) is essential for controlling active power dissipation in complex digital circuits. Transistor threshold voltage (Vt) variation is one of the key limiters to Vdd scaling. Several adaptive body biasing schemes have been proposed earlier to reduce the impact of die-to-die Vt variation. Unfortunately, body bias degrades short channel effect (SCE) in the MOSFET. As technology is scaled down, this adverse effect of body biasing poses an increasingly serious challenge to controlling SCE and results in worse within-die Vt variation. The scaling trends of body bias values required to reduce die-to-die Vt variations and the resulting increase in within-die Vt variation are presented across three different technology generations.<P> </DL> <HR> <H2>Session S10: <A NAME="sesss10"> Integrated Wireless Subsystems </A></H2> <I>Chair: Lou Williams</I><P> <DL> <DT> <A HREF="../../pdffiles/10_1scan.pdf" NAME="10_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S10.1 A 1.2V, 430 MHz, 4dBm Power Amplifier and a 250uW Front-End, Using a Standard Digital CMOS Process [p. 233]</B> <DD><I>T. Melly, A.-S. Porret, C.C. Enz, M. Kayal, E. Vittoz</I><P> Autonomous transceivers working in the ISM UHF bands should meet both requirements of a long battery lifetime and a small overall volume, thus implying to cut the receiving power consumption down to less than 1mW. Ultimately, this goal will only be reached by using original topologies and lowering the supply voltage down to single battery cell operation. A RF front-end and a power-amplifier (PA) designed for the 433 MHz European ISM band are presented. Both RF building blocks have been integrated in a standard 0.5 um digital CMOS process with 0.65 V threshold voltages. The front-end includes an LNA and a downconverter mixer. It achieves a total double sideband (DSB) noise figure of 9 dB, with a dynamic range of 85 dB for a 60 kHz bandwidth, while dissipating only 250 uW at 1.2 V supply voltage. The PA includes two fully integrated Class A stages together with an output Class C amplifier. It achieves a +4 dBm output power with a 15% overall efficiency under 1.2 V supply voltage.<BR> Keywords: Low Power, Low Voltage, UHF band, Low Noise Amplifier (LNA), Mixer, Flicker Noise, Power Amplifier (PA).<P> <DT> <A HREF="../../pdffiles/10_2.pdf" NAME="10_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S10.2 CMOS Front-End LNA-Mixer for Micropower RF Wireless Systems [p. 238]</B> <DD><I>Razieh Rofougaran, Tsung-Hsien Lin, William J. Kaiser</I><P> Motivated by the emerging needs for low power, low cost narrow-band wireless communication systems, the first micropower RFIC front-end has been implemented in standard CMOS technology. The front-end, an LNA combined with a down-conversion mixer, has been designed and fabricated in a HP 0.8 mm CMOS process. This mandates the use of high-Q discrete inductors to provide sufficient gain for the LNA. Employing these design methods, the front-end supply current is less than 110 mA with a 3V supply voltage for operation at 450 MHz. High-Q inductors have been manufactured using low-temperature co-fired ceramic (LTCC) technology. The front-end's gain is 25 dB with an IIP3 of -15 dBm. This is the lowest current consumption reported to date for a CMOS front-end operating at this frequency. <DT> <A HREF="../../pdffiles/10_3.pdf" NAME="10_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>S10.3 Differential PLL for Wireless Applications Using Differential CMOS LC-VCO and Differential Charge Pump [p. 243]</B> <DD><I>Ayman ElSayed, Akbar Ali, M.I. Elmasry </I><P> A Differentially controlled monolithic LC-VCO along with a differential charge pump are used to implement a differential PLL for substrate noise immunity. The differential VCO control is achieved with minimal increase in the power consumption and without sacrificing the tuning range. In a 0.5um CMOS technology the measured VCO phase noise is -119dBc @1.0MHz and the tuning range is 26% of the 1.25GHz center frequency, at a total power consumption of 4.0mA from 3V supply. The common mode rejection of the VCO control lines is more than 2000 at DC. The new differential charge pump architecture provides common mode correction without the need for a clean reference.<P> </DL> <HR> <H2>Session P3: <A NAME="sessp3"> Circuit Design & Optimization </A></H2> <I>Chair: Chi-ying Tsui</I><P> <DL> <DT> <A HREF="../../pdffiles/p3_1.pdf" NAME="p3_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P3.1 Passive Precharge and Rippled Power Logic (PPRPL) [p. 249]</B> <DD><I>Samuel B. Schaevitz, Christopher Lin</I><P> A low-power, high-speed logic style using Passive Precharge and Rippled Power is proposed. Ultra-low threshold voltage (Vt) devices permit high speed operation, while the heavy leakage current pre-charges dynamic nodes. High Vt devices prevent leakage through the logic. The high Vt devices provide power evaluate a sequence of logic gates and are activated in series for periods of time which are short relative to the clock period. The power effectively ripples through the logic path. These innovations combine to produce low power circuits that maintain very high speeds. A 16 bit by 16 bit multiplier was simulated in HSPICE using this logic style. We achieved a clock rate of 1 GHz with a latency of 1.3 ns. At that clock frequency the power dissipation is 10.9 mW.<P> <DT> <A HREF="../../pdffiles/p3_2scan.pdf" NAME="p3_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P3.2 Technology Scaling Behavior of Optimum Reverse Body Bias for Standby Leakage Power Reduction in CMOS IC's [p. 252]</B> <DD><I>Ali Keshavarzi, Siva Narendra, Shekhar Borkar, Charles Hawkins, Kaushik Roy, Vivek De</I><P> We demonstrate that, there is an optimum reverse body bias, unique to any technology generation, that minimizes the standby leakage power consumption of an IC design implemented in that technology. We also show: (1) the optimum reverse body bias value reduces by ~2X per technology generation, and (2) the maximum achievable leakage power reduction by reverse body biasing diminishes by ~4X per generation under constant field technology scaling scenario. Optimum point occurs as a result of reduction in subthreshold leakage with applied reverse bias. Therefore, new junction engineering techniques to reduce the bulk band-to-band tunneling leakage current component across the junction are needed to preserve the effectiveness of reverse body biasing for standby leakage control in future technologies.<P> <DT> <A HREF="../../pdffiles/p3_3.pdf" NAME="p3_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P3.3 An Architectural Solution for the Inductive Noise Problem due to Clock-Gating [p. 255]</B> <DD><I>Mondira Deb Pant, Pankaj Pant, D. Scott Wills, Vivek Tiwari</I><P> As we approach Gigascale Integration, chip power consumption is becoming a critical system parameter. Clock-gating idle units provides needed reductions in power consumption. However, it introduces inductive noise that can limit voltage scaling. This paper introduces an architectural approach for reducing inductive noise due to clock-gating through gradual activation/deactivation of units. This technique provides a 2x reduction in ground bounce on a 16 bit ALU simulated in SPICE, while reducing simulated SPEC95 performance by less than 5% on a typical superscalar architecture.<P> <DT> <A HREF="../../pdffiles/p3_4.pdf" NAME="p3_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P3.4 An Optimization Technique for Dual-Output Domino Logic [p. 258]</B> <DD><I>Sumant Ramprasad, Ibrahim N. Hajj, Farid N. Najm</I><P> Dynamic logic circuits [2] are used in high-performance circuits due to their speed and area advantage over static CMOS circuits. One well-known dynamic logic family is the domino CMOS family, which, however, suffers from its inability to perform inversions. Various methods have been proposed to overcome this restriction. One such method is the dual-output domino logic family. In the standard dual-output domino logic gate shown in Figure 1 each dual-output gate consists of two standard domino logic gates, producing the output, R and its complement, R. The advantage of the dual-output.<P> <DT> <A HREF="../../pdffiles/p3_5.pdf" NAME="p3_5"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P3.5 Statistically Optimized Asynchronous Barrel Shifters for Variable Length Codecs [p. 261]</B> <DD><I>Peter A. Beerel, Sangyun Kim, Pei-Chuan Yeh, Kyeounsoo Kim</I><P> This paper presents low-power asynchronous barrel shifters for variable length encoders and decoders useful in portable applications using multimedia standards. Our approach is to create multi-level asynchronous barrel shifters optimized for the skewed shift control statistics often found in these codecs. For common shifts, data passes through one level, whereas for rare shifts, data passes though multiple levels. We compare our optimized designs with the straight-forward asynchronous and synchronous designs. Both pre- and post-layout HSPICE simulation results indicate that, compared to their synchronous counterparts, our designs provide over a 40%savings in average energy consumption for a given average performance.<P> <DT> <A HREF="../../pdffiles/p3_6.pdf" NAME="p3_6"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P3.6 Inverse Polarity Techniques for High Speed/Low-Power Multipliers [p. 264]</B> <DD><I>Pascal C.H. Meier, Rob A. Rutenbar, L. Richard Carley </I><P> Various high-speed techniques have been developed for multipliers, but with the increasing popularity of mobile computing, a recent goal has been to minimize power dissipation. A popular delay-reduction technique applied to adder circuits is polarity inversion of bits. As this optimization reduces transistor count, it also has the potential for lowering power dissipation, and can be effectively applied to Wallace tree partial product reduction stages. We illustrate how this technique reduces power, interconnect capacitance, and chip area. Power reduction of up to 25% is achieved.<BR> 1.1 Keywords: Multiplier, low power, inverse polarity.<P> </DL> <HR> <H2>Session P4: <A NAME="sessp4"> Architecture & Systems </A></H2> <I>Chair: Mahadevamurty Nemani</I><P> <DL> <DT> <A HREF="../../pdffiles/p4_1scan.pdf" NAME="p4_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P4.1 Instruction Fetch Energy Reduction Using Loop Caches for Embedded Applications With Small Right Loops [p. 267]</B> <DD><I>Lea Hwang Lee, Bill Moyer, John Arends </I><P> A fair amount of work has been done in recent years on reducing power consumption in caches by using a small instruction buffer placed between the execution pipe and a larger main cache [1,2,6]. These techniques, however, often degrade the overall system performance. In this paper, we propose using a small instruction buffer, also called a loop cache, to save power. A loop cache has no address tag store. It consists of a direct-mapped data array and a loop cache controller. The loop cache controller knows precisely whether the next instruction request will hit in the loop cache, well ahead of time. As a result, there is no performance degradation.<BR> Keywords: Low cost, low power, embedded systems, small program loops, instruction buffering.<P> <DT> <A HREF="../../pdffiles/p4_2.pdf" NAME="p4_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P4.2 A Methodology for Power Efficient Partitioning of Data-Dominated Algorithm Specifications Within Performance Constraints [p. 270]</B> <DD><I>K. Masselos, K. Danckaert, F. Catthoor, C.E. Goutis, H. DeMan </I><P> A methodology for power efficient partitioning of real-time data-dominated system specifications is presented. The proposed methodology aims at reducing the memory requirements in realizations of such applications by applying extensive code transformations in the initial system specification before partitioning over processors. This reorganization basically aligns the data production and consumption between the different procedures of the initial specification thus reducing the memory size requirements (and the resulting power) of the system's realizations especially those in the interfaces between different processors. The main novel contribution is that performance issues are explicitly taken into account during power oriented system-level transformations. The proposed methodology can be applied both in a parallel (programmable) processor context and also in heterogeneous hardware-software architectures.<P> <DT> <A HREF="../../pdffiles/p4_3.pdf" NAME="p4_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P4.3 Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption [p. 273]</B> <DD><I>Koji Inoue, Tohru Ishihara, Kazuaki Murakami</I><P> This paper proposes a new approach using way prediction for achieving high performance and low energy consumption of set-associative caches. By accessing only a single cache way predicted, instead of accessing all the ways in a set, the energy consumption can be reduced. This paper shows that the way-predicting set-associative cache improves the ED (energy-delay) product by 60-70% compared to a conventional set-associative cache.<P> <DT> <A HREF="../../pdffiles/p4_4.pdf" NAME="p4_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P4.4 Designing Power Efficient Hypermedia Processors [p. 276]</B> <DD><I>Chunho Lee, Johnson Kin, Miodrag Potkonjak, William H. Mangione-Smith </I><P> Distributed hypermedia system that supports collaboration is an emerging platform for creation, discovery, management and delivery of information. We present an approach to low power system design space exploration for distributed hypermedia applications. Traditionally, low power design and synthesis of application specific programmable processors has been done in the context of given number of operations required to complete a task. Our approach utilizes the modern advances in compiler technology and architectural enhancements that are well matched to the compiler technology. This work is, to the best of our knowledge, the first attempt to address the need for synthesis of low power hypermedia processors. Also, this is the first work to address the power efficiency through exploiting instruction level parallelism (ILP) found in hypermedia tasks by an production quality ILP compiler. Using the developed framework we conduct an extensive exploration of low power system design space for a hypermedia application under area and throughput constraints. The framework introduced in this paper is very valuable in making early low power design decisions such as architectural configuration trade-offs including the cache and issue width trade-off under area and throughput constraint, and the number of branch units and issue width.<P> <DT> <A HREF="../../pdffiles/p4_5scan.pdf" NAME="p4_5"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>P4.5 Dynamic Power Estimation Using the Probabilistic Contribution Measure (PCM) [p. 279]</B> <DD><I>Hoon-Sang Jin, Myung-Soo Jang, Jin-Suk Song, Jin-Yong Lee, Taek-Soo Kim, Jeong-Taek Kong</I><P> In this paper, we present CubicPower, which is a dynamic power estimator based on Verilog/VHDL simulators. We propose the power characterization model and the probabilistic contribution measure (PCM) algorithm to calculate the actual power consumption of cell instances with given switching information. In addition to PCM, the state dependency and non-switching activity of gates are taken into account for more accurate power estimation. Experimental results of CubicPower show less than 10% error compared with the results of PowerMill simulation and the measured values of the IMS test equipment. Due to the PCM algorithm, CubicPower is more accurate than the leading commercial dynamic power estimator at the gate level and is 2-3 orders of magnitude faster than PowerMill.<P> </DL> <HR> <H2> <A NAME="sesst1"> Tutorial 1: </A></H2> <DL> <DT> <A HREF="../../pdffiles/t1scan.pdf" NAME="t1"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>T1 Circuit Styles and Strategies for CMOS VLSI Design on SOI [p. 282]</B> <DD><I>Fari Assaderaghi</I><P> This paper reviews specific circuit styles and strategies employed in the design of CMOS VLSI on partially-depleted (PD) SOI. These strategies address issues and problems that arise on PD SOI circuits (mainly due to the floating-body effect) such as delay hysteresis, noise margin reduction, etc. These circuit approaches also try to utilize SOI-specific properties to achieve a larger performance gain than that of a simple re-map of a bulk design to SOI. Although many aspects of CMOS design pertaining to SOI will be covered, the emphasis will be on dynamic and static circuits and high-performance SRAM's.<P> </DL> <HR> <H2> <A NAME="sesst2"> Tutorial 2: </A></H2> <DL> <DT> <A HREF="../../pdffiles/t2.pdf" NAME="t2"><IMG SRC="../../../../../icons/pdficon.gif"></A> <B>T2 System-Level Power Optimization: Techniques and Tools [p. 288]</B> <DD><I>Luca Benini, Giovanni De Micheli</I><P> This tutorial presents a cohesive view of power-conscious system-level design. We consider systems as consisting of a hardware platform executing software programs. We address the problems of power estimation and minimization for such systems. We consider the major constituents of systems: processors, memories and communication resources. We analyze power dissipation in these components and we survey computer-aided power reduction techniques. We also consider global system-level control schemes, such as dynamic power management. We conclude by pointing out further research problems which are still open in this domain.<P> </DL>