Micromachined microsystems and Micro Electro Mechanical
Systems (MEMS) have made possible the development of highly
accurate and portable sensors and instrument for a variety of
applications in the health care, industrial, consumer products,
avionics, and defense. Design of low-power circuits for these
applications, and use of micromachined sensors and actuators in
combination with integrated circuits to implement even lower
power microinstruments has now become possible and the focus
of attention. This paper reviews the state of the art in the
development of micromachined microsystems and MEMS,
discusses low-power design approaches for microsystems, and
reviews some recent development in power generation and energy
harvesting from the environment.
Keywords:
MEMS, Micromachining, Low-Power, Microsystems, Power
Sources, Energy Harvesting
Processors in portable electronic devices generally have a computational
load which has time-varying performance requirements.
Dynamic Voltage Scaling is a method to vary the processor's supply
voltage so that it consumes the minimal amount of energy by
operating at the minimum performance level required by the active
software processes. A dynamically varying supply voltage has
implications on the processor circuit design and design flow, but
with some minimal constraints it is straightforward to design a processor
with this capability.
Keywords: Energy efficient, variable voltage, processor, circuit design.
Comparisons among different dual-VT design choices for a large
on-chip cache with single-ended sensing show that the design
using a dual-VT cell and low-VT peripheral circuits is the best, and
provides 10% performance gain with 1.2x larger active leakage
power, and 1.6% larger cell area compared to the best design
using high-VT cells.
Keywords: Dual-VT, SRAM, Single-Ended Sensing.
In this paper we present a completely on-chip voltage regulation technique which promises to adjust the degree of voltage regulation in a digital logic chip in the face of process induced delay variations so as to minimize energy dissipation while always guaranteeing the target operating frequency. For this purpose the delay of a critical path replica of the circuit being regulated is constantly compared with the target delay to provide the regulator with the information needed to select the optimum voltage levels. The proposed solution is even more attractive in that no external components are required. Based on this scheme, a completely on-chip voltage regulator has been fabricated in a commercial 0.5 um CMOS process and used to generate the inner rail voltages for a DSP multiplier-accumulator (MAC) implemented in mixed swing QuadRail. Measured results indicate that the voltages generated by the regulator offer a very high degree of load regulation thus verifying the fast response time of the on-chip output buffer.
Digital sub-threshold logic circuits have recently been proposed for applications in the ultra-low power end of the design spectrum, where the performance is of secondary importance. To improve switching performance of the subthreshold logic family with comparable energy/switching, we propose the use of sub-DTMOS (sub-threshold Dynamic Threshold MOS) transistors. The stability of sub-threshold DTMOS logic to temperature and process variations eliminates the need of additional stabilization scheme that may be required for regular sub-threshold MOS logic families to ensure proper operation in the sub-threshold region.
We introduce the notion of energy scalable computation on general purpose processors. The principle idea is to maximize computational quality for a given energy constraint. The desirable energy-quality behavior of algorithms is discussed. Subsequently the energy-quality scalability of three distinct categories of commonly used signal processing algorithms (viz. filtering, frequency domain transforms and classification) are analyzed on the StrongARM SA-1100 processor and transformations are described which obtain significant improvements in the energy-quality scalability of the algorithm.
This paper presents a new approach for power reduction by taking a global, software-centri view. It analyzes the sources of power consumption: tasks that require services from hardware components. When a component is not used by any task, it an enter a sleeping state to save power. Operating systems have detailed information about tasks; therefore, S is the best place for power reduction. Our technique is effective in identifying hardware idleness and shutting down unused components. We implement this technique in Linux and show that it can save more than 50% power compared to traditional hardware-centri shutdown techniques.
Quality of service (QoS) is one of the key features for new Internet-based multimedia and other applications. Meanwhile, energy remains as a big concern for systems that perform such applications. We address the issue of combining system design concerns and QoS requirements to design systems that can deliver QoS guarantees. In this paper, we discuss how to satisfy QoS requirements and minimize the system's energy consumption. Specifically, we consider the following problem: Given a set of applications each specifying its required amount of computation and service time, how we allocate CPU time and determine the voltage profile on a variable voltage system, such that all the applications' requirements are satisfied and the system's total energy consumption is minimized. We optimally solve several basic cases and propose a dynamic programming procedure for the general case. Simulation shows that the new approach saves 38.75% energy over the system shut-down technique.
Portable wireless systems require long battery lifetime while still delivering high performance. The major contribution of this work is combining new power management (PM) and power control (PC) algorithms to trade off performance for power consumption at the system level in portable devices. First we present the formulation for the solution of the PM policy optimization based on renewal theory. Next we present the formulation for power control (PC) of the wireless link that enables us to obtain further energy savings when the system is active. Finally, we discuss the measurements obtained for a set of PM and PC algorithms implemented for the WLAN card on a laptop. The PM policy we developed based on our renewal model consumes three times less power as compared to the default PM policy for the WLAN card with still high performance. Power control saves additional 53% in energy at same bit error rate. With both power control and power management algorithms in place, we observe on average a factor of six in power savings.
Power consumption is a key point in the design of high-speed switched capacitor (SC) circuits, which allow to efficiently implement a number of analog functions. Among them, SC Signal-Delta modulators are very popular for A/D conversion: in this kind of circuits, operational amplifiers are the most consuming cells because of their requirements in terms of DC gain and unity-gain frequency. A new amplifier with 110dB DC gain and a unity-gain frequency of 250MHz is presented. The large power consumption (20mW) makes critical its use in commercial applications: however, combining this cell with a fast adaptive biasing circuit, high performance may be achieved with a reasonable dissipation. This approach has been used in the design of a 6th-order bandpass Sigma-Delta modulator featuring 73dB DR and suitable for the conversion at IF (10.7MHz) of the FM radio signal.
The power consumption of mixed-signal systems featured by an analog front-end, a digital back-end, and with signal processing tasks that can be computed with multiplications and accumulations, is analyzed. An implementation is proposed, composed of switched-capacitor mixed analog/digital multiply-accumulate units in the analog front-end, followed by an A/D converter. This implementation is shown to be superior in respect of power consumption compared to an equivalent implementation with a high-speed A/D converter in the front-end, to execute signal processing tasks that include decimation. The power savings are only due to relaxed requirement on A/D conversion rate, as a direct consequence of the decimation. In a case study of a narrowband FIR filter, realized with four multiply-accumulate units, and with a decimation factor of 100; power saving is 54 times. Implementation details are given, the power consumption, and the thermal noise are analyzed.
A low power monolithic Clock and Data Recovery IC for 2.5 Gb/s
SDH STM-16 systems has been designed and fabricated using
Maxim GST-2 27 GHz-fT Silicon bipolar technology. The circuit
performs the following functions: signal amplification and
limitation, clock recovery and decision; a single 3.3 V supply
voltage is required, and power consumption results below
350 mW. This IC and a previously presented transimpedance
amplifier so allows composing a chip set for the receiver with a
total power dissipation below 0.5 W. Preliminary measurements
under a 2 23 -1 PRBS data stream have shown an input sensitivity
below 20 mVpp and a rms jitter of 10 ps.
Keywords:
Clock recovery, optical communications, SDH, low power.
The design of the standard CMOS IC core of a commercial wireless burglar alarm system is presented as an example of a very low-power analog VLSI design for battery-operated systems. The main constraint is battery life, which must be at least five years (with standard camera-battery). The chip is composed of a digital (decision) part and an analog interface with sensors. The entire chip absorbs 10 uA. Measures on each single component and test on working environment show full functionality and complied with specifications. Even though the example is application specific, the design solutions and each single element can also be utilized in many other battery-operated low-frequency devices (e.g. environmental parameter monitoring).
Memory-processor integration offers new opportunities for reducing the energy of a system. In the case of embedded systems, one solution consists of mapping the most frequently accessed addresses onto the on-chip SRAM to guarantee power and performance efficiency. This option is especially effective when memory access patterns can be profiled and studied at design time (as in typical real-time embedded systems). In this work, we propose an algorithm for the automatic partitioning of on-chip SRAM in multiple banks that can be independently accessed. Starting from the dynamic execution profile of an embedded application running on a given processor core, we synthesize a multi-banked SRAM architecture optimally fitted to the execution profile. The algorithm provides a globally optimum solution to the problem under realistic assumptions on the power cost metrics, and with constraints on the number of memory banks. Results, collected on a set of embedded applications for the ARM processor, have shown average energy savings around 42%.
In recent years reducing power has become a critical design goal for high-performance microprocessors. This work attempts to bring the power issue to the earliest phase of high-performance microprocessor development. We propose a methodology for power-optimization at the micro-architectural level. First, major targets for power reduction are identified within superscalar microarchitecture, then an optimization of a superscalar micro-architecture is performed that generates a set of energy-efficient configurations forming a convex hull in the power-performance space. The energy-efficient families are then compared to find configurations that dissipate the lowest power given a performance target, or, conversely, deliver the highest performance given a power budget. Application of the developed methodology to a superscalar micro-architecture shows that at the architectural level there is a potential for reducing power up to 50%, given a performance requirement, and for up to 15% performance improvement, given a power budget.
Deep-submicron CMOS designs have resulted in large leakage energy dissipation in microprocessors. While SRAM cells in on-chip cache memories always contribute to this leakage, there is a large variability in active cell usage both within and across applications. This paper explores an integrated architectural and circuit-level approach to reducing leakage energy dissipation in instruction caches. We propose, gated-V dd , a circuit-level technique to gate the supply voltage and reduce leakage in unused SRAM cells. Our results indicate that gated-V dd together with a novel resizable cache architecture reduces energy-delay by 62% with minimal impact on performance.
Microprocessors represent a significant portion of the energy consumed
in portable electronic devices. Dynamic Voltage Scaling
(DVS) allows a device to reduce energy consumption by lowering
its processor speed at run-time, allowing a corresponding reduction
in processor voltage and energy. A voltage scheduler determines
the appropriate operating voltage by analyzing application constraints
and requirements. A complete software implementation,
including both applications and the underlying operating system,
shows that DVS is effective at reducing the energy consumed
without requiring extensive software modification.
Keywords:
Low-power, energy-efficient, RTOS, operating systems.
In this work, MOS Current Mode Logic (MCML) is
analyzed for application to low power, mixed signal
environments. A small MCML cell library is developed and
optimized for several different performance requirements. The
cells are then applied to the generation of piplelined CORDIC
structures and compared with equivalent CMOS circuits. MCML
CORDICs are designed which can operate from 125MHz to
310MHz with power consumption varying between 4.3mW and
18.6mW. These power results are up to 1.5 times less than
CMOS CORDICs with equivalent propagation delays. Design
was done in a 0.25µm standard CMOS process from ST
Microelectronics.
Keywords:
Current mode logic, CORDIC, Low-energy design, Digital logic.
Realization of high-performance domino logic depends strongly on energy-efficient and noise-tolerant interconnect design in ultra deep submicron processes. We characterize the cycle-averaged power model for interconnects accounting for switching statistics and dynamic behaviors. For the sake of signal integrity, cross-coupling effects are also characterized which reflect logical correlation between adjacent wires. Based on the new models for interconnect power and capacitive crosstalk, we optimize the coupling power consumed by interconnects with crosstalk constraints. Experimental results show that optimized designs save the power consumption significantly.
Two novel low power flip-flops are presented in the paper.
Proposed flip-flops use new gating techniques that reduce power
dissipation deactivating the clock signal. Presented circuits
overcome the clock duty-cycle limitation of previously reported
gated flip-flops.
Circuit simulations with the inclusion of parasitics show that
sensible power dissipation reduction is possible if input signal has
reduced switching activity. A 16-bit counter is presented as a
simple low power application.
Keywords:
CMOS digital integrated circuits, flip-fops, low-power circuits,
transition probability.
A synthesis method for generating hybrid pass gate circuits is presented. These circuits combine features from both complementary CMOS and pass gates architectures. The simulation results using a 0.7 um technology show that circuits synthesized according to the proposed method may achieve significant improvements in terms of area, power and delay over traditional full swing pass transistor logic and complementary CMOS.
Energy is one of the limited resources for modern systems, especially the battery-operated devices and personal digital assistants. The backlog in new technologies for more powerful battery is changing the traditional system design philosophies. For example, due to the limitation on battery life, it is more realistic to design for the optimal benefit from limited resource rather than design to meet all the applications' requirement. We consider the following problem: a system achieves a certain amount of utility from a set of applications by providing them certain levels of quality of service (QoS). We want to allocate the limited system resources to get the maximal system utility. We formulate this utility maximization problem, which is NP-hard in general, and propose heuristic algorithms that are capable of finding solutions provably arbitrarily close to the optimal. We have also derived explicit formulae to guide the allocation of resources to actually achieve such solutions. Simulation shows that our approach can use 99.9% of the given resource to achieve 25.6% and 32.17% more system utilities over two other heuristics, while providing QoS guarantees to the application program.
This paper deals with power minimization problem for data-dominated
applications based on a novel concept called partially
guarded computation. We divide a functional unit into two parts -
MSP (Most Significant Part) and LSP (Least Significant Part) -
and allow the functional unit to perform only the LSP
computation if the range of output data can be covered by LSP.
We dynamically disable MSP computation to remove unnecessary
transitions thereby reducing power consumption. We also propose
a systematic approach for determining optimal location of the
boundary between the two parts during high-level synthesis.
Experimental results show about 10~44% power reduction with
about 30~36% area overhead and less than 3% delay overhead in
functional units.
Keywords: Low Power, Partially Guarded Computation
In contrast to current design practice for (programmable) processor mapping, which mainly targets performance, we focus on a systematic trade-off between cycle budget and energy consumed in the background memory organization. The latter is a crucial component in many of today's designs, including multi-media, network protocols and telecom signal processing. We have a systematic way and tool to explore both freedoms and to arrive at Pareto charts, in which for a given application the lowest cost implementation of the memory organization is plotted against the available cycle budget per submodule. This by making optimal usage of a parallelized memory architecture. We indicate, with results on a digital audio broadcasting receiver and an image compression demonstrator, how to effectively use the Pareto plot to gain significantly in overall system energy consumption within the global real-time constraints.
This paper presents a state assignment technique called priority encoding, which uses multi-code assignment plus clock gating to reduce power dissipation in sequential circuits. The basic idea is to assign multiple codes to states so as to enable more effective clock gating in the sequential circuit. Practical design examples are studied and simulated by PSPICE. Experimental results demonstrate that the priority encoding technique can result in sizable power saving.
In this paper, we review the Bluetooth technology, a new universal radio interface enabling electronic devices to connect and communicate wirelessly via short-range connections. Motivations for the radio requirements are given, and the implications of system parameters like operating modes, frequency hopping, interference resistance are discussed from a low-power perspective. Specific characteristics enabling low-cost single-chip implementations and supporting low power consumption are outlined.
A new high-speed Domino circuit, called HS-Domino is developed. HS-Domino resolves the trade-off between performance and noise margins in conventional CD-Domino logic while dissipating low dynamic power with minimal area overhead. A dual-threshold (MTCMOS) implementation of HS-Domino and DDCVS logic is also devised. This implementation achieves low leakage values during standby, while maintaining high performance and low dynamic power during the active mode.
In this paper, we propose an adiabatic register file for ultra-low-energy applications, which uses a new reversible adiabatic logic, nRERL [1]. The nRERL register file discards garbage information with minimal energy dissipation. We designed a 16x8b three-port nRERL register file. From SPICE simulations, we found that the nRERL register file consumes less than 10% of the energy consumed in the conventional register file at the frequency of lower than 1MHz. We also describe how to design a RAM, a large array of the storage cells.
Minimum power CMOS ASIC macrocells are designed by minimizing the macrocell area using a new methodology to optimally insert repeaters for n-tier multilevel interconnect architectures. The minimum macrocell area and power dissipation are projected for the 100, 70 and 50 nm technology generations and compared with a n-tier design without using repeaters. Repeater insertion and a novel interconnect geometry scaling technique decrease the power dissipation by 58-68% corresponding to a macrocell area reduction of 70-78% for the global clock frequency designs of these three technology generations.
Recovering and reusing circuit energies that would otherwise be
dissipated as heat can reduce the power dissipated by a VLSI
chip. To accomplish this requires a power source that can
efficiently inject and extract energy, and an efficient power
delivery system to connect the power source to the circuit nodes.
The additional circuitry and timing required to support this
process can readily exceed the power-savings benefit. Clock-powered
logic is a circuit-level, energy-recovery approach that has
been implemented in two generations of small-scale
microprocessor experiments. The results have shown that it is
possible and practical to extract useful amounts of power savings
by leveraging the additional circuitry for other compatible
purposes. The capabilities and limitations of clock-powered logic
as a competitive low-power approach are presented and discussed
in this paper.
Keywords:
Energy-recovery CMOS, clock-powered logic, adiabatic charging,
microprocessors, ER-CMOS, supply-voltage scaling.
We present new modeling and simulation techniques to improve the accuracy and efficiency of transient analysis of large power distribution grids. These include an accurate model for the inherent decoupling capacitance of non-switching devices, as well as a statistical switching current model for the switching devices. Moreover, three new simulation techniques are presented for problem size-reduction and speed-up. Results of application of these techniques on three PowerPC tm microprocessors are also presented.
We introduce an energy consumption analysis of complex digital systems through a case study of ARM7TDMI RISC processor by using a new energy measurement technique. We developed a cycle-accurate energy consumption measurement system based on charge transfer which is robust to spiky noise and is capable of collecting a range of power consumption profiles in real time. The relative energy variation of the RISC core is measured by changing the opcode, the instruction fetch address, the register number, the register value, the data fetch address, and the immediate operand value in each pipeline stage, respectively. We demonstrated energy characterization of a pipelined RISC processor for high-level power reduction.
Power is increasingly becoming a design constraint for embedded systems. A processor is responsible for energy consumption on account of the software component of the embedded system. The power estimation of this component is a major concern due to the rising complexities of processors and the slow estimation tools. This work attempts to estimate the energy dissipation of the PR1900 - processor based on instruction set model with improved accuracy. The model is integrated in a simulation framework and validated. Over 200 times speedup has been obtained with average 1.4% loss in accuracy over gate level estimation. Analysis of the energy dissipated by the instruction vis a vis the processor architecture has been carried out and a substantial reduction in the measurement effort to build the processor energy model has been achieved.
We extend earlier work on high-level average power estimation
to include the power due to interconnect loading.
The resulting technique is a combination of a RTL-level gate
count prediction method and average interconnect estimation
based on Rent's rule. The method can be adapted to
be used with different place and route engines and standard
cell libraries. For a number of benchmark circuits, the
method is verified by extracting wire lengths from a layout
of each circuit and then comparing the predicted (at RTL)
power against that measured using SPICE. An average error
of 14.4% is obtained for the average interconnect length,
and an average error of 25.8% is obtained for average power
estimation including interconnect effects.
Categories and Subject Descriptors:
B.7.2 [Hardware]: Integrated Circuits|Design Aids
General Terms:
High-level power estimation, Register transfer level (RTL)
power estimation, Interconnect capacitance estimation
This paper describes power analysis at sub-zero temperatures for a high performance dynamic multiport register file (6 Read and 2 Write ports, 32 wordlines x 64 bitlines) fabricated in 0.25 um Silicon on Insulator (SOI) and bulk technologies. Based on the hardware it is shown that the performance of both register file and latch improves by 2-3.5% per 10 0 C reduction in temperature. The standby power for SOI reduces by 1.5% to 3% per 10 0 C temperature drop down to _30 deg C. The SOI chip is shown to have more significant performance improvement at low temperatures compared to bulk chip due to the floating body effect which partially offsets the increase in the threshold voltages (Vt). The low temperature performance gain is attributed to reduction in capacitance (around 7-8%) and rest is due to dynamic threshold voltages. At _30 deg C the register file is capable of functioning close to 1.02 GHz for read and write operations in a single cycle.
This paper describes an adaptive power management architecture to reduce power consumption in digital filters. The proposed approach combines two low-power techniques which utilize supply voltage reduction. The first technique, multiple voltage distribution (MVD), attempts to reduce power consumption by assigning reduced supply voltages to circuit modules while satisfying timing constraints. The second technique, adaptive voltage scaling (AVS), dynamically adjusts these multiple voltages to meet throughput requirements resulting in further power reduction. An FIR filter application using the combined MVD-AVS power management scheme for two adaptively scaled supply voltages is shown to consume one-third the power of a fixed supply voltage scheme, and half the power consumed with a single supply AVS.
A self-timed radix-2 division scheme for low power consumption
is proposed. By replacing dual-rail dynamic circuits in non-critical
data paths with single-rail static circuits, power dissipation is
decreased, yet performance is maintained by speculative remainder
computation. SPICE simulation results show that the proposed
design can achieve 33.8-ns latency for 56-bit mantissa division
and 47% energy reduction compared to a fully dual-rail version.
Keywords:
Low power, radix-2 division, self-timed, RSD.
In this paper we propose a novel rate calculation algorithm
called Quantized Rate Selection (QRS) for quantized un-dithered
dynamic supply voltage scaling (DSVS) systems.
The algorithm monitors the total buffered workload, and
where possible selects a rate value equal to a quantized
rate value. At quantized rate values, energy dissipation
of quantized DSVS systems approaches continuous voltage
level DSVS systems. Our experimental work on FMIDCT
computation using nine video sequences and a 4-level quantized
undithered system shows that additional energy savings of 1.4
% to 18.5% can be achieved from QRS, compared to the existing
averaging technique.
General Terms:
Dynamic Supply Voltage Scaling, Averaging, Quantization
and Dithering
In this work we propose an architecture for the acquisition and digitization of cardiac signals in a pace-maker, based on Sigda-Delta modulation. Due to the characteristics of such an application, the proposed system presents the typical design challenges of low-voltage, low-power circuits. The work demonstrates that, thanks to the narrow bandwidth typical of biological signals (50-150 Hz), oversampling conversion techniques can be advantageous in terms of power dissipation at a given dynamic range. The converter is designed in a 0:8um CMOS technology using the switched Op-Amp technique. The Sigma-Delta converter is a third order modulator with an oversampled frequency of about 8KHz and the circuit can operate at a minimum supply voltage of 2 V, while dissipating 2 uW at most. According to simulation results the dynamic range is larger than 50 dB .
This paper presents the design of a 3rd-order lowpass Sigma-Delta analog-to-digital (A/D) converter using a continuous-time (CT) loopfilter. The loopfilter has been implemented by using active RC-integrators. The influence of the low supply voltage on the building blocks such as the amplifier and the common mode feedback as well as on the overall Sigma-Delta modulator is discussed. Simulation results of the 1:5V CT Sigma-Delta A/D converter show a 75 dB dynamic range in a bandwidth of 25kHz. The expected power consumption is less than 300uW.
In this paper, the design and performance of a CMOS base-band
circuit for WCDMA direct conversion receiver are presented.
Consisting of one 5th-order anti-aliasing filter, one
4th-order tunable channel filter, and three variable gain amplifier
(VGA) stages, the baseband chain provides 72dB gain
range with 2dB gain step and is tunable to select three different
bandwidths (from 5MHz to 20MHz radio-frequency
spacing). It dissipates only 18mW from a single 3V supply.
The input IP3 is 10dBm, and the input-referred noise in the
passband is 41nV/ pHz.
Keywords:
Wideband CDMA, Baseband, Filter, VGA
A low-voltage analog multiplier operating at 1.2V is
presented. The multiplier core consists of four MOS transistors
operating in the saturation region. The circuit exploits the
quadratic relation between current and voltage of the MOS
transistor in saturation. The circuit was designed using standard
0.6um CMOS technology. Simulation results indicate an IP3 of
4.9dBm and a spur free dynamic range of 45dB.
Keywords:
Low-voltage, RF, CMOS, analog multiplier.
Gate capacitance has complex voltage dependency on terminal
voltages but the impact of this voltage dependency of gate capacitance on
power and delay has not been fully investigated, especially, in low-voltage,
low-power designs. Introducing an effective gate capacitance,
CG,eff, it is shown that the power and delay of CMOS digital circuit can be
estimated accurately. CG,effis a strong function of VTH/VDD and VTH/VDD
tends to increase in low-voltage region. Hence, the effective capacitance
relative to oxide capacitance, COX, is decreasing in low-voltage, low-power
designs. Therefore, considering CG,eff in accurate power and delay
estimation becomes more important in the future.
Keywords:
Gate capacitance, low supply voltage, low-power design.
Recent studies [MGK 98, Tiw 98] have confirmed that a
significant amount of energy is dissipated in the process of instruction
dispatching and issue in modern superscalar microprocessors. We
propose a model for the energy dissipated by instruction dispatching
and issuing logic in modern superscalar microprocessors and validate
them through register level simulations and SPICE - measured dissipation
coefficients from 0.5 micron CMOS layouts of relevant circuits.
Alternative organizations are studied for instruction window buffers
that result in energy savings of about 47% over traditional designs.
Keywords: power minimization, superscalar processor, instruction
dispatching, instruction issue, window buffer
Novel techniques for the power efficient synthesis of sum-of-product computations are presented. Simple and efficient heuristics for scheduling and assignment are described. Different partly static cost functions are proposed to drive the synthesis tasks. The proposed cost functions target the power consumption either in the buses connecting the functional units with the storage elements or inside the functional units. The partly static nature of the proposed cost functions reduces the time of the synthesis procedure. Experimental results from different relevant digital signal processing algorithmic kernels prove that the proposed synthesis techniques lead to significant power savings.
Adaptive encoding has shown to be an effective approach to bus power minimization in situations where characterization of the input statistics is not available. In this paper, we propose a novel technique for adaptive bus encoding that, conversely from existing solutions, exploits spatial correlations in the input data being transmitted to increase the accuracy in the dynamic selection of the encoding function. We discuss the encoding algorithm and we describe an architecture for its implementation as bus interface. We i present experimental data collected in a realistic simulation framework on a number of meaningful benchmarks, and we compare them to those obtained through the application of existing encoding schemes.
Advances in technology have allowed portable electronic devices to become smaller and more complex, placing stringent power and performance requirements on the device's components. The M.CORE M3 architecture was developed specifically for these embedded applications. To address the growing need for longer battery life and higher performance, an 8-Kbyte, 4-way set-associative, unified (instruction and data) cache with programmable features was added to the M3 core. These features allow the architecture to be optimized based on the application's requirements. In this paper, we focus on the features of the M340 cache sub-system and illustrate the effect on power and performance through benchmark analysis and actual silicon measurements.
Memory system usually consumes a significant amount of energy in many battery-operated devices. In this paper, we provide a quantitative comparison and evaluation of the interaction of two hardware cache optimization mechanisms (block buffering and sub-banking) and three widely used compiler optimization techniques (linear loop transformation, loop tiling, and loop unrolling). Our results show that the pure hardware optimizations (eight block buffers and four sub-banks in a 4K, 2-way cache) provided up to 4% energy saving, with an average saving of 2% across all benchmarks. In contrast, the pure software optimization approach that uses all three compiler optimizations, provided at least 23% energy saving, with an average of 62%. However, a closer observation reveals that hardware optimization becomes more critical for on-chip cache energy reduction when executing optimized codes.
This paper presents a procedure to generate energy-efficient
code for the Motorola DSP56K processor based on increasing the
packing efficiency and minimizing the number of address instructions.
The key features are a novel scheduling algorithm that reduces
the dependencies between instructions, a register allocation
algorithm that spills variables based on their packability, and
an address code generation algorithm that minimizes the number of
additional instructions. The size of the code generated by this
procedure is on the average 45% (25%) smaller than that generated by
Motorola's g56K (SPAM).
Categories and Subject Descriptors:
2.3 [Software and System Design]: Compilers, DSP and
embedded systems
General Terms: Code Generation, Low Power
This paper presents Pyramid code, an optimal code for transmitting sequential addresses over a DRAM bus. Constructed by finding an Eulerian cycle on a complete graph, this code is optimal for conventional DRAM in the sense that it minimizes the switching activity on the time-multiplexed address bus from CPU to DRAM. Experimental results on a large number of testbenches with different characteristics (i.e. sequential vs. random memory access behaviors) are reported and demonstrate a reduction of bus activity by as much as 50%.
This paper proposes a novel technique for power-performance trade-off based on profile-driven code execution. Specifically, we show that there is an optimal level of parallelism for energy consumption and propose a compiler-assisted technique for code annotation that can be used at run-time to adaptively trade-off power and performance. As shown by experimental results, our approach is up to 23% better than clock throttling and is as efficient as voltage scaling (up to 10% better in some cases). The technique proposed in this paper can be used by an ACPI-compliant power manager for prolonging battery life or as a passive cooling feature for thermal management.
This paper proposes an efficient asynchronous hardwired matrix-vector
multiplier for the two-dimensional discrete cosine
transform and inverse discrete cosine transform (DCT/IDCT). The
design achieves low power and high performance by taking
advantage of the typically large fraction of zero and small-valued
data in DCT and IDCT applications. In particular, it skips
multiplication by zero and dynamically activates/deactivates
required bit-slices of fine-grain bit-partitioned adders using
simplified, static-logic-based speculative completion sensing. The
results extracted by both bit-level analysis and HSPICE
simulations indicate significant improvements compared to
traditional designs.
Keywords:
Asynchronous matrix-vector multiplier, discrete cosine transform
In this paper, we describe area and power reduction techniques for a low-latency adaptive finite-impulse response filter for magnetic recording read channel applications. Various techniques are used to reduce area and power dissipation while speed remains as the main performance criterion for the target application. A parallel transposed direct form architecture operates on real-time input data samples and employs a fast, low-area multiplier based on selection of radix-8 pre-multiplied coefficients in conjunction with one-hot encoded bus leading to a very compact layout and reduced power dissipation. Area, speed and power comparisons with other low-power implementation options are also shown. The proposed filter has been fabricated using a 0.18 um L-effective CMOS technology and operates at 550 MSamples/s.
An 80,000 transistor, low swing, 32 x 32-bit multiplier was fabricated in a standard 0.35 um, Vth =0.5 V CMOS process and in a 0.35 m, back-bias tunable, near-zero Vth process. While standard CMOS at Vdd =3.3 V runs at 136 MHz, the same performance can be achieved in the low-Vth version at Vdd =1.3 V, resulting in more than 5 times lower power. Similar power reductions are obtained for frequencies down to 10 MHz. In addition, the low-Vth version is able to run at 188 MHz, which is 38% faster than standard CMOS.
A broad range of high-volume consumer applications require low-power,
battery operated, wireless microsystems and sensors.
These systems should conciliate a sufficient battery lifetime with
reduced dimensions, low cost and versatility. The design of such
systems highlights many tradeoffs between performances,
lifetime, cost and power consumption. Also, special circuit and
design techniques are needed to comply with the reduced supply
voltage (down to 1V).
These considerations are illustrated by design examples taken
from a transceiver chip realized in a standard 0.5m digital
CMOS process. The chip is dedicated to a distributed sensors
network and is based on a direct-conversion architecture. The
circuit prototype operates in the 434 MHz ISM band and
consumes only 1mW in receive mode. It achieves a -95dBm
sensitivity for a data rate of 24kbit/s. The transmitter section is
designed for 0dBm output power under the minimum 1V supply,
with a global efficiency higher than 15%.
Keywords:
RF, Transceiver, Low-Power, Low-Voltage, CMOS.
A fully differential 0.35um CMOS LNA plus mixer, tailored to a double conversion architecture, for GPS applications has been realized. The LNA makes use of an inductively degenerated input stage and a resonant LC load, featuring 12% frequency tuning, accomplished by an MOS varactor. The mixer is a Gilbert cell like, in which a NMOS and a PMOS differential pair, shunted together, realize the input stage. This topology allows to save power, for given mixer gain and linearity. The front-end measured performances are 40dB gain, 3.8dB NF, -25.5dBm IIP3, 1.3 GHz input frequency, 140MHz output frequency, with 8mA from a 2.8V voltage supply.
A bias boosting technique for a 3.2V, 1.9GHz Class AB RF amplifier designed in a 30GHz BiCMOS process is presented in this paper. In a Class AB amplifier, the average current drawn from the supply depends on the input signal level. As the output power increases so does the average currents in both the emitter and the base of the power transistor. The increased average current causes an increased voltage drop in the biasing circuitry and the ballast resistor. This reduces the conduction angle in the amplifier, pushing it deep into Class B and even Class C operation, reducing the maximum output power by 25%. To avoid the power reduction, the amplifier should have a larger bias which inevitably has a larger power dissipation at low output power levels. The proposed bias boosting circuitry dynamically increases the bias of the power transistor as the output power increases. The amplifier has less power dissipation at low power levels with an increased maximum output power.
This paper presents a framework for CMOS ring oscillator phase noise analysis for given power consumption specifications. This model considers both linear and nonlinear operations. It indicates that fast rail-to-rail switching has to be achieved for low phase noise and that the up-conversion of low-frequency noise from the current bias/control circuit can be significant. Our phase noise model is validated via simulation and measurement results. We also present a coupled-ring oscillator whose phase noise is - 114dBc/Hz at a 600kHz offset fro the 960MHz carrier frequency.
Scaling of feature sizes in semiconductor technology has been responsible for increasingly higher computational capacity of silicon. This has been the driver for the revolution in communications and computing. However, questions regarding the limits of scaling (and hence Moore's Law) have arisen in recent years due to the emergence of deep submicron noise. The tutorial describes noise in deep submicron CMOS and their impact on digital as well as analog circuits. In particular, noise-tolerance is proposed as an effective means for achieving energy and performance efficiency in the presence of DSM noise.
Wireless communications and more specifically, the fast growing
penetration of cellular phones and cellular infrastructure are the
major drivers for the development of new programmable Digital
Signal Processors (DSP's). In this tutorial, an overview will be
given of recent developments in DSP processor architectures, that
makes them well suited to execute computationally intensive
algorithms typically found in communications systems. DSP
processors have adapted instruction sets, memory architectures
and data paths to execute compute intensive communications
algorithms efficiently and in a low power fashion. Basic building
blocks include convolutional decoders (mainly the Viterbi
algorithm), turbo coding algorithms, FIR filters, speech coders,
etc. This is illustrated with examples of different commercial and
research processors. Please note that the authors do not endorse
the processors used in this tutorial. These processors are used to
illustrate how different solutions are proposed for the same
problem.
Keywords: Digital Signal Processing, architectures,
programmable processors, wireless communications.