| |
DAC 2002 ABSTRACTS
Sessions:
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
Chair: Aart de Geus
Organizers: Sharon Turnoy, Deirdre Hanford
Panel Members: Moshe Gavrielov, Richard Goering, Lucio Lanza, Vishal Saluja,
Jay Vleeschhouwer [p. 1]
-
-
The EDA sector is capturing unprecedented attention on Wall
Street. With seven IPOs in 2001 alone and strong performance by
the EDA "blue chips," the industry has gained new prominence
with the capital markets. In this panel, Aart de Geus will
moderate a discussion between representatives of the various
constituencies who play a role in shaping Wall Street's opinion of
EDA: financial analysts, portfolio managers, venture capitalists,
CEOs, and the press.
Questions discussed will include: How do investors and analysts
currently view EDA? What contributes to that perception? What
factors drive EDA's current favor with Wall Street, and why is the
sector "hot" compared to two to three years ago? How do we
sustain that favor? Given the highly complex nature of our
industry, how do investors decipher the strength of a particular
EDA firm? Is EDA tied to the semiconductor industry's
performance? What role does the press play in shaping the view?
Chair: Gang Qu
Organizers: Ahmed A. Jerraya, Krzysztof Kuchcinski
-
2.1 IP Delivery for FPGAs Using Applets and JHDL [p. 2]
-
Michael J. Wirthlin, Brian McMurtrey
This paper introduces an FPGA IP evaluation and delivery
system that operates within Java applets. The use of such
applets allows designers to create, evaluate, test, and obtain
FPGA circuits directly within a web browser. Based on
the JHDL design tool, these applets allow structural viewing,
circuit simulation, and netlist generation of application-specific
circuits. Applets can be customized to provide varying
levels of IP visibility and functionality as needed by both
customer and vendor.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - Simulation, Hardware
description languages
General Terms
Design
Keywords
Intellectual Property, JHDL, Applet, FPGA
-
2.2 Watermarking Integer Linear Programming Solutions [p. 8]
-
Seapahn Megerian, Milenko Drinic, Miodrag Potkonjak
Linear programming (LP) in its many forms has proven to be an
indispensable tool for expressing and solving optimization problems
in numerous domains. We propose the first set of generic
watermarking techniques for integer-LP (ILP). The proof of authorship
by watermarking is achieved by introducing additional
constraints to limit the solution space and can be used as effective
means of intellectual property protection (IPP) and authentication.
We classify and analyze the types of constraints in the ILP watermarking
domain and show how ILP formulations provide more
degrees of freedom for embedding signatures than other existing
approaches. To demonstrate the effectiveness of the proposed ILP
watermarking techniques, the generic discussion is further concretized
using two examples, namely Satisfiability and Scheduling.
Categories and Subject Descriptors
K.6.5 [Management of Computing and Information Systems]:
Security and Protection -- Digital Watermarking
General Terms
Algorithms, Economics, Theory, Legal Aspects.
Keywords
Digital Watermarking, Intellectual Property Protection
-
2.3 Model Design Using Hierarchical Web-Based Libraries [p. 14]
-
Fabrice Bernardi, Jean-FranÙois Santucci
Design tools can be profitably associated with libraries of
reusable modeling components that will make the description
and also the validation of the models much easier. Furthermore,
applications of today and tomorrow will be increasingly based
on three fundamental technologies: Object
Orientation, Client/Server and Internet. We propose in this
article an object-oriented architecture for the definition of
Web-based hierarchical models libraries. The originality of
our approach lies in the facts that it is based on : (i) a
notion of genericity of use, (ii) notions like inheritance and
abstraction links between the stored models and (iii) Web-based
storing and consulting libraries procedures.
Categories and Subject Descriptors
J.6 [Computer Applications]: Computer-Aided Design;
D.2.11 [Software]: Software Engineering|Software Architectures
; D.2.13 [Software]: Software Engineering|Reusable Software
General Terms
Design, Management
Keywords
models reuse, models libraries, Web-based access, abstraction hierarchy
-
2.4 Behavioral Synthesis via Engineering Change [p. 18]
-
Milenko Drinic, Darko Kirovski
Engineering change (EC) is a technique that enables a designer
to rapidly perform minor specification alternations
while minimally resynthesizing only small portions of the
specification throughout several levels of design abstraction.
In this paper, we introduce the first EC-based synthesis technique
for coordinated design optimization in multiple steps.
The technique has four phases: optimization region identification,feedback
formulation,resynthesis in first step, and
finally resynthesis in the second design step. To demonstrate
the technique,we focus on behavioral synthesis and
transformation, scheduling, and register assignment steps.
We developed a generic EC-based approach for design optimization
during multiple consecutive synthesis steps. Next,
we show how one can use EC to enhance coordinated application
of transformations and scheduling,and scheduling
and register assignment.
Categories and Subject Descriptors
B.5.2 [Register-Transfer-Level Implementation]: Design
Aids Optimization
General Terms
Design
Keywords
Engineering change, transformations, scheduling, register assignment
Chair: Gang Qu
Organizers: Grant E. Martin, Majid Sarrafzadeh
-
3.1 A Universal Technique for Fast and Flexible Instruction-Set Architecture Simulation [p. 22]
-
Achim Nohl, Gunnar Braun, Oliver Schliebusch, Rainer Leupers, Heinrich Meyr, Andreas Hoffmann
In the last decade, instruction-set simulators have become
an essential development tool for the design of new programmable
architectures. Consequently, the simulator performance
is a key factor for the overall design efficiency.
Based on the extremely poor performance of commonly used
interpretive simulators, research work on fast compiled instruction-set
simulation was started ten years ago. However,
due to the restrictiveness of the compiled technique,
it has not been able to push through in commercial products.
This paper presents a new retargetable simulation
technique which combines the performance of traditional
compiled simulators with the flexibility of interpretive simulation.
This technique is not limited to any class of architectures
or applications and can be utilized from architecture
exploration up to end-user software development.
The work-flow and the applicability of the so-called just-in-time
cache compiled simulation (JIT-CCS) technique will be
demonstrated by means of state of the art real world architectures.
Categories and Subject Descriptors
I.6.3 [Simulation and Modeling]: Simulation Support
Systems; I.6.3 [Simulation and Modeling]: Model Validation
and Analysis; D.3.2 [Programming Languages]:
Design Languages - LISA; C.0 [General]: Modeling of Computer
Architecture
General Terms
Design, Languages, Performance
Keywords
Retargetable simulation, compiled simulation, instruction
set architectures
-
3.2 A Fast On-Chip Profiler Memory [p. 28]
-
Roman Lysecky, Susan Cotterell, Frank Vahid
Profiling an application executing on a microprocessor is part of
the solution to numerous software and hardware optimization and
design automation problems. Most current profiling techniques
suffer from runtime overhead, inaccuracy, or slowness, and the
traditional non-intrusive method of using a logic analyzer doesn't
work for today's system-on-a-chip having embedded cores. We
introduce a novel on-chip memory architecture that overcomes
these limitations. The architecture, which we call ProMem, is
based on a pipelined binary tree structure. It achieves single-cycle
throughput, so it can keep up with today's fastest pipelined
processors. It can also be laid out efficiently and scales very well,
becoming more efficient the larger it gets. The memory can be
used in a wide-variety of common profiling situations, such as
instruction profiling, value profiling, and network traffic profiling,
which in turn can be used to guide numerous design automation
tasks.
Keywords
Profiling, system-on-a-chip, platform tuning, adaptive
architectures, low power, embedded CAD, binary tree, memory
design, embedded systems.
-
3.3 Design of an One-cycle Decompression Hardware
for Performance Increase in Embedded Systems [p. 34]
-
Haris Lekatsas, Jûrg Henkel, Venkata Jakkula
Code compression is known as an effective technique to reduce
instruction memory size on an embedded system. However, code
compression can also be very effective in increasing processor-to-memory
bandwidth and hence provide increased system performance.
In this paper we describe our design and design methodology
of the first running prototype of a one-cycle code decompression
unit that decompresses compressed instructions on-the-fly.
We describe in detail the architecture that enables decompression
of multiple instructions in one cycle and we present the design
methodologies and tools used. The stand-alone decompression unit
does not require any modifications on the processor core. We observed
up to 63% performance increase with 25% in average over
a wide variety of applications running on the hardware prototype
under various system configurations.
Categories and Subject Descriptors
B.3 [Hardware]: Memory Structures; C.3 [Computer Systems
Organization]: Special-purpose and Application-based Systems-Real-time
and embedded systems
General Terms
Algorithms, Design, Performance
Chair: Jacob White
Organizers: Jaijeet Roychowdhury, Mustafa Celik
-
4.1 A Factorization-Based Framework for Passivity-Preserving Model Reduction of RLC Systems [p. 40]
-
Q. Su, V. Balakrishnan, C.-K. Koh
We present a framework for passivity-preserving model reduction
for RLC systems that includes, as a special case, the well-known
PRIMA model reduction algorithm. This framework provides a
new interpretation for PRIMA, and offers a qualitative explanation
as to why PRIMA performs remarkably well in practice. In addition,
the framework enables the derivation of new error bounds for
PRIMA-like methods. We also show how the framework offers a
systematic approach to computing reduced-order models that better
approximate the original system than PRIMA, while still preserving
passivity.
Categories and Subject Descriptors
G.1.3 [NUMERICAL ANALYSIS]: Numerical Linear AlgebraÖ
linear systems; F.2.1 [ANALYSIS OF ALGORITHMS AND
PROBLEM COMPLEXITY]: Numerical Algorithms and Problems-computations
on matrices
General Terms
Algorithms
Keywords
Model Reduction, Large Scale Systems, RLC interconnect, Passivity
Preserving, Factorization
-
4.2 Model Order Reduction for Strictly Passive and Causal Distributed Systems [p. 46]
-
Luca Daniel, Joel Phillips
This paper presents a class of algorithms suitable for model reduction
of distributed systems. Distributed systems are not suitable
for treatment by standard model-reduction algorithms such as
PRIMA, PVL, and the Arnoldi schemes because they generate matrices
that are dependent on frequency (or other parameters) and
cannot be put in a lumped or state-space form. Our algorithms
build on well-known projection-based reduction techniques, and so
require only matrix-vector product operations and are thus suitable
for operation in conjunction with electromagnetic analysis codes
that use iterative solution methods and fast-multipole acceleration
techniques. Under the condition that the starting systems satisfy
system-theoretic properties required of physical systems, the reduced
systems can be guaranteed to be passive. For distributed
systems, we argue that causality of the underlying representation is
as important a consideration as passivity has become.
Categories and Subject Descriptors: B.7.2 Simulation, B.8.2 Performance
Analysis and Design Aids, G.1.1 Interpolation G.1.2, Approximations,
I.6 Simulation and Modeling.
General Terms: Algorithms, Performance, Design
Keywords: Passive reduced order modeling, Distributed systems.
-
4.3 Guaranteed Passive Balancing Transformations for Model Order Reduction [p. 52]
-
Joel Phillips, Luca Daniel, L. Miguel Silveira
The major concerns in state-of-the-art model reduction algorithms
are: achieving accurate models of sufficiently small size, numerically
stable and efficient generation of the models, and preservation
of system properties such as passivity. Algorithms such
as PRIMA generate guaranteed-passive models, for systems with
special internal structure, using numerically stable and efficient
Krylov-subspace iterations. Truncated Balanced Realization (TBR)
algorithms, as used to date in the design automation community,
can achieve smaller models with better error control, but do not
necessarily preserve passivity. In this paper we show how to construct
TBR-like methods that guarantee passive reduced models
and in addition are applicable to state-space systems with arbitrary
internal structure.
Categories and Subject Descriptors: B.7.2 Simulation, B.8.2 Performance
Analysis and Design Aids, I.6 Simulation and Modeling.
General Terms: Algorithms, Performance, Design
Keywords: Passive reduced order modeling, Truncated balanced
realization.
Chair: Steven Teig
Organizers: Ralph Otten, Timothy Kam
-
5.1 Uncertainty-Aware Circuit Optimization [p. 58]
-
Xiaoliang Bai, Chandu Visweswariah, Philip N. Strenski, David J. Hathaway
Almost by definition, well-tuned digital circuits have a large number of
equally critical paths, which form a so-called "wall" in the slack histogram.
However, by the time the design has been through manufacturing, many
uncertainties cause these carefully aligned delays to spread out.
Inaccuracies in parasitic predictions, clock slew, mode-to-hardware
correlation, static timing assumptions and manufacturing variations all
cause the performance to vary from prediction. Simple statistical principles
tell us that the variation of the limiting slack is larger when the height of
the wall is greater. Although the wall may be the optimum solution if the
static timing predictions were perfect, in the presence of uncertainty in timing
and manufacturing, it may no longer be the best choice. The application of
formal mathematical optimization in transistor sizing increases the height
of the wall, thus exacerbating the problem. There is also a practical matter
that schematic restructuring down-stream in the design methodology is easier
to conceive when there are fewer equally critical paths. This paper describes
a method that gives formal mathematical optimizers the incentive to avoid
the wall of equally critical paths, while giving up as little as possible
in nominal performance. Surprisingly, such a formulation reduces
the degeneracy of the optimization problem and can render the optimizer more
effective. This "uncertainty-aware" mode has been implemented and applied to
several high-performance microprocessor macros. Numerical results are
included.
-
5.2 Congestion-Driven Codesign of Power and Signal Networks [p. 64]
-
Haihua Su, Jiang Hu, Sachin S. Sapatnekar, Sani R. Nassif
We present a global wire design methodology that simultaneously
considers the performance needs for both signal
lines and power grids under congestion considerations. An
iterative procedure is employed in which the global routing
is performed according to a congestion map that includes
the resource utilization of the power grid, followed by a step
in which the power grid is adjusted to relax the congestion
in crowded regions. This adjustment is in the form of wire
removal in noncritical regions, followed by a wire sizing step
that overcomes the effects of wire removal. Experimental
results show that the overall routability can be significantly
improved while the power grid noise is maintained within
the voltage droop constraint.
Categories and Subject Descriptors
B.8.2 [Performance and Reliability]: Performance Analysis
and Design Aids
General Terms
Algorithms
Keywords
wire congestion, codesign, signal routing, power grid noise
-
5.3 On Metrics for Comparing Routability Estimation Methods for FPGAs [p. 70]
-
Parivallal Kannan, Shankar Balachandran, Dinesh Bhatia
Interconnect management is a critical design issue for large FPGA
based designs. One of the most important issues for planning interconnection
is the ability to accurately and efficiently predict the
routability of a given design on a given FPGA architecture. The recently
proposed routability estimation procedure, fGREP [6], produced
estimates within 3 to 4% of an actual detailed router. Other
known routability estimation methods include RISA [5], Lou's [7]
method and Rent's rule based methods [1] [12] [9]. Comparing
these methods has been difficult because of the different reporting
methods used by the authors. We propose a uniform reporting metric
based on comparing the estimates produced with the results of
an actual detailed router on both local and global levels. We compare
all the above methods using our reporting metric on a large
number of benchmark circuits and show that the enhanced fGREP
method produces tight estimates that outperform most other techniques.
Categories and Subject Descriptors
B.7.2 [Design Aids for Integrated Circuits]:
General Terms
Algorithms, Measurement, Experimentation
Keywords
FPGA, fGREP, routability estimation, congestion, RISA, Rent's
rule
Chair: Andrew B. Kahng
Organizer: Bob Dahlberg
Panel Members: Ron Collett, Patrick Groeneveld, Lambert van den Hoven, Lavi Lev,
Nancy Nettleton, Paul Rodman [p. 76]
-
-
As chip design becomes ever more complex, fewer design teams
are succeeding. Who's to blame? On one hand, tools are hard to
use, buggy, not interoperable, and have missing functionality. On
the other hand, there is a wide range of engineering skills within the
user population, and tools can be abused within flawed
methodologies. This panel will quantify and prioritize the key gaps,
including interoperability, that must be addressed on both sides.
Chairs: Dennis Sylvester, Kaustav Banerjee
Organizers: Dennis Sylvester, Kaustav Banerjee
-
7.1 Life Is CMOS: Why Chase the Life After? [p. 78]
-
George Sery, Shekhar Borkar, Vivek De
This paper discusses potential solutions to the CMOS device
technology scaling at gate lengths approaching 10nm. Promising
circuit and design techniques to control leakage power are
described. Energy-efficient microarchitecture trends for general purpose
microprocessors are elucidated.
Categories and Subject Descriptors
B.7 INTEGRATED CIRCUITS
B.7.1 Types and Design Styles š Microprocessors and
microcomputers, VLSI.
General Terms
Performance, Design
Keywords
Technology scaling, Leakage control, Microarchitecture
-
7.2 The Next Chip Challenge: Effective Methods for Viable Mixed Technology SoCs [p. 84]
-
H. Bernhard Pogge
The next generation of computer chips will continue the trend
for more complexity than their predecessors. Many of them will
contain different chip technologies and are termed SoCs (System
on a Chip). They present to the process community, the system
and circuit communities, as well as to the design and test
communities major new challenges. On the other hand they also
offer at the same time also new opportunities!. For one, the
desire to bring more functionality onto a single chip tends to
require additional processing, which in turn results in various
degrees of device compromises. The chips will also tend to
become larger due to the added device content, and this
generally will impact the yieldability of the final chip. And such
chips will require potentially new approaches to validate the
intended design performances. Chip sector reuse must also be
brought into the discussion and wherever possible into practice.
The net effect implies higher chip costs. Much of the industry's
efforts are therefore focused in addressing these challenges;
however, so far, not yet very successfully. The alternative has
been to continue in the placement of chips onto substrate
modules. Yet, this solution creates practical limits on achievable
wiring densities and bandwidth, due to the spacing requirements
of the C4 interconnection. Furthermore, every C4 joint is
associated with a signal delay of about 50 psec. All of these
handicaps would potentially benefit greatly from new SoC
methods, starting with the fabrication methodology and
extending it into the chip design and test areas.
Such a direction has been set in motion. The opportunity for a
uniquely new chip fabrication method has emerged by
combining a set of somewhat diverse processes. It is based on a
judicious selection of process elements from the traditional chip
area and combined with those of a somewhat more recent chip
packaging process methodology. This approach results in
overcoming simultaneously all of the key current process
limitations as experienced with today's SoC chip designs, as
well as eliminates certain chip packaging technology handicaps.
Yet, it does not require the need for new process tooling. It relies
on currently existing process tooling and process methodologies.
This new process direction has been found to be quite applicable
to a number of desirable SoC device designs, and offers new
opportunities for yet another expansion of the current
semiconductor technology base over the next few years.
However, effective SoC designs and fabrications require a much
closer and earlier collaboration between the process, design and
test communities.
General Terms: Design
Key Words: SoCs (System on a Chip); Chip Fabrication
methods; Chip/Packing integration; Chip Subsector concepts
-
7.3 Few Electron Devices: Towards Hybrid CMOS-SET Integrated Circuits [p. 88]
-
Adrian M. Ionescu, Michel J. Declercq, Santanu Mahapatra, Kaustav Banerjee, Jacques Gautier
In this paper, CMOS evolution and their fundamental and practical
limitations are briefly reviewed, and the working principles,
performance, and fabrication of single-electron transistors (SETs) are
addressed in detail. Some of the unique characteristics and
functionality of SETs, like unrivalled integration and low power,
which are complementary to the sub-20 nm CMOS1, are
demonstrated. Characteristics of two novel SET architectures, namely,
C-SET and R-SET, aimed at logic applications are compared. Finally,
it is shown that combination of CMOS and SET in hybrid ICs appears
to be attractive in terms of new functionality and performance,
together with better integrability for ULSI, especially because of their
complementary characteristics. It is envisioned that efforts in terms of
compatible fabrication processes, packaging, modeling, electrical
characterization, co-design and co-simulation will be needed in the
near future to achieve substantial advances in both memory and logic
circuit applications based on CMOS-SET hybrid circuits.
Categories and Subject Descriptors
B.7 INTEGRATED CIRCUITS
B.7.1 Types and Design Styles š Advanced Technologies
General Terms
Design, Experimentation, Measurement, Performance
Keywords
Nanoelectronics, Single-Electron Transistors, Ultimate CMOS,
Hybrid CMOS-SET Circuits, Low power, Inverter, Quantizer.
-
7.4 Carbon Nanotube Field-Effect Transistors and Logic Circuits [p. 94]
-
R. Martel, V. Derycke, J. Appenzeller, S. Wind, and Ph. Avouris
In this paper, we present recent advances in the understanding of
the properties of semiconducting single wall carbon nanotube and
in the exploration of their use as field-effect transistors (FETs).
Both electrons and holes can be injected in a nanotube transistor
by either controlling the metal-nanotube Schottky barriers present
at the contacts or simply by doping the bulk of the nanotube.
These methods give complementary nanotube FETs that can be
integrated together to make inter- and intra-nanotube logic
circuits. The device performance and their general characteristics
suggest that they can compete with silicon MOSFETs. While this
is true when considering simple prototype devices, several issues
remain to be explored before a nanotube-based technology is
possible. They are also discussed.
Categories and Subject Descriptors
B.6.0 [Logic Design]: General š novel logic devices.
General Terms
Measurement, Performance, Design, Experimentation.
Keywords
Nanoelectronics, Carbon Nanotube, Semiconductor, Field-Effect
Transistor, FET, Schottky Barrier, Circuits, Inverter, Logic Gate,
SWNT.
Chair: Yaron Wolfsthal
Organizers: Carl Pixley, Karem Sakallah
-
8.1 Efficient State Representation for Symbolic Simulation [p. 99]
-
Valeria Bertacco, Kunle Olukotun
Symbolic simulation is attracting increasing interest for the validation
of digital circuits. It allows the verification engineer to explore
all, or a major portion of the circuit's state space without having
to design specific and time-consuming test stimuli. However, the
complexity and unpredictable run-time behavior of symbolic simulation
have limited its scope to small-to-medium circuits.
In this paper, we propose a novel approach to symbolic simulation
that reduces the size of the BDDs of the state vector while
maintaining an exact representation of the set of states visited. The
method exploits the decomposition properties of Boolean functions.
By restructuring the next-state functions in their disjoint support
components, we gain a better insight in the role of each input variable.
Consequently, we can simplify the next-state functions without
significantly sacrificing the simulation accuracy. Our experimental
results shows that this approach can be used in effectively
reducing the memory requirements of symbolic simulation while
surrendering only a small portion of the design's state space.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - Verification, Simulation; B.8
[Hardware]: Performance and Reliability
General Terms
Design, Verification, Performance, Theory
Keywords
Formal Verification, Symbolic Simulation, BDDs
-
8.2 Handling Special Constructs in Symbolic Simulation [p. 105]
-
Alfred Kûlbl, James Kukula, Kurt Antreich, Robert Damiano
Symbolic simulation is a formal verification technique which combines
the flexibility of conventional simulation with powerful symbolic
methods. Some constructs, however, which are easy to handle in conventional
simulation need special consideration in symbolic simulation.
This paper discusses some special constructs that require unique
treatment in symbolic simulation such as the symbolic representation
of arrays, an efficient symbolic method for storing arrayed instances
and the handling of symbolic data-dependent delays. We present results
which demonstrate the effectiveness of our symbolic array model
in the simulation of highly regular structures like FPGAs, memories or
cellular automata.
Categories and Subject Descriptors
B.5.2 [Hardware]: Register-Transfer-Level ImplementationÖDesign
Aids, Verification; B.6.3 [Hardware]: Logic Design - Design Aids,
Verification; B.7.2 [Hardware]: Integrated Circuits - Design Aids, Verification
General Terms
Verification
Keywords
Symbolic Simulation, Formal Verification
-
8.3 A Hybrid Verification Approach: Getting Deep into the Design [p. 111]
-
Scott Hazelhurst, Gila Kamhi, Osnat Weissberg, Limor Fix
One method of handling the computational complexity of the verification
process is to combine the strengths of different approaches.
We propose a hybrid verification technology combining symbolic
trajectory evaluation with either symbolic model checking or SAT-based
model checking. This reduces significantly the cost (both
human and computing) of verifying circuits with complex initialisation,
as well as simplifying proof development by enhancing verification
productivity. The approach has been tested on current Intel
designs.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design AidsÖverification; F.3.1 [Specifying
and Verifying and Reasoning about Programs]: mechanical verification
General Terms
Verification, Theory
Keywords
symbolic model checking, symbolic trajectory evaluation, hybrid
verification
-
8.4 Can BDDs Compete with SAT Solvers on Bounded Model Checking? [p. 117]
-
Gianpiero Cabodi, Paolo Camurati, Stefano Quer
The usefulness of Bounded Model Checking (BMC) based
on propositional satisfiability (SAT) methods has recently
proven its efficacy for bug hunting. BDD based tools are
able to verify broader sets of properties (e.g. CTL formulas)
but recent experimental comparisons between SAT and
BDDs in formal verification lead to the conclusion that SAT
approaches are more robust and scalable than BDD techniques.
In this work we extend BDD-based verification to larger
circuit and problem sizes, so that it can indeed compete
with SAT-based tools. The approach we propose solves
Bounded Model Checking problems using BDDs. In order
to cope with larger models it exploits approximate traversals,
yet it is exact, i.e. it does not produce false negatives
or positives. It reaps relevant performance enhancements
from mixed forward and backward, approximate and exact
traversals, guided search, conjunctive decompositions and
generalized cofactor based BDD simplifications.
We experimentally compare our tool with BMC in NuSMV
(using mchaff as SAT engine), and we show that BDDs are
able to accomplish large verification tasks, and they can
better cope with increasing sequential depths.
Chair: Andreas Kanstein
Organizers: Limor Fix, Shin-ichi Minato
-
9.1 RTL C-Based Methodology for Designing and Verifying a Multi-Threaded Processor [p. 123]
-
Luc SÚmÚria, Andrew Seawright, Renu Mehra, Daniel Ng, Arjuna Ekanayake, Barry Pangrle
A RTL C-based design and verification methodology is
presented which enabled the successful high speed validation of a 7
million gate simultaneous multi-threaded (SMT) network processor.
The methodology is centered on statically scheduled C-based coding
style, C to HDL translation, and a novel RTL-C to RTL-Verilog
equivalence checking flow. It leverages improved simulation performance
combined with static techniques to reduce the amount of
RTL-Verilog and gate-level verification required during development.
Categories - B.5.2 [Register-Transfer-Level Implementation]
Design Aids: Automatic synthesis, Hardware description languages,
Optimization, Simulation,Verification.
General Terms - Design, Verification, Performance, Languages.
Keywords - C/C++, RTL, design, verification, formal equivalence checking.
-
9.2 High-Level Specification and Automatic Generation of IP Interface Monitors [p. 129]
-
Marcio T. Oliveira, Alan J. Hu
A central problem in functional verification is to check that a circuit
block is producing correct outputs while enforcing that the environment
is providing legal inputs. To attack this problem, several researchers
have proposed monitor-based methodologies, which offer
many benefits. This paper presents a novel, high-level specification
style for these monitors, along with a linear-size, linear-time translation
algorithm into monitor circuits. The specification style naturally
fits the complex, but well-specified interfaces used between
IP blocks in systems-on-chip. To demonstrate the advantage of our
specification style, we have specified monitors for various versions
of the Sonics OCP protocol as well as the AMBA AHB protocol,
and have developed a prototype tool that automatically translates
specifications into Verilog or VHDL monitor circuits.
Categories and Subject Descriptors
B.5.2 [Register-Transfer Level Implementation]: Design Aids;
B.6.3 [Logic Design]: Design Aids; C.0 [Computer Systems
Organization]: General - Systems specification methodology; J.6
[Computer-Aided Engineering]: Computer-aided design (CAD)
General Terms
Documentation, Languages, Verification
Keywords
Formal Verification, Regular Expressions, Pipelining, Alternation
-
9.3 Achieving Maximum Performance: A Method for the Verification of Interlocked Pipeline Control Logic [p. 135]
-
Kerstin Eder, Geoff Barrett
Getting the interlock logic which controls pipeline flow correct
is an important prerequisite for maximising pipeline performance.
Unnecessary pipeline stalls can only be eliminated when they can
be distinguished from those stalls which are necessary to preserve
functional correctness.
We propose a method for deriving a maximum pipeline performance
specification from a complete functional specification of the
pipeline control logic. The performance specification can be used
to generate simulation testbench assertions. On the other hand, the
specification can serve as a basis for formal property checking. The
most promising aspect of our work is, however, the potential to synthesise
the actual control logic from its formal description.
Categories and Subject Descriptors
B.5.2 [Register-Transfer-Level Implementation]: Design Aids -
Verification; B.5.1 [Register-Transfer-Level Implementation]: Design
- Control Design, Pipeline
General Terms
Performance, Verification
Keywords
Pipeline Stall, Interlock Logic, Verification
-
9.4 Formal Verification of Module Interfaces against Real Time Specifications [p. 141]
-
Arindam Chakrabarti, Pallab Dasgupta, P. P. Chakrabarti, Ansuman Banerjee
One of the main concerns of the designer of a circuit module is to guarantee
that the interface of the module conforms to specific protocols (such as
PCI Bus or Ethernet) by which it interacts with its environment. The
computational complexity of verifying such open systems under all possible
environments has been shown to be very hard (EXPTIME complete [10]). On
the other hand, designers are typically required to guarantee correct behavior
only for specific valid behaviors of the environment (such as a valid PCI
Bus environment). Designers attempt to model these behaviors through an
appropriate test bench for the module. In this paper we present a module
verifier tool based on a proposed real time temporal logic called Open-RTCTL,
which allows combined specification of the correctness properties and the input
environments. The tool accepts the design in a subset of Verilog. By making
the designer specify the environment constraints, we are able to verify a
module in isolation, and thereby avoid the state explosion problem due to
composition of modules. We present experimental results on modules from
the Texas-97 Benchmark circuits [14] to demonstrate the space/time
efficiency of the tool.
Categories and Subject Descriptors
B.7.2. [Hardware]: Integrated Circuits - Verification
General Terms
Verification
Keywords
Formal Verification, Temporal Logic
Chair: Mark Hahn
Organizers: Chandu Visweswariah, Narendra V. Shenoy
-
10.1 Automated Timing Model Generation [p. 146]
-
Ajay J. Daga, Loa Mize, Subramanyam Sripada, Chris Wolff, Qiuyang Wu
The automated generation of timing models from gate-level
netlists facilitates IP reuse and dramatically improves chip-level
STA runtime in a hierarchical design flow. In this paper we
discuss two different approaches to model generation, the design
flows they lend themselves to and results from the application of
these model generation solutions to large customer designs.
Categories and Subject Descriptors
J.6: Computer Application.CAD.
General Terms
Design, Performance, Algorithm, Verification
Keywords
Static Timing Analysis, Model Generation. EDA.
-
10.2 Timing Model Extraction of Hierarchical Blocks by Graph Reduction [p. 152]
-
Cho W. Moon, Harish Kriplani, Krishna P. Belkhale
Timing model extractor builds a timing model of a digital circuit for use with a static timing analyzer. This paper proposes a novel method of generating a gray box timing model from gate-level netlist by reducing a timing graph. Previous methods of generating timing models sacrificed accuracy and/or did not scale well with design size. The proposed method is simple, yet it provides model accuracy including arbitrary levels of latch time borrowing, correct support for self-loop timing checks and capability to support timing constraints that span multiple blocks. Also, cpu and memory resources required to generate the model scale well with size of the circuit. We were able to extract a model for a 456K gate block using under 2 minutes of cpu time and 464 MB of memory on a Sun Fire 880 machine. The generated model can provide a capacity improvement in timing verification by more than two orders of magnitude.
-
10.3 Efficient Stimulus Independent Timing Abstraction Model Based on a New Concept of Circuit Block Transparency [p. 158]
-
Martin Foltin, Brian Foutz, Sean Tyler
We have developed a new timing abstraction model for digital
circuit blocks that is stimulus independent, port based, supports
designs with level triggered latches, and can be input into
commercial STA (Static Timing Analysis) tools. The model is
based on an extension of the concept of latch transparency to
circuit block transparency introduced in this paper. It was
implemented, tested and is being used in conjunction with
transistor level STA for microprocessor designs with tens of
millions of transistors. The STA simulation times are significantly
shorter than with gray box timing models, which can decrease the
overall chip timing verification time. The model can also be used
in the intellectual property encapsulation domain.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - simulation, verification
General Terms
Performance, Design, Verification.
Keywords
Timing analysis, timing model, VLSI design, circuit optimization
-
10.4 An Implication-based Method to Detect Multi-Cycle Paths in Large Sequential Circuits [p. 164]
-
Hiroyuki Higuchi
This paper proposes a fast multi-cycle path analysis method
for large sequential circuits. It determines whether or not all
the paths between every flip-flop pair are multi-cycle paths.
The proposed method is based on ATPG techniques, especially
on implication techniques, to utilize circuit structure
and multi-cycle path condition directly. The method also
checks whether or not the multi-cycle path may be invalidated
by static hazards in combinational logic parts. Experimental
results show that our method is much faster than
conventional ones.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids
General Terms
Algorithms, Designs, Verification
Keywords
multi-cycle path, sequential circuits, implication, ATPG
Chair & Organizer: Majid Sarrafzadeh
-
11.1 WITHDRAWN
-
11.2 The Wearable Motherboard: A Framework
for Personalized Mobile Information Processing (PMIP) [p. 170]
-
Sugmee Park, Kenneth Mackenzie, Sundaresan Jayaraman
Textiles and computing share a synergistic relationship, which is
being harnessed to create a new paradigm in personalized mobile
information processing (PMIP). In this paper, we provide an
overview of this "interconnection" between the two fields and
present the vision for "E-Textiles," which represents the
convergence of the two fields. We discuss the role of the Georgia
Tech Wearable Motherboard in pioneering this paradigm of
"fabric is the computer" and serving as a framework for PMIP.
Finally, recent research in this area resulting in the realization of a
"computational fabric network" is discussed.
-
11.3 Challenges and Opportunities in Electronic Textiles Modeling and Optimization [p. 175]
-
Diana Marculescu, Radu Marculescu, Pradeep K. Khosla
This paper addresses an emerging new field of
research that combines the strengths and capabilities of electronics
and textiles in one: electronic textiles, or e-textiles. E-textiles, also
called Smart Fabrics, have not only "wearable" capabilities like any
other garment, but also local monitoring and computation, as well as
wireless communication capabilities. Sensors and simple
computational elements are embedded in e-textiles, as well as built
into yarns, with the goal of gathering sensitive information,
monitoring vital statistics and sending them remotely (possibly over a
wireless channel) for further processing. Possible applications include
medical (infant or patient) monitoring, personal information
processing systems, or remote monitoring of deployed personnel in
military or space applications. We illustrate the challenges imposed by
the dual textile/electronics technology on their modeling and
optimization methodology.
Categories and Subject Descriptors: I.6 [Simulation and
Modeling]: Modeling methodologies; B.8.2 [Performance and
reliability]: performance analysis and design aids.
General terms: design, performance
Chair: Stephen Ohr
Organizers: Linda Marchant, Philippe Magarshack
Panelists: Masao Hotta, Mike Brunoli, Felicia James,
Rudy Koch, Roy McGuffin, Andrew Moore [p. 181]
-
-
There is considerable controversy as to whether or not the trade in
analog intellectual properties (IP) will ever represent a viable
business opportunity. One school of thought suggests that analog
design will always be too specialized to constitute a major market;
another school of thought says this will be big business if certain
nagging technical problems are solved. While the demand for
analog interface components is high, the ability of IP creators to
render it in a tradable format is limited. And the ability of digital
design teams to successfully utilize analog IPs - without a
considerable amount of handholding - is similarly limited. EDA
tools here are looked upon as both culpable and offering the best
hopes for the future.
This panel of experts - representing analog designers, analog
EDA tool providers, silicon foundries and analog IP vendors Ö
bring their own points of view on some of the business and
technology issues which need to be resolved to provide the
context for analog IP development and trade. Among the open
questions:
Ý Is IP created and validated within the design environment for
re-use a more productive approach than imported IP? How
many 'traditional' analog designers would admit they are still
using kit parts and breadboards - maybe even SPICE and
manual IC layout techniques - today in their every-day job?
Ý What CAD tools are needed to help analog designers? Are
newly emerging EDA technologies, designed to enhance
analog design productivity, maturing rapidly enough to be
accepted by designers?
Ý What is the future of analog designs at the very low voltage swings
coming with sub-100nm CMOS?
Ý Though silicon foundries need to process a wide set of
external IP offerings in order to allow their users to build
complete systems, are foundries seeing enough activity in
analog IP designs to justify specialized fab runs or the kind
of process tuning that would allow analog and digital IPs to
coexist on the same chip?
Ý How likely - and how soon - can we get to analog IP
development and trade?
Chair: Giovanni De Micheli
Organizers: Renu Mehra, Enrico Macii
-
13.1 Task Scheduling and Voltage Selection for Energy Minimization [p. 183]
-
Yumin Zhang, Xiaobo (Sharon) Hu, Danny Z. Chen
Categories and Subject Descriptors
I.2.8 [Problem Solving, Control Methods, and Search]:
Scheduling
General Terms
Algorithms, Design
Keywords
voltage selection, task scheduling
In this paper, we present a two-phase framework that integrates
task assignment, ordering and voltage selection (VS)
together to minimize energy consumption of real-time dependent
tasks executing on a given number of variable voltage processors.
Task assignment and ordering in the first
phase strive to maximize the opportunities that can be exploited
for lowering voltage levels during the second phase,
i.e., voltage selection. In the second phase, we formulate the
VS problem as an Integer Programming (IP) problem and
solve the IP efficiently. Experimental results demonstrate
that our framework is very effective in executing tasks at
lower voltage levels under different system configurations.
-
13.2 Battery-Conscious Task Sequencing for Portable Devices Including Voltage/Clock Scaling [p. 189]
-
Daler Rakhmatov, Sarma Vrudhula, Chaitali Chakrabarti
Operation of battery-powered portable systems can no longer be
sustained once a battery becomes discharged. Maximization of the
battery lifetime is a difficult task due to nonlinearity of battery behavior
that depends on the characteristics of the system load profile.
We address the problem of task sequencing without and with voltage/
clock scaling that shapes the profile so that the battery lifetime
is maximized. We developed an accurate analytical battery model
and validated it with measurements taken on a real lithium-ion battery
used in a pocket computer. We use the model as a basis for a
unique battery-conscious cost function and utilize its properties to
develop several novel algorithms, including insertion of recovery
periods and voltage/clock scaling for delay slack distribution.
Categories and Subject Descriptors
J.6.2 [Computer-Aided Engineering]: Computer-Aided Design
General Terms
Algorithms, Performance, Design
Keywords
Battery, modeling, low-power design, scheduling, voltage scaling
-
13.3 An Energy Saving Strategy Based on Adaptive Loop Parallelization [p. 195]
-
I. Kadayif, M. Kandemir, M. Karakoy
In this paper, we evaluate an adaptive loop parallelization strategy
(i.e., a strategy that allows each loop nest to execute using different
number of processors if doing so is beneficial) and measure
the potential energy savings when unused processors during execution
of a nested loop in a multi-processor on-a-chip (MPoC)
are shut down (i.e., placed into a power-down or sleep state). Our
results show that shutting down unused processors can lead to as
much as 67% energy savings with up to 17% performance loss in
a set of array-intensive applications. We also discuss and evaluate
a processor pre-activation strategy based on compile-time analysis
of nested loops. Based on our experiments, we conclude that an
adaptive loop parallelization strategy combined with idle processor
shut-down and pre-activation can be very effective in reducing
energy consumption without increasing execution time.
Categories and Subject Descriptors
D.3.4 [Programming Languages]: Processors - Compilers, Optimization
General Terms
Design, Experimentation, Performance
Keywords
Adaptive Parallelization, Multiprocessing, Energy Consumption
Chair: Maciej Ciesielski
Organizers: Malgorzata Merek-Sadowska, Steven Nowick
-
14.1 River PLAs: A Regular Circuit Structure [p. 201]
-
Fan Mo, Robert K. Brayton
A regular circuit structure called a River PLA and its reconfigurable
version, Glacier PLA, are presented. River PLAs
provide greater regularity than circuits implemented with
standard-cells. Conventional optimization stages such as
technology mapping, placement and routing are eliminated. These
two features make the River PLA a highly predictable structure.
Glacier PLAs can be an alternative to FPGAs, but with a simpler
and more efficient design methodology.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids Ó Automatic synthesis.
General Terms
Algorithms.
Keywords
Programmable Logic Array, River routing.
-
14.2 WITHDRAWN
-
14.3 Layout-Aware Synthesis of Arithmetic Circuits [p. 207]
-
Junhyung Um, Taewhan Kim
In deep sub-micron (DSM) technology, wires are equally or more
important than logic components since wire-related problems such
as crosstalk, noise are much critical in system-on-chip (SoC) design.
Recently, a method [12] for generating a partial product reduction
tree (PPRT) with optimal-timing using bit-level adders to
implement arithmetic circuits, which outperforms the current best
designs, is proposed. However, in the conventional approaches including
[12], interconnects are not primary components to be optimized
in the synthesis of arithmetic circuits, mainly due to its
high integration complexity or unpredictable wire effects, thereby
resulting in unsatisfactory layout results with long and messed wire
connections. To overcome the limitation, we propose a new module
generation/synthesis algorithm for arithmetic circuits utilizing
carry-save-adder (CSA) modules, which not only optimizes the circuit
timing but also generates a much regular interconnect topology
of the final circuits. Specifically, we propose a two-step algorithm:
(Phase 1: CSA module generation) we propose an optimal-timing
CSA module generation algorithm for an arithmetic expression under
a general CSA timing model; (Phase 2: Bit-level interconnect
refinements) we optimally refine the interconnects between the CSA
modules while retaining the global CSA-tree structure produced by
Phase 1. It is shown that the timing of the circuits produced by
our approach is equal or almost close to that by [12] in most testcases
(even without including the interconnect delay), and at the
same time, the interconnects in layout are significantly short and
regular.
Categories and Subject Descriptions
B.2.4. [Arithmetic and Logic Structures]: High-Speed Arithmetic
- Algorithms,Cost/Performance
General Terms: Algorithms, Design and Performance
Keywords: Carry-save-adder, layout, high performance
Chair: Nikil Dutt
Organizers: Diederik Verkest, Luca Benini
-
15.1 Automatic Data Migration for Reducing Energy Consumption in Multi-Bank Memory Systems [p. 213]
-
V. De La Luz, M. Kandemir, I. Kolcu
An architectural solution to reducing memory energy consumption
is to adopt a multi-bank memory system instead of a monolithic
(single-bank) memory system. Some recent multi-bank memory
architectures help reduce memory energy by allowing an unused
bank to be placed into a low-power operating mode. This paper
describes an automatic data migration strategy which dynamically
places the arrays with temporal affinity into the same set of banks.
This strategy increases the number of banks which can be put into
low-power modes and allows the use of more aggressive energy saving
modes. Experiments using several array-dominated applications
show the usefulness of data migration and indicate that large
energy savings can be achieved with low overhead.
Categories and Subject Descriptors
B.3 [Hardware]: Memory Structures
General Terms
Design, Experimentation, Performance
Keywords
Energy Consumption, Multi-Bank Memories, Data Migration
-
15.2 Exploiting Shared Scratch Pad Memory Space in Embedded Multiprocessor Systems [p. 219]
-
Mahmut Kandemir, J. Ramanujam, A. Choudhary
In this paper, we present a compiler strategy to optimize data accesses
in regular array-intensive applications running on embedded
multiprocessor environments. Specifically, we propose an optimization
algorithm that targets the reduction of extra off-chip memory
accesses caused by inter-processor communication. This is
achieved by increasing the application-wide reuse of data that resides
in the scratch-pad memories of processors. Our experimental
results obtained on four array-intensive image processing applications
indicate that exploiting inter-processor data sharing can reduce
the energy-delay product by as much as 33.8% (and 24.3%
on average) on a four-processor embedded system. The results also
show that the proposed strategy is robust in the sense that it gives
consistently good results over a wide range of several architectural
parameters.
Categories and Subject Descriptors B.3 [Hardware] Memory
Structures; D.3.4 [Software] Programming Languages: Processors
[Compilers]
Terms Algorithms, management, performance.
Keywords Embedded multiprocessors, energy consumption, scratch
pad memories, access patterns, compiler optimizations, data tiles.
-
15.3 Address Assignment Combined with Scheduling in DSP Code Generation [p. 225]
-
Yoonseo Choi, Taewhan Kim
One of the important issues in embedded system design
is to optimize program code for the microprocessor to be stored
in ROM. In this paper, we propose an integrated approach to the
DSP address code generation problem for minimizing the number
of addressing instructions. Unlike previous works in which code
scheduling and offset assignment are performed sequentially without
any interaction between them, our work tightly couples offset
assignment problem with code scheduling to exploit scheduling on
minimizing addressing instructions more effectively. We accomplish
this by developing a fast but accurate two-phase procedure
which, for a sequence of code schedules, finds a sequence of memory
layouts with minimum addressing instructions. Experimental
results with benchmark DSP programs show improvements of
13%-33% in the address code size over Solve-SOA/GOA [7].
Categories and Subject Descriptors
C.3 [Special-purpose and application-based systems]: [Signal
processing systems]
General Terms
Algorithms, Performance
Keywords
Offset assignment, Scheduling, Code Generation
Chair: Jaijeet Roychowdhury
Organizers: Jaijeet Roychowdhury, Joel R. Phillips
-
16.1 Multifunctional Photonic Integration for the Agile Optical Internet [p. 231]
-
Edward H. Sargent
An agile, transparent optical network is emerging. This paper
enumerates the functions that will be needed at nodes in order to
add transparency and agility to the network while robustly
assuring optical fibre channel performance. One key requirement
will be scalable integration of multiple functions on a platform.
The paper presents recent results along one strategic axis for
integration š the use of functionalized self-organized photonic
crystals and heterostructures thereof to control the flow and
features of light.
Categories and Subject Descriptors
C.2.1 [Network Architecture and Design]: Enabling technologies.
General Terms
Experimentation, Theory.
Keywords
Agile optical networks, reconfigurability, optical performance
monitoring, optoelectronic integration, photonic crystals, optical
nonlinearity, electro-optics, optical polymers, semiconductor
nanocrystals.
-
16.2 Computer Aided Design of Long-Haul Optical Transmission Systems [p. 235]
-
James G. Maloney, Brian E. Brewington, Curtis R. Menyuk
We present a general overview of the role of computer models in
the design and optimization of commercial optical transmission
systems. Specifically, we discuss (1) the role of modeling in a
commercial setting, (2) achieving the proper balance between
accuracy and computation speed, (3) model verification against
experiment, and (4) case studies demonstrating the benefits of
modeling. Ideally, experiments are preferable to models when
describing system performance, particularly to support claims of
a system‰s functionality to a customer. However, modeling is
often the only choice for many of the problems that a commercial
networking company must solve. Because there are design
parameter spaces that are either too expensive or too time consuming
to verify experimentally, the main role of modeling in
industry is to study what experiments cannot. For example,
when an analytical solution of a statistical problem is infeasible,
a common modeling solution is to perform Monte Carlo trials to
study the statistical behavior. Another typical modeling task
involves looking at variations of hardware that would be
prohibitively expensive to acquire and test. Implementing
modeling in industry involves a balance between three needs:
cost-efficiency, time-efficiency, and accuracy. We will discuss
the approaches we have taken at PhotonEx to meet these needs:
leveraging academic research, developing ‹reduced modelsŠ and
utilizing computational clusters. Specifically, we will use case
studies to illustrate the application of these approaches to
modeling long-haul optical transmission systems.
Categories and Subject Descriptors
J.6 [Computer Applications]: Computer-Aided Engineering -
computer-aided design (CAD) and computer-aided
manufacturing (CAM).
General Terms
Algorithms, Measurement, Performance, Design,
Experimentation, Theory, Verification
Keywords
Optical Communication, Long-Haul (LH) Transmission, Ultra-Long
Haul (ULH) Transmission, Optical Modeling
-
16.3 A Fast Optical Propagation Technique for Modeling Micro-Optical Systems [p. 236]
-
Timothy P. Kurzweg, Steven P. Levitan, Jose A. Martinez, Mark Kahrs, Donald M. Chiarulli
As designers become more aggressive in introducing optical
components to micro-systems, rigorous optical models are
required for system-level simulation tools. Common optical
modeling techniques and approximations are not valid for most
optical micro-systems, and those techniques that provide accurate
simulation are computationally slow. In this paper, we introduce
an angular frequency optical propagation technique that greatly
reduces computation time while achieving the accuracy of a full
scalar formulation. We present simulations of a diffractive optical
MEM Grating Light Valve to show the advantages of this optical
propagation method and the integration of the technique into a
system-level multi-domain CAD tool.
Categories and Subject Descriptors
I.6.5 [Simulation and Modeling]: Model Development -
modeling methodologies
General Terms
Algorithms, Design
Keywords
Optical Propagation, Angular Spectrum, CAD, Optical
Micro-systems, Optical MEMS
Chair: Lawrence T. Pileggi
Organizers: Rob A. Rutenbar, Andrew B. Kahng
Panelists: Bob Brodersen, Anthony Hill, John Kibarian,
Desmond A. Kirkpatrick, Mitsumasa Koyanagi, Mark Lavin [p. 242]
-
-
Every year, the design and EDA communities are besieged by dire
warnings about the impending doom of "design as we know it"Š
Every year, another unpleasant physical effect from the evil
depths of deep submicron physics surfaces, compromising our
designs in new and vile ways. Every year, the same story: more
nanometer woes. Rather than endorse a new winner in this year's
race for the "next worst thing" from the nanometer arena, this
panel gathers a set of world-class technology experts to debate
what effects are hiding just around the next corner, waiting to
pounce on the unwary tool or chip designer. Which among these
is really the most important, when will it happen, and why?
Chair: Rathish Jayabharathi
Organizers: Kwang-Ting (Tim) Cheng, T. M. Mak
-
18.1 Low-Cost Sequential ATPG with Clock-Control DFT [p. 243]
-
Miron Abramovici, Xiaoming Yu, Elizabeth M. Rudnick
We present a new clock-control DFT technique
for sequential circuits, based on clock partitioning
and selective clock freezing, and we use it to break the global
feedback loops and to generate clock waves to test the
resulting sequential circuit with self-loops. Clock waves
allow us to significantly reduce the complexity of sequential
ATPG. Unlike scan, our non-intrusive DFT technique
does not introduce any delay penalty; the generated tests
may be applied at speed, have shorter application time, and
dissipate less power.
Categories and Subject Descriptors: B.8.1 [Performance
and Reliability]: Reliability, Testing, and
Fault-Tolerance
General Terms: Algorithms, Design, Reliability
-
18.2 Effective Diagnostics through Interval Unloads in a BIST Environment [p. 249]
-
Peter Wohl, John A. Waicukauski, Sanjay Patel, Greg Maston
Logic built-in self test (BIST) is increasingly being adopted to
improve test quality and reduce test costs for rapidly growing
designs. Compared to deterministic automated test pattern generation
(ATPG), BIST presents inherent fault diagnostic challenges.
Previous diagnostic techniques have been limited in their diagnosis
resolution and/or require significant hardware overhead. This
paper proposes an interval-based scan-unload method that ensures
diagnosis resolution down to gate-level faults with minimal hardware
overhead. Tester fail-data collection is based on a novel construct
incorporated into the design-extensions of the standard test-interface
language (STIL). The implementation of the proposed
method is presented and analyzed.
Categories and Subject Descriptors: B.8.1 [Performance
and Reliability]: Reliability, Testing and Fault-Tolerance.
General Terms: Algorithms, Design.
Keywords: built-in self-test (BIST), fault diagnosis.
-
18.3 On Output Response Compression in the Presence of Unknown Output Values [p. 255]
-
Irith Pomeranz, Sandip Kundu, Sudhakar M. Reddy
A circuit may produce unknown output values during simulation
of an input sequence due to an unknown initial state or due to the
existence of tri-state elements. For circuits tested using BIST,
unknown output values make it impossible to determine a single
unique signature for the fault free circuit. To accommodate
unknown output values in a BIST scheme, we describe a procedure
for synthesizing a minimal logic block that replaces
unknown output values by a known constant. The proposed procedure
ensures that the BIST scheme will be able to detect all the
faults detectable by the input sequence applied to the circuit
while allowing a single unique signature to be obtained.
-
18.4 Software-Based Diagnosis for Processors [p. 259]
-
Li Chen, Sujit Dey
Software-based self-test (SBST) is emerging as a promising
technology for enabling at-speed test of high-speed microprocessors
using low-cost testers. We explore the fault diagnosis capability
of SBST, in which functional information can be used to
guide and facilitate the generation of diagnostic tests. By using a
large number of carefully constructed diagnostic test programs,
the fault universe can be divided into fine-grained partitions,
each corresponding to a unique pass/fail pattern. We evaluate the
quality of diagnosis by constructing diagnostic-tree-based fault
dictionaries. We demonstrate the feasibility of the proposed
method by applying it to a processor example. Experimental
results show its potential as an effective method for diagnosing
larger processors.
Categories and Subject Descriptors
B.8.1 [Performance and Reliability]: Reliability, Testing, and
Fault-Tolerance.
General Terms
Algorithms, Measurement, Reliability, Experimentation.
Keywords
Microprocessor, self-test, instruction, diagnostics.
Chair: Wayne Wolf
Organizers: Anand Raghunathan, Xiabo (Sharon) Hu
-
19.1 Design of a High-Throughput Low-Power IS95 Viterbi Decoder [p. 263]
-
Xun Liu, Marios C. Papaefthymiou
The design of high-throughput large-state Viterbi decoders relies
on the use of multiple arithmetic units. The global communication
channels among these parallel processors often consist of long interconnect
wires, resulting in large area and high power consumption.
In this paper, we propose a data-transfer oriented design
methodology to implement a low-power 256-state rate-1/3 IS95
Viterbi decoder. Our architectural level scheme uses operation partitioning,
packing, and scheduling to analyze and optimize interconnect
effects in early design stages. In comparison with other
published Viterbi decoders, our approach reduces the global data
transfers by up to 75% and decreases the amount of global buses
by up to 48%, while enabling the use of deeply pipelined datapaths
with no data forwarding. In the RTL implementation of the individual
processors, we apply precomputation in conjunction with
saturation arithmetic to further reduce power dissipation with provably
no coding performance degradation. Designed using a 0.25
µm standard cell library, our decoder achieves a throughput of 20
Mbps in simulation and dissipates only 450 mW.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design StylesÖAlgorithms
implemented in hardware
General Terms
Design, Performance
Keywords
Communications, Pipelining, Bus reduction
-
19.2 A Detailed Cost Model for Concurrent Use With Hardware/Software Co-Design [p. 269]
-
Daniel Ragan, Peter Sandborn, Paul Stoaks
Hardware/software co-design methodologies generally focus on
the prediction of system performance or co-verification of system
functionality. This study extends this conventional focus through
the development of a methodology and software tool that
evaluates system (hardware and software) development,
fabrication, and testing costs (dollar costs) concurrent with
hardware/software partitioning in a co-design environment.
Based on the determination of key metrics such as gate count
and lines of software, a new tool called Ghost, evaluates software
and hardware development, fabrication, packaging and testing
costs. Ghost enables optimization of hardware/software
partitioning as a function of specific combinations of hardware
foundries and software development environments.
Categories and Subject Descriptors
E3 [HW/SW co-design]: specification, model., co-simulation and
performance analysis, system-level scheduling and partitioning.
General Terms Design, Economics.
Keywords Cost Modeling, Cost-Performance Trade-off.
-
19.3 Efficient Code Synthesis from Extended Dataflow Graphs for Multimedia Applications [p. 275]
-
Hyunok Oh, Soonhoi Ha
This paper presents efficient automatic code synthesis techniques
from dataflow graphs for multimedia applications. Since
multimedia applications require large size buffers containing
composite type data, we aim to reduce the buffer sizes with
fractional rate dataflow extension and buffer sharing technique. In
an H.263 encoder experiment, the FRDF extension and buffer
sharing technique enable us to reduce the buffer size by 67%. The
final buffer size is no more than in a manual reference code.
Keywords
memory optimization, software synthesis, multimedia, dataflow
Chair: Rajesh Gupta
Organizers: Annette Reutter, Donatella Sciuto
-
20.1 Transformation Based Communication
and Clock Domain Refinement for System Design [p. 281]
-
Ingo Sander, Axel Jantsch
The ForSyDe methodology has been developed for system level design.
In this paper we present formal transformation methods for
the refinement of an abstract and formal system model into an implementation
model. The methodology defines two classes of design
transformations: (1) semantic-preserving transformations and
(2) design decisions. In particular we present and illustrate communication
and clock domain refinement by way of a digital equalizer
system.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design-Aids; J.6 [Computer-Aided
Engineering]: Computer-Aided Design (CAD)
General Terms
Design, Theory
Keywords
System Design, System Modeling, Design Refinement
-
20.2 Model Composition for Scheduling Analysis in Platform Design [p. 287]
-
Kai Richter, Dirk Ziegenbein, Marek Jersak, Rolf Ernst
Categories and Subject Descriptors
C.0 [General]: System Architecture; C.3 [Computer Systems Organization]:
Special-Purpose and Application-Based Systems - realtime
and embedded systems; C.4 [Computer Systems Organization]:
Performance of Systems
General Terms
Algorithms, Performance, Verification
Keywords
Platform-Based Design, Performance Analysis, Scheduling, Formal
Analysis
-
20.3 Timed Compiled-Code Simulation of Embedded Software
for Performance Analysis of SOC Design [p. 293]
-
Jong-Yeol Lee, In-Cheol Park
In this paper, a new timing generation method is proposed for the performance analysis of embedded software. The time stamp generation of I/O accesses is crucial to performance estimation and architecture exploration in the timed functional simulation, which simulates the whole design at a functional level with timing. A portable compiler is modified to generate time-deltas, which are the estimated cycle counts between two adjacent I/O accesses, by counting the cycles of the intermediate representation (IR) operations and using a machine description that contains information on a target processor. Since the proposed method is based on the machine-independent IR of a compiler, the method can be applied to various processors by changing the machine description. The experimental results show that the proposed method is effective in that the average estimation error is about 2% and the maximum speed-up over the corresponding instruction-set simulators is about 300 times. The proposed method is also verified in a timed functional simulation environment.
Chair: Ziyad Hanna
Organizer: Shin-ichi Minato
-
21.1 Automated Equivalence Checking of Switch Level Circuits [p. 299]
-
Simon Jolly, Atanas Parashkevov, Tim McDougall
A chip that is required to meet strict operating criteria in terms of speed,
power, or area is commonly custom designed at the switch level. Traditional
techniques for verifying these designs, based on simulation, are expensive
in terms of resources and cannot completely guarantee correct operation.
Formal verification methods, on the other hand, provide for a
complete proof of correctness, and require less effort to setup. This paper
presents Motorola‰s Switch Level Verification (SLV) tool, which employs
detailed switch level analysis to model the behavior of MOS transistors
and obtain an equivalent RTL model. This tool has been used for equivalence
checking at the switch level for several years within Motorola for
the PowerPC, M*Core and DSP custom blocks. We focus on the novel
techniques employed in SLV, particularly in the areas of pre-charged and
sequential logic analysis, and provide details on the automated and integrated
equivalence checking flow in which the tool is used.
Categories and Subject Descriptors
J.6 [Computer-Aided Engineering]: Computer-Aided Design.
General Terms
Algorithms, Design, Verification.
Keywords
Custom design, switch level analysis, equivalence checking, formal verification,
MOS circuits, VLSI design.
-
21.2 A Practical and Efficient Method for Compare-Point Matching [p. 305]
-
Demos Anastasakis, Robert Damiano, Hi-Keung Tony Ma, Ted Stanion
An important step in using combinational equivalence
checkers to verify sequential designs is identifying and
matching corresponding compare-points in the two
sequential designs to be verified. Both non-function and
function-based matching methods are usually employed in
commercial verification tools. In this paper, we describe a
heuristic algorithm using ATPG for matching compare-points
based on the functionality of the combinational
blocks in the sequential designs. Results on industrial-sized
circuits show our methods are both practical and efficient.
Categories and Subject Descriptors
J.6 [Computer-aided engineering]: Verification, compare-point
matching.
General Terms
Algorithms, Experimentation, Verification.
Keywords
Combinational verification, equivalence checking, latch
mapping.
-
21.3 Self-referential Verification of Gate-level Implementations of Arithmetic Circuits [p. 311]
-
Ying-Tsai Chang, Kwang-Ting (Tim) Cheng
Verification of gate-level implementations of arithmetic circuits is
challenging due to a number of reasons: the existence of some
hard-to-verify arithmetic operators (e.g. multiplication), the use of
different operand ordering, the incorporation of merged arithmetic
with cross-operator implementations, and the employment of circuit
transformations based on arithmetic relations. It is hence a
peculiar problem that does not fit quite well into the existing RTL-to-gate
equivalence checking methodology. In this paper, we propose
a self-referential functional verification approach which uses
the gate-level implementation of the arithmetic circuit under verification
to verify itself. Specifically, the verification task is decomposed
into a sequence of equivalence checking subproblems,
each of which compare circuit pairs derived from the implementation
under verification based on the proposed self-referential functional
equations. A decomposition-based heuristic using structural
information is employed to guide the verification process for better
efficiency. Experimental results on a number of implementations
of the multiply-add units and the inner product units with different
architectures demonstrate the versatility of this approach.
Categories and Subject Descriptors
B.5.2 [Register-Transfer-Level Implementation]: Design AidsÖ
verification
General Terms
Algorithm, Verification
Keywords
Arithmetic circuit verification
Chair: Michael Santarini
Organizers: Sudhakar Jilla, Mark Miller
Panelists: Tommy Eng, Sandeep Khanna, Kamalesh Ruparel, Tom Russell, Kazu
Yamada [p. 317]
-
-
The traditional ASIC netlist handoff is changing - but to what? Is
RTL handoff finally a reality? Or, will a placement-based handoff
model emerge? Are differences among underlying tool
technologies and methodologies only cosmetic? Or, are there
fundamental business and IP distinctions? These and other
questions will be discussed as the panel examines the future of the
designer - ASIC vendor - EDA vendor relationship.
Chair: Joerg Henkel
Organizers: Marco Di Natale, Xiaobo (Sharon) Hu
-
23.1 Software Synthesis from Synchronous Specifications Using Logic Simulation Techniques [p. 319]
-
Yunjian Jiang, Robert K. Brayton
This paper addresses the problem of automatic generation of implementation
software from high-level functional specifications in
the context of embedded system on chip designs. Software design
complexity for embedded systems has increased so much that a
high-level functional programming paradigm need to be adopted
for formal verifiability, maintainability and short time-to-market.
We propose a framework for efficiently generating implementation
software from a synchronous state machine specification for embedded
control systems. The framework is generic enough to allow
hardware/software partition for a given architecture platform. It is
demonstrated that the logic optimization and simulation techniques
can be combined to produce fast execution code for such embedded
systems. Specifically, we propose a framework for software synthesis
from multi-valued logic, including fast evaluation of logic functions,
and scheduling techniques for node execution. Experiments
are performed to show the initial results of our algorithms in this
framework.
-
23.2 Complex Library Mapping for Embedded Software Using Symbolic Algebra [p. 325]
-
Armita Peymandoust, Tajana Simunic, Giovanni De Micheli
Embedded software designers often use libraries that have been
pre-optimized for a given processor to achieve higher code
quality. However, using such libraries in legacy code
optimization is nontrivial and typically requires manual
intervention. This paper presents a methodology that maps
algorithmic constructs of the software specification to a library of
complex software elements. This library-mapping step is
automated by using symbolic algebra techniques. We illustrate
the advantages of our methodology by optimizing an algorithmic
level description of MPEG Layer III (MP3) audio decoder for the
Badge4 [2] portable embedded system. During the optimization
process we use commercially available libraries with complex
elements ranging from simple mathematical functions such as
exp to the IDCT routine. We implemented and measured the
performance and energy consumption of the MP3 decoder
software on Badge4 running embedded Linux operating system.
The optimized MP3 audio decoder runs 300 times faster than the
original code obtained from the standards body while consuming
400 times less energy. Since our optimized MP3 decoder runs 3.5
times faster than real-time, additional energy can be saved by
using processor frequency and voltage scaling.
Categories and Subject Descriptors
C.3 [Special-Purpose and Application-Based Systems]:
Microprocessor/microcomputer applications, Real-time and
embedded systems, Signal processing systems.
General Terms
Algorithms, Performance, Design, Experimentation, Theory.
Keywords
Embedded software optimization, Automated library mapping,
Symbolic algebra, Polynomial representation, Computation
intensive software.
-
23.3 Retargetable Binary Utilities [p. 331]
-
Maghsoud Abbaspour, Jianwen Zhu
Since software is playing an increasingly important role in system-on-chip,
retargetable compilation has been an active research area
in the last few years. However, the retargetting of equally important
downstream system tools, such as assemblers, linkers and debuggers,
has either been ignored, or falls short of meeting the requirements
of modern programming languages and operating systems.
In this paper, we present techniques that can automatically
retarget the GNU binutils tool kit, which contains a large array of
production-quality downstream tools. Other than having all the advantages
enjoyed by open-source software by aligning to a de facto
standard, our techniques are systematic, as a result of using a formal
model of instruction set architecture (ISA) and application binary
interface (ABI); and simple, as a result of leveraging free software
to the largest extent.
Categories and Subject Descriptors
D.3.4 [Processors]: Retargetable compilers
General Terms
Design, Languages
Chair: Ivo Bolsens
Organizers: Grant E. Martin, Kurt Keutzer
-
24.1 Exploiting Operation Level Parallelism through Dynamically Reconfigurable Datapaths [p. 337]
-
Zhining Huang, Sharad Malik
Increasing non-recurring engineering (NRE) and mask costs are
making it harder to turn to hardwired Application Specific
Integrated Circuit (ASIC) solutions for high performance
applications [12]. The volume required to amortize these high
costs has been increasing, making it increasingly expensive to
afford ASIC solutions for medium volume products. This has led
to designers seeking programmable solutions of varying sorts
using these so-called programmable platforms. These
programmable platforms span a large range from bit-level
programmable Field Programmable Gate Arrays (FPGAs), to
word-level programmable application-specific, and in some cases
even general-purpose processors. The programmability comes
with a power and performance overhead. Attempts to reduce this
overhead typically involve making some core hardwired ASIC
like logic blocks accessible to the programmable elements. This
paper presents one such hybrid solution in this space š a relatively
simple processor with a dynamically reconfigurable datapath
acting as an accelerating co-processor. This datapath consists of
hardwired function units and reconfigurable interconnect. We
present a methodology for the design of these solutions and
illustrate it with two complete case studies: an MPEG 2 coder,
and a GSM coder, to show how significant speedups can be
obtained using relatively little hardware. The co-processor can be
viewed as a VLIW processor with a single instruction per kernel
loop. We compare the efficiency of exploiting the operation level
parallelism using classic VLIW processors and this proposed class
of dynamically configurable co-processors. This work is part of
the MESCAL project, which is geared towards developing design
environments for the development of application specific
platforms.
Categories and Subject Descriptors
J.6 [COMPUTER-AIDED ENGINEERING]: Computer-aided
design (CAD)
General Terms
Design, Performance
-
24.2 Dynamic Hardware Plugins in an FPGA with Partial Run-time Reconfiguration [p. 343]
-
Edson L. Horta, John W. Lockwood, David E. Taylor, David Parlour
Tools and a design methodology have been developed to
support partial run-time reconfiguration of FPGA logic on
the Field Programmable Port Extender. High-speed Internet
packet processing circuits on this platform are implemented
as Dynamic Hardware Plugin (DHP) modules that
fit within a specific region of an FPGA device. The PARBIT
tool has been developed to transform and restructure bitfiles
created by standard computer aided design tools into partial
bitsteams that program DHPs. The methodology allows
the platform to hot-swap application-specific DHP modules
without disturbing the operation of the rest of the system.
Keywords
FPGA, partial RTR, reconfiguration, hardware, modularity,
network, routing, packet, Internet, IP, platform computing
Categories and Subject Descriptors
B.7.2 [Hardware]: Circuits|Design Aids; B.7.1 [Hardware]:
Circuits|VLSI ; B.4.3 [Hardware]: Input/Output
and Data Communications|Interconnections (Subsystems);
C.2.1 [Computer Systems Organization]: Computer-Communication
Networks|Network Architecture and Design
General Terms
Design, Experimentation
-
24.3 A Reconfigurable FPGA-Based Readback Signal Generator For Hard-Drive Read
Channel Simulator [p. 349]
-
Jinghuan Chen, Jaekyun Moon, Kia Bazargan
A hard disk readback signal generator designed to provide
noise-corrupted signals to a channel simulator has been implemented
on a Xilinx VirtexTME FPGA device. The generator
simulates pulses sensed by read heads in hard drives.
All major distortion and noise processes, such as intersymbol
interference, transition noise, electronics noise, head and
media nonlinearity, intertrack interference, and write timing
error, can be generated according to the statistics and parameters
defined by the user. Reconfigurable implementation
enables an update of the signal characteristics in runtime.
The user also has the flexibility to choose from a set of
bitstreams to simulate particular combinations of noise and
distortion. Such customized restructuring helps reduce the
area consumption and hence virtually increase the capacity
of the FPGA device. The time to generate the readback
signals has been reduced by four orders compared to its software
counterpart.
Chair: Rob Aitken
Organizers: Miron Abramovici, T. M. Mak
-
25.1 Embedded Software-Based Self-Testing for SoC Design [p. 355]
-
A. Krstic, W.-C. Lai, L. Chen, K.-T. Cheng, S. Dey
At-speed testing of high-speed circuits is becoming increasingly difficult with external testers due to the growing gap between design and tester performance, growing cost of high-performance testers and increasing yield loss caused by inherent tester inaccuracy. Therefore, empowering the chip to test itself seems like a natural solution. Hardware-based self-testing techniques have limitations due to performance and area overhead and problems caused by the application of non-functional patterns.
Embedded software-based self-testing has recently become focus of intense research. In this methodology, the programmable cores are used for on-chip test generation, measurement, response analysis and even diagnosis. After the programmable core on a System-onšChip (SoC) has been self-tested, it can be reused for testing on-chip buses, interfaces and other non-programmable cores. The advantages of this methodology include at-speed testing, low design-for-testability overhead and application of functional patterns in the functional environment. In this paper, we give a survey and outline the roadmap and challenges of this emerging embedded software-based self-testing paradigm.
Categories and Subject Descriptors
B.8.1 [Integrated Circuits]: Performance and Reliability š reliability, testing and fault-tolerance.
General Terms
Algorithms, Performance, Reliability.
Keywords
VLSI test, SoC test, functional test, microprocessor test.
-
25.2 A Novel Wavelet Transform Based Transient Current
Analysis for Fault Detection and Localization [p. 361]
-
Swarup Bhunia, Kaushik Roy, Jaume Segura
Transient current (IDD) based testing has been often cited
and investigated as an alternative and/or supplement to quiescent
current (IDDQ) testing. While the potential of IDD
testing for fault detection has been established, there is no
known efficient method for fault diagnosis using IDD analysis.
In this paper, we present a novel integrated method
for fault detection and localization using wavelet transform
based IDD waveform analysis. The time-frequency resolution
property of wavelet transform helps us detect as well
as localize faults in digital CMOS circuits. Experiments
performed on measured data from a fabricated 8-bit shift
register and simulation data from more complex circuits
show promising results for both detection and localization.
Wavelet based detection method shows superior sensitivity
than spectral and time-domain methods. The effectiveness
of the localization method in presence of process variation,
measurement noise and complex power supply network is
addressed.
Categories and Subject Descriptors
B.8.2 [Hardware]: Performance and Reliability| Reliability,
Testing, and Fault-Tolerance
General Terms
Algorithms, Reliability, Experimentation
Keywords
Transient current (IDD), wavelet transform, fault localization
-
25.3 Signal Integrity Fault Analysis Using Reduced-Order Modeling [p. 367]
-
Amir Attarha, Mehrdad Nourani
This paper aims at analysis of signal integrity for the purpose of
testing high speed interconnects. This requires taking into account
the effect of inputs as well as parasitic RLC elements of the interconnect.
To improve the analysis/simulation time in integrity fault
testing, we use reduced-order modeling that essentially performs
the analysis in the frequency domain. To demonstrate the generality
and usefulness of our method, we also discuss its application for
test pattern generation targeting signal integrity loss.
-
25. 4 Enhancing Test Efficiency for Delay Fault Testing Using Multiple-Clock Schemes [p. 371]
-
Jing-Jia Liou, Li-C. Wang, Kwang-Ting Cheng, Jennifer Dworak, M. Ray Mercer, Rohit Kapur,
Thomas W. Williams
In conventional delay testing, the test clock is a single pre-defined
parameter that is often set to be the same as the system clock. This
paper discusses the potential of enhancing test efficiency by using
multiple clock frequencies. The intuition behind our work is that
for a given set of AC delay patterns, a carefully-selected, tighter
clock would result in higher effectiveness to screen out the potential
defective chips. Then, by using a smarter test clock scheme and
combining with a second set of AC delay patterns, the overall quality
of AC delay test can be enhanced while the cost of including the
second pattern set can be minimized. We demonstrate these concepts
through analysis and experiments using a statistical timing
analysis framework with defect-injected simulation.
Categories and Subject Descriptors
B.8.1 [Hardware]: Reliability, Testing, and Fault-Tolerance
General Terms
Experimentation, Measurement, Reliability
Keywords
Delay Testing, Statistical Timing Analysis, Transition Fault Model
Chair: Ahmed A. Jerraya
Organizers: Ahmed A. Jerraya, Kurt Keutzer
-
26.1 Going Mobile: The Next Horizon for Multi-million Gate
Designs in the Semi-Conductor Industry [p. 375]
-
Christian Berthet
The complexity of a System-on-Chip design is not only in the
million transistors packed in a square millimeter. The major
challenge for technical success of a SoC is to make sure that
millions lines of software fit in with millions gates.
In this paper, the problematic of multi-million gate design is
illustrated from the viewpoint of a practical development of a
complex digital system done at STMicroelectronics for a
GSM/GPRS cellular application.
Categories and Subject Descriptors
C.3 [Computer Systems Organisation]: Special-purpose and
application-based systems š real-time and embedded systems.
General Terms
Design.
Keywords
SoC Design, HW/SW co-design.
Chair: Sachin Sapatnekar
Organizers: Abhijit Dharchoudhury, Tadahiro Kuroda
-
27.1 HiPRIME: Hierarchical and Passivity Reserved Interconnect Macromodeling Engine for RLKC Power Delivery [p. 379]
-
Yahong Cao, Yu-Min Lee, Tsung-Hao Chen, Charlie Chung-Ping Chen
This paper proposes a general hierarchical analysis methodology,
HiPRIME, to efficiently analyze RLKC power delivery
systems. After partitioning the circuits into blocks, we develop
and apply the IEKS (Improved Extended Krylov Subspace)
method to build the Multi-port Norton Equivalent
circuits which transform all the internal sources to Norton
current sources at ports. Since there is no active elements
inside the Norton circuits, passive or realizable model order
reduction techniques such as PRIMA can be applied.
To further reduce the top-level hierarchy runtime, we develop
a second-level model reduction algorithm and prove
its passivity. Experimental results show 400-700X runtime
improvement with less than 0.2% error.
-
27.2 High-Level Current Macro-Model For Power-Grid Analysis [p. 385]
-
Srinivas Bodapati, Farid N. Najm
We present a frequency domain current macro-modeling technique
for capturing the dependence of the block current
waveform on its input vectors. The macro-model is based on
estimating the Discrete Cosine Transform (DCT) of the current
waveform as a function of input vector pair and then
taking the inverse transform to estimate the time domain
current waveform. The input vector pairs are partitioned
according to Hamming distance and a current macro-model
is built for each Hamming distance using regression. Regression
is done on a set of current waveforms generated for
each circuit, using HSPICE. The average relative error in
peak current estimation using the current macro-model is
less than 20%.
Categories and Subject Descriptors
B.7 [Hardware]: Integrated CircuitsÖCAD; B.7.2 [Integrated
Circuits]: Design AidsÖModeling
General Terms
Algorithms
Keywords
Power grid, Current macro-model, DCT
-
27.3 Macro-Modeling Concepts For The Chip Electrical Interface [p. 391]
-
Brian W. Amick, Claude R. Gauthier, Dean Liu
The power delivery network is made up of passive elements in the
distribution network, as well as the active transistor loads. A chip
typically has three types of power supplies that require attention:
core, I/O, and analog. Core circuits consist of digital circuits and
have the largest current demand. In addition to all of the system
issues/models for the core, modeling the I/O subsystem has the
additional requirement of modeling return paths and discontinuities.
The analog circuits present yet a different challenge to the macromodeling
of the supply network because they place a tight demand
on supply variations. This paper presents a design methodology on
how to generate macro-models of the entire chip electrical interface.
This methodology can be used by the chip, package, and system
designers and is being used to design high-reliability servers.
Categories and Subject Descriptors
C.5.3 [Computer System Implementation]: VLSI Systems.
General Terms
Performance, Design, Reliability.
Keywords
VLSI Power Distribution, Inductance, High Speed Microprocessor
Design, Analog and I/O Power Delivery.
-
27.4 Modeling and Analysis of Regular Symmetrically
Structured Power/Ground Distribution Networks [p. 395]
-
Hui Zheng, Lawrence T. Pileggi
In this paper we propose a novel and efficient methodology for
modeling and analysis of regular symmetrically-structured power/
ground distribution networks. The modeling of inductive effects is
simplified by a folding technique which exploits the symmetry in
the power/ground distribution. Furthermore, employment of susceptance
[10,11] (inverse of inductance) models enables further
simplification of the analysis, and is also shown to preserve the
symmetric positive definiteness of the circuit equations. Experimental
results demonstrate that our approach can provide up to 8x
memory savings and up to10x speedup over the already efficient
simulation based on the original sparse susceptance matrix without
loss of accuracy. Importantly, this work demonstrates that by employing
limited regularity, one can create excellent power/ground
distribution designs that are dramatically simpler to analyze, and
therefore amenable to more powerful global design optimization.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits] Design Aids - verification
General Terms
Design, Verification
Keywords
Power/Ground Distribution, Susceptance, Folding Technique, Design
Regularity
-
27. 5 Clock Tree Optimization in Synchronous CMOS Digital Circuits for Substrate Noise
Reduction Using Folding of Supply Current Transients [p. 399]
-
Mustafa Badaroglu, Kris Tiri, StÚphane Donnay, Piet Wambacq, Ingrid Verbauwhede,
Georges Gielen, Hugo De Man
In a synchronous clock distribution network with zero latencies,
digital circuits switch simultaneously on the clock edge, therefore
they generate substrate noise due to the sharp peaks on the supply
current. We present a novel methodology optimizing the clock tree
for less substrate generation by using statistical single cycle supply
current profiles computed for every clock region taking the timing
constraints into account. Our methodology is novel as it uses an
error-driven compressed data set during the optimization over a
number of clock regions specified for a significant reduction in
substrate noise. It also produces a quality analysis of the computed
latencies as a function of the clock skew. The experimental results
show >x2 reduction of substrate noise generation from the circuits
having four clock regions of which the latencies are optimized.
Categories and Subject Descriptors
B.5.1 [Register-Transfer-Level Implementation]: Design - datapath
design. B.6.1 [Logic Design]: Design Styles - sequential
circuits. B.6.3 [Logic Design]: Design Aids - optimization,
simulation. B.7.1 [Integrated Circuits]: Types and Design Styles- VLSI.
B.8.2 [Performance and Reliability]: Performance
Analysis and Design Aids.
General Terms
Algorithms, Design, Performance, Reliability.
Keywords
Substrate noise, di/dt noise, low-noise digital design, clock
distribution networks, supply current shaping and optimization.
Chair: Marek Perkowski
Organizers: Soha M. Hassoun, Yusuke Matsunaga
-
28.1 Resynthesis and Peephole Transformations for the Optimization
of Large-Scale Asynchronous Systems [p. 405]
-
Tiberiu Chelcea, Steven M. Nowick
Several approaches have been proposed for the syntax-directed compilation
of asynchronous circuits from high-level specification languages,
such as Balsa and Tangram. Both compilers have been successfully
used in large real-world applications; however, in practice,
these methods suffer from significant performance overheads
due to their reliance on straightforward syntax-directed translation.
This paper introduces a powerful new set of transformations, and
an extended channel-based language to support them, which can be
used an optimizing back-end for Balsa. The transforms described
in this paper fall into two categories: resynthesis and peephole. The
proposed optimization techniques have been fully integrated into a
comprehensive asynchronous CAD package, Balsa. Experimental
results on several substantial design examples indicate significant
performance improvements.
-
28.2 Design of Asynchronous Circuits by Synchronous CAD Tools [p. 411]
-
Alex Kondratyev, Kelvin Lwin
The roadblock to wide acceptance of asynchronous methodology
is poor CAD support. Current asynchronous design tools require a
significant re-education of designers, and their features are far
behind synchronous commercial tools. This paper considers a
particular subclass of asynchronous circuits (Null Convention
Logic or NCL) and suggests a design flow that is based entirely
on commercial CAD tools. This new design flow shows a
significant area improvement over known flows based on NCL.
-
28.3 Implementing Asynchronous Circuits using a Conventional EDA Tool-Flow [p. 415]
-
Christos P. Sotiriou
This paper presents an approach by which asynchronous circuits
can be realised with a conventional EDA tool flow and conventional
standard cell libraries. Based on a gate-level asynchronous circuit
implementation technique, direct-mapping, and by identifying the
delay constraints and exploiting certain EDA tool features, this paper
demonstrates that a conventional EDA tool flow can be used to
describe, place, route and timing-verify asynchronous circuits.
Categories and Subject Descriptors
B7.1. [Integrated Circuits]: Types and Design Styles
General Terms
Design, Experimentation, Standardization
Keywords
Asynchronous, EDA, Tool-Flow
-
28.4 Transformation Rules for Designing CNOT-based Quantum Circuits [p. 419]
-
Kazuo Iwama, Yahiko Kambayashi, Shigeru Yamashita
This paper gives a simple but nontrivial set of local transformation
rules for Control-NOT(CNOT)-based combinatorial circuits. It is
shown that this rule set is complete, namely, for any two equivalent
circuits, S1 and S2, there is a sequence of transformations, each of
them in the rule set, which changes S1 to S2. Our motivation is to
use this rule set for developing a design theory for quantum circuits
whose Boolean logic parts should be implemented by CNOT-based
circuits. As a preliminary example, we give a design procedure
based on our transformation rules which reduces the cost of CNOT-based
circuits.
Categories and Subject Descriptors
B.6.m [LOGIC DESIGN]: Miscellaneous
General Terms
Design, Theory
Keywords
Quantum Circuit, CNOT Gate, Local Transformation Rules
-
28.5 Fast Three-Level Logic Minimization Based on Autosymmetry [p. 425]
-
Anna Bernasconi, Valentina Ciriani, Fabrizio Luccio, Linda Pagli
Sum of Pseudoproducts (SPP) is a three level logic synthesis technique developed
in recent years. In this framework we exploit the "regularity" of Boolean
functions to decrease minimization time. Our main results are: 1) the
regularity of Boolean function f of n variables is expressed by its
autosymmetry degree k (which 0 <= k <= n), where k = 0 means no
regularity (that is, we are not able to provide any advantage over standard
synthesis); 2) for k >= 1 the function is autosymmetric, and a new
function fk is identified in polynomial time: fk is
"equivalent" to, but smaller than f, and depends on n - k variables only;
3) given a minimal SPP form for fk, a minimal SPP form for f
is built in linear time; 4) experimental results show that 61% of the
functions in the classical ESPRESSO benchmark suite are autosymmetric,
and the SPP minimization time for them is critically reduced; we can also solve
cases otherwise practically intractable. We finally discuss the role and
meaning of autosymmetry.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - Automatic Synthesis, Optimization.
General Terms
Algorithms, Design, Theory.
Keywords:
Three-Level Logic, Synthesis, Autosymmetry.
Chair: C.-J. Richard Shi
Organizers: Joel R. Phillips, Kartikeya Mayaram
-
29.1 An Efficient Optimization-based Technique to Generate Posynomial
Performance Models for Analog Integrated Circuits [p. 431]
-
Walter Daems, George Gielen, Willy Sansen
This paper presents an new direct-fitting method to generate posynomial
response surface models with arbitrary constant exponents
for linear and nonlinear performance parameters of analog integrated
circuits. Posynomial models enable the use of efficient geometric
programming techniques for circuit sizing and optimization.
The automatic generation avoids the time-consuming nature
and inaccuracies of handcrafted analytic model generation. The
technique is based on the fitting of posynomial model templates
to numerical data from SPICE simulations. Attention is paid to
estimating the relative "goodness-of-fit" of the generated models.
Experimental results illustrate the significantly better accuracy of
the new approach.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids; B.8.2 [Performance
and Reliability]: Performance Analysis and Design Aids; I.6.5
[Simulation and Modeling]: Model Development
General Terms
Performance, Design, Algorithms
Keywords
Performance Modeling for Analog Circuits, Posynomial Response
Surface Modeling, Geometric Programming
-
29.2 Remembrance of Circuits Past: Macromodeling
by Data Mining in Large Analog Design Spaces [p. 437]
-
Hongzhou Liu, Amit Singhee, Rob A. Rutenbar, L. Richard Carley
The introduction of simulation-based analog synthesis tools creates a
new challenge for analog modeling. These tools routinely visit 103 to
105 fully simulated circuit solution candidates. What might we do
with all this circuit data? We show how to adapt recent ideas from
large-scale data mining to build models that capture significant
regions of this visited performance space, parameterized by
variables manipulated by synthesis, trained by the data points
visited during synthesis. Experimental results show that we can
automatically build useful nonlinear regression models for large
analog design spaces.
CATEGORIES AND SUBJECT DESCRIPTORS
B.7.2 [Integrated Circuits]: Design aids - verification
GENERAL TERMS
Algorithms
-
29.3 Optimal Design of Delta-Sigma ADCs by Design Space Exploration [p. 443]
-
Ovidiu Bajdechi, Georges Gielen, Johan H. Huijsing
An algorithm for architecture-level exploration of delta-sigma ADC
design space is presented. The algorithm finds an optimal
solution by exhaustively exploring both single-loop and cascaded
architectures, with single-bit or multi-bit quantizer,
for a range of oversampling ratios. A fast filter-level step
evaluates the performance of all loop-filter topologies and
passes the accepted solutions to the architecture-level optimization
step which maps the filters on feasible architectures
and evaluates their performance. The power consumption of
each accepted architecture is estimated and the best top-ten
solutions in terms of the ratio of peak SNDR versus power
consumption are further optimized for yield. Experimental
results for two different design targets are presented. They
show that previously published solutions are among the best
architectures for a given target but that better solutions can
be designed.
Categories and Subject Descriptors
J.6 [Computer Applications]: Computer-Aided Engineering
General Terms
Design
Keywords
ADC, CAD, delta-sigma
-
29.4 Systematic Design of a 200 MS/s 8-bit Interpolating/Averaging A/D Converter [p. 449]
-
J. Vandenbussche, K. Uyttenhove, E. Lauwers, M. Steyaert, G. Gielen
The systematic design of a high-speed, high-accuracy Nyquistrate
A/D converter is proposed. The presented design
methodology covers the complete flow and is supported by
software tools. A generic behavioral model is used to explore the
A/D converter‰s specifications during high-level design and
exploration. The inputs to the flow are the specifications of the
A/D converter and the technology process. The result is a
generated layout and the corresponding extracted behavioral
model. The approach has been applied to a real-life test case,
where a Nyquist-rate 8-bit 200 MS/s 4-2 interpolating/averaging
A/D converter was developed for a WLAN application.
Categories and Subject Descriptors
B.7.m Integrated Circuits: miscellaneous
General Terms
Design
Keywords
A/D converters, Interpolating, Flash, Simulated Annealing.
Chair: Massoud Pedram
Organizers: Chaitali Chakrabarti, Sarma Vrudhula
-
30.1 Petri Net Modeling of Gate and Interconnect Delays for Power Estimation [p. 455]
-
Ashok K. Murugavel, N. Ranganathan
In this paper, a new type of Petri net called Hierarchical Colored
Hardware Petri net, to model real-delay switching activity for power
estimation is proposed. The logic circuit is converted into a HCHPN
and simulated as a Petri net to get the switching activity estimate
and thus the power values. The method is accurate and is significantly
faster than other simulative methods. The HCHPN yields
an average error of 4.9% with respect to Hspice for the ISCAS '85
benchmark circuits. The per-pattern simulation time is about 46
times lesser than PowerMill.
-
30.2 Power Estimation in Global Interconnects and its Reduction Using
a Novel Repeater Optimization Methodology [p. 461]
-
Pawan Kapur, Gaurav Chandra, Krishna C. Saraswat
The purpose of this work is two fold. First, to quantify and
establish future trends for the dynamic power dissipation in global
wires of high performance integrated circuits. Second, to develop a
novel and efficient delay-power tradeoff formulation for
minimizing power due to repeaters, which can otherwise constitute
50% of total global wire power dissipation. Using the closed form
solutions from this formulation, power savings of 50% on repeaters
are shown with minimal delay penalties of about 5% at the 50 nm
technology node. These closed-form, analytical solutions provide a
fast and powerful tool for designers to minimize power.
-
30.3 Low-Swing Clock Domino Logic Incorporating Dual Supply and Dual Threshold Voltage [p. 467]
-
Seong-Ook Jung, Ki-Wook Kim, Sung-Mo Kang
High-speed domino logic is now prevailing in performance critical block of
a chip. Low Voltage Swing Clock (LVSC) domino logic family is developed
for substantial dynamic power saving. To boost up the transition speed in
proposed circuitry, a well-established dual threshold voltage technique is
exploited. Dual supply voltage technique in the LVSC domino logic is
geared to reduce power consumption in clock tree and logic gates
effectively. Delay Constrained Power Optimization (DCPO) algorithm allocates
low supply voltage to logic gates such that dynamic power consumed by logic
gates is minimized. Delay time variations due to gate-to-source voltage
change and input signal arrival time difference are considered for accurate
timing analysis in DCPO.
Categories and Subject Descriptors
B.6 [Hardware]: Logic Design; B.7 [Hardware]: Integrated Circuits
General Terms
Design
Keywords
domino logic, low swing clock, dual supply voltage, dual threshold
voltage, low power
-
30.4 DRG-Cache: A Data Retention Gated-Ground Cache for Low Power [p. 473]
-
Amit Agarwal, Hai Li, Kaushik Roy
In this paper we propose a novel integrated circuit and
architectural level technique to reduce leakage power
consumption in high performance cache memories using single Vt
(transistor threshold voltage) process. We utilize the concept of
Gated-Ground [5] (NMOS transistor inserted between Ground
line and SRAM cell) to achieve reduction in leakage energy
without significantly affecting performance. Experimental results
on gated-Ground caches show that data is retained (DRG-Cache)
even if the memory are put in the stand-by mode of operation.
Data is restored when the gated-Ground transistor is turned on.
Turning off the gated-Ground transistor in turn gives large
reduction in leakage power. This technique requires no extra
circuitry; row decoder itself can be used to control the gated-Ground
transistor. The technique is applicable to data and
instruction caches as well as different levels of cache hierarchy
such as the L1, L2, or L3 caches. We fabricated a test chip in
TSMC 0.25µ technology to show the data retention capability and
the cell stability of DRG-cache. Our simulation results on 100nm
and 70nm processes (Berkeley Predictive Technology Model)
show 16.5% and 27% reduction in consumed energy in L1 cache
and 50% and 47% reduction in L2 cache with less than 5% impact
on execution time and within 4% increase in area overhead.
Categories and Subject Descriptors
B.3.2 [Memory Structure]: Design Styles --- Cache memories;
B.3.1 [Memory Structure]: Semiconductor Memories --- Static
memory (SRAM); B.7.1 [Integrated Circuits]: Types and Design
Styles --- Memory technology.
General Terms: Design, Performance and
Experimentation.
Keywords: Gated-ground, SRAM, low leakage cache.
Chair: Gary Smith
Organizers: Daya Nadamuni, Sharad Malik
Panelists: Rick Chapman, John Fogelin, Kurt Keutzer, Grant Martin, Brian Bailey
[p. 479]
-
-
As designers struggle with developing application solutions
consisting of complex systems-on-a-chip with a significant
software component, they must deal with a diversity of tools with
very different philosophies and assumptions, to help manage this
task. On one hand are tools which assume a clean separation
between the hardware and software parts of the design with an
abstraction of the hardware available for software development.
On the other hand are tools that try to handle the hardware and
software parts of the design concurrently. What drives these
different philosophies? Which of these is critical for emerging
system designs? Which of these is viable going forward? Our
panel of experts consisting of designers, embedded software tool
providers, system design tool providers and an academic will
answer these challenging questions.
Chair: Rajendran Panda
Organizers: Renu Mehra, Sarma Vrudhula
-
32.1 Dynamic and Leakage Power Reduction in MTCMOS
Circuits Using an Automated Efficient Gate Clustering Technique [p. 480]
-
Mohab Anis, Shawki Areibi, Mohamed Mahmoud, Mohamed Elmasry
Reducing power dissipation is one of the most principle subjects
in VLSI design today. Scaling causes subthreshold leakage currents
to become a large component of total power dissipation. This
paper presents two techniques for efficient gate clustering in MTCMOS
circuits by modeling the problem via Bin-Packing (BP) and
Set-Partitioning (SP) techniques. An automated solution is presented,
and both techniques are applied to six benchmarks to verify
functionality. Both methodologies offer significant reduction
in both dynamic and leakage power over previous techniques during
the active and standby modes respectively. Furthermore, the
SP technique takes the circuit‰s routing complexity into consideration
which is critical for Deep Sub-Micron (DSM) implementations.
Sufficient performance is achieved, while significantly reducing
the overall sleep transistors‰ area. Results obtained indicate
that our proposed techniques can achieve on average 90% savings
for leakage power and 15% savings for dynamic power.
Categories and Subject Descriptors: B.7.1 [Integrated Circuits]:
Types and Design Styles
General Terms: Design
-
32.2 Total Power Optimization By Simultaneous Dual-Vt Allocation and Device Sizing in High Performance Microprocessors [p. 486]
-
Tanay Karnik, Yibin Ye, James Tschanz, Liqiong Wei, Steven Burns, Venkatesh Govindarajulu, Vivek De, Shekhar Borkar
We describe various design automation solutions for design
migration to a dual-Vt process technology. We include the
results of a Lagrangian Relaxation based tool, iSTATS, and a
heuristic iterative optimization flow. Joint dual-Vt allocation
and sizing reduces total power by 10+% compared with Vt
allocation alone, and by 25+% compared with pure sizing
methods. The heuristic flow requires 5x larger computation
runtime than iSTATS due to its iterative nature.
Categories and Subject Descriptors
B.7 INTEGRATED CIRCUITS
B.7.1 Types and Design Styles š Microprocessors and
microcomputers, VLSI.
General Terms
Algorithms, Performance, Design, Experimentation,
Verification.
Keywords
Dual-Vt design, multiple threshold, sizing, optimization.
-
32.3 An Optimal Voltage Synthesis Technique for a Power-Efficient Satellite Application [p. 492]
-
Dong-In Kang, Jinwoo Suh, Stephen P. Crago
This paper presents an optimal voltage synthesis technique for a
satellite application to maximize system performance subject to
energy budget. A period of a satellite's orbit is partitioned into
several independent regions with different characteristics such as
type of computation, importance, performance requirements, and
energy consumption. Given a periodic energy recharge model,
optimal voltages for the regions are synthesized such that the
overall performance is maximized within the energy budget in the
period.
Categories and Subject Descriptors
C.4 [PERFORMANCE OF SYSTEMS] Design studies,
Modeling techniques, Performance attributes.
General Terms
Algorithms, Management, Performance, Design.
Keywords
Power-aware design, power-efficient design, satellite application,
queueing.
Chair: L. Miguel Silveira
Organizers: Georges G. Gielen, Kartikeya Mayaram
-
33.1 Fast and Accurate Behavioral Simulation of Fractional-N
Frequency Synthesizers and other PLL/DLL Circuits [p. 498]
-
Michael H. Perrott
Techniques for fast and accurate simulation of fractional-N
synthesizers at a detailed behavioral level are presented.
The techniques allow a uniform time step to be used for the
simulator, and can be applied to a variety of phase locked
loop (PLL) and delay locked loop (DLL) circuits beyond
fractional-N synthesizers, as well as to a variety of simulation
frameworks such as Verilog and Matlab. Simulated results
from a custom C++ simulator are shown to compare well to
measured results from a prototype fractional-N synthesizer
using a Delta-Sigma modulator to dither its divide value.
Categories and Subject Descriptors
I.6.5 [Simulation and Modeling]: Model Development
General Terms
Algorithms
Keywords
fractional-N,frequency,synthesizer,sigma,delta,PLL,DLL
-
33.2 Time-domain Steady-state Simulation of Frequency-Dependent
Components Using Multi-interval Chebyshev Method [p. 504]
-
Baolin Yang, Joel Phillips
Simulation of RF circuits often demands analysis of distributed
component models that are described via frequency-dependent multiport
Y, Z, or S parameters. Frequency-domain methods such as
harmonic balance are able to handle these components without difficulty,
while they are more difficult for time-domain simulation
methods to treat. In this paper, we propose a hybrid frequency-time
approach to treat these components in steady-state time-domain
simulations. Efficiency is achieved through the use of the multi-interval
Chebyshev (MIC) simulation method and a low-order rational-fitting
model for preconditioning matrix-implicit Krylov-subspace
solvers.
Categories and Subject Descriptors
I.6 [Computing Methodologies]: Simulation and Modeling
General Terms
Algorithms
Keywords
RF circuit simulation, frequency dependent, S parameter
-
33.3 A Time-domain RF Steady-State Method for Closely Spaced Tones [p. 510]
-
Jaijeet Roychowdhury
Verifying circuits with two or more closely-spaced driving frequencies
is important in RF and wireless communications, e.g., in
the design of down-conversion mixers. Existing steady-state calculation
methods, like harmonic balance, rely on Fourier series expansions
to find the difference-frequency components typically of
interest. Time-domain methods are, however, better suited for circuits
with strong nonlinearities such as switching. Towards this
end, we present a purely time-domain method for direct computation
of difference tones in closely-spaced multi-tone problems.
Our approach is based on multiple artificial time scales for decoupling
the tones driving the circuit. Our method relies on a novel
multi-time reformulation that expresses circuit equations directly
in terms of time-scales corresponding to difference tones. We apply
the new technique to an RF-CMOS mixer to predict baseband
bit-streams and down-conversion gain and distortion, in two orders
of magnitude less CPU time than traditional time-stepping simulation.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Analog Verification
General Terms
Algorithms, Design, Verification
Keywords
MPDE, difference-frequency time scales, multi-time PDEs, harmonic
balance, shooting, envelope, analog, mixed-signal, analog/RF
simulation, RF switching mixers, artificial time scales, homotopy,
continuation methods
-
33.4 An Algorithm for Frequency-Domain Noise Analysis in Nonlinear Systems [p. 514]
-
Giorgio Casinovi
Many mixed-technology and mixed-signal designs require
the assessment of the system's performance in the presence
of signals that are best characterized as having continuous
frequency spectra: for instance, the simulation of a circuit's
behavior in the presence of electrical noise. This paper
describes an algorithm for frequency-domain simulation of
nonlinear systems capable of handling signals with continuous
spectra. The algorithm is based on an orthogonal series
expansion of the signals in the frequency domain. Thus it
is, in a sense, the dual approach to frequency-domain simulation
with respect to harmonic balance, which relies on a
time-domain series expansion of the signals. Signal spectra
are obtained from the solution of a system of nonlinear algebraic
equations whose dimension increases with the desired
spectral accuracy. Numerical results obtained on an optical
amplifier are presented.
Categories and Subject Descriptors
J.6 [Computer-Aided Engineering]: Computer-Aided
Design; I.6.8 [Simulation and Modeling]: Types of Simulation
- continuous
General Terms
Algorithms, Verification
Keywords
Circuit simulation, frequency-domain analysis
Chair: Anand Raghunathan
Organizers: Anand Raghunathan, Marco Di Natale
-
34.1 System-level Performance Optimization of the Data Queueing
Memory Management in High-Speed Network Processors [p. 518]
-
Ch. Ykman-Couvreur, J. Lambrecht, D. Verkest, F. Catthoor, A. Nikologiannis, G. Konstantoulakis
In high-speed network processors, data queueing has to allow real-time
memory (de)allocation, buffering, retrieving, and forwarding of incoming
data packets. Its implementation must be highly optimized to combine
high speed, low power, large data storage, and high memory bandwidth. In
this paper, such data queueing is used as case study to demonstrate the
effectiveness of a new system-level exploration method for optimizing the
memory performance in dynamic memory management. Assuming that a multi-bank
memory architecture is used for data storage, the method trades off bank
conflicts against memory accesses during real-time memory (de)allocation. It
has been applied to the data queueing module of the PRO3 system
[8]. Compared with the conventional memory management technique for
embedded systems, our exploration method can save up to 90% of the bank
conflicts, which allows to improve worst-case memory performance of data
queueing operations by 50% too.
Categories and Subject Descriptors
B.3.3 [Memory Structures]: Performance Analysis and Design Aids
General Terms
Memory performance
Keywords
System-level exploration, memory management
-
34.2 Analysis of Power Consumption on Switch Fabrics in Network Routers [p. 524]
-
Terry Tao Ye, Luca Benini, Giovanni De Micheli
In this paper, we introduce a framework to estimate the power consumption
on switch fabrics in network routers. We propose different
modeling methodologies for node switches, internal buffers and
interconnect wires inside switch fabric architectures. A simulation
platform is also implemented to trace the dynamic power consumption
with bit-level accuracy. Using this framework, four switch
fabric architectures are analyzed under different traffic throughput
and different numbers of ingress/egress ports. This framework
and analysis can be applied to the architectural exploration for low
power high performance network router designs.
Categories and Subject Descriptors
C.4 [Performance of Systems]: Design studies, Modeling techniques
General Terms
Design, Experimentation, Performance
Keywords
Networks on Chip, Interconnect Networks, Systems on Chip, Power
Consumption
-
34.3 Memory Optimization in Single Chip Network Switch Fabrics [p. 530]
-
David Whelihan, Herman Schmit
Moving high bandwidth (10Gb/s+) network switches from the
large scale, rack mount design space to the single chip design
space requires a re-evaluation of the overall design requirements.
In this paper, we explore the design space for these
single chip devices by evaluating the ITRS. We find that unlike
ten years ago when interconnect was scarce, the limiting factor
in today's designs is on-chip memory. We then discuss an architectural
technique for maximizing the effectiveness of queue
memory in a single chip switch. Next, we show simulation results
that indicate that a more than two order of magnitude
improvement in dropped packet probability can be achieved
by re-distributing memory and allowing sharing between the
switch's ports. Finally, we evaluate the cost of the optimized
architecture in terms of other on-chip resources.
Categories and Subject Descriptors
B.3.2 [Harware]: Design Styles
General Terms
Design, Performance
Chair: Alan Mantooth
Organizers: Joel R. Phillips, Kartikeya Mayaram
-
35.1 Behavioral Modeling of (Coupled) Harmonic Oscillators [p. 536]
-
Piet Vanassche, Georges Gielen, Willy Sansen
A new approach is presented for the construction of behavioral
models for harmonic oscillators and sets of coupled harmonic oscillators.
The models can be used for system-level simulations and
trade-off analysis. Besides the steady-state behavior, the model also
takes the transient behavior of the oscillation amplitudes and phase
differences into account. This behavior is modeled using a set of
linear differential equations. Their extraction from a netlist description
is based upon the ideas of perturbation analysis and stochastic
averaging. The modeling equations are valid in the neighbourhood
of the oscillator‰s steady-state operating point and can be evaluated
at very low computational cost. Another major advantage of
the approach is that it explicitly separates the fast-varying steady-state
behavior and the slow-varying transient behavior. This allows
for straightforward application of multi-rate simulation techniques,
greatly boosting simulation speeds. The technique is illustrated for
a quadrature oscillator.
Categories and Subject Descriptors
G.1.7 [Mathematics of Computing]: Ordinary Differential Equations;
I.6.5 [Computing Methodologies]: Model DevelopmentÖ
modeling methodologies
General Terms
Algorithms, Theory
Keywords
Averaging, Behavioral modeling, Coupled harmonic oscillators, Perturbation
theory
-
35.2 Model Checking Algorithms for Analog Verification [p. 542]
-
Walter Hartong, Lars Hedrich, Erich Barke
In this contribution we present the first method for model checking
on nonlinear analog systems. Based on digital CTL model checking
algorithms and results in hybrid model checking, we have developed
a concept to adapt these ideas to analog systems. Using
an automatic state space subdivision method the continuous state
space is transfered into a discrete model. In doing this, the most
challenging task is to retain the essential nonlinear behavior of the
analog system. To describe analog specification properties, an extension
to the CTL language is needed. Two small examples show
the properties and advantages of this new method and the capability
of the implemented prototype tool.
Categories and Subject Descriptors
I.6.m [Computer Methodologies]: Simulation and Modeling
General Terms
Algorithms, Language, Theory, Verification
Keywords
Model Checking, Formal Methods, Nonlinear Analog Systems
-
35.3 Regularization of Hierarchical VHDL-AMS Models using Bipartite Graphs [p. 548]
-
Jochen Mades, Manfred Glesner
The powerful capability of VHDL-AMS to describe complex continuous
systems in form of differential algebraic equations (DAEs)
often leads to problems during numerical simulation. This paper
presents a discrete algorithm to analyze unsolvable DAE systems
and to correct the underlying hierarchical VHDL-AMS description
automatically in interaction with the designer, avoiding time consuming
manual error correction.
Categories and Subject Descriptors
G.2.2 [Discrete Mathematics]: Graph algorithms
General Terms
Algorithms, Design
Keywords
DAEs, VHDL-AMS, Regularization, Structural Solvability, Bipartite
Graphs
-
35.4 Improving the Generality of the Fictitious Magnetic Charge Approach
to Computing Inductances in the Presence of Permeable Materials [p. 552]
-
Yehia Massoud, Jacob White
In this paper we present an improvement to the fictitious
magnetic charge approach to computing inductances in the
presence of permeable materials. The improvement replaces
integration over a "non-piercing" surface with a line integral
and an efficient quadrature scheme. Eliminating the need to
generate non-piercing surfaces substantially simplifies handling
problems with general geometries of permeable materials.
Computational results are presented to demonstrate
the accuracy and versatility of the new method.
Chair: David J. Hathaway
Organizers: Louis Scheffer, Narendra V. Shenoy
-
36.1 A General Probabilistic Framework for Worst Case Timing Analysis [p. 556]
-
Michael Orshansky, Kurt Keutzer
The traditional approach to worst-case static-timing analysis is
becoming unacceptably conservative due to an ever-increasing
number of circuit and process effects. We propose a
fundamentally different framework that aims to significantly
improve the accuracy of timing predictions through fully
probabilistic analysis of gate and path delays. We describe a
bottom-up approach for the construction of joint probability
density function of path delays, and present novel analytical and
algorithmic methods for finding the full distribution of the
maximum of a random path delay space with arbitrary path
correlations.
Categories and Subject Descriptors
J.6.1 [Computer-Aided Engineering]: Computer-aided design.
General Terms
Algorithms
-
36.2 False Timing Path Identification Using ATPG Techniques and Delay-Based Information [p. 562]
-
Jing Zeng, Magdy Abadir, Jacob Abraham
A well-known problem in timing verification of VLSI circuits using
static timing analysis tools is the generation of false timing paths.
This leads to a pessimistic estimation of the processor speed and
wasted engineering effort spent optimizing unsensitizable paths.
Earlier results have shown how ATPG techniques can be used to
identify false paths efficiently [6],[9], as well as how to bridge the
gap between the physical design on which the static timing analysis
is based and the test view on which ATPG technique is applied
to identify false paths [9]. In this paper, we will demonstrate efficient
techniques to identify more false timing paths by utilizing
information from an ordered list of timing paths according to the
delay information. More than 10% of additional false timing paths
out of the total timing paths analyzed are identified compared to
earlier results on the MPC7455, a Motorola processor executing to
the PowerPCTM 1 instruction set architecture.
Categories and Subject Descriptors
J.6 [Computer-Aided Engineering]: --Computer-aided design; B.8.2
[Performance and Reliability]: Performance Analysis and Design
Aids; F.2.3 [Analysis of Algorithms and Problem Complexity]:
Tradeoffs between Complexity Measures; B.6.3 [Logic Design]:
Design Aids - Optimization
General Terms
Algorithms, Performance
Keywords
static timing analysis, false timing paths, ATPG, timing slack
-
36.3 False-Path-Aware Statistical Timing Analysis and Efficient Path
Selection for Delay Testing and Timing Validation [p. 566]
-
Jing-Jia Liou, Angela Krstic, Li-C. Wang, Kwang-Ting Cheng
We propose a false-path-aware statistical timing analysis framework.
In our framework, cell as well as interconnect delays are
assumed to be correlated random variables. Our tool can characterize
statistical circuit delay distribution for the entire circuit and
produce a set of true critical paths.
Categories and Subject Descriptors
B.8.2 [Hardware]: Performance Analysis and Design Aids
General Terms
Algorithm, Performance, Reliability
Keywords
Critical path selection, false path, statistical timing analysis
-
36.4 A Fast, Inexpensive and Scalable Hardware Acceleration Technique for Functional Simulation [p. 570]
-
Srihari Cadambi, Chandra S. Mulpuri, Pranav N. Ashar
We introduce a novel approach to accelerating functional
simulation. The key attributes of our approach are high-performance,
low-cost, scalability and low turn-around-time
(TAT).We achieve speedups between 25 and 2000x over zero
delay event-driven simulation and between 75 and 1000x
over cycle-based simulation on benchmark and industrial circuits
while maintaining the cost, scalability and TAT advantages
of simulation. Owing to these attributes, we believe
that such an approach has potential for very wide deployment
as replacement or enhancement for existing simulators.
Our technology relies on a VLIW-like virtual simulation processor
(SimPLE) mapped to a single FPGA on an off-the-shelf
PCI-board. Primarily responsible for the speed are (i)
parallelism in the processor architecture (ii) high pin count
on the FPGA enabling large instruction bandwidth and (iii)
high speed (124 MHz on Xilinx Virtex-II) single-FPGA implementation
of the processor with regularity driven efficient
place and route. Companion to the processor is the very fast
SimPLE compiler which achieves compilation rates of 4 million
gates/hour. In order to simulate the netlist, the compiled
instructions are streamed through the FPGA, along
with the simulation vectors. This architecture plugs in naturally
into any existing HDL simulation environment. We
have a working prototype based on a commercially available
PCI-based FPGA board.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - Simulation, Verification
General Terms
Algorithms, Performance, Experimentation, Verification
Keywords
Functional simulation, hardware acceleration, VLIW, FPGA
Chair: David Dill
Organizers: Nate James, Shishpal Rawat
Panelists: GÚrard Berry, Limor Fix, Harry Foster, Rajeev Ranjan, Gunnar Stalmarck, Curt Widdoes [p. 576]
-
-
Do formal verification tools and methodologies require a drastic
overhaul to move beyond equivalence checking? Equivalence
checking catches errors in synthesis and local hand-modifications
to designs. However, powerful formal verification technologies
are emerging to combat "behavioral" errors, which represent
today's biggest verification problems. Nonetheless, formal
verification experts are split on how formal tools should adapt to
this challenge. Some of our panelists feel that designers can
sufficiently benefit from new formal verification technologies by
making incremental changes to current methodologies. Others,
however, argue that major changes are required to reap
meaningful benefits from these new technologies. Just how much
change is enough, what is the capacity of our current tools and
what is limiting the full deployment of FV technology?
Our panel of experts, consisting of users, tool providers, and core
engine builders, will answer these challenging questions. The
panel will debate these issues while discussing real life examples
from the user base. They will provide a perspective of how the
progression of technology will bring the real promise of formal
verification to the user base.
Chair: Noel Menezes
Organizers: Charles J. Alpert, Steven Teig
-
38.1 S-Tree: A Technique for Buffered Routing Tree Synthesis [p. 578]
-
Milos Hrkic, John Lillis
We present the S-Tree algorithm for synthesis of buffered interconnects. The
approach incorporates a unique combination of real-world issues (handling
of routing and buffer blockages, cost minimization, critical sink isolation,
sink polarities), robustness and scalability. The algorithm is able to
achieve the slack comparable to that of buffered P-Tree [7] using less
resources (wire and buffers) in an order of magnitude less cpu time.
Categories and Subject Descriptors
B.7.2 [Hardware]: Integrated Circuits - Design Aids
General Terms
Algorithms
Keywords
timing algorithms, Steiner trees, routing, buffer insertion.
-
38.2 An Algorithm for Integrated Pin Assignment and Buffer Planning [p. 584]
-
Hua Xiang, Xiaoping Tang, D. F. Wong
The buffer block methodology has become increasingly popular as
more and more buffers are needed in deep-submicron design, and
it leads to many challenging problems in physical design. In this
paper, we present a polynomial-time exact algorithm for integrated
pin assignment and buffer planning for all two-pin nets from one
macro block (source block) to all other blocks of a given buffer
block plan as well as minimizing the total cost alpha.W + beta.R for any
positive a and b where W is the total wire length and R is the number
of buffers. By applying this algorithm iteratively (each time
pick one block as the source block), it provides a polynomial-time
algorithm for pin assignment and buffer planning for nets among
multiple macro blocks. Experimental results demonstrate its efficiency
and effectiveness.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design AidsÖPlacement and routing;
J.6 [Computer Applications]: Computer-Aided EngineeringÖ
Computer-aided design
General Terms
Algorithms, Design, Performance, Theory
Keywords
buffer insertion, pin assignment, min-cost maximum flow
-
38.3 An Efficient Routing Database [p. 590]
-
Narendra V. Shenoy, William Nicholls
Routing is an important problem in the process of design creation. In
this paper, we focus on the problem of designing a database for the
non-partitioned routing problem. New technology libraries describe
constraints that are hard to manage in grid-based approaches to the
routing database. While general region query based data-structures
have been proposed, they typically suffer from speed problems when
applied to large blocks. We introduce an interval-based approach.
It provides more flexibility than grid-based techniques. It exploits
the notion of preferred direction for metal layers to manage the memory
efficiently. It supports efficient region queries. We finally present
a comparison study for real industrial designs on this database.
Categories and Subject Descriptors
J.6 [Computer-Aided Engineering]: Computer-aided design
General Terms
Algorithms, Design
Keywords
Physical design, routing, database
Chair: Rolf Ernst
Organizers: Krzysztof Kuchcinski, Miodrag Potkonjak
-
39.1 Automatic Generation of Embedded Memory Wrapper for Multiprocessor SoC [p. 596]
-
Ferid Gharsalli, Samy Meftali, FrÚdÚric Rousseau, Ahmed A. Jerraya
Embedded memory plays a critical role to improve
performances of systems-on-chip (SoC). In this paper, we
present a new methodology for embedded memory design in the
case of application specific multiprocessor system-on-chip. This
approach facilitates the integration of standard memory
components. The concept of memory wrapper allows automatic
adaptation of physical memory interfaces to a communication
network that may have a different number of access ports. We
give also a generic architecture to produce this memory
wrapper. This approach has successfully been applied on a low-level
image processing application.
General Terms
Design, Experimentation.
Categories and Subject Descriptors
M1.9 and T4.5: High level and architectural synthesis
Keywords
Embedded Memory, System-on-Chip, Memory access, Memory
Wrapper Generation.
-
39.2 A Novel Synthesis Technique for Communication Controller Hardware from Declarative Data Communication Protocol Specifications [p. 602]
-
Robert Siegmund, Dietmar MÄller
An innovative methodology for the efficient design of communication
controller hardware for popular protocols such as ATM, USB
or CAN is proposed. In our approach, controller hardware in form
of RTL models is synthesized from a formal specification of a communication
protocol. The difference to previously published work
related to hardware synthesis techniques from protocol specifications
is that in our approach a complete communication architecture
consisting of both the interacting transaction producer and the
consumer controllers, as well as the interconnect between them,
are synthesized from one single protocol specification in the same
synthesis tool run, thus ensuring conformity of all producer and
consumer controllers to the protocol specification while tremendously
reducing the modeling effort for the controller specifications.
The formalism used for protocol specification and a corresponding
hardware synthesis algorithm from such specifications
are presented. The methodology has been applied to the design
of various communication controllers including IEC14443 Wireless
SmartCard, ATM and CAN. The novelty and efficiency of our
methodology is demonstrated through comparison to State-of-The-Art
protocol synthesis tools such as [10].
Categories and Subject Descriptors
B.4.4 [Hardware]: Input/Output and Data Communications Devices
- Performance Analysis and Design Aids; B.5.2 [Hardware]:
Register-Transfer-Level-Implementation - Design Aids
General Terms
Algorithms, Design, Languages
Keywords
Protocol Specification, Controller Hardware Synthesis, Interface-based
Design
-
39.3 An Integrated Algorithm for Memory Allocation and Assignment in High-level Synthesis [p. 608]
-
Jaewon Seo, Taewhan Kim, Preeti R. Panda
With the increasing design complexity and performance
requirement, data arrays in behavioral specification are usually
mapped to fast on-chip memories in behavioral synthesis. This paper
describes a new algorithm that overcomes two limitations of the previous
works on the problem of memory-allocation and array-mapping
to memories. Specifically, its key features are (1) a tight link to the
scheduling effect, which was totally or partially ignored by the existing
memory synthesis systems, and supporting (2) non-uniform
access speeds among the ports of memories, which greatly diversify
the possible (practical) memory configurations. Experimental data
on a set of benchmark filter designs are provided to show the effectiveness
of the proposed exploration strategy in finding globally best
memory configurations.
Categories and Subject Descriptors : B.5.2 [Register-Transfer-Level
Implementation]: Design Aids
General Terms : Algorithms, Performance, Design
Keywords : memory allocation, memory assignment, memory design,
scheduling effect
-
39.4 High-Level Synthesis of Multiple-Precision Circuits Independent of Data-Objects Length [p. 612]
-
M. C. Molina, J. M. Mendâas, R. Hermida
This paper presents a heuristic method to perform the high-level
synthesis of multiple-precision specifications. The scheduling is
based on the balance of the number of bits calculated per cycle,
and the allocation on the bit-level reuse of the hardware resources.
The implementations obtained are multiple-precision datapaths
independent of the number and widths of the specification
operations. As a result impressive area savings are achieved in
comparison with conventional algorithms implementations.
Categories and Subject Descriptors
B.5.2 [Register-transfer-level Implementation]: Design Aids
automatic synthesis, optimization.
General Terms
Algorithms, Design.
Keywords
High-level synthesis, multiple-precision, scheduling, allocation.
Chair: Giuseppe Lipari
Organizers: Marco Di Natale, Xiaobo (Sharon) Hu
-
40.1 Schedulability of Event-Driven Code Blocks in Real-Time Embedded Systems [p. 616]
-
Samarjit Chakraborty, Thomas Erlebach, Simon KÄnzli, Lothar Thiele
Many real-time embedded systems involve a collection of
independently executing event-driven code blocks, having
hard real-time constraints. Tasks in many such systems,
like network processors, are either not preemptable or have
restrictions on the number of preemptions allowed. All the
previous work on the schedulability analysis of such systems
either have exponential complexity, or allow unbounded number
of preemptions and are usually based on heuristics. In
this paper we present the exact necessary and sufficient conditions
under EDF, for the schedulability of such a collection
of code blocks in a non-preemptive environment, and give
efficient algorithms for testing them. We validate our analytical
results with experiments and show that the schedulability
analysis problem in such systems can be exactly and
efficiently solved in practice.
Categories and Subject Descriptors
C.3 [Computer Systems Organization]: Special-purpose
and application-based systemsÖReal-time and embedded systems
General Terms
Algorithms, Performance, Verification
-
40.2 Associative Caches in Formal Software Timing Analysis [p. 622]
-
Fabian Wolf, Jan Staschulat, Rolf Ernst
Precise cache analysis is crucial to formally determine program
running time. As cache simulation is unsafe with respect to the
conservative running time bounds for real-time systems, current
cache analysis techniques combine basic block level cache modeling
with explicit or implicit program path analysis. We present an
approach that extends instruction and data cache modeling from the
granularity of basic blocks to program segments thereby increasing
the overall running time analysis precision. Data flow analysis and
local simulation of program segments are combined to safely predict
cache line contents for associative caches in software running
time analysis. The experiments show significant improvements in
analysis precision over previous approaches on a typical embedded
processor.
Categories and Subject Descriptors
C.3 [Special-Purpose and Application-Based Systems]: [Realtime
and embedded systems]; D.2.4 [Software Engineering]: Software/
Program VerificationÖformal methods; C.4 [Performance of
Systems]: Performance Attributes
General Terms
Algorithms, Performance, Verification
Keywords
Cache Analysis, Embedded Software, Real-Time, Timing Analysis
-
40.3 Compiler-Directed Scratch Pad Memory Hierarchy Design and Management [p. 628]
-
M. Kandemir, A. Choudhary
One of the primary challenges in embedded system design is designing
the memory hierarchy and restructuring the application to take advantage
of it. This task is particularly important for embedded image
and video processing applications that make heavy use of large multidimensional
arrays of signals and nested loops. In this paper, we show
that a simple reuse vector/matrix abstraction can provide compiler with
useful information in a concise form. Using this information, compiler
can either adapt application to an existing memory hierarchy or
can come up with a memory hierarchy. Our initial results indicate that
the compiler is very successful in both optimizing code for a given
memory hierarchy and designing a hierarchy with reasonable performance/
size ratio.
Categories and Subject Descriptors
B.3 [Hardware]: Memory Structures; D.3.4 [Programming Languages]:
ProcessorsÖCompilers;Optimization
General Terms
Design, Experimentation, Performance
Keywords
Data Reuse, Scratch Pad Memory, Memory Hierarchy
Chair: Chris Rowen
Organizers: Kurt Keutzer, Majid Sarrafzadeh
-
41.1 Unlocking the Design Secrets of a 2.29 Gb/s Rijndael Processor [p. 634]
-
Patrick R. Schaumont, Henry Kuo, Ingrid M. Verbauwhede
This contribution describes the design and performance testing of
an Advanced Encryption Standard (AES) compliant encryption
chip that delivers 2.29 GB/s of encryption throughput at 56 mw
of power consumption. We discuss how the high level reference
specification in C is translated into a parallel architecture.
Design decisions are motivated from a system level viewpoint.
The prototyping setup is discussed.
Categories and Subject Descriptors
C.3 [Special-Purpose and Application-based Systems]
General Terms
Measurement, Performance, Design, Security.
Keywords
Rijndael, Encryption, Domain-Specific, Low-Power.
-
41.2 The iCOREœ 520 MHz Synthesizable CPU Core [p. 640]
-
Nick Richardson, Lun Bin Huang, Razak Hossain, Tommy Zounes, Naresh Soni, Julian Lewis
This paper describes a new implementation of the ST20-C2 CPU
architecture. The design involves an eight-stage pipeline with
hardware support to execute up to three instructions in a cycle.
Branch prediction is based on a 2-bit predictor scheme with a 1024-entry
Branch History Table and a 64 entry Branch Target Buffer and
a 4-entry Return Stack. The implementation of all blocks in the
processor was based on synthesized logic generation and automatic
place and route. The full design of the CPU from microarchitectural
investigations to layout required approximately 8-man years. The
CPU core, without the caches, has an area of approximately 1.5 mm2
in a 6-metal 0.18m CMOS process. The design operates up to 520
MHz at 1.8V, among the highest reported speeds for a synthesized
CPU core [1].
Categories and Subject Descriptors
M.2.4 [Design Methodologies]: High performance ASIC design:
Optimizing microarchitecture to layout.
General Terms: Performance, Design.
Keywords: Synthesis, ASIC, CPU, embedded, st20, pipeline,
cache, branch-prediction, high-frequency, microarchitecture
-
41.3 A Flexible Accelerator for Layer 7 Networking Application [p. 646]
-
Gokhan Memik, William H. Mangione-Smith
In this paper, we present a flexible accelerator designed for networking applications. The accelerator can be utilized efficiently by a variety of Network Processor designs. Most Network Processors employ hardware accelerators for implementing key tasks. New applications require new tasks, such as pattern matching, to be performed on the packets in real-time. Using our proposed accelerator, we have implemented several such tasks and measured their performance. Specifically, the accelerator achieves 25-fold improvement on the performance of pattern matching, and 10-fold improvement for tree lookup, over optimized software solutions. Since the accelerator is used for different tasks, the hardware requirements are small compared to an accelerator group that implements the same set of tasks. We also present accurate analytic models to estimate the execution time of these networking tasks.
Categories and Subject Descriptors
B.2.4 [Arithmetic and logic structures]: High-Speed Arithmetic,
Cost/Performance.
General Terms
Algorithms, Measurement, Performance, Design.
Keywords
Application-Specific Processor, Networking Applications, Network Processor, Accelerator, Table Lookup, Pattern Matching.
Chair: Jan Rabaey
Organizers: Joachim Kunkel, Dennis Brophy
Panelists: Raul Camposano, Davoud Samani, Larry Lerner, Rick Hetherington [p. 652]
-
-
The PC industry was the major consumer of silicon in the 80's
and 90's. It defined the requirements for EDA. In a world
dominated by PC's, clock frequency was the ultimate measure of
performance. Times have changed. Today the internet, wireless
communications and consumer applications such as high end
gaming stations have replaced the PC as the primary driver. How
we measure "cutting edge" has also changed. Raw computing
speed measured in terms of clock speed has given way to alternate
forms of measuring performance - MBits/s, MMACS, and
Polygons/sec now capture the differing computation requirements
for these domains. Depending on the application domain, weight
and power are just as important if not more important than raw
computational power. How do these changes drive technology
development in EDA? Which of these is the dominant EDA
driver? How much of this drive is a function of EDA following
the money? How much of this drive is a function of unique
technical challenges posed by these domains? This diverse set of
panelists with representation from various application segments
and the EDA industry will attempt to answer these questions.
Chair: Cheng-Kok Koh
Organizers: Kaushik Roy, Noel Menezes
-
43.1 Estimation of the Likelihood of Capacitive Coupling Noise [p. 653]
-
Sarma B. K. Vrudhula, David Blaauw, Supamas Sirichotiyakul
Categories and Subject Descriptors
B.7.2 [Hardware]: Integrated Circuits - Design Aids,Reliability
and Testing
General Terms
Performance, Reliability, Verification
Keywords
Noise, Signal Integrity, Deep Submicron
The corruption of signals due to capacitive and inductive coupling
of interconnects has become a significant problem in the design of
deep submicron circuits (DSM). Noise simulators, based on worst case
assumptions, are overly pessimistic. As a result, when they
are used on industrial ICs with hundreds of thousands of nets, thousands
of nets are reported as having potential noise violations. There
is a need to prioritize the problem nets based on the likelihood of
the noise and possibly even eliminate them from further consideration
if the likelihood is negligible. In this paper, a probabilistic
approach is described which allows for a quantitative means to prioritize
nets based on the likelihood of the reported noise violation.
We derive upper bounds on the probability that the total noise injected
on a given victim net by a specific set of aggressors exceeds
a threshold. This bound is then used to determine a lower bound on
the expected number of clock cycles (ENC) before the first violation
occurs on a given net. Nets can be prioritized based on the ENC.
We demonstrate the utility of this approach through experiments
carried out on a large industrial processor design using a state-of-the-art
industrial noise analysis tool. A significant and interesting
result of this work is that a substantial portion (25%) of the nets
were found to have an ENC of more than five years. If five years is
deemed to be sufficiently long time, then these could be eliminated
from further consideration.
-
43.2 Crosstalk Noise Estimation for Noise Management [p. 659]
-
Paul B. Morton, Wayne Dai
One of the main challenges in developing an effective crosstalk
noise management strategy is to develop a crosstalk noise estimate
which is both accurate and leads to a tractable optimization problem
that can be used to optimally redistribute uncommitted routing
resources to resolve crosstalk noise violations. Devgan's [4] estimate
comes very close to meeting these objectives, however, it is
extremely pessimistic for nets with long couplings or aggressor nets
with short rise times. The increased complexity of more sophisticated
estimates, such as that proposed by Vittal et. al. [13] lead to
a crosstalk management problem that is very hard to solve. In this
paper we develop two estimates, similar in form to Devgan's estimate,
that are based on local approximations of Vittal's estimate.
Our estimates are substantially more accurate than Devgan's estimate
while still allowing us to formulate the crosstalk management
problem in a form that can be solved efficiently.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Layout, Placement and routing
General Terms
design, performance
Keywords
crosstalk, noise, estimation, local approximation, optimal spacing,
noise management
-
43.3 Variable Frequency Crosstalk Noise Analysis:
A Methodology to Guarantee Functionality from DC to FMAX [p. 665]
-
Byron Krauter, David Widiger
One means of reducing pessimism in crosstalk
analysis is to consider timing orthogonality. While earlier
works have addressed the temporal alignment of timing windows
[1, 2, 3, 4], these treatments have overlooked one key
point. Crosstalk noise failures are frequency dependent. A chip
that functions at one frequency can fail due to crosstalk noise
at faster and slower frequencies. Moreover, because system
developers and manufacturers need chips that operate over a
wide range of frequencies, noise analysis tools should guarantee
a wide range of operation. In this paper, we explain this
phenomenon and propose a simple procedure to include timing
information and guarantee functionality from dc to fmax.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - simulation, verification
General Terms
Algorithms, Design, Verification
Keywords
crosstalk, noise analysis, timing windows, timing orthogonality,
GCD frequency, LCM window, frequency-dependent noise
-
43.4 Towards Global Routing With RLC Crosstalk Constraints [p. 669]
-
James D. Z. Ma, Lei He
Conventional global routing minimizes total wire length and congestion.
Experiments using large industrial benchmark circuits show that up to
24% of nets in such routing solutions may have RLC crosstalk violations
at 3GHz clock. We develop an extremely efficient length-scaled Keff
(LSK) model that has a high fidelity for long-range RLC crosstalk.
We formulate an extended global routing problem (denoted as GSINO) to consider
simultaneous shield insertion and net ordering with RLC crosstalk constraints,
then propose an effective three-phase GSINO algorithm. The GSINO algorithm
completely eliminates the RLC crosstalk violations, and has small area
and wire length overhead compared to conventional routing.
Categories and Subject Descriptors
B.7.2 [Design Aids] - Placement and Routing.
General Terms
Algorithms, Design, Theory
Chair: Yervant Zorian
Organizers: Seiji Kajihara, Kwang-Ting (Tim) Cheng
-
44.1 Reduction of SOC Test Data Volume, Scan Power and Testing Time
Using Alternating Run-length Codes [p. 673]
-
Anshuman Chandra, Krishnendu Chakrabarty
We present a test resource partitioning (TRP) technique that simultaneously
reduces test data volume, test application time and scan power. The proposed
approach is based on the use of alternating run-length codes for test data
compression. Experimental results for the larger ISCAS-89 benchmarks
and an IBM production circuits show that reduced test data volume, test
application time and low power scan testing can indeed be achieved in all
cases.
-
44.2 Embedded Test Control Schemes for Compression in SOCs [p. 679]
-
Douglas Kay, Sung Chung, Samiha Mourad
This paper presents novel control schemes for testing embedded
cores in a system on a chip (SOC). It converts a traditional BIST
scheme into an externally controllable scheme to achieve a high
test quality within optimal test execution time without inserting
test points. The scheme promotes design and test reuse without
revealing IP information.
Categories
Testing; Test generation and debugging
General Terms
Algorithms, Performance, Design, Experimentation
Key words:
BIST, Data Compression, Test Resource Allocation
-
44.3 Wrapper/TAM Co-Optimization, Constraint-Driven Test Scheduling,
and Tester Data Volume Reduction for SOCs [p. 685]
-
Vikram Iyengar, Krishnendu Chakrabarty, Erik Jan Marinissen
This paper describes an integrated framework for plug-and-play
SOC test automation. This framework is based on a new approach
for wrapper/TAM co-optimization based on rectangle packing. We
first tailor TAM widths to each core's test data needs. We then
use rectangle packing to develop an integrated scheduling algorithm
that incorporates precedence and power constraints in the
test schedule, while allowing the SOC integrator to designate a
group of tests as preemptable. Finally, we study the relationship
between TAM width and tester data volume to identify an effective
TAM width for the SOC. We present experimental results for
non-preemptive, preemptive, and power-constrained test scheduling,
as well as for effective TAM width identification for an academic
benchmark SOC and three industrial SOCs.
Categories and Subject Descriptors
B.7.3 [Integrated circuits]: Reliability and testing
General Terms
Algorithms,design,experimentation,reliability,verification
Chair: Rolf Ernst
Organizers: Diederik Verkest, Donatella Sciuto
-
45.1 Communication Architecture Based Power Management for Battery Efficient System Design [p. 691]
-
Kanishka Lahiri, Anand Raghunathan, Sujit Dey
Communication-based power management (CBPM) is a new battery-driven
system-level power management methodology in which the system level
communication architecture regulates the execution of various system
components, with the aim of improving battery efficiency, and hence, battery
life. Unlike conventional power management policies (that attempt to
efficiently shut down idle components), CBPM may delay the execution of
selected system components even when they are active, in order to adapt the
system-level current discharge profile to suit the battery's characteristics.
In this paper, we present a methodology for the design of CBPM based
systems, which consists of system-level performance and power profiling,
battery discharge analysis, instrumentation of system components to facilitate
CBPM, definition of CBPM policies, and generation of the CBPM based
system architecture. We present extensive evaluations of CBPM,
and demonstrate its application to the design of an IEEE 801.11 Wireless
LAN MAC processor system. Our results indicate that CBPM based systems
are significantly more battery efficient than those based on conventional
power management techniques. Further, we demonstrate that CBPM
enables design-time as well as run-time tradeoffs between system performance
and battery life.
Categories and Subject Descriptors
C.5.4 [Computer System Implementation]: VLSI Systems
General Terms
Algorithms, Design, Experimentation
Keywords
Embedded Systems, Communication Architectures, Battery Efficiency,
Power Management, Low Power Design
-
45.2 Scheduler-Based DRAM Energy Management [p. 697]
-
V. Delaluz, A. Sivasubramaniam, M. Kandemir, N. Vijaykrishnan, M. J. Irwin
Previous work on DRAM power-mode management focused on
hardware-based techniques and compiler-directed schemes to explicitly
transition unused memory modules to low-power operating
modes. While hardware-based techniques require extra logic
to keep track of memory references and make decisions about future
mode transitions, compiler-directed schemes can only work
on a single application at a time and demand sophisticated program
analysis support. In this work, we present an operating system
(OS) based solution where the OS scheduler directs the power
mode transitions by keeping track of module accesses for each process
in the system. This global view combined with the flexibility
of a software approach brings large energy savings at no extra hardware
cost. Our implementation using a full-fledged OS shows that
the proposed technique is also very robust when different system
and workload parameters are modified, and provides the first set of
experimental results for memory energy optimization with a multiprogrammed
workload on a real platform. The proposed technique
is applicable to both embedded systems and high-end computing
platforms.
Categories and Subject Descriptors
D.4.0 [Operating Systems]: General; D.4.1 [Operating Systems]:
Process ManagementÖScheduling; B.3.1 [Memory Structures]:
Dynamic Memory (DRAM)
General Terms
Management, Design, Experimentation
Keywords
Energy Management, DRAM, Operating Systems, Scheduler, Energy Estimation.
-
45.3 An Integer Linear Programming Based Approach for Parallelizing Application in On-Chip Multiprocessors [p. 703]
-
I. Kadayif, M. Kandemir, U. Sezer
With energy consumption becoming one of the first-class optimization
parameters in computer system design, compilation techniques
that consider performance and energy simultaneously are expected
to play a central role. In particular, compiling a given application
code under performance and energy constraints is becoming an important
problem. In this paper, we focus on an on-chip multiprocessor
architecture and present a parallelization strategy based on
integer linear programming. Given an array-intensive application,
our optimization strategy determines the number of processors to
be used in executing each nest based on the objective function and
additional compilation constraints provided by the user. Our initial
experience with this strategy shows that it is very successful in
optimizing array-intensive applications on on-chip multiprocessors
under energy and performance constraints.
Categories and Subject Descriptors
D.3.4 [Programming Languages]: ProcessorsÖCompilers, Optimization
General Terms
Design, Experimentation, Performance
Keywords
Embedded Systems, Loop-Level Parallelism, Constraint-Based Compilation
Chair: Srivaths Ravi
Organizers: Anand Raghunathan, Alfred E. Dunlop
-
46.1 Embedding Infrastructure IP for SOC Yield Improvement [p. 709]
-
Yervant Zorian
In addition to the functional IP cores, today's SOC necessitates
embedding a special family of IP blocks, called Infrastructure IP
blocks. These are meant to ensure the manufacturability of the SOC
and to achieve adequate levels of yield and reliability. The
Infrastructure IP leverages the manufacturing knowledge and feeds
back the information into the design phase. This paper analyzes the
key trends and challenges resulting in manufacturing susceptibility
and field reliability that necessitate the use of such Infrastructure IP.
It also describes several examples of such embedded IPs for
detection, analysis and correction.
Categories and Subject Descriptors
B.8.1 [Performance and Reliability]: Reliability, testing and fault
tolerance.
General Terms: Measurement, Performance, Design,
Reliability, and Verification.
Keywords: Semiconductor IP, Embedded Test and Repair, Yield
Optimization, Test Resource Partitioning.
-
46.2 Using Embedded FPGAs for SoC Yield Improvement [p. 713]
-
Miron Abramovici, Charles Stroud, Marty Emmert
In this paper we show that an embedded FPGA
core is an ideal host to implement infrastructure IP for yield
improvement in a bus-based SoC. We present methods for
testing,diagnosing,and repairing embedded FPGAs,for
which complete testability is achieved without any area overhead
or performance degradation. We show how an FPGA
core can provide embedded testers for other cores in the SoC,
so that cores designed to be tested with external vectors can
be tested with BIST,and the entire SoC can be tested with a
low-cost tester.
Categories and Subject Descriptors: B.8.1 [Performance
and Reliability]: Reliability, Testing, and Fault-Tolerance
General Terms: Algorithms, Design, Reliability
Chair: Joao Marques-Silva
Organizers: Malgorzata Merek-Sadowska, Soha M. Hassoun
-
47.1 A Proof Engine Approach to Solving Combinational Design Automation Problems [p. 725]
-
Gunnar Andersson, Per Bjesse, Byron Cook, Ziyad Hanna
There are many approaches available for solving combinational
design automation problems encoded as tautology or
satisfiability checks. Unfortunately there exists no single
analysis that gives adequate performance for all problems
of interest, and it is therefore critical to be able to combine
approaches.
In this paper, we present a proof engine framework where
individual analyses are viewed as strategies|functions between
different proof states. By defining our proof engine
in such a way that we can compose strategies to form new,
more powerful, strategies we achieve synergistic effects between
the individual methods. The resulting framework has
enabled us to develop a small set of powerful composite default
strategies.
We describe several strategies and their interplay; one of
the strategies, variable instantiation, is new. The strength
of our approach is demonstrated with experimental results
showing that our default strategies can achieve up to several
magnitudes of speed-up compared to BDD-based techniques
and search-based satisfiability solvers such as ZCHAFF.
Categories and Subject Descriptors
B.6.3 [Hardware]: Logic Design|Design Aids
General Terms
Algorithms, Design, Verification
-
47.2 Solving Difficult SAT Instances in the Presence of Symmetry [p. 731]
-
Fadi A. Aloul, Arathi Ramani, Igor L. Markov, Karem A. Sakallah
Research in algorithms for Boolean satisfiability and their implementations
[23, 6] has recently outpaced benchmarking efforts. Most of the
classic DIMACS benchmarks [10] can be solved in seconds on commodity
PCs. More recent benchmarks take longer to solve because of their
large size, but are still solved in minutes [25]. Yet, small and difficult
SAT instances must exist because Boolean satisfiability is NP-complete.
We propose an improved construction of symmetry-breaking clauses
[9] and apply it to achieve significant speed-ups over current state-of-the-art
in Boolean satisfiability. Our techniques are formulated as preprocessing
and can be applied to any SAT solver without changing its
source code. We also show that considerations of symmetry may lead to
more efficient reductions to SAT in the routing domain.
Our work articulates SAT instances that are unusually difficult for
their size, including satisfiable instances derived from routing problems.
Using an efficient implementation to solve the graph automorphism problem
[18, 20, 22], we show that in structured SAT instances difficulty may
be associated with large numbers of symmetries.
Categories and Subject Descriptors
I.1.2 [Algorithms]: Algebraic algorithms.
General Terms
Algorithms, experimentation, verification.
Keywords
SAT, CNF, faster, search, symmetry, difficult, instances, speed-up.
-
47.3 Satometer: How Much Have We Searched? [p. 737]
-
Fadi A. Aloul, Brian D. Sierawski, Karem A. Sakallah
We introduce Satometer, a tool that can be used to estimate the percentage
of the search space actually explored by a backtrack SAT
solver. Satometer calculates a normalized minterm count for those
portions of the search space identified by conflicts. The computation
is carried out using a zero-suppressed BDD data structure and
can have adjustable accuracy. The data provided by Satometer can
help diagnose the performance of SAT solvers and can shed light
on the nature of a SAT instance.
Categories and Subject Descriptors
I.1 [Symbolic and Algebraic Manipulation]: Expressions and
Their Representation, Algorithms.
General Terms
Algorithms, Measurement, Experimentation, Verification.
Keywords
BDDs, ZBDDs, SAT, CNF, backtrack search, conflict diagnosis,
search space coverage, search progress.
-
47.4 SAT with Partial Clauses and Back-Leaps [p. 743]
-
Slawomir Pilarski, Gracia Hu
This paper presents four new powerful SAT optimization
techniques: partial clauses, back-leaps, immediate
implications, and local decisions. These optimization
techniques can be combined with SAT mechanisms used
in Chaff, SATO, and GRASP to develop a new solver
that significantly outperforms its predecessors on a large
set of benchmarks. Performance improvements for standard
benchmark groups vary from 1.5x to 60x. Performance
improvements measured using VLIW microprocessor
benchmarks amount to 3.31x.
-
47.5 Combining Strengths of Circuit-based and CNF-based
Algorithms for a High-Performance SAT Solver [p. 747]
-
Malay K. Ganai, Lintao Zhang, Pranav Ashar, Aarti Gupta, Sharad Malik
We propose Satisfiability Checking (SAT) techniques that lead to a consistent performance improvement of up to 3x over state-of-the-art SAT solvers like Chaff on important problem domains in VLSI CAD. We observe that in circuit oriented applications like ATPG and verification, different software engineering techniques are required for the portions of the formula corresponding to learnt clauses compared to the original formula. We demonstrate that by employing the same innovations as in advanced CNF-based SAT solvers, but in a hybrid approach where these two portions of the formula are represented differently and processed separately, it is possible to obtain the consistently highest performing SAT solver for circuit oriented problem domains. We also present controlled experiments to highlight where these gains come from. Once it is established that the hybrid approach is faster, it becomes possible to apply low overhead circuit-based heuristics that would be unavailable in the CNF domain for greater speedup.
Categories and Subject Descriptors
J.6 [Computer Applications]: Computer-aided Engineering - Computer-aided design.
General Terms
Algorithms, Performance, Experimentation,Verification.
Keywords
Boolean Satisfiability (SAT), Boolean Constraint Propagation (BCP), Conjunctive Normal Form (CNF), Bounded Model Checking (BMC).
Chair: Noel Menezes
Organizers: Jaijeet Roychowdhury, Mustafa Celik
-
48.1 A Solenoidal Basis Method For Efficient Inductance Extraction [p. 751]
-
Hemant Mahawar, Vivek Sarin, Weiping Shi
The ability to compute the parasitic inductance of the interconnect
is critical to the timing verification of modern VLSI circuits. A
challenging aspect of inductance extraction is the solution of large,
dense, complex linear systems of equations via iterative methods.
Accelerating the convergence of the iterative method through preconditioning
is made difficult due to the non-availability of the system
matrix. This paper presents a novel algorithm to solve these
linear systems by restricting current to a discrete solenoidal subspace
in which Kirchoff‰s law is obeyed, and solving a reduced
system via an iterative method such as GMRES. A preconditioner
based on the Green‰s function is used to achieve near-optimal convergence
rates in several cases. Experiments on a number of benchmark
problems illustrate the advantages of the proposed method
over FastHenry.
Categories and Subject Descriptors
B.7.2 [Integrating Circuits]: Design Aids - placement and routing,
simulation, verification
General Terms
ALGORITHMS
Keywords
Inductance extraction, interconnect, solenoidal basis, iterative methods,
preconditioning
-
48.2 On the Efficacy of Simplified 2D On-Chip Inductance Models [p. 757]
-
Tao Lin, Michael W. Beattie, Lawrence T. Pileggi
Full three-dimensional (3D) inductance models of on-chip interconnect
contain an extremely large number of forward coupling
terms. It is therefore desirable to use a two-dimensional (2D) approximation
in which forward couplings are not included. Unlike
capacitive coupling, however, truncating mutual inductance terms
can result in loss of accuracy and even instability. This paper investigates
whether ignoring forward couplings is an acceptable choice
for all good IC designs or if full 3D models are necessary in certain
on-chip interconnect configurations. We show that the significance
of the forward coupling inductance depends on various aspects of
the design.
Categories and Subject Descriptors
B7.2 [Integrated Circuits] Design Aids - simulation, verification
General Terms
Theory, algorithms
Keywords
On-chip inductance, PEEC, sparsified model
-
48.3 A Physical Model for the Transient Response of Capacitively
Loaded Distributed rlc Interconnects [p. 763]
-
Raguraman Venkatesan, Jeffrey A. Davis, James D. Meindl
Rapid approximation of the transient response of high-speed
global interconnects is needed to estimate the time delay,
crosstalk, and overshoot in a GSI multilevel wiring network. This
paper outlines a rigorous and physical solution for transients in a
capacitively-loaded distributed rlc interconnect using a
convergent series of modified Bessel functions. Compact models
for time delay and repeater insertion are derived. The single-line
model is extended to evaluate crosstalk in two-coupled lines.
These solutions are validated by HSPICE simulation, and have
potential applications to rapid rlc timing analysis, global wire
sizing and repeater insertion, signal integrity estimation, and
reliability modeling (e.g. voltage overshoot and high current
density concerns).
Categories and Subject Descriptors
B.4.3 [Input/Output and Data Communications]:
Interconnections (Subsystems) - physical structures (e.g.
backplanes, cables, chip carriers).
General Terms
Performance, Design, and Verification.
Keywords
Interconnects, distributed rlc lines, transient response, time delay,
crosstalk, overshoot, repeaters.
-
48.4 HSpeedEx: A High-Speed Extractor for Substrate Noise
Analysis in Complex Mixed-Signal SOC [p. 767]
-
Adil Koukab, Catherine Dehollain, Michel Declercq
The unprecedented impact of noise coupling on
Mixed-Signal Systems-On-a-Chip (MS-SOC) functionality, brings a
new set of challenges for Electronics Design Automation (EDA) tool
developers. In this paper, we propose a new approach which
combines a thorough physical comprehension of the noise coupling
effects with an improved Boundary-Element-Method (BEM) to
accelerate the substrate model extraction and to avoid the dense
matrix storage. The low computational efforts required, as well as
speed and accuracy reached, makes this method a highly promising
alternative to verify complex MS-SOCs.
-
48.5 Combined BEM/FEM Substrate Resistance Modeling [p. 771]
-
E. Schrik, N.P. van der Meijs
For present-day micro-electronic designs, it is becoming ever
more important to accurately model substrate coupling effects.
Basically, either a Finite Element Method (FEM) or a
Boundary Element Method (BEM) can be used. The FEM
is the most versatile and flexible whereas the BEM is faster,
but requires a stratified, layout-independent doping profile
for the substrate. Thus, the BEM is unable to properly
model any specific, layout-dependent doping patterns that
are usually present in the top layers of the substrate, such
as channel stop layers. This paper describes a way to incorporate
these doping patterns into our substrate model
by combining a BEM for the stratified doping profiles with
a 2D FEM for the top-level, layout-dependent doping patterns,
thereby achieving improved flexibility compared to
BEM and improved speed compared to FEM. The method
has been implemented in the SPACE layout to circuit extractor
and it has been successfully verified with two other
tools.
Categories and Subject Descriptors
I.6.5 [Simulation and Modeling]: Model DevelopmentÖ
Modeling Methodologies; B.7.2 [Integrated Circuits]: Design
Aids - Verification; I.6.4 [Simulation and Modeling]:
Model Validation and Analysis
General Terms
Verification, Design
Keywords
substrate noise, modeling, finite element method, boundary element method
Chair: Jan Rabaey
Organizers: Grant E. Martin, Majid Sarrafzadeh
-
49.1 System Design Methodologies for a Wireless Security Processing
Platform [p. 777]
-
Srivaths Ravi, Anand Raghunathan, Nachiketh Potlapally, Murugan Sankaradass
Security protocols are critical to enabling the growth of a wide range of
wireless data services and applications. However, they impose a high
computational burden that is mismatched with the modest processing capabilities
and battery resources available on wireless clients. Bridging the security
processing gap, while retaining sufficient programmability in order to support
a wide range of current and future security protocol standards, requires
the use of novel system architectures and design methodologies.
We present the system-level design methodology used to design a programmable
security processor platform for next-generation wireless handsets.
The platform architecture is based on (i) a configurable and extensible
processor that is customized for efficient domain-specific processing,
and (ii) layered software libraries implementing cryptographic algorithms
that are optimized to the hardware platform. Our system-level design
methodology enables the efficient co-design of optimal cryptographic algorithms
and an optimized system architecture. It includes novel techniques
for algorithmic exploration and tuning, performance characterization and
macro-modeling of software libraries, and architecture refinement based
on selection of instruction extensions to accelerate performance-critical,
computation-intensive operations. We have designed a programmable security
processor platform to support both public-key and private-key operations
using the proposed methodology, and have evaluated its performance
through extensive system simulations as well as hardware prototyping. Our
experiments demonstrate large performance improvements (e.g., 31.0X for
DES, 33.9X for 3DES, 17.4X for AES, and up to 66.4X for RSA) compared
to well-optimized software implementations on a state-of-the-art embedded
processor.
Categories and Subject Descriptors
C.0 [Computer Systems Organization]: General- System architectures;
C.1.0 [Computer Systems Organization]: Processor architectures- General;
C.2.0 [Computer Systems Organization]: Computer-Communication
Networks- General, Security and protection; C.5.3 [Computer Systems Organization]:
Computer System Implementation- Microcomputers, Portable
devices; E.3 [Data]: Data encryption- DES, Public key cryptosystems
General Terms
Security, Performance, Design, Algorithms
Keywords
Security, Security processing, Encryption, Decryption, Wireless, Handset,
Embedded system, Performance, DES, 3DES, AES, RSA, SSL, IPSec, Design
methodology, Platform, System architecture
-
49.2 Constraint-Driven Communication Synthesis [p. 783]
-
Alessandro Pinto, Luca P. Carloni, Alberto L. Sangiovanni-Vincentelli
Constraint-driven Communication Synthesis enables the automatic
design of the communication architecture of a complex system from
a library of pre-defined Intellectual Property (IP) components. The
key communication parameters that govern all the point-to-point
interactions among system modules are captured as a set of arc
constraints in the communication constraint graph. Similarly, the
communication features offered by each of the components available
in the IP communication library are captured as a set of feature resources
together with its cost figures. Then, every communication
architecture that can be built using the available components while
satisfying all constraints is implicitly considered (as an implementation
graph matching the constraint graph) to derive the optimum
design solution with respect to the desired cost figure. The corresponding
constrained optimization problem is efficiently solved by
a novel algorithm that is presented here together with its rigorous
theoretical foundations.
Categories and Subject Descriptors
J.6.1 [Computer Applications]: Computer-Aided Engineering
ÖCAD.
General Terms
Algorithm.
Keywords
Communication Synthesis, Systems-on-Chip, Network Design.
-
49.3 Component-Based Design Approach for Multicore SoCs [p. 789]
-
W. Ces½rio, A. Baghdadi, L. Gauthier, D. Lyonnard, G. Nicolescu, Y. Paviot,
S. Yoo, A. A. Jerraya, M. Diaz-Nava
This paper presents a high-level component-based methodology
and design environment for application-specific multicore SoC
architectures. Component-based design provides primitives to
build complex architectures from basic components. This bottom up
approach allows design-architects to explore efficient custom
solutions with best performances. This paper presents a high-level
component-based methodology and design environment for
application-specific multicore SoC architectures. The system
specifications are represented as a virtual architecture described
in a SystemC-like model and annotated with a set of configuration
parameters. Our component-based design environment provides
automatic wrapper-generation tools able to synthesize hardware
interfaces, device drivers, and operating systems that implement a
high-level interconnect API. This approach, experimented over a
VDSL system, shows a drastic design time reduction without any
significant efficiency loss in the final circuit.
Categories and Subject Descriptors
C.3 [SPECIAL-PURPOSE AND APPLICATION-BASED
SYSTEMS]: Microprocessor/microcomputer applications, Real-time
and embedded systems.
General Terms
Design, Experimentation, Standardization.
Keywords
Multicore System-on-Chip, Component-based design, HW/SW
interfaces abstraction.
-
49.4 Traffic Analysis for On-chip Networks Design of Multimedia Applications [p. 795]
-
Girish Varatkar, Radu Marculescu
The objective of this paper is to introduce self-similarity
as a fundamental property exhibited by the bursty traffic
between on-chip modules in typical MPEG-2 video applications.
Statistical tests performed on relevant traces extracted from common
video clips establish unequivocally the existence of self-similarity
in video traffic. Using a generic communication architecture,
we also discuss the implications of our findings on on-chip buffer
space allocation and present quantitative evaluations for typical
video streams. We believe that our findings open up new directions
of research with deep implications on some fundamental issues in
on-chip network design for multimedia applications.
Categories and Subject Descriptors: C.4 [Performance of
systems]: Modeling techniques; B.8.2 [Performance and reliability]:
performance analysis and design aids.
General terms: performance, design
Keywords: system-level design, on-chip networks, communication
analysis, self-similarity, long-range dependence.
Chair: Magdy Abadir
Organizers: Carl Pixley, Masahiro Fujita
-
50.1 Deriving a Simulation Input Generator and a Coverage Metric From a Formal Specification [p. 801]
-
Kanna Shimizu, David L. Dill
This paper presents novel uses of functional interface specifications
for verifying RTL designs. We demonstrate how a simulation environment,
a correctness checker, and a functional coverage metric
are all created automatically from a single specification. Additionally,
the process exploits the structure of a specification written with
simple style rules. The methodology was used to verify a large scale
I/O design from the Stanford FLASH project.
Categories and Subject Descriptors
B.5.2 [RTL Implementation]: Design Aids
General Terms
Documentation, Performance, Design, Verification
Keywords
testbench, input generation, BDD minimization, coverage
-
50.2 Hole Analysis for Functional Coverage Data [p. 807]
-
Oded Lachish, Eitan Marcus, Shmuel Ur, Avi Ziv
One of the main goals of coverage tools is to provide the user with
informative presentation of coverage information. Specifically, information
on large, cohesive sets of uncovered tasks with common
properties is very useful. This paper describes methods for discovering
and reporting large uncovered spaces (holes) for crossproduct
functional coverage models. Hole analysis is a presentation
method for coverage data that is both succinct and informative.
Using case studies, we show how hole analysis was used to detect
large uncovered spaces and improve the quality of verification.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design AidsÖVerification; D.2.4 [Software
Engineering]: Software/Program Verification; D.2.5 [Software Engineering]:
Testing and DebuggingÖTesting tools
General Terms
Verification, Measurement, Algorithms, Experimentation
Keywords
Functional verification, Coverage analysis
-
50.3 Effective Safety Property Checking Using Simulation-Based Sequential ATPG [p. 813]
-
Shou Sheng, Koichiro Takayama, Michael S. Hsiao
In this paper, we present a successful application of a simulation-based
sequential Automatic Test Pattern Generation (ATPG) for safety property
verification, with the target on verifying safety property of large,
industrial-strength, hardware designs for which current formal methods
fail. Several techniques are developed to increase the effectiveness
and efficiency during state exploration and justification of the test
generator for verification, including (1) incorporation of a small
combinational ATPG engine, (2) reset signal masking, (3) threshold-value
simulation, and (4) weighted Hamming distance. Experimental results on
both ISCAS89 benchmark circuits and real industry circuits have shown
that this simulation-based verifier achieves better or comparable
results to current state-of-the-art formal verification tools BINGO
and CHAFF.
Categories and Subject Descriptors
B.7.2 [INTEGRATED CIRCUITS]: Design Aids, Verification, Simulation;
B.8.1 [PERFORMANCE AND RELIABILITY]: Testing
General Terms
Algorithms, Verification
Keywords
Verification, Simulation-Based, Sequential ATPG
-
50.4 A Comparison of Three Verification Techniques:
Directed Testing, Pseudo-Random Testing and Property Checking [p. 819]
-
Mike G. Bartley, Darren Galpin, Tim Blackmore
This paper describes the verification of two versions of a bridge
between two on-chip buses. The verification was performed just
as the Infineon Technologies Design Centre in Bristol was
introducing pseudo-random testing (using Specman) and property
checking (using GateProp) into their verification flows and thus
provides a good opportunity to compare these two techniques
with the existing strategy of directed testing using VHDL bus
functional models.
Categories and Subject Descriptors
B.5.2 [Register-Transfer-Level Implementation]: Design Aids
- Verification.
General Terms
Verification.
Chair: Enrico Macii
Organizer: Mani Srivastava
-
51.1 Energy-Efficient Communication Protocols [p. 824]
-
Carla F. Chiasserini, Pavan Nuggehalli, Vikram Srinivasan, Ramesh R. Rao
Wireless networking has experienced a great deal of popularity, and significant
advances have been made in wireless technology. However, energy efficiency
of radio communication systems is still a critical issue due to the limited
battery capacity of portable devices. In this paper, we deal with the
charge recovery effect that takes place in electrochemical cells and show
how we can take advantage of this mechanism to increase the energy delivered
by a battery. Then, we present energy-aware traffic shaping techniques, as
well as scheduling and routing algorithms, which exploit the battery
recovery effect.
Categories and Subject Descriptors
C.2.2 [Computer-Systems Organization]: Network Protocols
General Terms
Algorithms
Keywords
Wireless networks, energy efficiency, battery charge recovery
-
51.2 Reliable and Energy-Efficient Digital Signal Processing [p. 830]
-
Naresh Shanbhag
This paper provides an overview of algorithmic noise-tolerance
(ANT) for designing reliable and energy-efficient digital signal
processing systems. Techniques such as prediction-based,
error cancellation-based, and reduced precision redundancy
based ANT are discussed. Average energy-savings range
from 67% to 71% over conventional systems. Fluid IP core
generators are proposed as a means of encapsulating the
benefits of an ANT-based low-power design methodology.
CAD issues resident in such a methodology are also discussed.
Categories and Subject Descriptors
B.7 [Hardware]: Integrated Circuits
General Terms
DESIGN
Keywords
reliability, energy-efficiency, low-power, deep submicron, noise,
noise-tolerance, broadband, communications
-
51.3 CMOS: A Paradigm for Low Power Wireless? [p. 836]
-
Michiel Steyaert, Peter Vancorenland
An overview and comparison of different topologies for wireless
architectures are discussed, where the main focus lies on
the power consumption and possibilities towards integration
and reduction of external components. Architectures with
reduced number of building blocks (both internal and external)
are presented where the main benefits are the low costs,
both in the CMOS technology as well as the power.
Categories and Subject Descriptors
B.7 [Hardware]: Integrated Circuits
General Terms
Design
Keywords
CMOS Wireless receivers Low-power
Chair: Ralph Otten
Organizers: Ralph Otten, Steven Teig
-
52.1 TCG-S: Orthogonal Coupling of P*-admissible Representations for General Floorplans [p. 842]
-
Jai-Ming Lin, Yao-Wen Chang
We extend in this paper the concept of the P-admissible floorplan representation
to that of the P*-admissible one. A P*-admissible representation
can model the most general floorplans. Each of the currently existing
P*-admissible representations, SP, BSG, and TCG, has its strengths
as well as weaknesses. We show the equivalence of the two most promising
P*-admissible representations, TCG and SP, and integrate TCG with
a packing sequence (part of SP) into a new representation, called TCGS.
TCG-S combines the advantages of SP and TCG and at the same time
eliminates their disadvantages. With the property of SP, faster packing
and perturbation schemes are possible. Inherited nice properties from
TCG, the geometric relations among modules are transparent to TCGS
(implying faster convergence to a desired solution), placement with
position constraints becomes much easier, and incremental update for
cost evaluation can be realized. These nice properties make TCG-S a
superior representation which exhibits an elegant solution structure to
facilitate the search for a desired floorplan/placement. Extensive experiments
show that TCG-S results in the best area utilization, wirelength
optimization, convergence speed, and stability among existing works
and is very flexible in handling placement with special constraints.
Categories and Subject Descriptors
J.6 [COMPUTER-AIDED ENGINEERING]: Computer-aided design
(CAD)
General Terms
Algorithms
-
52.2 Floorplanning with Alignment and Performance Constraints [p. 848]
-
Xiaoping Tang, D. F. Wong
In this paper, we present a floorplanning algorithm based on sequence
pair representation. Our floorplanner has the following important
features: 1) It is explicitly designed for fixed-frame floorplanning,
which is different from traditional well-researched minarea
floorplanning. Moreover, we also show that it can be adapted
to minimize total area. 2) It addresses the problem of handling
alignment constraint which arises in bus structure. 3) It deals with
performance constraint such as bounded net delay, while many existing
floorplanners just minimize total wire length. 4) More importantly,
even with all these constraints the algorithm is very fast
in that it evaluates the feasibility of a sequence pair and translates
to a floorplan in O (n log log n) time typically where n is the number
of blocks and the number of constrained blocks is O(n), which is
significantly faster than the O(n3)
method operating on constraint
graph. Our algorithm is based on computing the longest common
subsequence of a pair of weighted sequences. Experimental results
on MCNC benchmark for block placement show the promise of the
method.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design AidsÖPlacement and routing;
J.6 [Computer Applications]: Computer-Aided EngineeringÖ
Computer-aided design
General Terms
Algorithms, Design, Performance, Theory
Keywords
longest common subsequence, floorplanning, sequence pair
-
52.3 Algorithms for Simultaneous Satisfaction of Multiple Constraints and Objective Optimization in a Placement Flow with Application to Congestion Control [p. 854]
-
Ke Zhong, Shantanu Dutt
This paper addresses the problem of tackling multiple constraints
simultaneously during a partitioning driven placement (PDP) process,
where a large solution space is available for constraint-satisfying optimization
compared to post-placement method. A general methodology of multi-constraint
satisfaction that balances violation correction and primary optimization
is presented. A number of techniques are introduced to ensure its convergence
and enhance its solution search capability with intermediate relaxation.
Application of our approach to congestion control modeled as pin density
and external net distribution balance constraints shows it effectively
reduces overall congestion by 14.3% and improves chip area by 8.9%, with
reasonable running time and only 1.6% increase in wire length. As far as
we know, this is the first time an approach to congestion reduction during
placement optimization produced good congestion improvement with very small
wire length increase.
Categories and Subject Descriptors:
J.6 [Computer-Aided Engineering]:
General Terms
Algorithms, Experimentation
Keywords
Multi-constraint satisfaction, partitioning-driven placement,
intermediate relaxation, congestion reduction, connector generation,
minimization of objective deterioration.
Chair: Jamil Kawa
Organizers: Chandu Visweswariah, Louis Scheffer
-
53.1 Coping with Buffer Delay Change Due to Power and Ground Noise [p. 860]
-
Lauren Hui Chen, Malgorzata Marek-Sadowska, Forrest Brewer
Variation of power and ground levels affect VLSI circuit performance.
Trends in device technology and in packaging have necessitated
a revision in conventional delay models. In particular, simple
scalable models are needed to predict delays in the presence of
uncorrelated power and ground noise. In this paper, we analyze the
effect of such noise on signal propagation through a buffer and
present simple, closed-form formulas to estimate the corresponding
change of delay. The model captures both positive (slowdown) and
negative (speedup) delay changes. It is consistent with short-channel
MOSFET behavior, including carrier velocity saturation effects.
An application shows that repeater chains using buffers instead of
inherently faster inverters tend to have superior supply level-induced
jitter characteristics.
Categories and Subject Descriptors:
J.6 [Computer-Aided Engineering]: Computer-aided design (CAD).
General Terms:
Algorithms.
Keywords:
Power and ground noise, differential mode noise, common mode
noise, incremental delay change.
-
53.2 Osculating Thevenin Model for Predicting Delay and Slew of Capacitively Characterized Cells [p. 866]
-
Bernard N. Sheehan
To extrapolate from one point to another using a line, one had
better get the slope right. In this paper we apply a similar
concept to the important problem in Static Timing Analysis
(STA) of predicting cell timing for RC loads using capacitive
characterization data. Instead of a line we have a Thevenin
circuit, and instead of matching slopes we match load
sensitivities. We present a table driven, highly accurate cell
delay and slew prediction procedure that can improve STA when
interconnect effects dominate. It is significantly more accurate
(especially slew) than previously published Ceff methods.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - simulation, verification.
General Terms Algorithms, Verification.
Keywords Static Timing Analysis, Effective Capacitance
-
53.3 Timed Pattern Generation for Noise-on-Delay Calculation [p. 870]
-
Seung Hoon Choi, Florentin Dartu, Kaushik Roy
Computing the noise on delay effects is required for all circuits
from simple ASIC designs to microprocessors. Transistor-level
simulation engines make accurate delay calculation affordable
only if the number of simulation per stage is very small. We
propose a solution that predicts the alignment of aggressor signals
with respect to the victim signal to induce the worst-case noise
effect on delay. The aggressor alignment can be used to setup a
detailed simulation. The worst-case delay in the presence of noise
is predicted within 5% error for more than 99% of the cases tested
using an average of 1.27 simulations per stage transition.
Categories and Subject Descriptors
B.8.2 [Hardware] : Performance Analysis and Design Aids
General Terms : Performance, Verification
-
53.4 VeriCDF: A New Verification Methodology for Charged Device Failures [p. 874]
-
Jaesik Lee, Ki-Wook Kim, Sung-Mo Kang
A novel tool for full-chip verification is reported for CDM-ESD
protection. Until recently, ESD protection has been simulated in
device level, leading to the well known limitations on capturing
global features such as the power protection circuits and package
parasitics. In practice, fatal failures occur due to unexpected discharged
paths in multi-power supply chips, which can only be verified
by chip-level simulation. Associated with the new concept of
macromodelling, hierarchical approach provides effective analysis
methodology for mixed-signal chips. The hierarchical approach
provides the analysis of chip-level discharging paths and reliability
of gate oxide. Simulation results on a CMOS ASIC chip processed
in a 0.25-µm technology are in accordance with the measurement
data. Scanning electron microscope locates a gate oxide fault as
our analysis predicted.
Categories and Subject Descriptors
B.8 [Hardware]: Performance and Reliability; C.4 [Computer
Systems Organization]: Performance of Systems
General Terms
Verification
Keywords
Reliability, Modeling, Simulation
Chair: Henk Corporaal
Organizers: Jo Dale Carothers, Luca Benini
-
54.1 A Framework for Evaluating Design Tradeoffs in Packet Processing Architectures [p. 880]
-
Lothar Thiele, Samarjit Chakraborty, Matthias Gries, Simon KÄnzli
We present an analytical method to evaluate embedded network
packet processor architectures, and to explore their design space.
Our approach is in contrast to those based on simulation, which
tend to be infeasible when the design space is very large. We illustrate
the feasibility of our method using a detailed case study.
Categories and Subject Descriptors
C.0 [Computer Systems Organization]: GeneralÖModeling of
computer architecture, System architectures; B.4.1 [Hardware]:
Input/output and data communications - Data communication devices
General Terms
Performance, Design
-
54.2 Energy Estimation and Optimization of Embedded VLIW Processors based
on Instruction Clustering [p. 886]
-
A. Bona, M. Sami, D. Sciuto, C. Silvano, V. Zaccaria, R. Zafalon
Aim of this paper is to propose a methodology for the definition
of an instruction-level energy estimation framework for
VLIW (Very Long Instruction Word) processors. The power
modeling methodology is the key issue to define an effective
energy-aware software optimisation strategy for state-of-the-art
ILP (Instruction Level Parallelism) processors.
The methodology is based on an energy model for VLIW
processors that exploits instruction clustering to achieve an
efficient and fine grained energy estimation. The approach
aims at reducing the complexity of the characterization problem
for VLIW processors from exponential, with respect to
the number of parallel operations in the same very long instruction,
to quadratic, with respect to the number of instruction
clusters. Furthermore, the paper proposes a spatial
scheduling algorithm based on a low-power reordering
of the parallel operations within the same long instruction.
Experimental results have been carried out on the Lx processor,
a 4-issue VLIW core jointly designed by HPLabs and
STMicroelectronics. The results have shown an average error
of 1:9% between the cluster-based estimation model and
the reference design, with a standard deviation of 5:8%. For
the Lx architecture, the spatial instruction scheduling algorithm
provides an average energy saving of 12%.
Categories and Subject Descriptors
B.7.0 [Design Aids]: General; B.6.0 [Logic Design]: General
General Terms
Experimentation
Keywords
power estimation, vliw architectures
-
54.3 Energy Exploration and Reduction of SDRAM Memory Systems [p. 892]
-
Yongsoo Joo, Yongseok Choi, Hojun Shim,
Hyung Gyu Lee, Kwanho Kim, Naehyuck Chang
In this paper, we introduce a precise energy characterization of SDRAM
main memory systems and explore the amount of energy associated
with design parameters, leading to energy reduction techniques
that we are able to recommend for practical use.
We build an in-house energy simulator for SDRAM main memory
systems based on cycle-accurate energy measurement and state-machine-based
characterizations which independently characterize
dynamic and static energy. We explore energy behavior of the
memory systems by changing design parameters such as processor
clock, memory clock and cache configuration. Finally we propose
new energy reduction techniques for the address bus and practical
mode control schemes for the SDRAM devices. We save 10.8mJ
and 12mJ, 40.2% and 14.5% of the total energy, for 24M instructions
of an MP3 decoder and a JPEG compressor, using a typical
32-bit, 64MB SDRAM memory system.
Categories and Subject Descriptors
B.8.2 [Performance and Reliability]: Performance Analysis and
Design Aids; C.4 [Performance of Systems]
General Terms
Design, Experimentation, Measurement, Performance
Keywords
low power, memory system, SDRAM
Chair: Petru Eles
Organizers: Ahmed A. Jerraya, Krzysztof Kuchcinski
-
55.1 Coordinated Transformations for High-Level Synthesis
of High Performance Microprocessor Blocks [p. 898]
-
Sumit Gupta, Timothy Kam, Michael Kishinevsky, Shai Rotem,
Nick Savoiu, Nikil Dutt, Rajesh Gupta, Alex Nicolau
High performance microprocessor designs are partially characterized
by functional blocks consisting of a large number of operations
that are packed into very few cycles (often single-cycle) with little
or no resource constraints but tight bounds on the cycle time. Extreme
parallelization, conditional and speculative execution of operations
is essential to meet the processor performance goals. However,
this is a tedious task for which classical high-level synthesis
(HLS) formulations are inadequate and thus rarely used. In this paper,
we present a new methodology for application of HLS targeted
to such microprocessor functional blocks that can potentially speed
up the design space exploration for microprocessor designs. Our
methodology consists of a coordinated set of source-level and fine-grain
parallelizing compiler transformations that targets these behavioral
descriptions, specifically loop constructs in them and enables
efficient chaining of operations and high-level synthesis of the
functional blocks. As a case study in understanding the complexity
and challenges in the use of HLS, we walk the reader through
the detailed design of an instruction length decoder drawn from
the Pentium®-family of processors. The chief contribution of this
paper is formulation of a domain-specific methodology for application
of high-level synthesis techniques to a domain that rarely, if
ever, finds use for it.
Categories and Subject Descriptors: B.5.1 [Register-Transfer-Level
Implementation] Design Aids C.5.3 [Computer System Implementation]
Microcomputers š Microprocessors
General Terms: Design
Keywords: High-level synthesis, microprocessor design
-
55.2 Forward-Looking Objective Functions: Concept and Applications in High Level Synthesis [p. 904]
-
Jennifer L. Wong, Seapahn Megerian, Miodrag Potkonjak
The effectiveness of traditional CAD optimization algorithms is
proportional to the accuracy of the targeted objective functions.
However, behavioral synthesis tools are not used in isolation; they
form a strongly connected design flow where each tool optimizes
its own objective function without considering the consequences
on the optimization goals of the subsequently applied tools.
Therefore, efforts to optimize one aspect of a design often have
unforeseen negative impacts on other phases of the design process.
Our objective is to establish a systematic way of developing
and validating new types of objective functions that consider the
effects on subsequently applied synthesis steps. We demonstrate
the generic forward-looking objective function (FLOF) strategy on
three main steps in behavioral synthesis: (i) Transformation, (ii)
Scheduling, and (iii) Register Assignment. We show how the
FLOF can be used in the first two phases to reduce the total
number of registers required in the third phase.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - Optimization.
General Terms
Algorithms, Design.
Keywords
Objective Functions, Behavioral Synthesis, Transformations,
Scheduling, Register Assignment
-
55.3 ILP-Based Engineering Change [p. 910]
-
Farinaz Koushanfar, Jennifer L. Wong, Jessica Feng, Miodrag Potkonjak
We have developed a generic integer linear programming(ILP)-based
engineering change(EC) methodology. The EC methodology
has three components: enabling, fast, and preserving.
Enabling EC provides a user with the means to specify the
amount of flexibility and how this flexibility should be distributed
throughout the solution so that one can guarantee
that a specific set of EC demands can be satisfied while preserving
the quality of the initially obtained solution. Fast
EC conducts changes in a fraction of the time needed to
solve the problem while preserving or in some cases improving
the quality of the initial solution. Preserving EC maintains
either user specified components of the solution or as
much as possible of the initial solution while still guaranteeing
an optimal solution to the altered problem instance. We
applied the generic methodology to Boolean Satisfiability
(SAT) problem. The effectiveness of all proposed approaches
and algorithms is demonstrated on standard benchmarks.
Categories and Subject Descriptors
B.6 [Logic Design]: General; B.6.3 [Design Aids ]: Software
Engineering; B.7 [Integrated circuits]: [VLSI (very large scale
integration)]
General Terms
Automatic synthesis, optimization
Keywords
Engineering change, satisfiability(SAT), integer linear programming,
synthesis
|