### LVDCSL: Low Voltage Differential Current Switch Logic, A Robust Low Power DCSL Family # Dinesh Somasekhar, Kaushik Roy <sup>1</sup> Purdue University #### Abstract In this paper we present a robust Differential Current Switch Logic gate suitable for low $V_{DD}$ , low power operation. Differential Current Switch Logic gates achieve high performance and low power by restricting internal node voltage swings. Traditional DCSL is however highly sensitive to load imbalance because of the presence of a cross coupled inverter pair at the output. In this paper we describe LVDCSL, a low voltage DCSL family which preserves the essential features of DCSL namely, high speed, low power, restricted internal voltage swings and a latching input stage. However it is much more robust to mismatched output loads, and is capable of working at far lower voltages. In addition spikes in output transitions are greatly reduced simplifying interface to conventional CMOS circuits. Our results show that LVDCSL is capable of working at under 2volts in a 0.35\mu CMOS process while being faster than $comparable\ Domino\ gates.\ At\ the\ same\ time\ total$ power consumption is reduced. LVDCSL achieves 40% delay improvement and 22% power reduction in comparison with Domino gates. #### 1 INTRODUCTION Unlike conventional CMOS circuit designs which use low functionality gates with limited fan-in, differential cascode voltage switch circuits (DCVS) allow much higher functionality with greater fan-in [1, 2, 3]. This is especially true Permission to make digital/hard copy of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ©1997 ACM 0-89791-903-3/97/08..\$3.50 for DCVS logic styles which use internal sense circuits to speed output transitions, as in the case of Enable/Disable Cascode voltage switch logic (ECDL) [5, 7], Sample Set Differential Logic (SSDL) [4], Latched differential cascode Logic (LCDL) [6]. The DCVS logic style is especially good at implementing exclusive-or type of functionality found in arithmetic circuits, using large NMOS trees [7, 8, 9]. However a common shortcoming of all these circuits is the high power consumption because of the number of internal nodes being toggled in a large NMOS tree. Differential Current Switch Logic is a DCVS logic gate which attempts to achieve low power consumption by restricting the voltage swing of internal nodes [10]. Unfortunately it is not free of its own problems. In this paper we present a new form of DCSL logic. While its structure is entirely different in comparison with previous DCSL gates, it achieves the same goals of low power and high performance as previous forms of DCSL circuits. In particular it is capable of operating well in newer low voltage CMOS processes with threshold voltages being an appreciable fraction of the supply voltage. This paper is organized as follows: We briefly relate DCSL to various DCVS logic families in section 1. A description of the original DCSL form is presented in section 2, wherein we describe the main problems. The robust DCSL logic gate is presented in section 3. A description of the various stages of operation is given in section 4. In section 4 we also illustrate how the problems in the original DCSL forms are solved. The performance of LVDCSL gate with respect to Domino gates is compared in section 5. The improvement in performance obtained is evaluated by replacing input sections in the critical path of 64 bit adder with high fan-in DCSL gates. We summarize the salient features and the shortcomings of LVDCSL in section 6. #### 2 DIFFERENTIAL CURRENT SWITCH LOGIC Differential Current Switch Logic (DCSL) is a DCVS Logic gate. In common with DCVS logic families it consists of a large evaluation NMOS tree with differential/complementary inputs and outputs which provides the gate functionality. The NMOS tree is designed so that there is exactly one path from one of the outputs to ground through the tree. We can conceive the simplest form of precharged DCVS by precharging the tree outputs high. The NMOS tree now evaluates its inputs by discharging one of the outputs. DCSL uses techniques employed in ECDL and SSDL to speed the output transition, namely the addition of a cross coupled inverter <sup>&</sup>lt;sup>1</sup>This work was supported in part by Intel and by ARPA under contract F33615-95-C-1625. Dinesh Somasekhar is in the Department of Electrical Engineering, Purdue University, West Lafayette, IN 47906 USA. E-mail: somasekh@ecn.purdue.edu. Kaushik Roy is with the Department of Electrical Engineering, Purdue University, West Lafayette, IN 47906 USA. E-mail: kaushik@ecn.purdue.edu Figure 1: Precharged High DCSL gate pair across the output which acts as a simple sense amplifier. In fact we can extend ECDL and SSDL gates to get precharged low and precharged high DCSL logic families, which give superior performance with reduced power [10]. The topology of a precharged high DCSL gate is shown in figure 1. While conventional DCSL gates allow a path from both outputs to the NMOS tree, DCSL disconnects the high going transition from the NMOS tree by using two cross-coupled transistors T4 and T8. The effect of this is to prevent internal nodes of the NMOS tree charging all the way up to the supply (less the threshold drop of NMOS devices). In fact simulations show that in a 5V process with 1V threshold it is possible to restrict internal voltage swings to under 1V. Disconnecting the NMOS tree also reduces the total capacitance seen by the high going output and speeds its transitions. A detailed description of the operation of DCSL gates is provided in [10]. The main problems with the operation of the gates are the following: OUTPUT SPIKES: Both outputs start falling low until the inverter loop cuts in to drive one of the outputs high. To allow the gate to be easily interfaced to conventional CMOS circuits this, low going spike has to be limited. It is also advantageous to limit the spike, since it leads to an added power loss. We observe that the output spike will have a magnitude of at least $V_{tp}$ , since the PMOS devices do not turn on and limit the spike till the output falls below $V_{CC} - V_{tp}$ . In reality the spike lies between $V_{tp}$ and and $V_{CC}/2$ REDUCED SUPPLY VOLTAGES: Modern processes have lowered supply voltage to 1.5V to 3.3V from 5V. In contrast, threshold voltages have not decreased to the same extent to prevent excessive leakage and lower noise margins in Domino gate. A 3.3V process has thresholds in the range of 0.7 to 0.9V as opposed to 1 to 1.1V. While the original DCSL gate worked well with $V_t$ being $\frac{1}{5}V_{DD}$ it fails to operate well with $V_t$ around $\frac{1}{3}V_{DD}$ . As mentioned before the output spike is greater than $V_{tp}$ , hence the source of T4 and T5 must fall well below $V_{CC} - V_{tp} - V_{tn}$ before a path to the NMOS tree is even established. This fact severely affects gate operation at low voltages, with relatively Figure 2: Robust DCSL gate high threshold voltages. Simulations in a $0.5\mu m$ revealed that at least 3V was required for proper operation with $0.9V\ V_t$ . A side-effect seen was that gate height is limited to quite low values. GATE ROBUSTNESS: The cross-coupled inverter pair is equally sensitive to imbalances in capacitive output loading as it is to current differences in the NMOS tree. This is aggravated because the disconnection of the evaluation NMOS trees after the outputs toggle, prevents the outputs from recovering. While there are other points of concern, namely the sensitivity of the gate to injected noise, these can be avoided by proper design. The LVDCSL gate tackles the above problems by using an entirely different output stage. ### 3 LVDCSL: A LOW VOLTAGE, ROBUST DCSL GATE We note in figure 1 that sensitivity of the gate to output imbalance can be greatly decreased by decoupling the inputs of inverter formed by P2/T1 and P3/T2 from the outputs. Also on the high going CLK edge T1 and T2 go on immediately when evaluation starts. This shunts current away from the evaluation tree. It is apparent that a current differential buildup will be faster if T1 and T2 are off at the start of evaluation, and the entire current is used to charge up the NMOS tree. Not only does this improve the robustness of the gate, more importantly it reduces the spike seen at the output node. Further spike reduction is possible by having the gates of pull-up PMOS devices in the inverter low, at the start of evaluation. The improved LVDCSL circuit incorporates the above techniques to improve gate robustness at the expense of speed. The new topology of the DCSL gate is shown in figure 2. It achieves the above requirements. We shall demonstrate this to be the case while explaining gate functionality in the following section. #### 4 OPERATION OF THE GATE We can view the operation of the gate as being split over three states, namely the *precharge*, *evaluate*, and the *stable output* state. #### 4.1 Precharge State Figure 3: Precharged state The precharge state has CLK# high and CLK low. The state of the gate is shown in figure 3. The grey transistors in the figure indicate transistors not in the active path, while the other transistors play in role in driving the outputs or internal nodes of the gate. As shown in the figure outputs are precharged high, while internal nodes A, and B are charged low. We note that if the inputs of the NMOS tree are all high—which is the case when the gate is fed by a preceding Domino internal node—there is a path to ground from A/B to the ground via the NMOS tree. This allows CLK#, the precharge CLK, to be deactivated with the gate maintaining its precharge state. Unlike the previous DCSL circuit, every transistor in the path from $V_{CC}$ to the evaluation NMOS tree is activated, except for T1 and T2. Switching on T1 and T2 immediately causes a path to the Evaluation NMOS tree. The previous DCSL circuit transistors (T4/T5 in fig. 1) impede the evaluation path, since they are not fully on. This allows for increased robustness of the gate. The fact that the PMOS devices P3/P4 are on, limits the output spike to a great extent. T3 and T4 the pull-down NMOS devices in the inverters are off. This means that when gate evaluation starts there is not shunting of current away from the NMOS evaluation tree. All the above factors result - as mentioned previously - in a lower glitch, greater robustness and an ability to operate with supply voltages slightly greater than twice the NMOS threshold voltage. While the presence of two clock phases may appear as a disadvantage, it is mitigated by the fact that only CLK is time critical. #### 4.2 Evaluate State Figure 4 shows the paths switched on when the gate enters the evaluate stage with CLK high and CLK# deactivated, Figure 4: Evaluate State and the inputs to the NMOS tree being set up. When T1 and T2 turn on the NMOS tree begins to charge up through T1, T7 and T2, T8. Assume that the NMOS tree has a stronger path on the left, node A will be held at a lower voltage than B. As node B goes higher than $V_{tn}$ , T3 switches on and the positive feedback loop rapidly drives the outputs in the proper direction. T8 turns off because of OUT going low, which in turn disconnects the NMOS tree from the high output OUT#. This limits the voltage swing in the internal nodes of the NMOS tree. In common with the previous DCSL circuits, this achieves our goal of limiting the power consumption at internal nodes of the large NMOS evaluation tree. #### 4.3 Stable Outputs Figure 5: Stable State The gate comes to rest in the state shown in figure 5. In this state changes in input will not cause the gate state to be disturbed. We observe that the NMOS evaluation tree can disturb the state of the gate only by pulling an output node to ground. Since the path from the high output is disconnected the NMOS evaluation tree cannot effect the outputs. Hence all inputs to the gate may be precharged after evaluation, in case they are fed by a similar gate. This allows for simple pipelined configurations. Figure 6: States in LVDCSL operation The various states of operation and the voltage waveforms at various nodes is shown in figure 6. These show the three states of operation, the voltage buildup at the internal nodes and the final outputs. We note that CLK# needs to go low before CLK goes high (it may go low well in advance of the CLK rising edge). CLK needs to go low before CLK# goes high to avoid through path currents during the precharge face. While not shown our simulations are carried out with all inputs originally high, and some set of them going low just before the CLK high edge. ## 5 PERFORMANCE WITH RESPECT TO DOMINO Performance of LVDCSL was verified with a $0.35\mu$ process. Supply voltage used was 2.2V and below. Threshold voltages of devices were in the range of 0.45V. Previous work [10] has shown the marked advantage of using DCSL gates in comparison to similar DCVS gates. In this work instead of contrasting to DCVS gates we compare to high performance Domino gates. A 64bit adder was selected for this purpose. Domino gates were selectively replaced to evaluate the advantage of LVDCSL. The critical carry look ahead path has a basic building block of an 8 bit CLA circuit, composed of two 4-input propagate-generate domino gates followed by a 2-input static CMOS gate. The overall delay of the domino gates (not including the static CMOS gate) was 210ps. The functional complexity achieved by LVDCSL allows the two stage 8 bit CLA circuit to be implemented in a single 8 input stage. Figure 7 shows the performance and depth of output spike with respect to variation in the width of NMOS device in inverter loop. We see that the gate is capable of surpassing the speed of a 4-input Domino (210ps). In fact the functionality achieved by this gate is a full 8 bit propagate- Figure 7: LVDCSL performance versus. T3,T4 width Figure 8: Performance versus P3, P4 width generate computation as opposed to a 4 bit computation achieved in the Domino gate. The depth of glitch is above $\frac{V_{DD}}{2}$ which allows the gate to directly drive static CMOS gates. While the speed of the gate is high, additional time has to be allowed for setup time of inputs with respect to clock. In spite of this – a 100ps setup margin – the initial stage delay achieved by the 8 bit CLA dropped from 0.55ns to 0.33ns, a 40% improvement in performance. We note that the following factors help in improving the delay with respect to domino. - The possibility of achieving shorter delays in a single DCSL stage. - A decrease in the number of stages results because the higher functionality of the DCSL gate allows domino gates to be combined. This may not be true in all situations, however in the case of the adder it allows the compression of two stages into one. - LVDCSL has a much lighter loading at the gate inputs Figure 9: Comparison of LVDCSL power consumption with respect to Domino since the NMOS transistors in the evaluation tree are small. This factor also improves power consumption. The main factor which causes an increase in delay as compared to Domino is the need to allow for setup time for inputs with respect to CLK. LVDCSL achieves the above high performance without compromising power. Figure 9 shows the power consumption of an 8 bit stage as compared to Domino. The graph shows the highly spiked Domino currents (8mA peak) consumed during precharge. In contrast LVDCSL draws supply current during both precharge and evaluate however the peak current is much smaller (4.1mA peak). Average current consumption for Domino is 1.131mA while LVDCSL consumes 0.873mA current, a 22% improvement. We also show various waveforms in figure 10. The DCSL gate offers Figure 10: Waveforms for adder a lighter load on the clock line (8%) as opposed to Domino. Figure 11: Performance with respect to supply voltage While the numbers quoted above do not include the power drawn from the clock, accounting the same would further skew the advantage towards LVDCSL. Figure 11 shows simulations for an 8 bit Carry Lookahead Circuitry using the robust DCSL gate. The height of the NMOS tree is 4. While the gate voltage degrades as we approach $V_{tp} + V_{Tn} = 1.1V$ , it shows that the gate is usable to twice the threshold voltage. The graph shows the capability of DCSL to operate at fairly low voltages. The advantage of the using the new gate was verified by replacing the first stage of the critical path of a 64bit $0.35\mu$ adder. Single stage DCSL gates replace the carry lookahead circuitry for an 8 bit propagate, generate circuitry. The reduction in stage as opposed to implementing it using a combination of DCSL and Domino gives an overall improvement of speed by 26%. Number of stages decreases from 6 to 4. #### 6 CONCLUSION In this paper we have presented a differential current switch logic gate which is capable of operating at low voltages. The salient features of LVDCSL as compared to previous forms of DCSL are: - Greatly increased robustness: load mismatch at the output of a factor of 5 are tolerated - Capable of operating at voltages down to $2 \times V_t$ as opposed to $3 \times V_t$ in previous designs - Compared to a high performance logic family like domino, the gate is capable of higher speed at a lower power consumption. The main disadvantages of LVDCSL in the authors view-point are: • the high complexity of the output stage prevents its use in simple gates. The layout of the output stage is critical in the sense that internal nodes A and B have to be balanced. • Unlike Domino the gate does have a setup time with respect to CLK. In addition, the complexity of the output structure does not allow very short cycle times. As mentioned by the reviewers of this paper, we foresee the utility of this gate in selected places such as carry generation circuitry for adders. It is however possible by judiciously replacing initial stages of existing high performance designs to greatly reduce transistor count and power without impacting on performance. LVDCSL circuits are targeted at very high performance circuits where power is often a secondary issue. As such we have restricted our comparisons with respect to domino, however further work is needed to quantify the power performance trade-off with respect to static CMOS. #### REFERENCES - [1] Wai Lee, Uming Ko and P.T. Balsara, "A Comparative Study on CMOS digital Circuit Families for Low-Power Applications", in *Proceedings of the Intl. Workshop on* Low Power Design, 1994, pp. 129-131. - [2] K. M. Chu, D. L. Pulfrey, "Comparison of CMOS Circuit Techniques: Differential Cascode Voltage Switch Logic versus Conventional Logic," *IEEE J. Solid State Circuits*, vol. 22, no. 4, pp. 528-532, Aug 1987. - [3] L. G. Heller, W. R. Griffin, J. W. Davis and N. G. Thomas, "Cascode voltage switch logic: A differential CMOS logic family," in Proc. IEEE Int. Solid-State Circuits Conf., 1984, pp. 16-17. - [4] T. A. Grotjohn and B. Hoefflinger, "Sample-set differential logic (SSDL) for complex high speed VLSI," IEEE J. Solid State Circuits, vol. 21, no. 2, pp. 367-369, Apr 1986. - [5] Shih-Lien Lu, "Implementation of Iterative Networks with CMOS Differential Logic," IEEE J. Solid State Circuits, vol. 23, no. 4, pp 1013-1017, Aug 1988. - [6] Wu Chung-Yu, Cheng Kuo-Hsing, "Latched CMOS Differential Logic (LCDL) for Complex High-Speed VLSI," *IEEE J. Solid State Circuits*, vol. 26, no. 9, pp. 1324-1328, Sep 1991. - [7] Lu Shih-Lien L., Milos D Ercegovac, "Evaluation of two-summand adders implemented in ECDL CMOS differential logic", IEEE J. Solid State Circuits, vol. 26, no. 8, pp. 1152-1160, Aug 1991. - [8] N. Kanopoulos et.al. "Design of self-checking circuits using DCVS logic: A case study", IEEE Transactions on Computers, vol. 41, no. 7, pp. 891-896, Jul 1992. - [9] N. Kanopoulos, J. H. Carabetta, "Design and implementation of a totally self-checking 16 multiplied by 16 bit ar ray multiplier", *Integration, the VLSI Journal*, vol. 14, no. 2, pp. 215-228, Dec 1992. - [10] Dinesh Somasekhar, Kaushik Roy, "Differential Current Switch Logic: A Low Power DCVS Logic Family", IEEE J. Solid State Circuits, June 1996