RECYCLING CLOCK NETWORK ENERGY
IN HIGH-PERFORMANCE DIGITAL DESIGNS
USING ON-CHIP DC-DC CONVERTERS

by

Mehdi Alimadadi

M.A.Sc., University of British Columbia, 2000
B.A.Sc., Iran University of Science and Technology, 1989

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF
THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in

The Faculty of Graduate Studies
(Electrical and Computer Engineering)

THE UNIVERSITY OF BRITISH COLUMBIA
(Vancouver)

July 2008

© Mehdi Alimadadi, 2008
ABSTRACT

Power consumption of CMOS digital logic designs has increased rapidly for the last several years. It has become an important issue, not only in battery-powered applications, but also in high-performance digital designs because of packaging and cooling requirements. At multi-GHz clock rates in use today, charging and discharging CMOS gates and wires, especially in clocks with their relatively large capacitances, leads to significant power consumption. Recovering and recycling the stored charge or energy about to be lost when these nodes are discharged to ground is a potentially good strategy that must be explored for use in future energy-efficient design methodologies.

This dissertation investigates a number of novel clock energy recycling techniques to improve the overall power dissipation of high-performance logic circuits. If efficient recycling energy of the clock network can be demonstrated, it might be used in many high-performance chip designs, to lower power and save energy.

A number of chip prototypes were designed and constructed to demonstrate that this energy can be successfully recycled or recovered in different ways:

- Recycling clock network energy by supplying a secondary DC-DC power converter: the output of this power converter can be used to supply another region of the chip, thereby avoiding the need to draw additional energy from the primary supply. One test chip demonstrates energy in the final clock load can be recycled, while another demonstrates that clock distribution energy can be recycled.

- Recovering clock network energy and returning it back to the power grid: each clock cycle, a portion of the energy just drawn from the supply is transferred back at the end of the cycle, effectively reducing the power consumption of the clock network.

The recycling methods described in this thesis are able to preserve the more ideal square clock shape which has been a limitation of previous work in this area. Overall, the results provided in this thesis demonstrate that energy recycling is very promising and must be pursued in a number of other areas of the chip in order to obtain an energy-efficient design.
# TABLE OF CONTENTS

Abstract .................................................................................................................. ii  
Table of Contents ...................................................................................................... iii  
List of Tables ........................................................................................................... vi  
List of Figures .......................................................................................................... vii  
List of Abbreviations ............................................................................................. x  
List of Symbols ........................................................................................................ xi  
Acknowledgments ................................................................................................... xii  
Dedication ................................................................................................................ xiii  

1 Introduction ........................................................................................................... 1  
   1.1 Main Motivation ............................................................................................... 1  
   1.2 Research Challenges and Objectives ............................................................ 3  
   1.3 Research Contributions ................................................................................ 5  
   1.4 Thesis Outline ............................................................................................... 7  

2 Background .......................................................................................................... 8  
   2.1 Discrete Switching Power Converters ............................................................. 9  
   2.1.1 Basic Switching Converters ........................................................................ 9  
   2.1.2 Zero Voltage Switching ............................................................................. 11  
   2.2 CMOS Inverter Driver Circuit ...................................................................... 13  
   2.3 Integrated Switching Power Converters ....................................................... 14  
   2.4 Literature Survey ........................................................................................... 16  
   2.4.1 Switching Power Converters ..................................................................... 16  
   2.4.2 Low-Swing Power Converters .................................................................. 20  
   2.4.3 Resonant Clock Strategies ....................................................................... 23  
   2.5 Implementation Considerations ................................................................... 24  

3 Integrated Buck Converters ............................................................................. 28  
   3.1 Integrated Clock Driver/Buck Converter ...................................................... 29
5 Low-Power Clock Driver................................................................................................. 86
  5.1 Introduction ............................................................................................................... 86
  5.2 Circuit Design ........................................................................................................... 86
  5.3 Complete Circuit ...................................................................................................... 89
  5.4 Simulation ................................................................................................................ 91
  5.5 Chip Implementation............................................................................................... 94
  5.6 Chip Measurements................................................................................................. 95
  5.7 Summary .................................................................................................................. 97

6 Conclusions.................................................................................................................. 98
  6.1 Future Work ............................................................................................................ 101
    6.1.1 Continuation of the Work .............................................................................. 101
    6.1.2 Investigating New Ideas ................................................................................. 104

References...................................................................................................................... 106

Appendices..................................................................................................................... 111
  A Discrete Switching Power Converters .................................................................... 111
    A.1 Buck (Step-Down) Switching Converters ...................................................... 111
    A.2 Boost (Step-Up) Switching Converters ......................................................... 117
    A.3 Buck-Boost Switching Converters ................................................................. 118
  B On-Chip Passive Components .................................................................................. 120
    B.1 Inductors ............................................................................................................. 120
    B.2 Capacitors .......................................................................................................... 123
LIST OF TABLES

Table 2.1. Performance comparison of reviewed converters..................................................... 19
Table 2.2. Comparison of recent microprocessors................................................................. 26
Table 3.1. Summary of comparison between integrated buck converters .............................. 54
Table 6.1. Chip prototype results............................................................................................ 101
LIST OF FIGURES

Figure 1.1. Recycling clock energy with a DC-DC converter (approximate model) ...................... 4
Figure 1.2. Reducing clock power consumption with an inductor ................................................ 4
Figure 2.1. Basic switching converter topologies ............................................................................ 10
Figure 2.2. ZVS operation in a synchronous buck converter .......................................................... 12
Figure 2.3. A CMOS inverter driver with tapering factor $r$ .......................................................... 13
Figure 2.4. A CMOS inverter chain driving a CMOS buck converter ............................................. 14
Figure 2.5. ZVS operation in a CMOS synchronous buck converter .............................................. 15
Figure 2.6. Block diagram of a four-phase interleaved DC-DC converter [13] .............................. 17
Figure 2.7. Circuit diagram of the fully integrated two-stage buck converter [15] ......................... 18
Figure 2.8. Circuit diagram of the fully integrated boost converter [21] ........................................ 18
Figure 2.9. Low swing DC-DC conversion technique [24] ............................................................. 20
Figure 2.10. Cascode bridge circuit [26] ......................................................................................... 21
Figure 2.11. Block diagram of a power management on-chip [27] .................................................. 21
Figure 2.12. Implicit DC-DC conversion through charge recycling [28] ..................................... 22
Figure 2.13. Simple lumped circuit model of the resonant clock distribution [11] .................... 23
Figure 2.14. A tapered H-tree clock distribution network ............................................................. 25
Figure 2.15. Components of a resonant clock sector [10] ................................................................. 27
Figure 3.1. Efficiency block diagram .............................................................................................. 30
Figure 3.2. Integrated clock driver/buck converter .......................................................................... 32
Figure 3.3. Circuit of the reference clock for the integrated clock driver/buck converter .......... 34
Figure 3.4. Circuit diagram of the integrated clock driver/buck converter ..................................... 35
Figure 3.5. Timing diagram of $V_{clk}$ ............................................................................................. 36
Figure 3.6. Simplified circuit model for analyzing $V_{clk}$ during clock fall time ......................... 36
Figure 3.7. Simulated waveforms for the integrated clock driver/buck converter ....................... 38
Figure 3.8. Simulated output voltage and input power of the integrated buck converter ............ 39
Figure 3.9. Simulated raw and effective efficiencies of the integrated buck converter ............... 40
Figure 3.10. Implementation of the integrated clock driver/buck converter ................................ 42
Figure 3.11. Block diagram of the test bench setup .................................................................. 43
Figure 3.12. The effect of $F_{sw}$ on $V_{out}$ ........................................................................... 44
Figure 3.13. The effect of $D$ on $V_{out}$ ................................................................................. 45
Figure 3.14. The effect of $F_{sw}$ on $P_{in1}$ and $P_{in2}$ ............................................................ 47
Figure 3.15. The effect of $D$ on $P_{in1}$ and $P_{in2}$ ............................................................... 48
Figure 3.16. The effect of $F_{sw}$ on $\eta$ .................................................................................. 50
Figure 3.17. The effect of $D$ on $\eta$ ....................................................................................... 51
Figure 3.18. The effect of $F_{sw}$ on $\eta_{eff}$ .............................................................................. 52
Figure 3.19. The effect of $D$ on $\eta_{eff}$ .................................................................................. 53
Figure 3.20. Low-swing buck converter .................................................................................. 57
Figure 3.21. Circuit diagram of the low-swing buck converter ................................................ 58
Figure 3.22. Simulation results for each variant of the circuit ................................................ 62
Figure 3.23. Deep n-well implementation cross sectional view ................................................ 63
Figure 3.24. Chip micrograph ................................................................................................. 64
Figure 3.25. Measured prototype performance ....................................................................... 67
Figure 4.1. Integrated clock driver/boost converter .................................................................. 71
Figure 4.2. Circuit diagram of the integrated clock driver/boost converter ............................... 73
Figure 4.3. Simulation results of the integrated clock driver/boost converter ........................ 75
Figure 4.4. Chip micrograph of the integrated clock driver/boost converter .............................. 76
Figure 4.5. Integrated clock driver/buck-boost converter ......................................................... 79
Figure 4.6. Circuit diagram of the integrated clock driver/buck-boost converter .................... 81
Figure 4.7. Simulation results of the integrated clock driver/buck-boost converter ................... 83
Figure 4.8. Chip micrograph of the integrated clock driver/buck-boost converter .................... 84
Figure 5.1. Low-power clock driver ....................................................................................... 87
Figure 5.2. Circuit diagram of the low-power clock driver and the reference clock .............. 90
Figure 5.3. Simulated clock waveforms of Figure 5.1(b) and Figure 5.2 .................................. 91
Figure 5.4. Simulated $M_{p2}$ drain current waveforms of Figure 5.1(b) and Figure 5.2 .......... 93
Figure 5.5. Simulated $M_{n1}$ drain current waveforms of Figure 5.1(b) and Figure 5.2 .......... 93
Figure 5.6. Effect of changing inductor value on power savings in Figure 5.2 ....................... 94
Figure 5.7. Chip micrograph .................................................................................................... 95
Figure 5.8. Test and simulation results .................................................................................... 96
Figure A.1. A basic buck converter ......................................................................................... 112
LIST OF ABBREVIATIONS

ABB: Adaptive Body Biasing
AC: Alternative Current
ALUCAP: Aluminum Cap
ASITIC: Analysis and Simulation of Inductors and Transformers for Integrated Circuits
CCM: Continuous Conduction Mode
CMOS: Complementary Metal-Oxide-Semiconductor
DC: Direct Current
DCM: Discontinuous Conduction Mode
DVFS: Dynamic Voltage and Frequency Scaling
DRC: Design Rule Checking
ESR: Equivalent Series Resistance
PGS: Patterned Ground Shield
I/O: Input/Output
LVDS: Low-Voltage Differential Signaling
MIM: Metal-Insulator-Metal
MOSFET: Metal-Oxide-Semiconductor Field-Effect Transistor
MSV: Multiple Supply Voltages
NMOS: Negative-Channel Metal-Oxide-Semiconductor
PDA: Personal Digital Assistant
PMOS: Positive-Channel Metal-Oxide-Semiconductor
PWM: Pulse Width Modulation
RAM: Random Access Memory
SoC: System on Chip
ZVS: Zero-Voltage Switching
LIST OF SYMBOLS

$C_{\text{clk}}$: Clock Load Capacitance
$C_{F}$: Filter Capacitor
$C_{\text{ox}}$: Oxide Capacitance
$C_{x}$: Parasitic Capacitance
$D$: Clock Duty Cycle
$F_{\text{sw}}$: Clock/Switching Frequency
$I_{L}$: Inductor Current
$I_{\text{out}}$: Output Current
$L$: CMOS Transistor Length
$L_{F}$: Filter Inductor
$M_{n}$: NMOS Transistor
$M_{p}$: PMOS Transistor
$\eta$: Raw Efficiency
$\eta_{\text{eff}}$: Effective Efficiency
$P_{\text{in}}$: Input Power
$P_{\text{out}}$: Output Power
$r$: Tapering Factor
$T_{\text{delay}}$: ZVS delay-time
$T_{\text{sw}}$: Clock/Switching Period
$V_{\text{ck}}$: Clock Node Voltage
$V_{\text{DD}}$: Supply Voltage
$V_{gs}$: Gate-to-Source Voltage
$V_{t}$: Threshold Voltage
$W$: CMOS Transistor Width
ACKNOWLEDGMENTS

I offer my enduring gratitude to the faculty, staff and my fellow students at the UBC, who have inspired me to continue my work in this field.

I owe particular thanks to (in alphabetical order) Drs. William Dunford, Guy Lemieux, Shahriar Mirabbasi, Patrick Palmer and Resve Saleh for enlarging my vision of science and providing coherent answers to my endless questions. I also thank my colleague Ph.D. student Samad Sheikhaei whom without his help my work would have not been as successful.

Special thanks are owed to my parents, who have supported me throughout my years of education.
DEDICATION

To my parents
and my aunt and uncle who encouraged me
to continue my education …
1 INTRODUCTION

1.1 Main Motivation

Power consumption of digital logic has increased rapidly for the last several years. It has become an important issue, not only in battery-powered applications, but also in high-performance digital designs because of packaging and cooling requirements. As current manufacturing reaches the nanometer range, clock switching frequency has increased dramatically. This further increases the dynamic power loss of those designs because of the continuous charging and discharging of capacitance that characterizes CMOS logic behavior.

In high-performance chip designs, the clock itself consumes a significant amount of power. For example, the clock network in IBM’s POWER6 processor can operate above 5GHz and consumes 22% (roughly 22W) of the total power and is second only to leakage power [1]. As another example, the Intel Itanium 2 microprocessor clock, with adaptive frequency changes around 2GHz, consumes 25% (roughly 25W) of the total power [2]. In an older 1GHz Itanium 2 microprocessor, the clock consumes 33% (roughly 43W) of the total power [3]. Clearly, it is very important to reduce clock power consumption as much as possible.

A typical buffered clock network consists of a balanced H-tree distribution network terminated by a chain of inverter drivers. The final drivers are sized large enough to drive hundreds to thousands of latches and very long wires [4]. To reduce skew, groups of these final drivers can be shorted by a mesh, effectively producing a larger driver and clock capacitance.
Charging and discharging large clock capacitance of this nature is the main cause of the high power consumption.

There have been a few methods of clock energy reduction previously reported in the literature, such as gated clocks, low-swing signals, double-edge triggered flip-flops, adiabatic switching, and resonant clocking. Clock gating, which is the most common method, is done by masking the clock input to a sub-circuit with an appropriate signal to cut-down its activity and thus power [5] [6]. One drawback is the high level of design effort needed to ensure that there are no potential timing problems in the circuit because of clock gating. Another disadvantage is the resulting explosion of different clock gating states that makes the circuits difficult to verify and test.

Low-swing signaling and double-edge triggered flip-flops utilize complex circuitry and are sometimes employed in high-performance designs. Low-swing is used in the distribution of the clock but not the final drivers [7]. Double-edge triggered flip-flops are sometimes employed in ASICs [8], which often operate below 1GHz, but not in custom microprocessors operating over 1GHz.

Adiabatic switching is done by slow charging and discharging of the clock [9], but it is too slow to be employed for high-performance circuits. Resonant clocking is a promising technique for high-speed clocks. It operates by recycling the clock energy using another charge reservoir [10] or by exchanging the charge between load capacitances of two differential clock networks [11]. In both methods, because of the resonating nature of the circuits, a sinusoidal clock waveform is generated. This type of clock waveform is problematic because sharp edges are needed to define precise timing points. Although promising, resonant techniques are not yet practical enough for most applications.
1.2 Research Challenges and Objectives

The main objective of this dissertation is to investigate methods of recycling or recovering charge in the clock network by using fully integrated power conversion techniques in a system-on-chip. It takes an extremely large amount of energy to operate a high frequency clock in high-performance designs. In each cycle, the energy stored in the clock is wasted by discharging it to ground. One way of recovering energy is to re-deploy the charge elsewhere in the circuit as a second voltage source. Such re-deployment is called energy recycling in this thesis, and it can be used to enable further energy-reducing strategies. Another way of recovering the energy is to return it to the original power grid, a concept called energy recovery. The goals of this thesis are to:

- apply energy recycling and energy recovery techniques to energy stored in clocked capacitances,
- design switching converters that operate at a very high frequency so passive components are small enough to fit on-chip, and
- demonstrate the proposed solutions through chip fabrication and testing.

In this dissertation, novel voltage converter circuits are introduced to recycle energy stored in the clock network on every cycle. As shown in Figure 1.1, these converters operate by taking their input energy from the clock network and producing a useful DC energy source. By running at a high clock frequency of roughly 3GHz, the size of passive components for a low-pass filter needed by the switch-mode power supply are greatly reduced and this enables on-chip integration. However, operating at a high switching frequency results in high switching losses in power converter. These losses are reduced by employing zero voltage switching (ZVS) techniques and by directly integrating clock-tree drivers with converter power-transistor drivers. Also, using low-swing signaling helps in reducing dynamic losses in the gate driver.
Although the main goal of this thesis is to reduce energy by recovering stored energy in the clock, many energy-saving techniques rely upon having voltages other than the primary $V_{DD}$ supply available on-chip. For example, since dynamic power is a quadratic function of voltage, circuitry that is not performance-critical can operate at a lower supply voltage to save significant energy. Also, adaptive body biasing can use new voltage sources to dynamically adjust transistor threshold voltages between high-performance and low-power modes. Generating these voltages with an on-chip power converter rather than bringing them in from outside can simplify chip and board design and reduce costs.

Since the extra regulated voltage may not always be needed, a more practical solution would recycle energy in a way that more directly reduces power consumption of the clock network. Figure 1.2 shows one way this can be done using an inductor to recover energy from $C_{clk}$. Here, rather than providing a second voltage supply, the energy is returned back to the on-chip power-supply grid through a circuit configuration resembling a DC-DC boost converter. To achieve a nearly square clock waveform, the energy is transferred in a non-resonant way.
This thesis investigates recycling and recovery strategies to reduce the effective clock power consumption. Here, the main challenge comes from the necessity of being limited to on-chip CMOS power transistors and passive components. Other challenges to overcome to make these methods feasible are:

- minimizing the impact of using charge-recycling methods and driver integration on internal signals of the original system,
- reducing the increased dynamic losses that result from the high switching frequency needed to shrink the passive components,
- avoiding complex circuit solutions as they are prone to malfunction, have higher chance of failure, and can easily introduce more energy losses than they save,
- avoiding technology-dependent solutions that may not be valid for future generations of finer feature size CMOS technologies, and
- coping with inaccuracy of models and simulation results at very high frequencies with very large current densities.

1.3 Research Contributions

Overall, this dissertation presents several “firsts” in the field of on-chip power supplies and clock distribution. It is the first work to consider recycling or returning energy stored in the clock distribution network back to the power supply system. The work also includes switch-mode DC-DC converters with the smallest area at 0.27mm$^2$, and the highest operating frequency at 3+ GHz reported to date. Furthermore, it represents the first work to employ ZVS at such high operating frequencies.

To enable successful on-chip integration, the following techniques have been applied:
• A high switching frequency of 3GHz was used to reduce the size of the converter passive components, so they could be moved on-chip.

• Converter switching drivers were integrated with clock-tree drivers, to improve efficiency of the voltage converter by reducing the power needed to drive the converter at such high frequencies.

• Zero-voltage switching was employed in the clock network to recycle its energy by redeploying it through the power converter.

• Creation of a novel delay circuit to provide the time delay needed to implement ZVS at such a high switching frequency.

Also, this dissertation introduces “reduce, reuse and recycle” as a complete energy savings strategy using an on-chip buck converter circuit as an example. While the previous contributions focused on recycling energy at the final output of a clock driver circuit, this method focuses on the energy used to operate the “front end” of a converter circuit which can also be applied to clock networks. The following additional contributions were made:

• Low-swing signaling is used to reduce energy in the front-end drive chain (energy reduction).

• Supply stacking of two separate front-end drive-chains allows the charge used by the PMOS drive chain to be re-used by the NMOS drive chain (energy reuse).

• Surplus charge from the PMOS drive chain is sent to the load by the switching converter (energy recycling).

Although the first two concepts have been implemented before (low-swing signaling and supply stacking), this thesis demonstrates how they are part of an overall strategy encompassing reducing, reusing, and recycling energy. However, the third concept of energy recycling is a new contribution.
This dissertation also presents a novel clock driver circuit that returns energy back to the power supply grid with the help of a charged inductor. The circuit configuration is based on the boost converter topology. This method differs from previous contributions by improving the power consumption of a clock tree itself instead of producing an auxiliary voltage supply. The following additional contributions were made:

- The gating delay circuit also provides ZVS delay time for turning on the final PMOS driver.
- Compared to clock resonant schemes, the clock waveform is kept nearly square.
- The clock duty cycle is fixed to avoid concerns of clock jitter and timing uncertainty.
- This method is simpler than the other circuits presented here. Modification to the original clock driver is done by adding only one inductor and two transistors; the other methods require a large filter capacitor.

### 1.4 Thesis Outline

The remainder of this dissertation is organized as follows. Chapter 2 provides background on discrete and integrated DC-DC switching power converters, including an explanation of typical switch-mode power converter topologies. It also describes some of the previous work that has been done in the area of on-chip converters and high performance clocking. Chapter 3 presents the charge-recycling architectures using buck converter topology. Chapter 4 explores alternative converter topologies, namely boost and buck-boost designs. Chapter 5 presents the low-power clock driver design. Lastly, Chapter 6 provides a final summary and discusses future work.
2 BACKGROUND

Discrete switch-mode power converters [12] are popular as they are very efficient regulator circuits. The use of switch-mode DC-to-DC power converters has increased in recent years as more electronic devices, such as laptop computers and cell phones, are powered from batteries. By powering the electronic circuits through a DC-DC converter, they receive a regulated voltage as the battery voltage drops. DC-DC converters can also adjust the voltage level needed to supply different sub-circuits of a system as they can provide higher or lower voltage levels than the battery voltage or even a negative voltage, if needed.

A key quality metric for power converters is the conversion efficiency. Typical efficiencies are 50 - 70% for the lower end, and 80 - 95% for the higher end. Other key quality metrics are the output voltage regulation and the output voltage ripple which is usually kept below 5% peak-to-peak.

In a discrete switch-mode converter, efficiency is compromised due to the parasitic elements of the circuit. Integrating the converter within a system-on-chip diminishes the problem by reducing the stray components. Therefore, a number of efforts are underway to move the power converter on-chip [13] [14] [15]. This also could lower the number of required power pins and improve the quality of voltage regulation as well.

The rest of this chapter provides background on discrete and integrated DC-DC power converters including a brief explanation of the basic topologies and a detailed survey of the previously published on-chip power converters. It also describes some of the previous work which has been done in the area of high-performance clocking.
2.1 Discrete Switching Power Converters

2.1.1 Basic Switching Converters

Switch-mode converters consist of an inductor that periodically is connected in different configurations. Usually the input of these converters is an unregulated DC voltage, such as a rectified AC voltage or a battery. By adjusting the ratio of time spent in each configuration, \textit{i.e.}, the duty cycle, the output voltage can be established and regulated. Switching frequency itself does not have an effect on the output voltage.

Switch-mode converters are more efficient than linear converters, in the range of 80\% to 95\% for a discrete design. As switches are either fully closed or fully open, the voltage drop happens only across the inductor, which ideally is a no-loss component. That is, voltage drop is due to energy stored, but not dissipated, in the inductor. The higher efficiency of these converters has made them attractive for all types of applications. For example, using these converters can increase battery life in a portable device. As there are switches in the circuit that are closed and opened, harmonics are present in the system that need to be dealt with by employing a suitable filter.

There are two basic DC-DC converter topologies that can generate output voltages that are lower or higher than the input voltage. A third simple configuration can be derived from the two basic ones to generate a negative output voltage with a magnitude that is either greater than or less than the input voltage magnitude [12].

One of the basic switching conversion topologies is the step-down or buck converter, shown in Figure 2.1(a). Basically, its operation can be described as averaging a PWM square wave signal by passing it through a low-pass filter. The average or DC value is \( D \times V_{DD} \) which implies that the output voltage is a function of the magnitude \( (V_{DD}) \) and also the duty cycle \( (D) \) of the square waveform. The operation of the buck converter is fairly simple as there are only
two operational states. In the first state, the switch is closed, diode is reverse-biased and current builds up in the inductor. In the second state, the switch is opened. Current in the inductor cannot change instantly, so the current finds its way through the diode and the energy is transferred from the inductor to the load. The ideal DC gain of a buck converter is $D$.

![Buck configuration](a)

![Boost configuration](b)

![Buck-boost configuration](c)

Figure 2.1. Basic switching converter topologies

Another basic DC-DC conversion topology is the step-up or boost converter. Components used are similar to the buck converter but connected in a different configuration as shown in Figure 2.1(b). Similarly, there are two operational states. In the first state, the switch is closed, current builds up in the inductor and diode is reverse-biased, isolating the output stage. In
the second state, the switch is opened. Current in the inductor can not change instantly, so the current finds its way through the diode. The inductor voltage will be in series with the source voltage, so the output capacitor receives a voltage that is higher than the supply voltage. The load receives energy from the input source as well as the inductor. Therefore, the ideal DC gain of a boost converter is $1/(1 - D)$.

The buck-boost topology also uses the same components as the buck converter but connected in yet another configuration, as shown in Figure 2.1(c). Again, there are two operational states. In the first state, the switch is closed, so current builds up in the inductor and the diode is reverse-biased isolating the output stage. In the second state, the switch is opened. Current in the inductor can not change instantly, so the current finds its way through the diode. The inductor will be in parallel with the output and the energy is transferred from the inductor to the load. Hence the ideal DC gain of a buck-boost converter is $-D/(1 - D)$.

### 2.1.2 Zero Voltage Switching

In advanced switch-mode power converters, zero voltage switching (ZVS) operation is used to manage dynamic power losses in the power transistors [16]. The basic idea of ZVS is that these power transistors are turned on when the voltage across their terminals is zero, which results in no power loss during switching. Consider the circuit diagram of a synchronous buck converter shown in Figure 2.2(a). $S_1$ acts as the switch to connect to the supply and $S_2$ acts as the diode from Figure 2.1(a). $C_x$ includes all capacitances at node $V_{inv}$. When $S_1$ is on, $V_{inv} = V_{DD}$ and the current in the inductor is increasing. $S_1$ is turned off in accordance with the required converter output voltage, in other words, the duty cycle of the gating signal. $S_2$ is kept off and the inductor current moves the charge stored in $C_x$ to $C_F$ and, as a result, $V_{inv}$ decreases. When $V_{inv} = 0$, $S_2$ is turned on to achieve ZVS for $S_2$. Noticing that $S_1$ is off and no supply voltage is connected to the
circuit, inductor current decreases and by design reaches to some negative value. At this time, $S_2$ is turned off and the negative inductor current charges $C_x$. $V_{inv}$ is increased and when $V_{inv} = V_{DD}$, $S_1$ is turned on to achieve ZVS for $S_1$.

![Synchronous buck configuration](image)

(a) Synchronous buck configuration

![Idealized timing diagram](image)

(b) Idealized timing diagram

Figure 2.2. ZVS operation in a synchronous buck converter

In Figure 2.2, the value of $C_x$ affects the rise and fall times of $V_{inv}$ as a larger capacitor will slow down the transitions of the node $V_{inv}$. The output voltage has a ripple due to the switching action. The percentage ripple in the output voltage is usually specified to be less than,
for example, 5% peak-to-peak. Therefore, as a first order of approximation, it is valid to assume that $V_{out}$ is constant.

### 2.2 CMOS Inverter Driver Circuit

CMOS transistors used as switches in the converter of Figure 2.2 are big compared to other transistors used in digital logic. The gate inputs of these transistors have significant capacitance. To achieve rapid turn-on and turn-off transitions, a tapered driver, which is a chain of inverters whose size successively grows by the tapering factor $r$, is used to drive those big transistors. Figure 2.3 shows such a driver chain with $n$ stages. To increase the overall efficiency of a power converter, the driver circuit should be designed so that the power consumption in the driver chain is minimized.

![Diagram of CMOS inverter driver circuit](image)

**Figure 2.3.** A CMOS inverter driver with tapering factor $r$

As described in [17], two parameters $\beta$ and $r$ characterize the inverter chain: $\beta$ is the ratio of PMOS to NMOS transistor sizes in an inverter, and $r$ is the ratio of the transistor sizes in consecutive inverter stages.

A common practice is to widen PMOS transistors so that in an inverter stage, the resistances of PMOS and NMOS transistors are matched [17]. This typically requires $\beta = 2.5 \sim 3.5$. As a result, high-to-low and low-to-high propagation delays are equalized. In addition, the rise/fall times of inputs and outputs of the inverters are equalized, which minimizes
the short circuit dissipation. As such, most power dissipation in an inverter driver is associated with the dynamic power, and only a minor fraction (<10%) is due to short-circuit currents.

Increasing $r$ is also a key to reducing power consumed by the front-end inverter chain. In this work, to keep the design simple without varying too many variables, a value of $r = 4$ corresponding to fan out of four is chosen for the inverter chain. Using fan-out of four (FO4) is a common practice that minimizes propagation delay in the inverter chain [17].

### 2.3 Integrated Switching Power Converters

To be able to implement switch-mode DC-DC power converters on chip, the power switches of the converter are replaced by CMOS transistors. As an example, comparing the circuit diagram of the buck converter in Figure 2.1(a) with the CMOS inverter in Figure 2.4 reveals a similarity which leads to use of a CMOS inverter as power switches of a buck converter.

![Figure 2.4. A CMOS inverter chain driving a CMOS buck converter](image)

The CMOS transistors in the converter of Figure 2.4 are playing the role of power transistors, so they need to be big to pass high currents. Big transistors have small on-state resistance resulting in a reduced static power loss. On the other hand, this will increase the stray
capacitances of the transistors and dynamic power loss is increased which needs to be addressed. A bigger transistor will need a bigger gate driver circuit as well.

Integrated power converters usually work at higher switching frequencies to shrink the size of the passive components. Therefore, to reduce the amount of heat generated, higher efficiency values are much preferred and, as such, implementation of ZVS operation is very important. To do this, separate gating signals are needed for each transistor as shown in Figure 2.5.

![Figure 2.5. ZVS operation in a CMOS synchronous buck converter](image)

Here, after $M_p$ is turned off, $M_n$ is kept off until the inductor current discharges $C_x$ to zero and then $M_n$ is turned on, achieving ZVS for $M_n$. Similarly, after $M_n$ is turned off, $M_p$ is kept off until the negative inductor current charges $C_x$ to $V_{DD}$ and then $M_p$ is turned on, achieving ZVS for $M_p$.

For integrated power converter designs, to save on-chip area, smaller inductor and capacitor values are preferred. Graphs illustrating the required inductor and capacitor values for different currents and switching frequencies are given in Appendix A. Choosing a mid-level output current will give a good compromise between inductor and capacitor values while higher switching frequencies will reduce both.
2.4 Literature Survey

Discrete power converters have been around for many years and, as such, they have been studied in detail in many publications. While the earlier papers, such as [18] and [19], had focused on design optimization, more recent papers have focused on using advanced control methods that could be employed using digital signal processors [8].

As on-chip power converters are becoming popular, various approaches to integrating them on-chip have been reported in the latest literature. Some have tried to implement discrete design techniques such as using a multi-phase configuration to improve the quality of the output voltage while others have tried to implement integrated design techniques such as using low-swing transistors to reduce the consumption of the converter itself. Those designs are discussed briefly in the following sections.

2.4.1 Switching Power Converters

Physical constraints push on-chip integrated power converters to use small inductors and capacitors. Recent work has focused on reducing the size of these components while maintaining high efficiency. In [20], an analytical solution is derived for the optimal DC-DC converter design, linking power efficiency directly to CMOS front-end parameters and inductor technology.

In recent years, many integrated power converters have been reported, mostly switching at a few megahertz frequency and with off-chip passive components. A converter switching at 480MHz [14] operates at one of the highest reported frequencies. It contains four single-phase modules that operate as stand-alone converters and receive synchronization signals from a block synchronizer as shown in Figure 2.6. In a multiphase topology, the switching times of the inductors are staggered to cancel out the output voltage ripple. This design utilizes air-core
inductors mounted on the package of a 90nm CMOS chip. At 233MHz, power efficiency of 83.2% has been reported with voltage conversion of 1.2V to 0.9V at 0.3A load current and at 480MHz, efficiency of 72% was reported with voltage conversion of 1.8V to 0.9V at 0.5A.

On the other hand, [15] is an example of a fully integrated step-down converter fabricated in a 0.18µm SiGe RF BiCMOS process. The converter provides a programmable 1.5V to 2V output voltage at a 200mA current rating with a switching frequency of 45MHz. This design, shown in Figure 2.7, utilizes a two-stage interleaved ZVS synchronous buck topology, and has a maximum efficiency of 65%.

Also, [21] is an example of a fully integrated step-up converter fabricated in a 0.5µm process. The circuit diagram for this design is shown in Figure 2.8. The target specifications are input and output voltages of 5V, a maximum load current of 200mA, and an average switching frequency of 75 MHz. The conversion efficiency was not reported for this design.

Figure 2.6. Block diagram of a four-phase interleaved DC-DC converter [13]
Figure 2.7. Circuit diagram of the fully integrated two-stage buck converter [15]

Figure 2.8. Circuit diagram of the fully integrated boost converter [21]
Table 2.1, partially taken from [13], shows a comparison of the previously reported on-chip converters, some of which have on-chip passives. Among fully integrated converters, [21] has the highest switching frequency of 75MHz and uses an area of 1.5mm × 1.5mm to fit the large on-chip passive components. On the other hand, [14] has the highest reported switching frequency of 480MHz and uses on-package inductors. It later appeared in [13], switching at a lower frequency of 233MHz, to boost the efficiency.

<table>
<thead>
<tr>
<th>Year</th>
<th>[22]</th>
<th>[23]</th>
<th>[24]</th>
<th>[14]</th>
<th>[13]</th>
<th>[21]</th>
<th>[25]</th>
<th>[15]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology</td>
<td>0.35µm</td>
<td>1.5µm</td>
<td>0.18µm</td>
<td>90nm</td>
<td>90nm</td>
<td>0.50µm</td>
<td>1.5µm</td>
<td>0.18µm SiGe RF BiCMOS</td>
</tr>
<tr>
<td>Switching frequency, $F_{sw}$ (MHz)</td>
<td>1</td>
<td>2</td>
<td>102</td>
<td>480</td>
<td>233</td>
<td>75</td>
<td>10</td>
<td>45</td>
</tr>
<tr>
<td>Input voltage, $V_i$ (V)</td>
<td>1.2</td>
<td>3.3</td>
<td>1.8</td>
<td>1.8</td>
<td>1.2 ~ 1.4</td>
<td>5</td>
<td>4</td>
<td>2.8</td>
</tr>
<tr>
<td>Output voltage, $V_o$ (V)</td>
<td>0.5</td>
<td>1.7</td>
<td>0.9</td>
<td>0.9</td>
<td>0.9 ~ 1.1</td>
<td>5</td>
<td>2</td>
<td>1.8</td>
</tr>
<tr>
<td>Output current, $I_o$ (A)</td>
<td>0.02</td>
<td>0.07</td>
<td>0.25</td>
<td>0.5</td>
<td>0.3 ~ 0.4</td>
<td>0.2</td>
<td>0.25</td>
<td>0.2</td>
</tr>
<tr>
<td>Efficiency, $\eta$ (%)</td>
<td>91</td>
<td>92</td>
<td>88</td>
<td>72</td>
<td>83 ~ 85</td>
<td>N/A</td>
<td>50</td>
<td>65</td>
</tr>
<tr>
<td>Filter inductor, $L_F$</td>
<td>10µH</td>
<td>4.7µH</td>
<td>8.8nH</td>
<td>3.6nH *</td>
<td>6.8nH **</td>
<td>50nH</td>
<td>1µH</td>
<td>11nH ***</td>
</tr>
<tr>
<td>Filter capacitor, $C_F$</td>
<td>20µF</td>
<td>10µF</td>
<td>3.0nF</td>
<td>2.5nF</td>
<td>2.5nF</td>
<td>650pF</td>
<td>180nF</td>
<td>6nF</td>
</tr>
<tr>
<td>On/off-chip passives</td>
<td>off</td>
<td>off</td>
<td>off</td>
<td>Off-chip, on-package inductors</td>
<td>Off-chip, on-package inductors</td>
<td>on</td>
<td>on</td>
<td>on</td>
</tr>
</tbody>
</table>

* This design uses four inductors, 3.6nH each.
** This design uses four inductors, 6.8nH each.
*** This design uses two inductors, 11nH each.

To reduce the size and footprint of the passive components, the switching frequency of the converters needs to be increased. The integrated clock driver/power converter designs introduced in this dissertation uses GHz-range switching frequency for full on-chip passive component integration and a smaller passive component footprint. Reduced efficiencies at those higher frequencies are compensated by employing charge recycling methods, as described in detail later in this dissertation.
2.4.2 Low-Swing Power Converters

To enhance the efficiency characteristics of high-frequency switching DC-DC converters, [24] proposes a low-voltage-swing MOSFET gate drive technique as shown in Figure 2.9.

It has been reported that an efficiency of 88% at a switching frequency of 102 MHz is achieved for a voltage conversion from 1.8V to 0.9V with a low-swing DC-DC converter based on a 0.18µm CMOS technology. This corresponds to a power reduction of 27.9% as compared to a standard full-swing DC-DC converter.

![Figure 2.9. Low swing DC-DC conversion technique [24]](image)

Another low-swing design presented in [26] utilizes a cascode bridge circuit as shown in Figure 2.10. The circuit can operate at input voltages higher than the maximum voltage that can be applied directly across the terminals of a MOSFET. It has been reported that an efficiency of 79.6% is achieved for 5.4V to 0.9V conversion in a 0.18µm CMOS technology. This DC-DC converter operates at a switching frequency of 97MHz while supplying a DC current of 250mA to the load.

Moreover, [27] combines the low-swing idea with digital controlling as a power management solution as shown in Figure 2.11. In this scheme, normally the DC-DC converter
works in pulse width modulation (PWM) mode to achieve high-quality regulation as well as good efficiency. However, in standby mode in which the load current is very low, pulse width modulation control leads to low efficiency due to excessive switching loss. To extend the standby time, pulse frequency modulation is used for light-load operation to achieve good efficiency. This digitally-controlled buck converter is implemented in 0.25µm CMOS. The PWM switching frequency is 1.5MHz. The converter achieves a maximum of 91% efficiency at 200mA output current. Maximum input voltage is 5.5V and the output voltage ranges from 1V to 1.8V.

![Figure 2.10. Cascode bridge circuit [26]](image)

![Figure 2.11. Block diagram of a power management on-chip [27]](image)
In contrast, using a different methodology, the approach in [28] consists of stacking CMOS logic domains to operate from a voltage supply that is a multiple of the nominal supply voltage. DC-DC down conversion is performed using charge recycling without the need for explicit power converters as shown in Figure 2.12. This high-voltage power delivery system would need start-up devices to avoid device overstress during power-on. Also level shifters that translate logic levels between stacked domains are needed. The approach clearly requires that the stacked loads have well-balanced charge utilization for high efficiency. One context in which this approach may be more easily applicable is in a multi-core microprocessor in which each core could be designed to operate in a different stacked domain. Current utilization in each domain could be controlled with workload balancing; level-shifting voltage interfaces would only have to be present to interface between cores or with the chip pads.

Figure 2.12. Implicit DC-DC conversion through charge recycling [28]

The low-swing buck converter design introduced in this dissertation improves upon this previous work by introducing “reduce, reuse and recycle” as a complete energy savings strategy for an on-chip buck converter circuit. Energy reduction in the front-end drive chain is achieved by using low-swing signaling at 660MHz. Charge reuse is achieved by supply-stacking separate front-end drive-chains for the output transistors. And finally, energy recycling is achieved by taking surplus charge available from the top front-end drive-chain along with the charge
available in the clock load capacitance and sending it to the load as a regulated supply. Although the first two concepts have been implemented before, the third concept of energy recycling is a new contribution, as described in detail later in this dissertation.

### 2.4.3 Resonant Clock Strategies

A clock signal distribution network in an integrated circuit requires a capacitive clock distribution model. An approach to global clock distribution presented in [29] augments traditional tree-driven grids with on-chip inductors. The large clock capacitance then resonates with the inductance $L_{\text{spiral}}$ shown in Figure 2.13. This approach promises to significantly reduce the power necessary to drive the grid, since the energy of the fundamental resonates back and forth between electric and magnetic forms rather than being dissipated as heat. Consequently, the clock drivers must only supply the energy needed to overcome losses at the fundamental. Furthermore, because the effective capacitance of the clock network is dramatically reduced, the number of gain stages and the associated latency required to drive the clock is reduced as well, resulting in considerable improvement in skew and jitter.

![Figure 2.13. Simple lumped circuit model of the resonant clock distribution][11]

While the non-resonant power scales linearly with frequency, [29] and [10] report that the resonant power is fairly constant, with better-than-80% power savings at the desired resonance frequency of 1.1GHz. To minimize energy dissipation at the fundamental, there might be some
need to tune the grid resonance to the clock frequency with MOS capacitors that can be switched onto the clock load. Local buffering would not be resonant and would dissipate the same amount of power as a non-resonant distribution. Hence, with resonant clocking there would be a desire to shift more of the clock load to the resonant grid. This approach can scale to higher clock frequencies for a given clock load by the addition of more inductors to the network. Sinusoidal clocks are, however, generally undesirable because of slower signal transition times. The slow transition results in increased skew and jitter as there is no precise moment to define the clock event.

The concept presented in [29] is improved in [11] by introducing a distributed differential oscillator global clock network. Here, the distribution is differential with the use of symmetric inductors placed between the two clock phases, eliminating the need to add large capacitors to the clock distribution as in the resonant single-ended distribution of [29].

The low-power clock driver circuit introduced in this thesis differs from resonant clocking by providing a quasi-square clock waveform with sharp edges at the frequency of 4GHz using a circuit configuration that resembles a boost converter. This improves the power consumption of a clock tree itself.

2.5 Implementation Considerations

The designs introduced here rely on the charge stored in the clock load capacitance. Thus, the exact location to connect these circuits depends on where the clock load capacitance is located, i.e., it depends on the configuration of clock distribution network.

Clock distribution networks have been studied in detail in [4]. They usually form a tree structure. If the ends of the branches are connected to each other, a mesh structure is formed which has the benefit of reduced interconnect resistance within the clock tree. A single buffer
can be used to drive the entire clock if the clock is distributed entirely on metal. The buffer needs to be able to provide enough current to drive the clock network capacitance while keeping the clock waveform intact.

One of the goals of the clock designers is to minimize the clock skew. One common way of achieving this is by using a symmetric layout such as an H-tree, as shown in Figure 2.14. Therefore, each clock path from the clock source to individual clock loads has the same delay, assuming exact matching of the layout and no process variations. The interconnect capacitance in an H-tree is greater than a standard clock-tree [30] because total wire length tends to be greater. Thus, using an H-tree reduces skew while increasing the power.

In the example of Figure 2.14, the last inverter that drives the clock tree trunk is the biggest inverter which drives the capacitance of the whole H-tree. Therefore, the last inverter could be used for the power switches in the integrated clock driver/buck converter, or it could be where the inductor would be located in the low-power clock driver design.

![Figure 2.14. A tapered H-tree clock distribution network](image)

Another approach is to distribute buffers throughout the clock network. This method requires more area but will be necessary if the resistance of the clock interconnects is significant. In a well-balanced clock distribution network, buffers are the primary source of clock skew [4] as active device characteristics vary more than passive device characteristics. Buffers may also
be used to drive local loads. In this case, the integrated clock driver/buck converter must be replicated for each region that has its own local load buffer.

Another concern is the area overhead when these energy-recycling designs are utilized in a real microprocessor environment. To investigate this concern, power consumption of a few recent microprocessor designs is summarized in Table 2.2, which is partially taken from [3].

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Switching frequency, $F_{sw}$ (GHz)</td>
<td>1.0</td>
<td>1.3</td>
<td>~2.0</td>
<td>5+</td>
</tr>
<tr>
<td>Overall power consumption (W)</td>
<td>130</td>
<td>125</td>
<td>100</td>
<td>~100</td>
</tr>
<tr>
<td>Chip area (mm$^2$)</td>
<td>421</td>
<td>400 est.</td>
<td>596</td>
<td>341</td>
</tr>
<tr>
<td>Clock power share (%)</td>
<td>33</td>
<td>30</td>
<td>25</td>
<td>22</td>
</tr>
</tbody>
</table>

As an example, the clock network in IBM’s POWER6 processor consumes 22% of the overall chip power or about 22W with an overall area of 341mm$^2$. This results in an estimated clock power consumption of 65mW/mm$^2$. Using $P = CV_{DD}^2F_{sw}$, in which $P$, $C$, $V_{DD}$ and $F_{sw}$ are clock power consumption, clock capacitance, supply voltage and clock frequency, respectively, and assuming $V_{DD} = 1.0V$, it can be estimated that the overall clock capacitance of the chip is 4.4nF or in other words 13pF/mm$^2$. In the low-power clock driver design, a clock load gate capacitance of 21pF has been used which corresponds to an area of 1.6mm$^2$ of a high performance microprocessor. In contrast, the area needed to implement the inductor in that design is 0.1mm$^2$, which is much smaller than 1.6mm$^2$ it is trying to recover power from.

In comparison, [10] reports a 90nm CMOS resonant clocking test-chip with $C_{clk} = 7.5pF$ and four sets of $LC$ passives as shown in Figure 2.15. This results in a clock phase and amplitude that are both uniform across the entire clock network [11]. Local buffering would not be resonant and would dissipate power as a non-resonant network [29]. Tuning of the grid to the clock
resonance frequency could be done by switching MOS capacitors onto the clock load, but if the $Q$ of the resonator is small, resonance can be achieved over a wide frequency band [29].

In Figure 2.15, each $C_{decap}$ is 20pF and $L$ is 1nH, occupying a chip area of $80\mu m \times 80\mu m$ and $90\mu m \times 90\mu m$, respectively [10]. Since the H-tree itself does not include any buffers, the four sets of $LC$ passives are in parallel which results in an effective decoupling capacitance of 80pF and an effective inductance of 250pH at clock resonance frequency of 3.7GHz. It has been reported that approximately 20% of the clock power is being recycled in the test chip which, with a redesign, would likely approach the 80% observed in 0.18µm test-chips [10].

Figure 2.15. Components of a resonant clock sector [10]
3 INTEGRATED BUCK CONVERTERS

In this chapter, an integrated clock driver/buck converter design and a low-swing buck converter design are discussed. In the first design, the energy stored in the clock load capacitance provides the input power to voltage converters operating at the clock speed of roughly 3GHz. The second design, a low-swing buck converter, introduces “reduce, reuse and recycle” as a complete energy savings strategy at 660MHz. Energy reduction in the front-end drive chain is achieved by using low-swing signaling (half-rail swings instead of full-rail). Charge reuse is achieved by one drive-chain reusing charge from the other drive-chain. And finally, energy recycling is achieved by taking surplus charge available from one drive-chain, along with the charge available in the clock-load-capacitance, and sending it to the load as a regulated supply.

In the designs introduced here, high-speed switching losses are reduced by employing zero voltage switching and by directly integrating the clock-tree drivers with the converter power-transistor drivers. Also, the designs are implemented in an open loop, lacking output voltage regulation, but with the goal of having less than 5% ripple on $V_{out}$.

The techniques proposed in these designs are valid for finer feature size CMOS technologies as well.
3.1 Integrated Clock Driver/Buck Converter

3.1.1 Introduction

This section describes a new method where the energy of the clock is recovered to supply on-chip DC-DC converters [32] [33]. This work differs from resonant clocking by providing a quasi-square clock waveform with sharp edges because the inductors are not working in a resonating mode with the clock capacitors. Here, part of the challenge comes from the necessity of being limited to on-chip CMOS power transistors and passive components, as the design is limited to the same technology as the rest of the circuit.

Directly integrating a clock driver intended for high-performance logic with a DC-DC power converter merges several compatible concepts. The converter switching losses are merged into the clock-tree switching losses, the multi-GHz clock frequency reduces the size of converter passive components so that they can be put on-chip, and the final clock drivers and the DC-DC converter power transistors are both very wide to improve switching time of the clock and reduce static losses of the converter. Also, these large, low-impedance transistors need to be driven by a tapered inverter chain to keep up with the very high frequency. Similarly, the power used by this chain should be minimized in both cases. But higher switching frequency increases the dynamic power loss. To compensate for this loss, two major ideas have been used in this work: charge-recycling and zero-voltage switching.

Output voltage regulation can be achieved by modulating the clock duty cycle, a scheme compatible with single-edge triggered clocking. The converters’ output voltage can be used to supply sub-circuits that operate at other voltage levels as it is challenging to bring in and distribute several voltage domains. Since the switching DC-DC converters are small, several of them can be deployed in different regions to produce independent, regional power supplies. This allows several different regulated voltages to be on-chip at the same time, all powered from the
same off-chip primary supply. Many power-saving techniques such as mixed-voltage islands and adaptive body biasing (ABB) [34] can utilize these additional supply voltages. An on-chip DC-DC converter can power these schemes without the need for external pins, external components, or board design effort. Another advantage of on-chip converters is the ability to respond quickly to dynamic load conditions in many-core processors, a key requirement for achieving the savings promised by dynamic voltage and frequency scaling (DVFS) [35].

Figure 3.1 shows how integrating the clock driver with the power converter helps in increasing the overall efficiency. The integrated clock driver/power converter in Figure 3.1(a) receives $P_{in1}$. Part of $P_{in1}$ is required to operate the clock network. If a dedicated clock driver was constructed, this power consumption would be $P_{in2}$. We use $P_{in1} - P_{in2}$ to operate the power converter and recycle energy from the clock driver. As shown in Figure 3.1(b), if this power and circuitry was removed from the integrated design, a stand alone power converter would still be needed that provides $P_{out}$ using just the incremental power $P_{in1} - P_{in2}$. Recycling the clock power increases the effective efficiency.

(a) Raw efficiency  
(b) Effective efficiency

Figure 3.1. Efficiency block diagram
To compare the dual-purpose circuit with traditional on-chip power converters, a new concept is introduced as effective efficiency. Effective efficiency ($\eta_{\text{eff}}$) is defined as the output power of the converter divided by the incremental power to operate the converter.

\[
\eta_{\text{eff}} = \frac{P_{\text{out}}}{P_{\text{in1}} - P_{\text{in2}}}
\]  

(3.1)

Effective efficiency captures how efficient a traditional converter would have to be if it were to supply the same output power using just the additional input power needed by the dual-purpose circuit.

### 3.1.2 Circuit Design

One of the basic switch-mode DC-DC conversion topologies is the step-down or buck converter. Its operation can be described as averaging a square wave signal by passing it through a low-pass filter as shown in Figure 3.2(a). The average or DC value is $D \times V_{DD}$ which implies that the output voltage is a function of the magnitude, $V_{DD}$, and also the duty cycle, $D$, of the square waveform. A basic integrated clock driver/buck converter circuit is shown in Figure 3.2(b). Here, a chain of cascaded inverters (not shown) is used as a driver buffer for node $V_{\text{clk-in}}$. $C_{\text{clk}}$ is the sum of all transistor and wiring capacitances that are connected to the clock node.

The idealized timing diagram of the internal signals is presented in Figure 3.2(c), where $D$, $T_{\text{sw}}$, and $T_{\text{delay}}$ represent clock duty cycle, switching period (i.e., clock period), and ZVS delay-time, respectively. As shown in Figure 3(c), there are three intervals of operation:

- Interval 1 (time 0 to $D \times T_{\text{sw}}$) is intended to drive the load and charge $C_{\text{clk}}$ through $M_p$.

During this time, the inductor current increases linearly since the voltage across it is constant.
Figure 3.2. Integrated clock driver/buck converter

- Interval 2 (time $D \times T_{sw}$ to $D \times T_{sw} + T_{delay}$) is intended for charge recycling. Therefore, both $M_n$ and $M_p$ are off. The charge that is stored in $C_{clk}$ is moved to the output circuit through the inductor, as the inductor current cannot be disrupted abruptly. This results in a rapid drop of $V_{clk}$ which is intended. In this short period of time, the inductor current can be assumed somewhat constant. It is worth mentioning that if there is no delay present, then at time $D \times T_{sw}$, $C_{clk}$ would be discharged to ground through $M_n$, wasting the stored energy.

- Interval 3 (time $D \times T_{sw} + T_{delay}$ to $T_{sw}$) starts when the voltage across $M_n$ is close to zero. At this time, $M_n$ is turned on to provide a low-resistance path for the inductor current. As there is no energy supplied to the system and the voltage across the inductor is constant,
inductor current decreases linearly. ZVS operation occurs when $M_n$ is turned on while its source-drain voltage is close to zero, thereby reducing dynamic power loss.

Theoretically, in interval 3, when the falling inductor current crosses zero, $M_n$ could be turned off to allow charging $C_{clk}$ with the negative inductor current. Then, at the beginning of the next switching cycle, $M_p$ would be turned on with zero voltage across it, i.e., ZVS operation for $M_p$. In practice, this might increase the output voltage ripple, as $C_F$ should provide the required charge for the large $C_{clk}$. Moreover, the inductor RMS current and thus the power loss in the inductor resistance would be increased. In this design, the minimum inductor current is set to be close to zero; therefore, no ZVS operation is implemented for $M_p$. In practice, due to the process variation, the inductor current may go slightly negative. However, as the inductor current does not stop at zero, the converter is considered to be operating in continuous conduction mode (CCM).

At the end of interval 3, $M_p$ is turned on and $M_n$ turned off at roughly the same time. That is, the delay element should only delay a rising edge on $V_{clk}$, not the falling edge.

### 3.1.3 Complete Circuit

To be able to calculate the effective efficiency using Equation (3.1), a reference clock driver is needed. In this section, this reference circuit will be described first. This will be followed by the integrated clock/converter circuit.

#### Reference Clock Circuit

To evaluate the performance of the integrated clock driver/power converter circuit, a reference circuit containing the tapered inverters to form a clock driver was designed using a reference
clock capacitance $C_{clk}$. In this work, the clock capacitance $C_{clk}$ is assumed to be 12pF. The approach described in [17] is used here to design the inverter chain. A common practice is to use wider PMOS transistors than NMOS transistors so that the resistance of PMOS and NMOS transistors is matched. In this circuit, PMOS transistors are three times wider than NMOS transistors, except for the last inverter stage in which the PMOS is four times wider as shown in Figure 3.3. This is done to keep the reference circuit similar to the integrated design where $M_p$ needs to be wider to drive $C_{clk}$ and $L_F$ simultaneously. As is common practice, a fan-out ratio of four is chosen for the inverter chain. To increase the overall efficiency of a power converter, the driver circuit can be designed so that the power consumption in the drive chain is minimized.

![Transistor dimensions are in μm.](image)

**Figure 3.3. Circuit of the reference clock for the integrated clock driver/buck converter**

**Integrated Clock Driver/Buck Converter**

A detailed circuit diagram of the integrated clock driver/buck converter, including the buffer delay circuitry, is shown in Figure 3.4. Some transistors have been added to implement the capacitors.

To control the exact on/off timing of $M_n$ and $M_p$, the inverter driving those transistors is replaced with two separate inverters, with the same total transistor sizes and roughly the same
power consumption as the original single driver. To implement the delay time, the gate of $M_1$ is connected to $V_{clk}$ instead of being connected to the gate of $M_2$. Therefore, compared to $V_p$, the rising edge of $V_n$ is delayed by $T_{delay}$, a duration which depends on how quickly $L_F$ drains $C_{clk}$ and how fast $M_1$ turns on to raise $V_n$. A drop in $V_{clk}$ will result in $M_1$ and then $M_n$ to turn on and consequently $V_{clk}$ drops faster. Since the gate of $M_2$ is connected to $V_m$, no falling edge delay is observed for $V_n$. To prevent $M_1$ and $M_2$ from being on concurrently at the rising edge of $V_m$, the source of $M_1$ is connected to $V_p$ instead of $V_{DD}$. Therefore, $V_n$ falls at the falling edge of $V_p$.

Figure 3.4. Circuit diagram of the integrated clock driver/buck converter

In interval 1 of the operation, $C_{clk}$ stores some energy which is then being delivered to the load in interval 2. The output voltage is therefore given by $V_{out} = D_{eff} \times V_{DD}$ where

$$D_{eff} = D + \frac{1}{2} \frac{T_{delay} - T_{fall}}{T_{sw}} \quad (3.2)$$

$T_{fall}$ is the fall time of $V_{clk}$ if there was no ZVS delay and $T_{delay}$ is the fall time of $V_{clk}$ in the presence of ZVS, as shown in Figure 3.5.
Equation (3.2) suggests that if $T_{delay}$ is equal to $T_{fall}$, the duty cycle remains unchanged. Any $T_{delay}$ larger than $T_{fall}$ would increase the effective duty cycle accordingly. $T_{delay}$ can be calculated using the simplified circuit model given in Figure 3.6.

![Simplified circuit model](image)

Figure 3.6. Simplified circuit model for analyzing $V_{clk}$ during clock fall time

At time $t = 0$ when $M_p$ turns off, $V_{clk}(0) = V_{DD} - I_{Lmax} \cdot R_{on-PMOS}$. During clock fall time, $I_{Lmax}$ can be assumed to be constant, therefore $V_{clk}(t) = V_{clk}(0) - \frac{1}{C_{clk}} \cdot I_{Lmax} \cdot t$. The time that it takes for $V_{clk}$ to reach zero can be determined by:

$$T_{delay} = C_{clk} \left( \frac{V_{DD} - R_{on-PMOS}}{I_{Lmax}} \right)$$

(3.3)
3.1.4 Simulation

To evaluate the performance of the integrated clock-driver/power converter circuits, it is simulated in 90nm CMOS technology using standard-$V_t$ transistors. A square wave signal with ~30psec rise/fall time, which is about the rise/fall time of an inverter with fan-out of four, is used as the clock source.

Simulated waveforms for the integrated buck converter are shown in Figure 3.7. The circuit is simulated with a 50% duty cycle and 70mA load current. The inductor current shown as $L_f$ in Figure 3.7(b), exhibits a triangular shape as expected, with minimum and maximum values of around −50mA and 190mA, respectively. In the first half cycle of the clock, $M_p$ source current provides the energy to charge up $C_{clk}$ as well as $L_F$. Because of the high current, there is a voltage drop of ~0.1V across $M_p$ as suggested by the droop of $V_{clk}$ to ~0.9V in Figure 3.7(a). In this figure, the reference clock circuit output is shown as $V_{clk-ref}$. Both clocks have similar edge slopes. In the second half cycle of the clock, inductor current discharges $C_{clk}$. As can be seen in Figure 3.7(b), $M_n$ source current is always positive, which means that all the charge in $C_{clk}$ is delivered to the load instead of the ground.

Simulation results of the buck converter circuit at different duty cycles and output currents are given in Figure 3.8 and Figure 3.9. $P_{out}$ can also be derived from Figure 3.8(a). The output voltage increases as $D$ is increased and, at the same time, the effective efficiency decreases. For example, at 70mA output current, by varying the duty cycle from 30% to 70%, the output voltage changes from 0.27V to 0.7V. The corresponding effective efficiency ranges from 286% down to 135%. For the reference circuit (the clock driver alone), simulations determined its power consumption, $P_{in2}$, was 41mW.
Figure 3.7. Simulated waveforms for the integrated clock driver/buck converter
Figure 3.8. Simulated output voltage and input power of the integrated buck converter
Figure 3.9. Simulated raw and effective efficiencies of the integrated buck converter
3.1.5 Chip Implementation

As models and simulation results of large passive on-chip components are inaccurate at very high frequencies and current densities, the integrated clock driver/buck converter is fabricated to assess the difficulties of implementing power regulation in deep-submicron technologies.

The block diagram and micrograph of the clock driver/buck converter chip are shown in Figure 3.10. The area of the integrated converter including $L_F$ and $C_F$ is $0.27\text{mm}^2$. The inductor alone is $0.1\text{mm}^2$. The total die area is $1\text{mm}^2$ to allow for probe station testing.

In order to avoid potential hot spots on the chip, especially at high load currents, some layout decisions were made to transfer heat out of the chip as quickly as possible. Higher metal layers such as $M6$ and $M7$ are better for transferring heat as they are the thickest. Power and ground grids are connected to high-power transistors through a large number of vias. These vias transfer the heat from the transistors located on the substrate to the surface of the chip and then to the probe pins, which serve as heat sinks.

In order to satisfy the specified maximum current densities and to avoid electromigration, paths that would normally carry high currents are widened. This also helps in reducing resistive voltage drops across the circuit. To satisfy DRC rules for maximum width and density of metal layers in 90nm CMOS process, wide paths such as those used in the inductor layout are slotted.

Large transistors inject high currents into the substrate through the large drain junction capacitances and by forward-biasing the source-bulk junction diodes. In order to prevent latch up caused by those high currents, the layout of the circuit incorporates substrate contacts with sufficiently small spacing to minimize the resistance [36].

A few provisions to the chip layout are needed for testing purposes. To match the chip input impedance with the signal generator output impedance, a $50\Omega$ termination resistor is added on chip. The probes available in the lab provide a limited number of connections that can be
made simultaneously. Also, since it is very difficult to monitor 3GHz waveforms on the chip without being invasive, these types of measurements were not attempted.

(a) Chip block diagram

(b) Chip micrograph

Figure 3.10. Implementation of the integrated clock driver/buck converter
3.1.6 Chip Measurements

The test bench for the integrated clock driver/buck converter was setup as shown in Figure 3.11. For precise power measurement, all the parasitic resistances in the test setup were accounted for through measurement and calibration. As a result, a supply voltage of 1.0V was applied at the chip probe pads. An external signal generator provides the clock signal to the chip under test.

![Block diagram of the test bench setup](image)

**Figure 3.11. Block diagram of the test bench setup**

**Investigating the Output Voltage**

The converter output voltage vs. the output current is plotted in Figure 3.12. In each graph, the duty cycle is kept constant and the switching frequency and load are changed to produce different curves. As expected, the output voltage does not change much with frequency. However, at 3.5GHz, Figure 3.12(a) suggests that the chip may not be working properly since the output voltage is significantly higher than the other data points.

Figure 3.13 can be derived from Figure 3.12 by keeping the frequency constant while the duty cycle is changed. It shows that at higher duty ratios, the output voltage is higher as expected.
Figure 3.12. The effect of $F_{sw}$ on $V_{out}$
Figure 3.13. The effect of $D$ on $V_{out}$

(a) With $F_{sw} = 2\text{GHz}$

(b) With $F_{sw} = 2.5\text{GHz}$
Investigating the Input Power

The input power to the integrated clock driver/buck converter, $P_{in1}$, is plotted in Figure 3.14 along with the input power to the reference clock circuit $P_{in2}$. Figure 3.14 shows that as the frequency is increased, the input power to the circuits are increased due to more switching activity. Because of a test anomaly, there is not much difference between data points at 2GHz and 2.5GHz. Similar to the previous conclusion, Figure 3.14(a) suggests that the chip may not be working properly at 3.5GHz as $P_{in1}$ is lower than other data points.

By keeping the frequency constant while the duty cycle is changed, Figure 3.15 can be derived. Higher duty cycles increase $P_{in1}$ because it affects the conversion duty cycle of the power converter. This figure also suggests that the change in duty cycle does not affect $P_{in2}$, which is expected since it does not change the switching activity. However, this conclusion cannot be drawn from the data due to the test anomaly described earlier.
Figure 3.14. The effect of $F_{sw}$ on $P_{in1}$ and $P_{in2}$
Figure 3.15. The effect of $D$ on $P_{\text{in}1}$ and $P_{\text{in}2}$

(a) With $F_{\text{sw}} = 2\text{GHz}$

(b) With $F_{\text{sw}} = 2.5\text{GHz}$
Investigating Raw and Effective Efficiencies

Raw efficiency of the integrated converter is defined by $\eta = \frac{P_{\text{out}}}{P_{\text{in}1}}$ and is plotted in Figure 3.16 and Figure 3.17. These figures show that the raw efficiency does not change much at different duty ratios and different frequencies. Again, the chip may not be working properly at 3.5GHz. The key metric for measuring the performance of the integrated clock driver/power converters is the effective efficiency. Since the overall input power operates two separate functions, the amount of power needed to operate the reference stand-alone clock network is not included as input power to the converter when calculating effective efficiency. Instead, only the incremental amount of power is counted as input power. When some additional energy is recycled from the clock, it is possible for the output power to exceed the incremental input power. Since energy cannot be spontaneously created, an effective efficiency greater than 100% is proof that energy recycling is taking place. Effective efficiency also represents the efficiency required of a stand-alone power converter to compete with an energy recycling architecture.

Also, the effective efficiency which is defined by $\eta_{\text{eff}} = \frac{P_{\text{out}}}{P_{\text{in}1} - P_{\text{in}2}}$ is very sensitive to the value of $P_{\text{in}1} - P_{\text{in}2}$. This problem is especially more pronounced at lower output currents where $P_{\text{in}1}$ would be close to $P_{\text{in}2}$. If there is a slight inaccuracy in the measured values of $P_{\text{in}1}$ and $P_{\text{in}2}$, the corresponding effective efficiency value can dramatically change.

As can be seen in Figure 3.18 and Figure 3.19, effective efficiency is increased at lower output currents. Since the available energy in $C_{\text{clk}}$ is constant with respect to output current, at low current outputs a greater proportion of the output energy comes from recycling. However, higher $F_{\text{sw}}$ results in more energy being stored in the capacitor per second. Hence, $\eta_{\text{eff}}$ benefits from increasing the frequency and lowering the output current. Achieving an effective efficiency above 100% is definitive proof that energy is being recovered from the clock.
Figure 3.16. The effect of $F_{sw}$ on $\eta$
Figure 3.17. The effect of $D$ on $\eta$
Figure 3.18. The effect of $F_{sw}$ on $\eta_{eff}$
Figure 3.19. The effect of $D$ on $\eta_{eff}$
3.1.7 Summary

The integrated clock driver/power converter designs presented here are capable of recovering energy from the clock and supplying it to the converter. The results show that the use of on-chip passives with power switching by CMOS inverters in ZVS mode allows for good efficiency [32] [33]. By converting unused potential energy into a useful regulated supply, the designer can power other parts of a circuit instead of wasting energy by simply dissipating unwanted charge to ground. Many applications can benefit from this new design technique. Optimization of the designs will require further investigation into the simulation tools, particularly their use in designing on-chip passives.

Table 3.1 provides a summary of performance comparison between this work and two other previously published buck converters. The output voltage ripple given is part of the design specification. Note the high levels of efficiency relative to the other designs.

Table 3.1. Summary of comparison between integrated buck converters

<table>
<thead>
<tr>
<th>Converter type</th>
<th>Previous Work</th>
<th>This Work</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>4-Phase Buck [14]</td>
<td>2-Phase Buck [15]</td>
</tr>
<tr>
<td>Technology</td>
<td>90nm CMOS</td>
<td>0.18µm SiGe RF BiCMOS</td>
</tr>
<tr>
<td>Layout Area (mm²)</td>
<td>0.14 * (excludes L)</td>
<td>27</td>
</tr>
<tr>
<td>Switching frequency, (F_{sw}) (MHz)</td>
<td>480</td>
<td>45</td>
</tr>
<tr>
<td>Inductor, (L_F) (pH)</td>
<td>3 600 (per phase)</td>
<td>11 000 (per phase)</td>
</tr>
<tr>
<td>Capacitor, (C_F) (pF)</td>
<td>2 500</td>
<td>6 000</td>
</tr>
<tr>
<td>Supply Voltage, (V_{in}) (V)</td>
<td>1.8</td>
<td>2.8</td>
</tr>
<tr>
<td>Output Voltage, (V_{out}) (V)</td>
<td>0.9</td>
<td>1.5 ~ 2</td>
</tr>
<tr>
<td>Output Voltage Ripple</td>
<td>&lt; 5%</td>
<td></td>
</tr>
<tr>
<td>Output Current, (I_{out}) (mA)</td>
<td>500</td>
<td>200</td>
</tr>
<tr>
<td>Effective Efficiency, (\eta_{eff}) (%)</td>
<td>72</td>
<td>65</td>
</tr>
</tbody>
</table>

* Layout area was reported in [13].

Among the previously published on-chip DC-DC converters, [14] has the highest reported switching frequency, which is 480MHz but still using on-package inductors. In contrast,
[15] implemented a fully on-chip buck converter in 0.18µm SiGe RF BiCMOS technology that was 65% efficient. It also used an area of 27mm\(^2\) to fit the large passive components. The buck converter in this work achieves a much higher effective efficiency using only 1/100\(^{th}\) of the area.

### 3.2 Low-Swing Buck Converter

#### 3.2.1 Introduction

A high switching frequency is the key design parameter that enables the full integration of active and passive devices of a switching converter. At these high frequencies, the energy dissipated in the power MOSFETs and gate drivers are a good part of the total losses of a DC-DC converter. Although the integrated clock driver/converter circuit presented in Section 3.1 recycles the energy stored in the main clock capacitor, it does not attempt to save energy used in the “front-end” driver chain. In this section, the energy conscious techniques of reduce, reuse and recycle are applied to the front-end driver chain.

In this design two separate chains of inverters are used to drive each of the power transistors in a buck converter circuit. A switching frequency near 1GHz\(^1\) results in a reduction in the filter inductor and capacitor area which allows full integration of these power supplies. To compensate for the switching power loss under high-frequency operation, low-swing drivers and supply stacking techniques are used together with charge recycling of the PMOS drive chain to improve conversion efficiency [37].

---

\(^1\) This design was implemented in older 0.18µm CMOS technology for reasons of cost and fabrication schedule. All other implementations in this thesis were designed in newer 90nm CMOS technology.
3.2.2 Circuit Design

The circuit diagram of a CMOS-based buck converter is shown in Figure 3.20(b). $C_x$ includes all the parasitic capacitances at node $V_{inv}$ including $M_p$ and $M_n$ drain to ground capacitances. When both $M_p$ and $M_n$ are off, a positive inductor current will remove charge from $C_x$, reducing $V_{inv}$, while a negative inductor current will charge $C_x$, increasing $V_{inv}$. When $V_{inv} = 0$, the $M_n$ transistor is turned on, while when $V_{inv} = V_{DD}$, the $M_p$ transistor is turned on. In this way, ZVS operation is achieved for both $M_n$ and $M_p$ transistors by independently driving their gates. In Figure 3.20(c), the two time periods when both transistors are off are characterized as $T_{delay1}$ and $T_{delay2}$, corresponding to the delay-time needed to implement ZVS operation for the $M_n$ and $M_p$ transistors, respectively. There are four intervals of operation:

- **Interval 1** (time 0 to $D \times T_{sw}$). $M_p$ is on. During this time, the inductor current increases linearly since the voltage across it is constant. At the end of this interval, $M_p$ is turned off in accordance with the required converter output voltage (the duty cycle).

- **Interval 2** (time $D \times T_{sw}$ to $D \times T_{sw} + T_{delay1}$). Both $M_p$ and $M_n$ are off. The charge that is stored in the parasitic capacitance $C_x$ is moved to the output circuit through the inductor, as the inductor current can not be disrupted abruptly. This results in rapid drop of $V_{inv}$. In this short period of time, the inductor current can be assumed to be constant, as shown.

- **Interval 3** (time $D \times T_{sw} + T_{delay1}$ to $T_{sw} - T_{delay2}$) starts when the voltage across $M_n$ is close to zero. At this time the $M_n$ is turned on under ZVS to provide a low-resistance path for the inductor current. As there is no energy supplied to the system and voltage across the inductor is constant, inductor current decreases linearly and by design reaches some negative value. At this point of time, $M_n$ is turned off.

- **Interval 4** (time $T_{sw} - T_{delay2}$ to $T_{sw}$). Both $M_p$ and $M_n$ are off. Parasitic capacitance $C_x$ is charged as the inductor current can not be disrupted abruptly. This results in rapid
increase of $V_{inv}$. At the end of this interval, $V_{inv}$ is close to $V_{DD}$ and $M_p$ is ready to be turned on under ZVS.

(a) A typical buck converter

(b) Simplified circuit diagram of the low-swing buck converter

(c) Idealized timing diagram

Figure 3.20. Low-swing buck converter

### 3.2.3 Complete Circuit

In this design, two separate inverter chains are used to drive each of the power transistors of the buck converter circuit as shown in Figure 3.21. The tapered inverter chains are voltage-stacked to use the same $V_{DD}$ supply, similar to [27]. As a result, the inverter chains each have a lower supply voltage, resulting in low-swing operation to save gate and driver power.
The size of transistor $M_p$ is set to be three times the size of transistor $M_n$ for symmetrical behavior. The chain to drive $M_p$ is similarly three times larger than the bottom chain, which is optimized to drive $M_n$. Since the PMOS chain is larger, charge accumulates in the middle capacitor $C_m$ which should operate near $V_{DD}/2$. In [27], the excess charge is dissipated to $V_{ss}$ through an additional regulator forcing node $V_m$ to $V_{DD}/2$. Here, the extra charge is delivered to the converter output circuit to increase efficiency. This task is performed by two series diode-connected NMOS transistors, $D_1$ and $D_2$. These diodes automatically deliver charge to load when $V_{inv} < (V_m - 2V_t)$ without a need for additional gating signals. Two diodes in series are needed to act as a voltage regulator for $V_m$ when $M_n$ is ON and $V_{inv}$ is low. The goal is to keep $V_m$ near $V_{DD}/2$. Hence, accumulated charge at $C_m$ is removed through the diodes by inductor $L_f$ instead of an external regulator. The voltage divider $R_1$ and $R_2$ puts $V_m$ near $V_{DD}/2$ at startup and does not significantly contribute to operational power.
Charge recycling occurs during intervals 2 and 4 when both \( M_p \) and \( M_n \) are off and \( V_{\text{inv}} \) is in transition. In particular, when \( V_{\text{inv}} \) is rising there is significant charge stored on the gate of \( M_p \) that is discharged through the upper driver to the \( C_m \) node at the same time that current is drawn from this node into \( C_x \). When \( V_{\text{inv}} \) is falling, any additional surplus charge from the PMOS drivers can also be delivered to \( C_x \).

In this design, the reduce, reuse and recycle design technique has been employed as follows [38]:

- **Reduce:** The wide NMOS and PMOS output transistors have large input gate capacitance, requiring them to be driven by a chain of tapered inverters referred to here as the front-end drive chain. Separate drive chains are required to allow precise control of the NMOS and PMOS turn-on and turn-off times to achieve ZVS. Despite ZVS, which reduces energy waste in the final NMOS/PMOS pair, significant losses are associated with operating the two drive chains and the gates of the output transistors at high switching frequencies. To reduce the energy lost at every transition, each drive chain employs low-swing signaling by swinging only half-rail, between 0 and \( V_{\text{DD}}/2 \) or between \( V_{\text{DD}}/2 \) and \( V_{\text{DD}} \) for NMOS and PMOS, respectively. This saves a significant amount of energy compared to full-rail switching. However, the outputs of the low-swing drive chains must turn on their respective NMOS and PMOS output transistors, so it is essential that \( V_{DD}/2 > V_{t,NMOS} \) and \( V_{DD}/2 > |V_{t,PMOS}| \). To increase overdrive, it is recommended that low-\( V_t \) devices be used for the NMOS and PMOS output transistors as well as the rest of the drive chain.

- **Reuse:** A half-rail swing for both drive chains offers a further advantage: the NMOS and PMOS chain can share the common reference voltage of \( V_{DD}/2 \). This allows energy reuse in the form of voltage supply stacking as shown in Figure 3.21. Charge used by the upper
PMOS drive chain still has unused potential, so it can be reused by the lower NMOS drive chain. A more general case of supply stacking is called charge recycling in [28].

- \textit{Recycle}: The PMOS output transistor $M_p$ in Figure 3.21 is three times wider than NMOS output transistor $M_n$. As a result, the drive chain of the PMOS (top inverter chain) is much larger and requires much more charge to operate than the drive chain of the NMOS (bottom inverter chain). Charge accumulates at node $V_m$, which is stored in the middle capacitor $C_m$. The excess charge is \textit{recycled} by delivering it to the converter output load through the two series diode-connected NMOS transistors, $D_1$ and $D_2$.

In this design, weak negative feedback helps keep $V_m$ near a stable operating point of $V_{DD}/2$. If $V_m$ increases, the bottom chain receives a higher supply voltage, which increases its power intake and causes $V_m$ to drop. At the same time, $M_n$ turns on with a higher $V_{gs}$ and $V_{inv}$ is pulled closer to $V_{SS}$, giving $D_1$ and $D_2$ higher $V_{gs}$, facilitating charge removal from $C_m$. Similarly, if $V_m$ decreases, the top chain receives a higher supply voltage, which results in increasing its power intake and causing $V_m$ to increase. Also, a lower $V_m$ causes $D_1$ and $D_2$ to receive lower $V_{gs}$, facilitating accumulation of charge in $C_m$.

Capacitance $C_m$ was chosen to be 20 times larger than the NMOS $C_{gate}$ to limit ripple at $V_m$. $L_F$ and $C_F$ values were chosen to be 4.38nH and 1.1nF, respectively, to operate at a switching frequency of 660MHz with a voltage ripple of less than 5% at 50mA load.

\subsection*{3.2.4 Simulation}

Three variants of the circuit were simulated: (i) baseline converter using full-swing drivers; (ii) low-swing/stacked drive chain is added to reduce and reuse energy; and (iii) diodes and $C_m$ are added to recycle energy, similar to the prototype. Here, changes to the original baseline converter are done in two stages to be able to study the effect of each modification. Using low-$V_t$
transistors would have facilitated the operation of the supply-stacked low-swing transistors. Due to the lack of low-$V_t$ transistors in the available 0.18µm CMOS kit, simulations of these designs are done at 2.2V instead of the typical 1.8V for this technology.

Simulation results for a fixed load current of 50mA are shown in Figure 3.22. As expected, the circuit with all the options has the highest efficiency. Indeed, the efficiencies show improvement with each additional change. For example, at a 40% duty cycle, the efficiency of the circuits are (i) baseline 22%, (ii) low-swing 30%, and (iii) energy recycling diodes 35%. Thus the efficiency improves from 22% to 35% with the reduce, reuse and recycle methodology. Figure 3.22(a) also shows that while circuits (ii) and (iii) are more efficient than (i), they have lower $V_{out}$ at the same duty cycle.
Figure 3.22. Simulation results for each variant of the circuit

(a) Output voltage vs. duty cycle

(b) Efficiency vs. duty cycle
3.2.5 Chip Implementation

The chip was fabricated in 0.18µm CMOS. Node $V_{m}$, the middle voltage that should remain at $V_{DD}/2$ for supply stacking, is made available off-chip to be externally probed or adjusted if necessary. Input resistors $R_3$ and $R_4$ in Figure 3.21 are 50Ω terminators so $V_{pmos-in}$ and $V_{nmos-in}$ can be driven by external signal generators at the high frequency of 660MHz.

To keep things simple due to fabrication deadlines, this design does not automatically delay signals to achieve ZVS. Instead, the implementation relies upon the test equipment to generate input signals $V_{pmos-in}$ and $V_{nmos-in}$ with the appropriate timing. Although it is difficult to employ ZVS at a high frequency it has been successfully implemented in the other designs of this thesis.

The NMOS transistors in the top inverter chain for $M_p$ need to have zero body voltage with respect to their sources, so they are isolated from the p-substrate using n-well and deep n-well implantation as described in [39] and shown in Figure 3.23. The same procedure is used for $D_1$ and $D_2$, where the body should be connected to the drain to reverse bias the intrinsic body diode.

![Figure 3.23. Deep n-well implementation cross sectional view](image)

63
The chip micrograph is shown in Figure 3.24. The chip is laid out for on-chip probing. Here, the inductor $L_F$ design is two turns of simple concentric coils implemented in the top four metal layers of the chip. The tracks include shorts along their length to reduce series resistance. The ground shield (PGS) is implemented using the lowest of the six available metal layers. The current density is 0.122mA/µm². The value of inductance was extracted using ASITIC [40]. The inductance extracted was found to be 4.38nH, at 660MHz, with lumped pi model capacitances of 6.5pF and a quality factor of 10 at a resonant frequency around 1GHz. A DC series resistance of 0.7Ω was also extracted. The integrated capacitor $C_F$ is implemented using gate capacitance of an array of NMOS transistors. The 3.4mm² total die area uses 2.5mm² for the converter. Even at 660MHz, the inductor dominates the area which occupies 1.8mm². Designed for an output current of 50mA at 1V, the power converter achieves a power-to-area ratio of 50/2.5=20mW/mm².

![Chip micrograph](image)

Figure 3.24. Chip micrograph.

There are a few limitations with the implemented prototype. First, $M_p$ and $M_n$ and the drive chains should all be implemented with low-$V_t$ transistors. Using them would help the drivers fully turn on with the low-voltage supply, thereby reducing power consumption in the drive chains and improving power delivery to the output load. However, these were not available in the CMOS process that was used. Instead, regular transistors were used, resulting in degraded efficiency in both simulation results and the manufactured prototype. Using an ad hoc method of
simulating low-$V_t$ transistors, conversion efficiency at a 40% duty cycle is improved to 46% (up from 35%).

Second, power is lost due to the voltage drop across diodes $D_1$ and $D_2$. The diodes were used to keep it simple for proof-of-concept, but a more complex circuit could be devised. Nonetheless, it is clear from the simulations that the concept is working and a significant improvement in efficiency is gained by the use of the driver energy recycling. Although there is a drop in $V_{out}$ after switching to low-swing drivers, Figure 3.22(a) clearly shows that the addition of the energy recycling diodes is able to improve energy conversion to the point where the $V_{out}$ is nearly restored to the same level obtained with the original full-swing drivers. The restoration in the voltage conversion ratio (Figure 3.22(a)) also implies that the rising edge of $V_{inv}$ is sped up. Speeding it up by means of an increased reverse inductor current would be detrimental to the conversion efficiency because of discharging $C_F$ and it would increase the losses with a higher ripple current.

Third, the ZVS timing delays were controlled by the signal generator, but a proper circuit needs to be added to control these delays itself. This was not implemented to keep the design simple.

### 3.2.6 Chip Measurements

Testing of this chip was done at 2.2V like the simulations. Conversion efficiency and output voltage measurements are presented in Figure 3.25. The physical measurements required the use of an external supply of 1.1V connected to $V_m$ because it was higher than the expected voltage of $V_{DD}/2$. However, measurements show that this supply voltage was not delivering any power to the circuit as it was always sinking current to reduce $V_m$. The output is adjustable between 0.75V
to 1V by varying duty cycle $D$ from 45 to 64% with a fixed $R_{\text{load}} = 18.3\,\Omega$. Conversion efficiency, $P_{\text{out}}/P_{\text{in}}$, ranges 25 to 31%.

The use of the external source voltage sink indicates that the simulation of the gate driver inverters is not as accurate as required when using standard transistors in the supply-stacked manner.

The efficiency of the prototype could be improved in a few ways. Using low-$V_t$ transistors would help the drivers fully turn on with the low-swing voltage supply, thereby reducing power consumption in the drive chains. Power is also lost due to the voltage drop across diodes $D_1$ and $D_2$. The diodes keep it simple, but a more complex circuit could be devised. For example [41] mimics the behavior of a diode using a transistor, where the gate is driven by a voltage comparator sensing $V_{DS}$. However, gating circuitry used here must operate much more quickly, on the order of tens of picoseconds [41].
(a) Output voltage vs. duty cycle

(b) Raw efficiency vs. duty cycle

Figure 3.25. Measured prototype performance
3.2.7 Summary

The low-swing buck converter design presented here demonstrates the operation of a 660MHz converter implemented in a 0.18μm process, including on-chip passives. The measured efficiency obtained is promising for such a prototype and for such a high switching frequency [37]. However, the important result is that energy recycling is shown to be a feasible way to reduce energy loss in the front-end drive chain and to boost overall conversion efficiency.

The chip area consumed by the converter is dominated by the inductance even at 660MHz. However, the inductor was designed for a current of 50mA and this represents a power to area ratio of 50mW/2.5mm$^2$. By combining the techniques in this chip with clock energy recycling introduced in the integrated clock driver/power converter circuits, it should be possible to boost the raw efficiency above 50%.
4 INTEGRATED BOOST AND BUCK-BOOST CONVERTERS

In this chapter, two more integrated clock driver/power converter designs are discussed that operate at 3GHz. First, a boost converter configuration is used to provide higher output voltage levels than buck converters. Second, a buck-boost converter is used to generate a negative supply voltage, which may be useful for analog circuits.

Similar to the previous designs introduced here, high-speed switching losses are reduced by employing zero voltage switching and by directly integrating the clock-tree drivers with the converter power-transistor drivers. Also, the designs are implemented in open-loop, with the goal of having less than 5% ripple on $V_{out}$. The techniques proposed in these designs are valid for finer feature size CMOS technologies as well.

4.1 Integrated Clock Driver/Boost Converter

4.1.1 Introduction

Compared to discrete designs, on-chip converters have relatively higher static power losses. Also, clocks always require a minimum low time. As a result, a buck converter won't be able to practically provide an output voltage that is close to $V_{DD}$. To remedy this, a boost converter configuration is investigated here that provides higher output voltage levels [42].
4.1.2 Circuit Design

In the typical boost converter of Figure 4.1(a), when the switch is on, voltage \( V_{in} \) will be across the inductor \( L_F \) and current will build up in the inductor. In the next phase, when the switch is off, inductor current finds its way through the diode and charges the output capacitor to \( V_{out} = V_{Lf} + V_{in} \). The diode plays an important role as it will automatically turn off to prevent shorting \( V_{out} \) to ground when the switch turns on. One challenge comes from the fact that a low-loss power diode is not available in CMOS technology.\(^2\)

The integrated clock driver/boost converter circuit shown in Figure 4.1(b) uses a switched-capacitor voltage-shifter circuit to generate a shifted gating signal for the PMOS transistor used in place of a power diode. Similar to Figure 3.2(b), a chain of inverters is used to drive \( C_{clk} \) and ZVS needs to be employed to recover the energy stored in the capacitor. In addition to providing output voltage levels higher than \( V_{DD} \), the circuit also produces a buffered version of the clock, \( V_{clk\_scaled} \), at the same magnitude as \( V_{out} \). This clock signal can be used in the circuitry powered by the converter, but allowances for clock skew and level-conversion will need to be made in the data path logic.

Ignoring turn on/off times of the transistors, there are two intervals of operation as shown in Figure 4.1(c):

- Interval 1: At the beginning of this interval, \( V_{clk} \) goes high and \( M_n \) turns on. Consequently, voltage \( V_{DD} \) will be across the inductor \( L_F \) and the inductor current increases linearly (assuming a constant voltage across the inductor). At the same time, the voltage of capacitor \( C_{shift} \) will be added to \( V_{clk} \) so that \( V_{shift} \) reaches voltage \( V_{max} \), a higher voltage than \( V_{out} \). The diodes \( D_{shift} \) are reverse biased. As \( C_{shift} \) is pre-charged to \( V_{out} - 2V_{diode\_drop} \) in the previous interval 2, \( V_{gs} \) of \( M_p \) would be equal to: \( V_{max} - V_{out} = (V_{DD} + \)

---

\(^2\) Diodes consisting of a simple p-n junction can be built in CMOS, but the associated voltage drop in modern CMOS is large relative to \( V_{in}, V_{out} \) and \( V_{DD} \).
\[(V_{\text{out}} - 2V_{\text{diode\_drop}}) - V_{\text{out}} = V_{DD} - 2V_{\text{diode\_drop}}\] which has a positive value and \(M_p\) turns off completely.

- Interval 2: As a new \(V_{\text{clk}}\) half-cycle starts, \(M_p\) turns on and \(M_n\) turns off. Capacitor \(C_{\text{shift}}\) will be charged through diodes \(D_{\text{shift}}\) to a value of \(V_{\text{out}} - 2V_{\text{diode\_drop}}\). As the diodes are forward biased, \(V_{gs}\) of \(M_p\) becomes equal to \(-2V_{\text{diode\_drop}}\) which has a negative value larger than the threshold voltage of \(M_p\), turning it on completely. At this time, inductor current finds its way through \(M_p\) and will charge up the output capacitor \(C_F\).

(a) A typical boost converter

(b) Simplified circuit diagram of the integrated clock driver/boost converter

(c) Idealized timing diagram

Figure 4.1. Integrated clock driver/boost converter
In the above discussion, the average voltage of $V_{\text{clk}}$ is $D \times V_{DD}$, where $D$ is the duty cycle. This is the operating voltage available to the boost converter (and not $V_{DD}$). Ideally, the output voltage would be $V_{\text{out}} = \frac{1}{1-D} \times V_{\text{clk}} = \frac{D}{1-D} \times V_{DD}$. With $D > 50\%$, the output voltage will be higher than $V_{DD}$. Voltages higher than $V_{DD}$ could be used for 1) high-voltage I/O circuits, 2) gating signal of NMOS pass transistors such as those used in sampling circuits, 3) providing PMOS transistors with body bias voltages higher than $V_{DD}$ which is used to dynamically change the threshold voltage to achieve speed and power scaling and 4) speeding up the operation of some parts of the circuit by increasing $V_{DD}$.

**4.1.3 Complete Circuit**

The complete circuit diagram of the integrated clock driver/boost converter circuit is shown in Figure 4.2. $M_{n2}$ and $M_{p2}$ introduce the turn-on delay for $M_{n1}$ as in the integrated clock driver/buck converter from Chapter 3. The drain node of $M_{n3}$ and $M_{p3}$, denoted as $V_{\text{clk\_scaled}}$, swings from zero to $V_{\text{out}}$. Here, the value of the scaled clock capacitor is selected to be 2.2pF. Some of the recovered energy is subsequently lost when this capacitor is discharged, so it should be kept small. To keep the output ripple on $V_{\text{out}} < 5\%$, a large capacitance $C_F$ is needed for bulk energy storage.

In Figure 4.2, the gating signal for $M_{n3}$ changes from $V_{DD}$ to zero. However as the source of $M_{p3}$ is connected to $V_{\text{out}}$, the appropriate gating signal for $M_{p3}$ should instead change from $V_{\text{out}}$ to $V_{\text{out}} - V_{DD}$, therefore a voltage shift greater than or equal to $V_{\text{out}} - V_{DD}$ is needed. The combination of diodes $D_{\text{shift}}$, capacitor $C_{\text{shift}}$, and transistors $M_{n1}$ and $M_{p1}$ perform as a switched-capacitor voltage shifter. In interval 2, the top plate of $C_{\text{shift}}$ is connected to $V_{\text{out}}$ through $D_{\text{shift}}$ diodes and the bottom plate is connected to the ground through $M_{n1}$. The top plate of $C_{\text{shift}}$ is connected to the gate of $M_{p3}$ and turns it on due to a gating voltage of $V_{\text{out}} - 2V_{\text{diode\_drop}}$. In
interval 1, the bottom plate is switched to \( V_{DD} \) through \( M_{p1} \). The capacitor \( C_{shift} \) retains its charge since the diodes \( D_{shift} \) are reverse biased, so the top plate of \( C_{shift} \) jumps up by \( V_{DD} \) to \( V_{out} - 2V_{diode\_drop} + V_{DD} \). However, since \( 2V_{diode\_drop} \) is smaller than \( V_{DD} \), transistor \( M_{p3} \) receives an acceptable gating signal and turns off.

![Circuit diagram of the integrated clock driver/boost converter](image)

**Figure 4.2.** Circuit diagram of the integrated clock driver/boost converter

Except for \( D_{shift} \) and \( C_{shift} \), all transistor body terminals are connected to their source pins. The body terminals of \( D_{shift} \) and \( C_{shift} \) are connected to ground instead. This prevents forward biasing of the body-drain intrinsic diode, in case the drain voltage goes lower than the source voltage. Also, this makes the layout implementation easier as well, since no deep n-well structure is required. Finally, a 1k\( \Omega \) resistor is added in parallel to \( C_{shift} \) to bias the \( D_{shift} \) diodes and provide a DC current path to avoid floating nodes when the \( D_{shift} \) is off.
4.1.4 Simulation

Figure 4.3 shows the output voltage and the effective efficiency of the boost converter at different duty cycles and output currents. As $D$ is increased, the output voltage increases and the effective efficiency decreases. By varying the duty cycle, the highest effective efficiency changes to a different output current level. A maximum effective efficiency of 111% is achieved at $D = 40\%$ with $I_{out} = 30\text{mA}$. At $I_{out} = 50\text{mA}$, by varying the $D$ from 40\% to 80\%, $V_{out}$ changes from 0.75V to 1.73V. The corresponding effective efficiency ranges from 98\% down to 24\%. For the reference circuit consisting of a clock driver only, simulations determined its power consumption, $P_{in2}$, was 100mW.

Compared to the integrated clock driver/buck converter circuit, $P_{in2}$ is higher here because a larger $C_{clk}$ has been selected (25pF vs. 12pF). Also, all the transistors are low-$V_t$ type to facilitate operation at lower $V_{DD}$ levels.
Figure 4.3. Simulation results of the integrated clock driver/boost converter
4.1.5 Chip Implementation

The micrograph of the clock driver/boost converter chip is shown in Figure 4.4. The area of the integrated clock driver/boost converter including $L_F$ is 0.26mm$^2$ and the area of the reference clock driver is 0.03mm$^2$. The inductor alone is 0.1mm$^2$. The total die area of 2mm$^2$ is shared with two other designs in this work (the integrated clock driver/buck-boost converter later in this chapter and the low-power clock driver circuit in Chapter 5).

![Figure 4.4. Chip micrograph of the integrated clock driver/boost converter](image)

4.1.6 Chip Measurements

Unfortunately, this circuit was not functional due to a number of suspected problems. Higher peak-to-peak current levels compared to the buck design might have been a reason. Since the inductor current in this design is much higher, the resistive voltage drop across the inductor and current paths may be significant. Although it used wider/thicker paths than the buck design, using even more metal is suggested for future layouts.

Also this circuit shares the die with two other designs, a buck-boost converter and a low-power clock driver. There are some elements of the circuit that were also used in the buck-boost
design which also didn’t work, such as the voltage shifter circuit. This leads to the conclusion that the present voltage shifter design might be very sensitive to fabrication variation, and/or necessary layout masks have not been used, specifically for the 1kΩ resistor. If the voltage shifter circuit is faulty, there won’t be enough gate voltage $V_{shift}$ to turn off the $M_{p3}$ transistor, thus it stays on and drains the output capacitor $C_F$. This agrees with chip measurement which show it providing an output voltage of a few hundreds of millivolts, indicating that the output may be shorted within the chip. Inspection of the voltage shifter circuit is suggested for future layouts.

4.1.7 Summary

The idea of energy recovery from a high-speed clock load in high-speed digital circuits was investigated by exploring the integration of the boost converter topology with a high-speed clock driver [42]. While simulation shows promising results of effective efficiency above 100%, chip measurement results are unable to confirm this due to non-functional fabricated chips.

4.2 Integrated Clock Driver/Buck-Boost Converter

4.2.1 Introduction

Another basic switching converter investigated here is a buck-boost converter which has a negative output voltage with respect to the common terminal of the input voltage [42].
4.2.2 Circuit Design

In the typical buck-boost converter of Figure 4.5(a), when the switch is on, voltage $V_{in}$ will be across the inductor $L_F$ and current will build up in the inductor. In the next phase, when the switch is off, inductor current finds its way through the diode and charges the output capacitor to $V_{out} = V_{L_f}$ which has a negative value. Here, the diode prevents shorting $V_{out}$ to $V_{DD}$ when the switch is on.

The integrated clock driver/buck-boost converter circuit shown in Figure 4.5(b) uses a switched capacitor voltage shifter circuit to generate a shifted gating signal for the NMOS transistor used in place of the power diode. Similar to Figure 3.2(b), a chain of inverters is used to drive the converter and ZVS needs to be employed. An extra switch $S_{clk}$ is also added between nodes $V_{clk}$ and $V_{inv}$. This switch prevents $V_{clk}$ from becoming negative as $V_{inv}$ goes below zero when $M_n$ is on.

Ignoring turn on/off times of the transistors, there are two intervals of operation as shown in Figure 4.5(c):

- Interval 1: At the beginning of this interval, $\overline{V_{clk}}$ goes to zero and $M_p$ turns on. Switch $S_{clk}$ is closed and $C_{clk}$ is charged up. Consequently, voltage $V_{DD}$ will be across the inductor $L_F$ and current in the inductor increases linearly assuming a constant voltage across the inductor. At the same time, voltage of the capacitor $C_{shift}$ from the previous interval 2 will be added to $\overline{V_{clk}}$ and $V_{shift}$ reaches a lower value than $V_{out}$ as diodes $D_{shift}$ are reversed biased. Since $C_{shift}$ is pre-charged to $V_{DD} - (V_{out} + 3V_{diode\_drop})$ in the previous interval 2, the $V_{gs}$ of $M_n$ would be equal to: $V_{shift} - V_{out} = (-V_{DD} + (V_{out} + 3V_{diode\_drop})) - V_{out} = -V_{DD} + 3V_{diode\_drop}$ which has a negative value and $M_n$ turns off completely.

- Interval 2: As $\overline{V_{clk}}$ is high, $M_p$ is off and $M_n$ is on. At the same time, capacitor $C_{shift}$ will be charged through diodes $D_{shift}$ to a value of $V_{DD} - (V_{out} + 3V_{diode\_drop})$. Since the diodes
are forward biased, the $V_{gs}$ of $M_n$ is equal to $3V_{\text{diode\_drop}}$, which is a positive value larger than the threshold voltage of $M_n$, thus ensuring it turns on completely. At this time, inductor current finds its way through $M_n$ and will charge up the output capacitor $C_F$ to a negative voltage value. The switch $S_{clk}$ is closed at the beginning of interval 2 to allow the inductor to discharge $C_{clk}$. However, when $V_{inv}$ starts to go negative, the switch is opened to keep $V_{clk}$ at zero.

![Diagram of a typical buck-boost converter](image1)

![Diagram of simplified circuit diagram](image2)

![Diagram of idealized timing diagram](image3)

Figure 4.5. Integrated clock driver/buck-boost converter

In the above discussion, the available input voltage to the converter is $V_{in} = mean(V_{clk}) = D \times V_{DD}$. Hence, the ideal output voltage is calculated by
\[ V_{out} = \frac{-D}{1-D} \times V_{in} = \frac{-D^2}{1-D} \times V_{DD} \] which is negative. A negative output voltage could be used for 1) gating signals of PMOS pass transistors such as those used in sampling circuits, 2) providing NMOS transistors with negative body bias voltages which is used to dynamically change the threshold voltage to achieve speed and power scaling, and 3) negative supply voltage for analog circuits.

### 4.2.3 Complete Circuit

A complete implementation of the integrated clock driver/buck-boost converter is shown in Figure 4.6. Many of the changes are similar in nature to those used to implement the boost circuit, e.g., the addition of \( M_{p3} \) and \( M_{n3} \) to delay the energy-wasting discharge of \( C_{clk} \).

In Figure 4.6, the gating signal for \( M_{p1} \) changes from zero to \( V_{DD} \). However as the source of \( M_{n1} \) is connected to \( V_{out} \), the appropriate gating signal for \( M_{n1} \) should instead change from \( V_{out} \) to \( V_{out} + V_{DD} \), therefore a voltage shift equal to \( V_{out} \) is needed. The combination of diodes \( D_{shift} \), capacitor \( C_{shift} \), and transistors \( M_{n3} \) and \( M_{p3} \) perform as a switched-capacitor voltage shifter. The bottom plate of \( C_{shift} \) is connected to \( V_{out} \) through \( D_{shift} \) diodes and the top plate is connected to \( \bar{V}_{clk} \) through \( M_{p3} \) which is connected to \( V_{DD} \) in interval 2. In interval 1, \( D_{shift} \) diodes are reversed biased and the top plate is switched to ground through \( M_{n3} \). As the capacitor \( C_{shift} \) retains its charge, the bottom plate of \( C_{shift} \) jumps down by \( V_{DD} \). The switched capacitor voltage is \(-V_{DD} + (V_{out} + 3V_{diode\_drop})\) instead of \( V_{out} \). However, since \( 3V_{diode\_drop} \) is smaller than \( V_{DD} \), transistor \( M_{n1} \) still receives an acceptable gating signal to turn off.

There are three implementation decisions in Figure 4.6 that warrant further discussion. First, transistors \( M_{p2} \) and \( M_{n2} \) are added to protect \( M_{p1} \) and \( M_{n1} \) from potentially large voltage drops across them since \( V_{inv} \) switches between \( V_{DD} \) and \( V_{out} \). Connecting the gates of transistors
$M_{p2}$ and $M_{n2}$ to ground will provide for automatic on-off timing and proper operation of the circuit.

Second, transistor $M_{p4}$ acts as the switch to prevent $V_{clk}$ from going negative. The gate of $M_{p4}$ is connected to $V_{bias}$, which is set at the threshold voltage of PMOS transistor $M_{p5}$. When $V_{inv}$ is positive, $M_{p4}$ is on and provides the path for the inductor current to discharge $C_{clk}$. When $V_{inv}$ falls below zero, $M_{p4}$ turns off and nodes $V_{inv}$ and $V_{clk}$ are disengaged. Meanwhile, $M_{n2}$ turns on and provides a path for the inductor current. In this design, $V_{bias}$ is generated by a small DC current passing through the diode-connected PMOS transistor $M_{p5}$. To stabilize the voltage, capacitor $C_{bias}$ is added to the node $V_{bias}$.

Figure 4.6. Circuit diagram of the integrated clock driver/buck-boost converter

Third, the body terminals of all NMOS transistors need to be connected to their source node or the most negative voltage in the system to prevent forward biasing of body-source
intrinsic diodes. For transistors $M_{n1}$ and $M_{n2}$, the body is connected to the (non-ground) source node, so these transistors need to be isolated inside a deep n-well structure for layout. The body terminals of all other transistors are also connected to their sources. For layout implementation of $C_F$ and $C_{bias}$, PMOS transistors are used. If NMOS transistors were used, since $V_{out}$ and $V_{bias}$ are both negative, the gate and source nodes should have been connected to the ground and a negative voltage, respectively, to have a positive $V_{gs}$. Therefore, a deep n-well would be needed in order to be able to connect body to their source.

### 4.2.4 Simulation

Figure 4.7 shows the output voltage and effective efficiency of the integrated buck-boost converter circuit. Here, maximum effective efficiency of 66% is achieved at $D = 20\%$ with $I_{out} = 50\, mA$. In this case, at 50mA output current, the output voltage changes from $-0.5\, V$ to $-1.43\, V$ when varying the duty cycle from 20% to 60%. The corresponding effective efficiency ranges from 66% down to 35%. Simulated $P_{in2}$ was 100mW. The lower efficiency compared to the previous circuits is a result of more transistors in the main current path. Also, all the transistors are low-$V_t$ type to facilitate operation at lower $V_{DD}$ levels.

In these circuits, the effective efficiency can only exceed 100% when clock energy is being recycled, since it is not counted as the input power by the effective efficiency metric. In this buck-boost design, the effective efficiency does not exceed 100%, so it does not offer any proof that clock energy is being recycled. However, looking at Figure 4.6 reveals that during recycling time, there is no path from $C_{clk}$ to ground except through $L_F$. This means charge in $C_{clk}$ is being recycled to current in $L_F$ while $M_{n2}$ is off. During this recycling time, it should be noted that $M_{n2}$ is off because $V_{inv} \geq -V_{t_{nmos}}$, so it is not conducting.
Figure 4.7. Simulation results of the integrated clock driver/buck-boost converter
4.2.5 Chip Implementation

The micrograph of the clock driver/buck-boost converter chip is shown in Figure 4.8. The area of the integrated clock/converter including $L_F$ is 0.2mm$^2$. The inductor alone is 0.1mm$^2$. Although it is more complex, it is smaller than the boost design because more effort was put into its layout design. This design shares the same die as the integrated clock driver/boost converter presented earlier in this chapter and the low-power clock driver design presented in Chapter 5.

![Chip micrograph of the integrated clock driver/buck-boost converter](image)

Figure 4.8. Chip micrograph of the integrated clock driver/buck-boost converter

4.2.6 Chip Measurements

Unfortunately, this circuit was not functional due to a number of suspected problems similar in nature to the boost design presented earlier. The chip can provide a negative output voltage of a few hundreds of millivolts, leading to similar conclusions as the boost design.

4.2.7 Summary

The idea of energy recovery from a high-speed clock load in high-speed digital circuits was investigated by exploring the integration of the buck-boost converter topology with a high-speed
clock driver [42]. While the simulation results are promising, test results are not available due to non-functional fabricated chips.

4.3 Conclusions

The two designs presented in this chapter work in simulation, but there appear to be related layout issues that prevent the fabricated prototype from operating correctly. This highlights some of the difficulty of designing these new types of circuits. It is essential to fabricate prototypes and test them due to difficulties with modeling and simulating these high power circuits with magnetic fields, heat, and other practical issues.

Although the two designs presented in this chapter did not result in a functional prototype, they inspired the design of a third circuit which is presented in the next chapter. It borrows from the integrated clock driver/boost converter to produce a low-power clock driver. This third prototype circuit did operate correctly and results in 35% lower power in the clock drivers.
5 LOW-POWER CLOCK DRIVER

5.1 Introduction

In this chapter, a low-power clock driver is designed to return the energy stored in the clock capacitance back to the power grid [43]. This way, instead of producing a secondary regulated output voltage like all of the other circuits in this thesis, the energy needed to operate the clock driver itself is effectively reduced. The circuit configuration of this low-power clock driver resembles a boost converter or full-bridge DC-DC converter [12].

5.2 Circuit Design

A simplified schematic of the proposed low-power clock driver circuit is shown in Figure 5.1(b). This circuit incorporates an inductor at the clock node, but unlike resonant clocking schemes, the inductor appears in the driver side not the load side. \( C_{clk} \) and \( C_{int} \) are the sum of wiring and transistor capacitances that are connected to nodes \( V_{clk} \) and \( V_{int} \), respectively. Assuming a fan-out of four as the inverter taper factor, \( C_{int} \) is one-fourth of \( C_{clk} \).

In the discharging phase of \( C_{clk} \), the energy stored in the capacitor is transferred to the inductor instead of being discharged to ground. Some of this inductor energy is returned to the power grid through \( M_{p2} \), effectively reducing power consumption of the clock driver.
Figure 5.1. Low-power clock driver

The circuit in Figure 5.1(b) resembles a full-bridge DC-DC converter in which \( M_{p1}, M_{n1}, M_{p2} \) and \( M_{n2} \) are the bridge switches, and \( C_{int}, C_{clk} \) and \( L_F \) are the bridge load. \( C_F \) represents the intrinsic power-grid capacitance and the on-chip decoupling capacitances commonly added to digital designs.

The input to the generic full-bridge converter shown in Figure 5.1(a) is a fixed DC voltage but the DC magnitude and polarity of the bridge load voltage (\( V_{clk} - V_{int} \)) can be adjusted by pulse-width modulating the gating signals. Switches \( (M_{p2}, M_{n1}) \) and \( (M_{n2}, M_{p1}) \) are treated as two pairs. Because of the inductive load, depending on the direction of the load voltage and
current, the load may consume or return power. The load current does not become discontinuous but the input current to the bridge can change its direction, so it is important that the source has low internal impedance. A bigger $C_F$ would better facilitate this requirement.

If the bridge stays in a particular state long enough, the energy stored in the inductor would be large enough to be used for charging/discharging the load capacitors. In practice, non-ideality of $M_{n2}$ and $M_{n1}$ results in their slow turn-on, providing the time needed for the inductor current to discharge $C_{int}$ and $C_{clk}$. Similarly, non-ideality of $M_{p2}$ and $M_{p1}$ gives the inductor time to charge those capacitors.

In the simplified design of Figure 5.1(b), the CMOS inverter propagation delay (from $V_{int}$ to $V_{clk}$) helps provide more time for the inductor to charge/discharge capacitor $C_{clk}$. This is observed, for example, after $M_{p2}$ turns on and raises $V_{int}$ with the assistance of the inductor before $V_{clk}$ falls due to the turn-on of $M_{n1}$. The complete circuit, which will be discussed in detail later, utilizes zero-voltage switching (ZVS) to provide an even longer delay that is dynamically adjusted.

Operation of the circuit in Figure 5.1(b) can be explained using the idealized timing diagram shown in Figure 5.1(c). There are eight intervals:

- **Interval 1:** $M_{p1}$ and $M_{n2}$ are on. $C_{clk}$ is already charged up and $V_{clk}$ is high. Inductor current is positive and is increasing linearly.

- **Interval 2:** $M_{n2}$ is turned off and $M_{p2}$ is turned on. $V_{int}$ increases.

- **Interval 3:** $M_{n1}$ is turned on and $M_{p1}$ is turned off. $V_{clk}$ decreases. For a short time the inductor current continues to rise. When $M_{p1}$ is off, the inductor takes energy from $C_{clk}$ rather than $V_{DD}$ and helps $V_{clk}$ to fall rapidly. The inductor will first transfer energy to $C_{int}$, helping $M_{p2}$ to increase $V_{int}$ quickly, and then transfer energy to the on-chip power grid through $M_{p2}$. Inductor current peaks when $V_{int} = V_{clk}$, i.e., when the voltage across $L_F$ is
zero. The inductor current starts to decrease. $V_{\text{clk}}$ and $V_{\text{int}}$ reach low and high values, respectively.

- **Interval 4:** $M_{p2}$ and $M_{n1}$ are on. $C_{\text{clk}}$ is already discharged and $V_{\text{clk}}$ is low. Inductor current is positive and is decreasing linearly.

- **Intervals 1′–4′:** With the direction of the inductor current reversed, intervals 1–4 repeat in the opposite sense to help charge capacitor $C_{\text{clk}}$ from the stored energy in $C_{\text{int}}$ and $L_F$. When $C_{\text{int}}$ is discharged, $M_{n2}$ keeps $V_{\text{int}}$ at zero, providing the current path for $L_F$ to charge up $C_{\text{clk}}$.

In the above discussion, whenever the absolute value of the inductor current is decreasing, the energy stored in the inductor is being delivered to another element of the circuit. Here, the destination of the charge can be $C_F$, $C_{\text{clk}}$, or $C_{\text{int}}$. Energy recycling occurs when $C_{\text{clk}}$ charge is returned to the power grid via the inductor during interval 3. The inductor also reduces the amount of energy consumed by helping to precharge $C_{\text{clk}}$ from the energy stored in itself and $C_{\text{int}}$ during interval 3′. However, as $C_{\text{int}}$ is smaller than $C_{\text{clk}}$, there is no opportunity to return energy to the power grid in this interval. Additional energy recycling occurs when $L_F$ magnetic energy is returned to the power grid during intervals 4 and 4′.

### 5.3 Complete Circuit

Ideally, all of the energy stored in $C_{\text{clk}}$ should be recovered (by moving it to $C_{\text{int}}$ and/or $C_F$) rather than being wasted by discharging $C_{\text{clk}}$ into the ground. Thus, to maximize the energy savings, the turn-on of $M_{n1}$ should be delayed. This is shown in Figure 5.2 with the addition of transistors $M_{n3}$ and $M_{p3}$. Furthermore, $M_{n3}$ and $M_{p3}$ also delay the turn-on of $M_{p1}$, allowing $C_{\text{clk}}$ to be precharged by the inductor. This achieves zero-voltage switching in the final drive stage and reduces switching power loss.
The main benefit of implementing ZVS for $M_{n1}$ is that $C_{clk}$ won’t be shorted to ground anymore. During ZVS dead-time, the charge is removed (recovered) by the inductor current and consequently $V_{clk}$ is reduced to zero. After this, $M_{n1}$ is turned on to provide a low-loss path for current and also to keep $V_{clk}$ around zero. If $M_{n1}$ is not turned on, the inductor current would turn on the intrinsic body-drain diode of $M_{n1}$. The resultant voltage drop across this diode, $-V_{diode\_drop}$, would contribute to the overall power consumption of the system. In the charging phase of $C_{clk}$, ZVS for $M_{p1}$ causes $C_{clk}$ to be charged mainly through the inductor $L_F$.

![Circuit Diagram](image)

Figure 5.2. Circuit diagram of the low-power clock driver and the reference clock
5.4 Simulation

The circuit of Figure 5.2, consisting of an inductor and two ZVS transistors, returns part of the \( C_{clk} \) energy back to the power grid thus the power consumption of the clock driver is reduced in a non-resonant fashion. In comparison, clock-resonance schemes such as [10] and [11] reduce energy by resonating \( C_{clk} \) with an inductor, resulting in nearly sinusoidal clock waveforms.

Simulation results of the implemented low-power clock driver operating at 4 GHz are shown in Figure 5.3. As shown in the figure, the proposed technique preserves the sharp edges of the clock in the presence of the inductor. Compared to the reference clock driver implemented in the same process, the slope of the rising clock edge in the new circuit is similar, although the falling slope is slightly slower because ZVS transistors \( M_{n3} \) and \( M_{p3} \) are in the path of charging the \( V_{intn} \) node. Thus, \( M_{n1} \) turns on slightly slower and hence, \( V_{clk} \) has a slower falling edge.

![Figure 5.3. Simulated clock waveforms of Figure 5.1(b) and Figure 5.2](image)

To investigate the effect of ZVS transistors on circuit operation, \( M_{p2} \) and \( M_{n1} \) drain currents are plotted in Figure 5.4 and Figure 5.5, respectively. A positive \( M_{p2} \) drain current...
means that $C_{clk}$ charge is being returned to $V_{DD}$ and a positive $M_{n1}$ drain current means that $C_{clk}$ is being discharged to the ground.

Figure 5.4 shows that there are periods of time that $M_{p2}$ drain current, in both simplified and complete circuit versions, has a positive “area under the curve”, with the complete circuit having a bigger area. Similarly, Figure 5.5 shows that $M_{n1}$ drain current in both simplified and complete versions have a smaller “area under the curve”, with the complete circuit having a smaller area. Simulations show that the “area under the curves” in Figure 5.4 are 1.3, −0.7 and −7.9pA.s for the complete, simplified and reference circuits, respectively. Similarly, the “area under the curves” in Figure 5.5 are 16.2, 19.3 and 24.0pA.s for those circuits. These results that are for the PMOS and the NMOS transistors that are in the $C_{clk}$ discharge path, can help in comparing the three variants of the circuit. The complete circuit has the biggest $M_{p2}$ area, confirming the most recycling to $V_{DD}$, and has the smallest $M_{n1}$ area, confirming the least dissipation of $C_{clk}$ charge to ground.

The low-power clock driver of Figure 5.2 was also simulated at different switching frequencies along with its simplified version from Figure 5.1(b) and the reference clock driver. The simulation results in Figure 5.8 show a trend that power savings is improved as the clock frequency is increased. The simplified circuit does not perform as well as the complete circuit since the ZVS transistors $M_{n3}$ and $M_{p3}$ in Figure 5.2 assist in energy return to the power grid. Also, simulation results at 4 GHz show a percentage power saving equal to $(P_{in2}−P_{in1})/P_{in2} = 37\%$. Here, $P_{in1} = 86$mW and $P_{in2} = 136$mW are the power consumption of the complete and the reference circuits, respectively.
Figure 5.4. Simulated $M_{p2}$ drain current waveforms of Figure 5.1(b) and Figure 5.2

Figure 5.5. Simulated $M_{n1}$ drain current waveforms of Figure 5.1(b) and Figure 5.2
To evaluate the effect of inductor value on power consumption, the complete circuit is simulated with different inductor values by varying a factor $K$ such that $L_F = K \times 310pH$. Figure 5.6 shows the results and suggests an optimum inductor value is needed for different frequency ranges. For example, at $K = 1$, minimum power consumption is achieved over the clock frequency range of 3 to 4GHz. This value of inductance corresponds to the fabricated prototype.

![Figure 5.6. Effect of changing inductor value on power savings in Figure 5.2](image)

### 5.5 Chip Implementation

As a proof of concept, the two circuits in Figure 5.2 have been fabricated in a 1P7M2T 90nm CMOS process using low-$V_t$ transistors to facilitate operation at lower $V_{DD}$ levels. The 310pH inductor is made with a single loop using the four top metal and one extra aluminum (ALUCAP) layers in parallel. The inductor was modeled using ASITIC. A Patterned Ground Shield (PGS) was also placed in between the inductor coil and the substrate.

In the chip, the total capacitance connected to node $V_{clk}$ (shown as $C_{clk}$ in Figure 5.2) is 25pF. Presenting a fanout-of-4 load to the clock driver, the load gate capacitance connected to
node $V_{clk}$ is 21pF which is implemented using gate capacitance of 2016/0.75µm NMOS transistor array. All transistor bodies are connected to their sources, except for $M_{n3}$ whose body is connected to ground. This prevents forward biasing of the body-drain intrinsic diode and avoids the need for using a deep n-well structure.

The chip micrograph is shown in Figure 5.7. The inductor area is 0.1mm$^2$. The low-power clock driver (including the inductor) and the reference circuit occupy 0.15mm$^2$ and 0.03mm$^2$, respectively.

![Chip Micrograph](image)

**Figure 5.7. Chip micrograph**

## 5.6 Chip Measurements

Chip measurement results in Figure 5.8 show energy savings for a clock frequency range of 2.75 to 4GHz. The measurements show increasing power savings as clock frequency increases to 4GHz. At lower frequencies, the inductor current will have more time to build-up, which results in an increased resistive voltage drop across the inductor. Thus the energy savings are reduced. To improve this, a larger inductance is needed as shown in Figure 5.6.
The simulation results show very good agreement with the measured results below 3.5GHz, but begin to deviate at higher frequencies. Measurements above 4GHz were not possible due to limits of our test equipment. At 4GHz, measurements confirm the power consumption is reduced from 117mW (in the reference circuit) to 76mW (in the complete circuit), a net power savings of 35%.

![Figure 5.8. Test and simulation results](image)

The clock waveforms are made available off-chip using open-drain PMOS buffers. At 4GHz, the RMS clock jitter is measured to be 1.25ps and 1.17ps for the complete low-power clock driver and the reference clock driver, respectively. Thus, the added jitter by the inductor and ZVS transistors is negligible.
5.7 Summary

The design introduced here benefits from the charge stored in the clock load capacitance. Thus, the exact location for including the proposed clock driver circuits depends on the configuration of clock distribution network as was discussed in Section 2.5.

In many situations, it is desirable to “stop the clock” to save power by gating the incoming clock signal. With a stopped clock, the inductor would be continuously conducting and dissipate significant static power. To solve this problem, power gating with a header transistor can disconnect the power supply from the driver, which also reduces standby leakage [44]. This introduces a new concern: the $LC$ components can oscillate and introduce additional unwanted clock transitions until the stored energy in the system is dissipated. To address this issue, an extra NMOS transistor can be added in parallel to $C_{clk}$ to provide a discharge path for the clock, keeping $V_{clk}$ at zero and immediately shorting any unwanted oscillations. This shorting transistor can share the same gating signal as the header power-gating transistor.

One of the strengths of the circuit presented in this chapter compared to earlier chapters is its simplicity. It requires relatively few components, and it does not require changing the operation of the clock by duty cycle modulation or low-swing distribution. Hence, the application of energy recycling concepts and on-chip DC-DC converter technology resulted in significant power savings to a very important circuit.
6 CONCLUSIONS

As an energy saving strategy, recycling energy stored in the clock that would otherwise be discharged to ground has been the subject of this dissertation. Two methods of reusing this energy have been investigated: 1) using the energy to provide an extra supply voltage for other circuits and 2) transferring the energy back to power grid to improve power consumption of the circuit.

Power losses in a system can be divided into two categories: resistive and dynamic. Reducing resistive power loss by optimizing along current paths has always been a goal for circuit designers, while reducing dynamic losses have been achieved by minimizing the gate capacitance, reducing the supply voltage and/or switching frequency.

The voltage across an open switch is stored as electric energy in the stray capacitance of the switch. It has been known that an inductor is needed to successfully remove this energy in full, a common practice in designing an LC oscillator circuit. The inductor is a good candidate for transferring energy into it as, ideally, there is no loss in the transfer process. However, in power circuits, the oscillation in the circuit is avoided by choosing the switching frequency much higher than the resonant frequency of the LC circuit. As a result, the state of the circuit is changed many times before one oscillation cycle is completed. In one state of the circuit, current from the supply voltage builds up in the inductor. In the other state, the circuit configuration is changed so that current continues to flow from the capacitor, since current can not change instantly in an inductor. When the capacitor is fully discharged, i.e., its voltage is zero, the switch should close to provide a low resistance path for the inductor current; otherwise, the intrinsic
diode of the switch would turn on. This is not desirable as the voltage drop across a diode is bigger than the voltage drop across a closed switch.

The energy improvement methods proposed here are based on the Zero Voltage Switching technique which delays turning on a switch until the voltage across it is zero. During the delay time, the voltage across the device is reduced, i.e., the energy that was about to discharge to ground through the device is transferred to another passive reservoir. Consequently, power loss related to dynamic losses in the system is reduced. To further reduce the total power loss, resistive losses in the system have been reduced by using wider and/or thicker current paths. Also wider transistors have been employed, although they have large gate capacitance requiring them to be driven by bigger drivers.

As the inductor and capacitor used in a power converter are big, the only way that a converter can fit on-chip is to reduce their size. Converter transistors can switch at higher frequencies that allow converter passive components to be integrated on-chip. However, these higher frequencies were traditionally avoided due to dynamic losses. By reducing these dynamic losses, on-chip integration of power converters can be made more practical.

This thesis has investigated several methods of reducing these dynamic losses by recycling energy stored in the clock network. The first circuit is integrated with the clock driver in a way that delivers the final clock load energy to the switching converter. The second circuit recognizes that the high switching losses of the front-end driver chain can be reduced through supply stacking and that some excess energy that results can also be delivered to the switching converter. The third and forth circuits explore boost and buck-boost topologies. The fifth circuit uses power converter circuitry to directly reduce the power consumption of a clock driver in a manner that represents a boost converter.

The charge recycling methods used here are able to exploit a large clock capacitance. Although traditional design practices try to keep the clock capacitance as small as possible, these
techniques provide a power-saving alternative when that is no longer an option. The proposed methods are generic, technology-independent solutions that could be valid for future generations of finer feature size CMOS technologies.

The implementations in this thesis attempt to diminish the impact of the integration on the original system by minimizing the effect of charge recycling on internal signals. Also, stability and robustness of the proposed solutions are assured by avoiding complex circuit configurations as they are prone to malfunction and also have higher failure rates.

The results demonstrated here are very promising. A significant amount of work remains to be done to optimize these circuits before they are practical. Although simulations show good-quality, quasi-square clock waveforms, there is concern that clock jitter may be increased as a result of the power converter integration. Placement of these circuits in the clock network could also increase skew. Future work is needed to address concerns regarding integrating the new circuits in a real, complex chip design; for example, interaction between the converters and the system power grid and/or decoupling capacitors. As a limiting factor, the new clock waveform from the integrated converter/driver circuits is only suitable for positive-edge-triggered digital blocks as the converter output voltage is adjusted by pulse width modulation of the clock waveform. The low-power clock driver does not have this drawback, since it can work at a fixed duty cycle.

If these methods of charge recycling prove to be successful in practice, they potentially could be used in many high performance, high frequency designs to lower power and save energy. This new design approach may transform a regular CMOS designer's way of thinking to take into account energy recycling. For example, designers typically minimize capacitance, but bigger capacitors in some areas may lead to more energy recovery and provide benefits in other areas.
Table 6.1 summarizes the overall results of the fabricated prototypes that are presented in this thesis:

Table 6.1. Chip prototype results

<table>
<thead>
<tr>
<th>Type</th>
<th>Buck</th>
<th>Boost</th>
<th>Buck-Boost</th>
</tr>
</thead>
<tbody>
<tr>
<td>Integrated clock driver/power converter</td>
<td>90nm CMOS Simulation works</td>
<td>90nm CMOS Simulation works</td>
<td>90nm CMOS Simulation works</td>
</tr>
<tr>
<td></td>
<td>Prototype works</td>
<td>Prototype not working</td>
<td>Prototype not working</td>
</tr>
<tr>
<td>Low-swing power converter</td>
<td>180µm CMOS Simulation works</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td></td>
<td>Prototype works</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>Low-power clock driver</td>
<td>N/A</td>
<td>90nm CMOS Simulation works</td>
<td>Prototype works</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>N/A</td>
</tr>
</tbody>
</table>

### 6.1 Future Work

The plan for future extension of this work can be divided into two key categories: continuation of the previous work and finding new ideas for charge recycling.

#### 6.1.1 Continuation of the Work

As a continuation of the previous work, the two previous buck converter concepts can be merged onto a single design. That is, two chains of gate drivers can be considered in the clock-tree converter design. The advantage of this configuration is that the electric charge in the clock-tree circuit as well as the transistor gating circuits will be reused to get improved efficiency.

There are potential problems with this approach. For example, there might be a mismatch between the parallel inverter chains and therefore gating signals arriving at the power transistors may go out of synchronization. This is particularly a problem at very high speeds. This problem could be alleviated by using new adaptive delay circuits that are part of the ZVS function to
either re-sync or tolerate mismatch better. The following improvements to this new design can also be pursued:

- NMOS ZVS operation has been implemented in the integrated clock driver/power converter designs. The implementation of PMOS ZVS operation and its effect on converter efficiency could be considered. A dual circuit similar to the NMOS delay circuit can be used for this purpose. Early simulation results (using a different circuit) had shown that negative inductor current needed for ZVS operation of $M_p$ resulted in increased power loss due to inductor series resistance $R_s$. The new delay circuit and reduced $R_s$ through inductor thickening may alleviate this power loss.

- The existing diode-connected NMOS transistors in low-swing buck converter suffer from power loss during the on state. A simplified gating circuit is needed to fully turn on an NMOS transistor while mimicking diode behavior. This could be achieved by connecting the gate of a wide transistor to a comparator circuit that senses the voltage difference across the transistor [41]. The challenge comes from the fact that the comparator circuit needs to react very quickly while driving a big transistor.

- The existing transistors in the low-swing buck converter are of standard-$V_t$ type because low-$V_t$ devices were not available in the design kit. A low-$V_t$ device is needed to facilitate operation at $V_{DD}/2$ levels.

- To improve the chip layouts, some fine tuning could be performed to reduce resistance across the circuit. This would include resizing of the power transistors and changing the width of the circuit paths and/or the inductor path. This would increase area but would improve efficiency by decreasing the resistive power loss.

- In the clock-tree charge-recycling scheme, the clock signal has been disturbed in order to achieve converter voltage regulation. However, the quality of clock signal is important to
a logic designer. Clock jitter and duty cycle in clock-tree scheme should be measured and improved, perhaps by using on-chip structures and experimentation.

- Stacking of the passive filter components, specifically putting the filter inductor above the filter capacitor to save area, could be considered. The area under the inductor has not been used here due to concerns of negative impact on inductance and/or eddy current losses. Recently these concerns have been studied in [45] with reassuring results.

- Power grid capacitance could be integrated with the converter output filter. They behave like a distributed capacitor across the chip. Power grids can potentially oscillate due to $L$ and $C$ effects in the grid itself. This effect needs to be taken into account while studying the stability of the system. This idea would reduce the size of the output capacitor and, as a result, reduce the converter area.

- The effect of injecting charge back to on-chip DC power distribution grid could be investigated. A large system can potentially have several of the integrated low-power clock drivers working in parallel, raising concerns with their possibly synchronized operation.

- To simplify the designs, the current chips have limited controllability and observability. On-chip voltage buffer circuits can be added to view the internal signals such as the clock waveform. Also, on-chip jitter measurement circuits would help in accurate jitter measurement.

- The effect of delivering a voltage surge back to the power grid that is perfectly synchronized to the clock is unknown. It could deliver energy just-in-time to reduce resistive voltage drop, or it could be at the wrong time and increase it.
Clock networks are one of the charge dissipating sub-circuits in a system. There are other sub-circuits of an integrated system that have capacitors with a charging/discharging operation cycle. Those circuits could be investigated in order to apply charge recycling methods to feed DC-DC converters or for returning the charge back to the power grid. Examples of other charge dissipating circuits include:

- On-chip memories: In a synchronous random access memory (RAM), one entire word line is always fully charged/discharged every access cycle. As well, all bit lines are pre-charged (possibly not to full $V_{DD}$, but halfway) and during the read cycle they are partially or fully discharged. DRAM empties its capacitor storage onto the bit line, but the charge change is very small and probably can't be captured. However, SRAM uses a pull-down NMOS to drain the bit line to ground. It might be possible to capture and collect the SRAM pull-down charge in a “pseudo-ground” grid, and fed it to a DC-DC converter.

- I/O pads: I/O pads usually have big capacitance that are charged and discharged in every change of output state. Instead of discharging the pad capacitance to ground, the charge can be delivered to a power converter. There are two common types of I/O pads: full swing digital pads that are used in low-speed signaling and low-voltage differential signaling (LVDS) pads that are used in high-speed signaling. Different charge recycling methods could be applied to those pads.

- Tail current source in differential pairs and biased circuits: Instead of sinking current to the ground, it can be redirected to a DC-DC boost converter. This circuit is different from the others in the sense that the charge is not recycled from a capacitor but from a continuous current source.
From the list above, the most advantageous ones could be identified and selected to demonstrate advantages of charge recycling. This would define a new category of designs that reuse, recover, and recycle energy called “green” chips or environmentally friendly electronic circuits. With reduced energy consumption, green chips can be powered from the renewable energy of the environment, such as sunlight or human body heat. Living off free ambient energy, they will be closer to zero-footprint and can become true wireless devices.
REFERENCES


APPENDICES

A Discrete Switching Power Converters

Switch-mode converters consist of an inductor that periodically is connected in different configurations. By adjusting the ratio of time spent in each configuration, the output voltage can be regulated. This method is more efficient, in the range of 80% to 95% for a discrete design, as switches are either fully on or fully off and voltage drop ideally happens only across the inductor, which is a no-loss component, i.e., voltage drop causes energy to be stored, not to be dissipated, in the inductor. For the sake of simplicity, in the following discussions power losses in the circuits are neglected, i.e., $P_{in} = P_{out}$.

A.1 Buck (Step-Down) Switching Converters

One of the basic switch-mode DC-DC conversion topologies is the step-down or buck converter. Basically its operation can be described as averaging a square wave signal by passing it through a low pass filter. The average or DC value is $D \times V_{DD}$ which implies that the output voltage is a function of the magnitude and also the duty cycle of the square waveform.

As shown in Figure A.1, the square waveform is generated using two switches: one transistor and one diode. Using the diode simplifies the circuit as it operates automatically and does not need a gating signal.
The operation of the buck converter is fairly simple. If the inductor current never stays at zero, it is said that the converter is operating in Continuous Conduction Mode (CCM). As shown in Figure A.2, there are two operational states.

In the first state, the transistor is on, diode is reversed biased and current builds up in the inductor. In the second state, the transistor is turned off. Current in the inductor can not change instantly, so the current finds its way through the diode. Since the supply is disconnected from the circuit, inductor current decreases as the energy is transferred from the inductor to the load.

In steady-state operation, the integral of the inductor voltage over one time period, in other words the average of the inductor voltage, must be zero. Therefore

\[(V_{DD} - V_{out})t_{on} = V_{out}(T_{sw} - t_{on})\] or
which implies that the converter has a linear ideal DC gain, \( i.e., \), behaves like a DC transformer. Also in steady state as there is no DC current going through the capacitor, the inductor average current is equal to the output DC current. Suppose the DC load current is decreased slowly. The average value of the inductor current falls to the point that the minimum inductor current reaches zero. At this time the average inductor current is

\[
I_L = \frac{1}{2} \Delta i_L = \frac{1}{2} i_{L,\text{max}} = \frac{t_{\text{on}}}{2L} (V_{DD} - V_{\text{out}}) = \frac{D T_{\text{sw}}}{2L} (V_{DD} - V_{\text{out}}) \quad \text{as shown in Figure A.3. Noting that } I_L = I_{\text{out}}, \text{ Equation A.2 gives the minimum inductance needed to keep the converter in CCM with a minimum design load current } I_{\text{out-min}}. \text{ In practice, it is considered that } I_{\text{out-min}} \approx 0.1 \times I_{\text{out}}.
\]

\[
L = \frac{D T_{\text{sw}}}{2 I_{\text{out}}} (V_{DD} - V_{\text{out}}) \quad \text{(A.2)}
\]

![Waveforms of a buck converter in CCM mode](image)

Figure A.3. Waveforms of a buck converter in CCM mode

If the DC load current is further decreased, since the diode can not conduct a negative current, minimum inductor current stays at zero as shown in Figure A.4. This is called Discontinuous Conduction Mode (DCM). In DCM, the converter behavior is not linear which requires a complex controller algorithm for voltage regulation.
Because a transistor on-state voltage drop is less than a diode on-state voltage drop, the diode can be replaced by a transistor as shown in Figure A.5. This configuration is referred to as Synchronous Buck Converter. This configuration also avoids the complexity of DCM. As the second transistor can conduct negative currents, the converter always stays in CCM mode.

In reality there is a significant voltage drop across the diode in the basic buck converter. Because a transistor on-state voltage drop is less than a diode on-state voltage drop, the diode can be replaced by a transistor as shown in Figure A.5. This configuration is referred to as Synchronous Buck Converter. This configuration also avoids the complexity of DCM. As the second transistor can conduct negative currents, the converter always stays in CCM mode.

The filter capacitor that is directly connected at the output of the converter, makes it seen as a voltage source by the load. A bigger capacitor makes the output voltage ripple smaller. It can be proven that the peak-to-peak output voltage ripple can be written as

\[
\frac{\Delta V_{out,pp}}{V_{out}} = \frac{\pi^2}{2} (1 - D) \left( \frac{F_c}{F_{sw}} \right)^2 \text{ where } F_c = \frac{1}{2\pi\sqrt{LC}} \text{ is the corner frequency of the filter. Choosing }
\]

\[ F_c \equiv 0.1 \times F_{sw} \ll F_{sw} \text{ minimizes the ripple. This also shows that the output voltage ripple is } \]

Figure A.4. Waveforms of a buck converter in DCM mode

Figure A.5. A synchronous buck converter
independent of the output current in the CCM mode. Thus the filter capacitor can be derived using Equation A.3:

\[ C = \frac{(1-D)}{8(\Delta V_{out,pp} / V_{out})F_{sw}^2} \quad (A.3) \]

It is also worth noting that in the second CCM state, when the diode in the basic configuration is conducting, the converter model can be simplified to an LCR circuit. Choosing \( F_c << F_{sw} \) prevents the potential for oscillation as well.

In integrated power converter designs, to save on-chip area, smaller inductor and capacitor values are much preferred. Using Equations A.2 and A.3 for the basic buck converter, the effect of switching frequency and converter output current on the converter inductor and capacitor are illustrated in Figure A.6 and Figure A.7, respectively. Choosing a mid-level converter output current will give a good compromise between inductor and capacitor values while higher switching frequencies will reduce both.
Figure A.6. Effect of $F_{sw}$ and $I_{out}$ on buck converter inductor

Figure A.7. Effect of $F_{sw}$ and $I_{out}$ on buck converter capacitor
A.2 Boost (Step-Up) Switching Converters

Another basic DC-DC conversion topology is the step-up or boost converter. Components used are similar to the buck converter but connected in a different configuration as shown in Figure A.8.

The operation of the boost converter is fairly simple. If the inductor current never stays at zero, it is said that the converter is operating in Continuous Conduction Mode (CCM). As shown in Figure A.9, there are two operational states.

In the first state, the transistor is on, current builds up in the inductor and diode is reversed biased isolating the output stage. In the second state, the transistor is turned off. Current in the inductor can not change instantly, so the current finds its way through the diode. Inductor
voltage will be in series with the source voltage, so the output capacitor receives a voltage that is higher than the supply voltage. The load receives energy from the input source as well as the inductor and therefore the inductor current decreases.

In steady-state operation, the integral of the inductor voltage over one time period, in other words the average of the inductor voltage, must be zero. Therefore

\[ V_{DD} t_{on} + (V_{DD} - V_{out}) t_{off} = 0 \]

or

\[ \frac{V_{out}}{V_{DD}} = \frac{T_{sw}}{t_{off}} = \frac{1}{1 - D} \]  

(A.4)

which implies that the converter has a non-linear ideal DC gain even in CCM. Also in steady state, the inductor average current is equal to the input average current. Suppose the DC load current is decreased slowly. The average value of the inductor current falls to the point that the minimum inductor current reaches zero. At this time the average inductor current is

\[ I_L = \frac{1}{2} \Delta i_L = \frac{1}{2} i_{L,\text{max}} = \frac{1}{2} \frac{V_{DD}}{L} t_{on} = \frac{T_{sw}}{2L} \frac{V_{out}}{D(1 - D)} \]

Noting that \( I_L = I_{DD} = \frac{I_{out}}{1 - D} \), this equation gives the minimum inductance needed to keep the converter in CCM with a minimum design load current. If the DC load current is further decreased, since at the end of commutation cycle the inductor discharges completely, minimum inductor current stays at zero. This is called Discontinuous Conduction Mode (DCM).

A.3 Buck-Boost Switching Converters

Buck-boost topology is a combination of the two basic configurations. Components used are similar to the buck converter but connected in a different configuration as shown in Figure A.10.
The operation of the boost converter is fairly simple. If the inductor current never stays at zero, it is said that the converter is operating in Continuous Conduction Mode (CCM). As shown in Figure A.11, there are two operational states.

In the first state, the transistor is on, current builds up in the inductor and diode is reversed biased isolating the output stage. In the second state, the transistor is turned off. Current in the inductor can not change instantly, so the current finds its way through the diode. Inductor voltage will be in parallel with the output voltage. Since the supply is disconnected from the circuit, inductor current decreases as the energy is transferred from the inductor to the load.
In steady-state operation, the integral of the inductor voltage over one time period, in other words the average of the inductor voltage, must be zero. Therefore $V_{DD} t_{on} + (-V_{out}) t_{off} = 0$

or

$$\frac{-V_{out}}{V_{DD}} = \frac{t_{on}}{t_{off}} = \frac{D}{1-D} \tag{A.5}$$

which implies that the converter has a non-linear ideal DC gain even in CCM. Also in steady state, the inductor average current is equal to the sum of the input average current and the output average current. Suppose the DC load current is decreased slowly. The average value of the inductor current falls to the point that the minimum inductor current reaches zero. At this time the average inductor current is

$$I_L = \frac{1}{2} \Delta I_L = \frac{1}{2} I_{L,max} = \frac{1}{2} \frac{T_{sw} V_{DD}}{L} D = \frac{T_{sw} V_{out}}{2L} (1-D) .$$

Noting that $L = D I_{DD} + I_{out} = \frac{D}{1-D} I_{out} + I_{out}$, this equation gives the minimum inductance needed to keep the converter in CCM with a minimum design load current. If the DC load current is further decreased, since at the end of commutation cycle the inductor discharges completely, minimum inductor current stays at zero. This is called Discontinuous Conduction Mode (DCM).

**B On-Chip Passive Components**

**B.1 Inductors**

An inductor is an integral part of any switch-mode power converter. Traditionally, magnetic materials are used in construction of inductors to confine the magnetic field close to the coil, thereby increasing the inductance. Magnetics on silicon have been introduced before in the literature [46]. However, to keep the inductor design compatible with conventional CMOS process, coreless inductors are being used in this work.
A simplified π model of an inductor is shown in Figure B.1 which consists of an ideal inductance $L_{series}$, a series resistance $R_{series}$ representing the ohmic losses in the coil, inductor capacitances $C_{s1}$ and $C_{s2}$, and substrate resistances $R_{s1}$ and $R_{s2}$. The value of these components can be derived using ASITIC software [40].

![Figure B.1. A Simplified π model of an inductor](image)

In CMOS processes, the silicon substrate has a relatively low resistivity and eddy currents in the silicon can be considerable. As the eddy current tries to create a magnetic field that opposes the applied magnetic field, the effect of eddy current is seen by a reduced net flux and thus a reduced inductance. Since different substrate structures have different resistivity, they will have different effects on the inductance.

Any coupled currents in the substrate will increase the substrate noise because they change the substrate voltage. Consequently, a metal Patterned Ground Shield (PGS) is placed in between the inductor coil and the substrate [47]. By using strings of ground-substrate contacts, any induced current in the substrate will be shorted at regular intervals to the system ground as well. The inductor characteristics will become independent of the substrate structure and eddy currents in the substrate may also reduce. Among the different patterns that are introduced and studied in the literature, the wide bar pattern shown in Figure B.2 avoids eddy current path and is used in this work [48].
Use of only higher metal layers for the inductor and the lowest metal layer for PGS will keep the inductor high up above the PGS. Excluding use of the lower metal layers for the inductor will also reduce $C_{s1}$ and $C_{s2}$. On the other hand, block out masks can be applied during fabrication to keep the doping level under the spiral coil at a minimum to maximize $R_{s1}$ and $R_{s2}$ [49].

The effect of high frequencies on inductor characteristics has previously been studied in [50]. Using ASITIC [40], those effects are illustrated in Figure B.3 for the following inductor in a 1P7M2T 90nm process: a one-turn octagon inductor with an external radius of 300µm and a width of 50µm. The two thick metal layers of the process ($M6$ and $M7$) are put in parallel to reduce the series resistance $R_{series}$. ASITIC simulation for the inductor used in the buck converter chip shows that at 3GHz, an inductance value of 320pH with $R_{series} = 260m\Omega$, $C_{s1} = C_{s2} = 140fF$, $R_{s1} = R_{s2} = 280\Omega$ and quality factor of 20 is achieved.

In Figure B.3, as the frequency increases, $R_{series}$ increases mainly due to both skin and proximity effects. At frequency reaching the maximum $Q$, $R_{series}$ starts to decrease rapidly. It is believed that the decrease is caused by coupling through the silicon substrate. It has been reported that adding a parallel combination of resistance and capacitance to the $\pi$ model can increase the accuracy of the inductor model because coupling mechanisms through the silicon
substrate are resistive-coupling dominant at low frequencies and capacitive-coupling dominant at high frequencies [50].

![Figure B.3. Effect of high frequency on an inductor characteristic](image)

**B.2 Capacitors**

In CMOS there are a few different types of capacitors available, including MIM, Fractal and MOSFET gate capacitances. MIM capacitors are manufactured using special metal layers and as such they can be accurately characterized but they have low capacitance density. Fractal capacitors are made of geometrically shaped regular metal layers. They have higher capacitance density but the capacitance can vary depending on the variations of the fabrication process [51]. MOSFET gate capacitors have the highest capacitance density, but they are non-linear [52] and require a DC bias voltage to operate.

In switch-mode power converters, capacitors are used as bulk energy storage devices. In this work, an array of hundreds of NMOS devices in parallel is used to accommodate the high capacitance needed. The nonlinear behavior of gate capacitance is not significant in power converter applications because capacitance can be predicted according to the working voltage.
Using the procedure given in [17] the effect of gate voltage on gate capacitance is presented in Figure B.4 for a 1µm² NMOS device in 90nm CMOS technology.

![Figure B.4. Gate capacitance vs. gate voltage for an NMOS device](image)

As shown in Figure B.4, for voltages higher than 0.75V, the capacitance density is around $C_{ox} = 12\text{fF}/\mu\text{m}^2$. The total capacitance of one transistor’s gate capacitance can then be calculated using $C = W \cdot L \cdot C_{ox}$ in which $W \cdot L$ product, represents the gate area of the transistor. To increase the gate area, transistors are usually designed with a length much higher than the minimum allowed by the technology. MOSFET-based capacitors has been studied in [52] using the distributed resistor and capacitor model shown in Figure B.5.

![Figure B.5. Model of a MOSFET gate capacitor](image)

The internal resistance, commonly known as Equivalent Series Resistance ($ESR$), of such a capacitor consists of two parts: gate resistance $R_g$ and channel resistance $R_{ch}$. Equation B.1 summarizes the relationship between $ESR$ and transistor aspect ratio.
While Equation B.1 gives the \( ESR \) of one device, in practice, many gate capacitors are put in parallel. The \( ESR \) given by Equation B.1 is then divided by the number of parallel devices in the capacitor structure, to achieve the total resistance of the capacitor.

Equation B.1 suggests that \( ESR \) would exhibit a minimum for a certain device aspect ratio. Minimum \( ESR \) is independent of the capacitance value, since it is not dependent on the size of the MOSFET capacitor but on its shape, \( i.e., \) aspect ratio. Equation B.1 is plotted in Figure B.6 for a 90nm CMOS design kit using MATLAB rather than Cadence. That is because Cadence schematic simulation engine only considers \( R_{ch} \). In Cadence, post-layout simulation based on extracted component values is needed for the effect of \( R_G \) to be included. As shown in Figure B.6, a \( W/L \) of about 10 minimizes the \( ESR \). A reduced \( ESR \) not only decreases power loss in the capacitor, but also lowers the voltage ripple across it.

The effect of high frequency on capacitance and series resistance of gate capacitors in CMOS technology has been studied in [53] which shows no significant change at frequencies of interest of this work.

\[
ESR = R_{ch} + R_G = K_0 \frac{L}{W} + K_{10} \frac{W}{L}
\]  

(B.1)