# BRNO UNIVERSITY OF TECHNOLOGY

# Faculty of Electrical Engineering and Communication

# MASTER'S THESIS

Brno, 2022

Bc. Vojtěch Král



# **BRNO UNIVERSITY OF TECHNOLOGY**

VYSOKÉ UČENÍ TECHNICKÉ V BRNĚ

# FACULTY OF ELECTRICAL ENGINEERING AND COMMUNICATION

FAKULTA ELEKTROTECHNIKY A KOMUNIKAČNÍCH TECHNOLOGIÍ

## **DEPARTMENT OF RADIO ELECTRONICS**

ÚSTAV RADIOELEKTRONIKY

# SAVING POWER AND AREA WITH MULTI-BIT PULSED LATCHES

ÚSPORA SPOTŘEBY A PLOCHY PŘI POUŽITÍ MULTI-BITOVÝCH PULSNÍCH KLOPNÝCH OBVODŮ

#### MASTER'S THESIS DIPLOMOVÁ PRÁCE

AUTHOR AUTOR PRÁCE Bc. Vojtěch Král

### SUPERVISOR VEDOUCÍ PRÁCE

Ing. Jiří Dřínovský, Ph.D.

## **BRNO 2022**



## **Master's Thesis**

Master's study program Electronics and Communication Technologies

Department of Radio Electronics

Student: Bc. Vojtěch Král Year of study: **ID:** 195671

Academic year: 2021/22

TITLE OF THESIS:

#### Saving power and area with multi-bit pulsed latches

#### INSTRUCTION:

Study the possibilities of reducing the consumption of digital parts in ASIC design. Especially pulse generator topologies driving latches study in more details. Such design can replace broadly used sequential circuits. During the study of the open literature, focus done on topology with low consumption and low area. Also, take into an account the balance between the simplicity of design and the reliability at conditions stated in the assignment. Choose the most appropriate topology and design a multi-bit pulsed latch. Required operating parameters of the multi-bit pulsed latch in onsemi 65 nm technology are: the presence of scan input, a version without reset pin and with reset pin, supply voltage 1.2 V  $\pm$  10% and temperature range from -40 ° C to 150 °C. Also, determine the frequency range in which the multi-bit pulsed latches can operate.

Follow up on the results from your semester project and create a complete design of a multi-bit pulsed latch circuit in onsemi 65 nm technology, including simulations using the Specter simulator. Optimize the design for power consumption and for minimum layout area. In the next step create layouts of basic types of multi-bit pulsed latches and perform simulations with netlists extracted from the layout, including parasitic elements. Try to prove the positive impact on consumption and area in a simple design by replacing classic multi-bit flip-flops with multi-bit pulsed latches.

#### RECOMMENDED LITERATURE:

[1] E. Consoli, G. Palumbo, J. M. Rabaey and M. Alioto, "Novel Class of Energy-Efficient Very High-Speed Conditional Push–Pull Pulsed Latches," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, no. 7, pp. 1593-1605, July 2014, doi: 10.1109/TVLSI.2013.2276100.

[2] V. Ahuja, P. T. Karule and U. S. Ghodeswar, "Design of high speed conditional push pull pulsed latch," 2016 International Conference on Communication and Signal Processing (ICCSP), 2016, pp. 1374-1378, doi: 10.1109/ICCSP.2016.7754378.

Date of project specification: 11.2.2022 Deadline for 25.5.2022 submission:

Supervisor:Ing. Jiří Dřínovský, Ph.D.Consultant:Ing. Jiří Šolc

**prof. Dr. Ing. Zbyněk Raida** Chair of study program board

Faculty of Electrical Engineering and Communication, Brno University of Technology / Technická 3058/10 / 616 00 / Brno

#### WARNING:

The author of the Master's Thesis claims that by creating this thesis he/she did not infringe the rights of third persons and the personal and/or property rights of third persons were not subjected to derogatory treatment. The author is fully aware of the legal consequences of an infringement of provisions as per Section 11 and following of Act No 121/2000 Coll. on copyright and rights related to copyright and on amendments to some other laws (the Copyright Act) in the wording of subsequent directives including the possible criminal consequences as resulting from provisions of Part 2, Chapter VI, Article 4 of Criminal Code 40/2009 Coll.

Faculty of Electrical Engineering and Communication, Brno University of Technology / Technická 3058/10 / 616 00 / Brno

## Abstract

This thesis describes types of power consumption in CMOS technology and low power techniques that can be used in application-specific integrated circuits. It describes a multibit pulsed latch as one of the low power techniques that can be used as a better replacement for a standard multi-bit master-slave flip flop. The multi-bit pulsed latch is composed of two parts: a pulse generator and a pulsed latch. Different useful topologies are mentioned. Topologies are chosen for their optimized area and power consumption. A schematic of the multi-bit pulsed latch is designed from chosen topologies and compared to a schematic of the standard multi-bit flip flop. Required layouts of multi-bit pulsed latches are then made and compared to standard layouts of multi-bit flip flops. Those designed multi-bit pulsed latches are also simulated in a simple design.

## Keywords

Pulse latch, flip flop, multi-bit pulsed latch, low power techniques, pulse generator, CMOS, low power consumption

## Abstrakt

Diplomová práce se zabývá typy výkonových ztrát v technologii CMOS a následně jejich řešením metodami nízké spotřeby, které lze použít v oblasti zákaznických integrovaných obvodů. Jednou z možných metod pro nízkou spotřebu výkonu je vícebitový pulzní klopný obvod, který lze použít jako náhradu za standardní vícebitový klopný obvod typu master-slave. Vícebitový pulzní klopný obvod se skládá ze dvou částí, pulzního generátoru a pulzního klopného obvodu. V diplomové práci jsou popsané topologie, které mohou být použité. Konečná topologie byla vybraná na základě nízké spotřeby a plochy. Z vybrané topologie je vytvořeno schéma vícebitového pulzního klopného obvodu, které je porovnáno se schématem vícebitového klopného obvodu. Následně jsou vytvořeny layouty potřebných vícebitových pulzních klopných obvodů a porovnány jsou se standardními layouty vícebitových klopných obvodů. V závěru práce jsou navrhnuté vícebitové pulzní klopné obvody odsimulovány v jednoduchém designu.

## Klíčová slova

Pulzní klopný obvod, klopný obvod, vícebitový pulzní klopný obvod, metody nízké spotřeby, pulzní generátor, CMOS, nízká spotřeba

## Rozšířený abstrakt

Diplomová práce se zabývá typy výkonových ztrát, které vznikají v technologie CMOS. CMOS technologie je nejpoužívanější technologií pro výrobu čipů, ve které se pro snížení spotřeby používají metody pro nízkou spotřebu. Za jednu z metod nízké spotřeby a velikosti se považuje vícebitový pulzní klopný obvod. Standardní vícebitový klopný obvod typu master-slave lze nahradit za vícebitový pulzní klopný obvod, který má lepší vlastnosti v porovnání s velikostí a spotřebou. Vícebitový pulzní klopný obvod se skládá ze dvou částí, z pulzního generátoru a pulzního klopného obvodu. Pulzní klopný obvod je klopný obvod, který lze řídit krátkým pulzem namísto klasického obdélníkového signálu, který má obvykle střídu signálu 1:1. Pulz je vytvořen v pulznímu generátoru, který je navržený tak, aby byl dostatečně dlouhý pro ovládání pulzního klopného obvodu. V diplomové práci se porovnávají různé typy pulzních generátorů a pulzních klopných obvodů. Topologie je vybraná na základě vhodných parametrů. Z vybraného pulzního generátoru a pulzního klopného obvodu je možné vytvořit vícebitový pulzní klopný obvod, který má lepší spotřebu a velikost než varianta s jedním bitem. Vícebitový pulzní klopný obvod má lepší spotřebu a velikost, protože má méně tranzistorů. Pro jeden generátor je použito více pulzních klopných obvodů. Z vybrané topologie je vytvořeno schéma vícebitového pulzního klopného obvodu, které je porovnáno se schématem vícebitového klopného obvodu. Následně jsou vytvořeny layouty potřebných vícebitových pulzních klopných obvodů a porovnány jsou se standardními layouty vícebitových klopných obvodů. Zjišť ují se i minimální a maximální použitelné frekvence všech obvodů. V závěru práce jsou navrhnuté vícebitové pulzní klopné obvody odsimulované v jednoduchém designu, ve kterém je potvrzeno, že vícebitové pulzní klopné obvody mohou být použity jako náhrada za standardní vícebitové klopné obvody.

## **Bibliographic citation**

KRÁL, Vojtěch. *Úspora spotřeby a plochy při použití multi-bitových pulsních klopných obvodů* [online]. Brno, 2022 [cit. 2022-05-14]. Dostupné z: <u>https://www.vutbr.cz/studenti/zav-prace/detail/141542</u>. Diplomová práce. Vysoké učení technické v Brně, Fakulta elektrotechniky a komunikačních technologií, Ústav radioelektroniky. Vedoucí práce Jiří Dřínovský.

## **Author's Declaration**

| Author:        | Bc. Vojtěch Král                                    |
|----------------|-----------------------------------------------------|
| Author's ID:   | 195671                                              |
| Paper type:    | Master's Thesis                                     |
| Academic year: | 2021/22                                             |
| Topic:         | Saving power and area with multi-bit pulsed latches |

I declare that I have written this paper independently, under the guidance of the advisor and using exclusively the technical references and other sources of information cited in the project and listed in the comprehensive bibliography at the end of the project.

As the author, I furthermore declare that, with respect to the creation of this paper, I have not infringed any copyright or violated anyone's personal and/or ownership rights. In this context, I am fully aware of the consequences of breaking Regulation S 11 of the Copyright Act No. 121/2000 Coll. of the Czech Republic, as amended, and of any breach of rights related to intellectual property or introduced within amendments to relevant Acts such as the Intellectual Property Act or the Criminal Code, Act No. 40/2009 Coll., Section 2, Head VI, Part 4.

Brno, May 10, 2022

\_\_\_\_\_

author's signature

## Acknowledgement

I would like to thank my supervisor and boss Jiří Šolc, who let me make this thesis on multi-bit pulsed latches. I would also like to thank my colleagues both at work and at school who helped me when I needed it the most. A big thank you also goes to my close family for their unlimited help in difficult times and my girlfriend Eliška Křížová for her unlimited positive energy and belief in me. Thank you, everybody.

Brno, May 10, 2022

-----

Author's signature

# Contents

| FIGURES                                                                                   | 8  |
|-------------------------------------------------------------------------------------------|----|
| TABLES                                                                                    | 10 |
| INTRODUCTION                                                                              | 11 |
| 1. CMOS TECHNOLOGY                                                                        | 13 |
| 1.1 FUNDAMENTALS OF CMOS TRANSISTORS                                                      | 13 |
| 1.2 DIGITAL STANDARD CELL LIBRARY                                                         | 15 |
| 1.3 PARAMETERS OF CMOS TRANSISTORS                                                        | 17 |
| 2. POWER CONSUMPTION IN CMOS TECHNOLOGY                                                   | 19 |
| 2.1 SWITCHING POWER                                                                       | 20 |
| 2.2 Short-circuit power                                                                   | 20 |
| 2.3 LEAKAGE POWER                                                                         | 21 |
| 2.3.1 Subthreshold leakage                                                                | 22 |
| 2.3.2 Thin gate-oxide tunneling leakage                                                   | 23 |
| 2.3.3 Reverse-bias pn-junction leakage                                                    | 23 |
| 3. LOW POWER TECHNIQUES                                                                   | 24 |
| 3.1 System                                                                                | 24 |
| 3.2 PLACE & ROUTE                                                                         |    |
| 3.3 Algorithm                                                                             | -  |
| 3.4 Architecture                                                                          | 27 |
| 3.5 Logic                                                                                 |    |
| 3.6 Technology                                                                            | 28 |
| 3.7 TOPOLOGY AND CIRCUIT                                                                  | 28 |
| 4. MULTI-BIT PULSED LATCH                                                                 | 30 |
| 4.1 Pulse generator                                                                       | 31 |
| 4.1.1 Suitable pulse generators                                                           | 33 |
| 4.1.2 Pulse generators for special purpose                                                | 38 |
| 4.2 PULSED LATCH                                                                          | 40 |
| 4.3 MULTI-BIT PULSED LATCH                                                                | 45 |
| 4.4 COMPARATION OF MULTI-BIT FLIP FLOPS AND MULTI-BIT PULSED LATCHES IN SIMPLE SIMULATION | ON |
| FROM SCHEMATIC NETLIST                                                                    | 54 |
| 4.5 LAYOUTS OF DESIGNED MULTI-BIT PULSED LATCHES                                          | 56 |
| 4.6 TESTING IN SIMPLE DESIGN                                                              | 70 |
| 5. CONCLUSION                                                                             | 72 |
| LITERATURE                                                                                | 74 |
| SYMBOLS AND ABBREVIATIONS                                                                 | 76 |

# **FIGURES**

| Figure 1.1 - Number of transistors in integrated circuits in dependence on years [7]               | 13 |
|----------------------------------------------------------------------------------------------------|----|
| Figure 1.2 - Symbol and functions of NMOS and PMOS transistors [9]                                 |    |
| Figure 1.3 - The side view of NMOS (left) and PMOS (right) transistors [3]                         |    |
| Figure 1.4 - Layout of inverter in 65 nm technology                                                |    |
| Figure 1.5 - Zoomed NMOS transistor from inverter cell with parameters of the transistor           |    |
| Figure 1.6 - The schematic of an inverter                                                          |    |
| Figure 2.1 - Dependence of normalized power on years [10]                                          |    |
| Figure 2.2 - Leakage components are shown in layout and schematic (I1 – Diode reverse bias current |    |
| Subthreshold current, I3 – Gate-induced drain leakage, I4 – Thin-oxide tunneling leakage) [11]     |    |
| Figure 3.1 - Example of AND glitch generator                                                       |    |
| Figure 3.2 - Differences between (a) normal, (b) parallelism, and (c) pipelined circuit [3]        |    |
| Figure 4.1 - Block diagram of multi-bit pulsed latch in normal mode                                |    |
| Figure 4.2 - Design of global and local pulse generator                                            |    |
| Figure 4.3 - Block diagram of multi-bit pulsed latch in scan mode                                  |    |
| Figure 4.4 - Block diagram of the simple pulse generator                                           |    |
| Figure 4.5 - Simulated block diagram from Figure 4.4 in MATLAB                                     |    |
| Figure 4.6 - Simple pulse generator made of inverter and 2-input AND                               |    |
| Figure 4.7 - Graph of dependence of inverter delay on transistor channel length                    |    |
| Figure 4.8 - Stacked inverter                                                                      |    |
| Figure 4.9 - Double-stacked inverter                                                               |    |
| Figure 4.10 - Five inverters delay (top) and three inverters delay (bottom)                        |    |
| Figure 4.11 - Double stacked inverter with 2-input logical AND                                     |    |
| Figure 4.12 - Generated pulses across the PVTs from chosen generator                               |    |
| Figure 4.13 - Block diagram of the pulse generator with 2-input NOR gate                           |    |
| Figure 4.14 - Simulated block diagram from Figure 4.13 in MATLAB                                   |    |
| Figure 4.15 - Circuit of the pulse generator with NOR logical gate                                 |    |
| Figure 4.16 - Circuit of the pulse generator with XOR logical gate                                 |    |
| Figure 4.17 - PTLA topology of a pulsed latch                                                      |    |
| Figure 4.18 - SSALA topology of a pulsed latch                                                     |    |
| Figure 4.19 - SSA2LA topology of a pulsed latch                                                    |    |
| Figure 4.20 - CPNLA topology of a pulsed latch                                                     |    |
| Figure 4.21 - PPCLA topology of a pulsed latch                                                     |    |
| Figure 4.22 - PPCLA modified topology of a pulsed latch                                            |    |
| Figure 4.23 - Standard 2-bit multiplexer                                                           |    |
| Figure 4.24 - Pass gate topology of 2-bit multiplexer                                              |    |
| Figure 4.25 - Pulsed latch with scan mode and negated output                                       |    |
| Figure 4.26 - 2-bit pulsed latch with scan mode and negated output                                 |    |
| Figure 4.27- Used topology of the pulsed latch with a multiplexer and reset pin                    |    |
| Figure 4.28 – Graphs of 2-bit pulsed latch in scan mode                                            |    |
| Figure 4.29 – Graphs of 2-bit pulsed latch in normal mode                                          |    |
| Figure 4.30 - The 4-bit pulsed latch with scan mode and negated output                             |    |
| Figure 4.30 - The 4-bit pulsed latch with scan mode and negated output                             |    |
| Figure 4.32 - Layout of the plain 2-bit pulsed latch                                               |    |
| Figure 4.32 - Layout of the plain 2-oft pulsed latch                                               |    |
| Figure 4.33 - Layout of the plain 4-oft pulsed latch                                               |    |
| Figure 4.35 - Layout of the reset 2-bit pulsed latch                                               |    |
| rigure 4.55 - Layout of the reset 2-oft pulsed laten                                               | 00 |

| Figure 4.36 - Layout of the reset 4-bit pulsed latch                                                       | 51 |
|------------------------------------------------------------------------------------------------------------|----|
| Figure 4.37 - Layout of the reset 8-bit pulsed latch                                                       | 52 |
| Figure 4.38 - Graph of the comparison of the area size of plain multi-bit pulsed latches and standard plai | n  |
| multi-bit flip flops                                                                                       | 65 |
| Figure 4.39 - Graph of the comparison of the area size of reset multi-bit pulsed latches and standard rese | t  |
| multi-bit flip flops                                                                                       | 65 |
| Figure 4.40 - Graph of the comparison of the power consumption in normal mode of plain multi-bit           |    |
| pulsed latches and standard plain multi-bit flip flops                                                     | 56 |
| Figure 4.41 - Graph of the comparison of the power consumption in normal mode of reset multi-bit puls      | ed |
| latches and standard reset multi-bit flip flops                                                            | 56 |
| Figure 4.42 - Graph of the comparison of the power consumption in scan mode of plain multi-bit pulsed      |    |
| latches and standard plain multi-bit flip flops                                                            | 67 |
| Figure 4.43 - Graph of the comparison of the power consumption in scan mode of reset multi-bit pulsed      |    |
| latches and standard reset multi-bit flip flops                                                            | 67 |

## **TABLES**

| Table 1.1 - List of common cells in the digital standard cell library                                   | . 16 |
|---------------------------------------------------------------------------------------------------------|------|
| Table 1.2 - Truth table of an inverter                                                                  | . 18 |
| Table 3.1 - Different design levels with examples of low power techniques                               | . 24 |
| Table 3.2 - Binary and Gray code sequence                                                               | . 28 |
| Table 4.1 - Example of values from the graph in Figure 4.7                                              | . 33 |
| Table 4.2 - Parameters of simulated pulse generators                                                    | . 36 |
| Table 4.3 - Table of specific cross corners                                                             | . 38 |
| Table 4.4 - Parameters of special simulated pulse generators                                            | . 38 |
| Table 4.5 - The comparison of latches controlled by pulse                                               | . 41 |
| Table 4.6 – Maximal frequency dependence on the number of bits                                          | . 49 |
| Table 4.7 - A comparison of the 2-bit pulsed latch and the 2-bit flip flop                              | . 54 |
| Table 4.8 - A comparison of the 4-bit pulsed latch and 4-bit flip flop                                  | . 55 |
| Table 4.9 - A comparison of the 8-bit pulsed latch and 8-bit flip flop                                  | . 55 |
| Table 4.10 - Measured power consumption of the multi-bit pulsed latches in normal mode                  | . 63 |
| Table 4.11 - Measured power consumption of the multi-bit pulsed latches in scan mode                    | . 63 |
| Table 4.12 - Measured area of the designed multi-bit pulsed latches                                     |      |
| Table 4.13 - Maximal usable frequency of the multi-bit pulsed latches                                   | . 63 |
| Table 4.14 - Overview of the measured parameters in designed multi-bit pulsed latches                   |      |
| Table 4.15 - Overview of parameters in standard multi-bit flip flops                                    | . 64 |
| Table 4.16 - Table of area comparison of multi-bit pulsed latches and multi-bit flip flops              | . 68 |
| Table 4.17 - Table of power consumption of multi-bit pulsed latches and multi-bit flip flops in normal  |      |
| mode                                                                                                    |      |
| Table 4.18 - Table of power consumption of multi-bit pulsed latches and multi-bit flip flops in scan mo | ode  |
|                                                                                                         | . 68 |
| Table 4.19 - Used parameters in a simple design simulation                                              |      |
| Table 4.20 - Measured power consumption of 64-bit shift register                                        | . 71 |
| Table 4.21 - Measured area size of the 64-bit shift register                                            | . 71 |

## INTRODUCTION

At this moment we are living in a phase in which we should start thinking about the future of the Earth more than ever before, since the carbon footprint is worse than ever. The carbon footprint is causing global warming which makes the planet warmer and warmer. Global warming is not caused only by carbon footprint but also by other greenhouse gases. At this moment, these gases are mainly produced by industrial and automotive end markets, but they are also made when the energy is generated by not re-usable resources - like coal. Many companies these days are promising to do something about it - like onsemi. The company onsemi has said it will achieve zero-net emissions by 2040 [12].

Energy demands are getting worse every year because many things must be supplied with power. Automotive markets also include chips that are developed for automotivespecific applications. These chips have a significant power consumption. What is more, when there are put together, the power consumption is even bigger. Nowadays, a modern car contains several hundreds of chips. These chips can be developed in different technologies that have different power consumption, but these chips can be also developed with low power solutions or ultra-low power solutions that minimize power consumption. For example, handheld devices have limited power supplies, because they are powered by batteries. If the power consumption is not optimized, the battery life will be short. What is more, the size of the battery would have to be huge. Both negative cases are unwanted.

At the beginning of this thesis, there is a short introduction to CMOS technology, where the basics of the technology are explained. There is also an explanation of where CMOS chips can lose power. The CMOS technology is the most used technology for chips. There will be a description of methods that can be used to reduce the power and area of application-specific integrated circuits (ASIC). The digital part of the chips is made of a standard library of digital cells also known as building blocks. Digital standard cell libraries are explained too. A multi-bit pulsed latch can be included into digital standard cell libraries. This pulsed latch can be used as a better replacement for standard multi-bit flip flops that have bigger area and power consumption.

The multi-bit pulsed latch is composed of a pulse generator and a latch that can be driven by pulse. The aim of this work is to choose a pulse generator and a latch topology for application in 65 nm CMOS technology with a specification mentioned in an assignment:  $\pm 10$  % of power supply, temperature range from -40 °C to 150 °C, and an option to have a scan input. There should also be a version of the multi-bit pulsed latch with and without a reset pin. The pulse generator and pulsed latch topologies have different results that will be explained. From a chosen pulse generator and a pulsed latch, a multi-bit version with scan can be built. Multi-bit cells are often designed in 2-bit, 4-bit and 8-bit versions. If a multi-bit pulsed latch has more bits, the savings are bigger.

These multi-bit pulsed latches are firstly simulated from the schematic netlist to get knowledge about functionality. The simulation is running in different process corners, voltages, and temperatures (PVTs) that were mentioned before. The most problematic circuit is the 8-bit pulsed latch with a scan input because the pulse has the biggest load and the pulse needs to be in a specific range which is dependent on the technology, used topology of the latch, and the point where the scan is connected to the next latch. The results of power consumption and estimated area of all designed multi-bit pulsed latches simulated from the schematic are compared with standard multi-bit flip flops. The most time-consuming part of this development is designing layouts from schematics. These layouts can be simulated too, but the advantage is that the parasitic extraction can be run. The parasitic extraction can extract netlist with parasitic elements which can be later simulated. At the end of this work, designed layouts of multi-bit pulsed latches are compared to standard layouts of multi-bit flip flops and then they are also tested in a simple design.

## **1. CMOS TECHNOLOGY**

The first working transistor was invented in 1947 by American physicists John Bardeen and Walter Brattain. Since 1947 a lot of research was done. The research has had a great influence on today's devices. Today's transistors are part of integrated circuits, small electronic parts developed for a special purpose. These transistors can be of different types, but CMOS transistors are most used inside integrated circuits instead of bipolar transistors because the CMOS transistors are faster, have smaller power consumption, and have a smaller area on-chip. It needs to be mentioned that the first integrated circuit was invented in 1959 by Robert Noyce, because a few years later - in 1965 - American businessman and co-founder of Fairchild Semiconductor, Gordon Moore, predicted doubling of transistors every year in integrated circuits. The actual trend is shown in Figure 1.1. This prediction is called Moore's law [3].



Figure 1.1 - Number of transistors in integrated circuits in dependence on years [7]

#### **1.1 Fundamentals of CMOS transistors**

As it was mentioned above, the CMOS transistors are often used in integrated circuits because of their advantages in comparison with bipolar transistors. CMOS stands for a complementary metal-oxide-semiconductor. The word 'complementary' means that two types of transistors complement each other as transistor type N channel and P channel. Their schematic symbols and functions are shown in Figure 1.2. Figure 1.2 shows a schematic symbol with 3 pins: D (drain), G (gate), and S (source). In chip development,

there is also another pin called B (bulk) which is the substrate of the transistor. The gate is the pin which is controlling the state of the transistor as it is shown in Figure 1.2. The metal in the name stands for gate contact which is made of metal in modern chips. The space under the contact is made of polysilicon and in short term, it is called a poly. This contact is insulated by oxide as it is shown in Figure 1.3. Silicon dioxide also known as  $SiO_2$  is often used as oxide. In today's technology, the thickness of this insulator is about a few atoms which makes these devices vulnerable to electrostatic discharge (ESD), because even a small electrostatic charge can break the gate. The last part of the name stands for semiconductor because the transistor is made of semiconductor material [3].



Figure 1.2 - Symbol and functions of NMOS and PMOS transistors [9]



Figure 1.3 - The side view of NMOS (left) and PMOS (right) transistors [3]

The structure of the CMOS technology is shown in Figure 1.3, there is a NMOS transistor (left) and a PMOS transistor (right). It is important to state that NMOS and PMOS work differently, and their parasitic features are different. When comparing both transistors of the same size, the NMOS transistor will be faster with lower leakage than the one with the PMOS transistor. The leakage will be explained in the next chapter. On the other hand, the PMOS transistor is more immune to noise than the NMOS transistor.

The application for both transistors is specified by the negative and positive features, for example, the NMOS transistor with smaller leakage can be used as a standard cell called a fill capacitor in digital libraries.

## 1.2 Digital standard cell library

Digital standard cells are an irreplaceable part of automotive-specific applications. Digital standard cells are well-defined cells that can be used in a design as building blocks [18]. These cells have different views that can represent different functions like schematic, layout, symbol, verilog, liberty, and others [3]. These building blocks are also pre-characterized to save time when they are used in a bigger design. These data of the

cells for the whole library; like maximal load, input capacitance, and others, are stored in liberty to make more complex simulations faster. Many types of digital standard libraries can be found in the technology; it depends on the or requirements from customers. application The performance of the libraries is given by the height of the cells. The digital standard library has all cells with the same height because these cells can be easily connected to power rails and can be easily used with the automatic place and route tool [18]. Digital libraries can be often developed as high-density (HD) or high-speed (HS). High-speed libraries have greater height, larger speed, and bigger power consumption than high-density libraries. The width of the cells is given by the complexity of the cell, but the width of the cell should be minimized to achieve a maximal density of the design. These libraries can be also designed with different models of the transistor that have different threshold voltages [3].



Figure 1.4 - Layout of inverter in 65 nm technology

The simplest cell is shown in Figure 1.4. It represents a simple layout inverter that is composed of two transistors. The height of these transistors in Figure 1.4 is not the same because the PMOS transistor is slower than the NMOS transistor with the same height as it was mentioned. The height ratio of both transistors is given by the technology. These libraries can be composed of many types of common cells. The list of often designed cells is shown in Table 1.1. These libraries can be also enriched with special digital standard cells that can have better power consumption, area, special purpose, or even performance. The following parts can be considered as digital standard cells: a level shifter, power-down cell, isolation cell, state retention flip flop, dual or multi-bit flip flops, dual

edge-triggered flip flop, low swing dual edge-triggered flip flop, and a multi-bit pulsed latch. This study is primarily focused on multi-bit pulsed latches [3].

| Name of cell             | Description of the cells                                        |
|--------------------------|-----------------------------------------------------------------|
| BUF, INV, AND, OR, NAND, | Simple logical functions with multi-inputs and different        |
| NOR, XOR, XNOR           | output strengths                                                |
| HALF / FULL ADDER        | 2-bit half or full adder with different output strengths        |
| MUX / DEMUX              | Multiplexer or demultiplexer with different output strengths    |
| ECO CELLS                | Universal cells that can be used in case of need                |
| AOI / OAI                | Multi-input and/or or or/and logical combination                |
| FLIP FLOPS / SCAN FLIP   | Flip flops (plain, reset, set) with different output strengths, |
| FLOPS                    | scan flip flop can be used as a shifter                         |
| LATCHES                  | Flip flop controlled with level                                 |
| FILLER / FILLCAP         | The cell can connect power rails or can be used as              |
|                          | decoupling capacitances                                         |
| DELAYS                   | Used for compensation of STA violations                         |
| CLOCK GATING CELLS       | Used to synchronize the clock signal                            |

Table 1.1 - List of common cells in the digital standard cell library

Digital standard cell libraries can be designed in each technology, where there may be different libraries in the technology with different uses as it was mentioned. The technology specifies the minimum length of transistors that can be used in the library. With passing years, people are making smaller and smaller transistors. Currently, the lowest technology is 3 nm [13]. The used technology in this thesis is 65 nm from onsemi company.

The reason, why the size of the technology is moving down, is simple. If the size of the transistor is smaller, used power supply voltage can be lower as well. It means that the power consumption is lower too. Also, the maximal possible frequency can be higher. The density is also higher, which means the amount of the used material is decreased. What's more, more chips can be made on the same size of a wafer than before. That is the main reason why the CMOS technology is moving forward. The smaller technology has also some problems. One of the disadvantages of the small technology is the cost of development and manufacturing - smaller technology requires more advanced processes. The problem of small technology is the leakage power which is more significant in smaller technology than in bigger technology. The leakage will be explained in more detail in the next chapter because it is linked with power consumption. With smaller technology, the ESD problem and the electromigration are more complicated for the price of higher density and performance [3].

#### **1.3 Parameters of CMOS transistors**

Transistors have four main parameters: one stands for W (width), the second stands for L (length), the third stands for n (number of fingers) and the last stands for m (multiplier). The last two are not common in digital standard cells. They are often used in analog CMOS circuits. The first two parameters are shown in Figure 1.5. These parameters define the size of the area under the gate (polysilicon) contact. Also, some parameters can be given by technology. For example, parasitic parameters as resistance or capacitance are both dependent on the size of the gate.



Figure 1.5 - Zoomed NMOS transistor from inverter cell with parameters of the transistor

As it was mentioned, the simplest standard cell in CMOS technology is an inverter, which represents a logical function called negation. The inverter has one input and one output. The truth table is shown in Table 1.2. The inverter is made of two transistors: NMOS and PMOS. The schematic and layout of the circuit are shown in Figure 1.6 and Figure 1.4. This circuit needs to be explained because the next chapter describes the power consumption in CMOS technology on this cell. That is why an understanding of this circuit is necessary. The layout is shown because there are layers that need to be explained and will be shown later in more complicated layouts.

The blue layer represents the metal one (M1) that is used for internal routing. Yellow boxes represent contacts (CO) between the active and metal one (M1). The beige layer represents an active which is an actual transistor. The top beige rectangle represents

PMOS, and the bottom beige rectangle is NMOS. The PMOS is bigger, because as it was mentioned, if both transistors had the same size, the PMOS would be slower. The last explained layer is red, which is crossing both actives. It is the poly (polysilicon) that represents the gate of the transistors. The size of the transistor is given by the size of the poly. Other layers are not important, because they are just technology layers. Colors in the layout are not important too, because every company or technology has its own colors of layers.

| Input (A) | Output (Q) |
|-----------|------------|
| 0 (Low)   | 1 (High)   |
| 1 (High)  | 0 (Low)    |

Table 1.2 - Truth table of an inverter

A MNI1 Vdl.\*

Figure 1.6 - The schematic of an inverter

## 2. POWER CONSUMPTION IN CMOS TECHNOLOGY

The power consumption in CMOS technology is a big issue nowadays. Devices with CMOS chips can be divided in two groups. One group has batteries, and the second group is powered from an outlet. Devices powered by batteries are limited by the capacity of the battery. If the power consumption is optimized in the case of handheld devices, the size of the battery can be smaller, and the device can last longer than before in the unoptimized state. If batteries last longer, it results in a better environment too, because batteries cannot be recycled efficiently. Devices powered by the outlet need to be optimized for power consumption too because if no one cares about power consumption at this moment, people in the future will have a big problem with energy consumption. After all, power stations will not be able to generate enough energy. With smaller technology, the power consumption decreases, and the performance of the chip is increased.

Before low power solutions can be shown, the cause of power consumption and its components need to be explained. The power consumption can be divided into two main groups: dynamic power and static power. The dynamic power is caused by flipping circuits as switching power and short-circuit power. The static power refers to leakage power. Each power consumption has its own parasitic features, that can affect the results of the power consumption. The total power consumption is given by equation (2.1) [3].

$$P_{total} = P_{switching} + P_{short-circui} + P_{leakage}.$$
 (2.1)



Figure 2.1 - Dependence of normalized power on years [10]

Components of the total power consumption are not of the same size for all technology processes as it is shown in Figure 2.1. As the years were passing by, the technology processes have changed but so have the magnitudes of dynamic and static power. As the years go on, the dynamic power increases slightly, but static power increases significantly. Static power becomes dominant when the technology reaches the 65 nm process, which is the case of this thesis. It needs to be dealt with [10]. Causes of the power consumption of each component will be explained on the inverter shown in Figure 1.6 in the following subchapters. Afterwards, it can be shown how to decrease power consumption with lower power techniques.

#### 2.1 Switching power

Switching power is caused by charging and discharging of capacitances node where the output is. The capacitance node mainly includes gate, overlapping, and interconnection capacitance. That is a reason why the layout needs to be designed precisely to minimize parasitic elements. The switching power can be expressed as equation (2.2):

$$P_{switching} = \alpha \cdot C_L \cdot f \cdot V_{DD}^2, \qquad (2.2)$$

where  $\alpha$  is the switching activity factor of the clock, C<sub>L</sub> is the capacitance load connected to the output stage, f is the frequency of the clock and V<sub>DD</sub> is the power supply voltage of the cell. The equation shows that the switching power depends on several quantities that are easily observable and measurable in CMOS circuits. These parameters need to be tailored to the application. Frequency and voltage do not need to be used with high value if the circuit performance does not require it. The typical switching activity factor of the clock is used as 0.5 which is equal to 50 % of the duty cycle. In a special application, the switching activity factor can be changed. The capacitive load needs to be optimized to the minimum possible value to reduce the switching power, which depends on the layout and output load. The switching power can be optimized with methods that can change the parameters in equation (2.2) [5].

#### 2.2 Short-circuit power

In digital CMOS circuits, there are always two complementary networks as p-network (pull-up) and n-network (pull-down). The inverter can be used as an example for the explanation as it is shown in Figure 1.6. Normally, when the input and output states are stable, only one transistor is turned on and conducts the output either to the power supply voltage node or to the ground node. The other network is turned off and blocks the current from flowing. There is also a transition state which is called short-circuit. The short-circuit

exists when switching to another state from a low logical state to a high logical state and vice versa when both transistors are half-opened. There is a short time interval, where both transistors are half-opened. The current flows through both types of transistors from the power supply to the ground. The short-circuit power can be expressed as equation (2.3):

$$P_{short-circuit} = I_{SC} \cdot V_{DD} \cdot f, \qquad (2.3)$$

where the  $I_{SC}$  is the short-circuit current that is caused by short-circuit,  $V_{DD}$  is the power supply voltage and f is the switching frequency. Parameters of equation (2.3) such as supply voltage and switching frequency need to be optimized because if the circuits do not need to be fast, the applied voltage and frequency can be lower, which helps to reduce power consumption. Dynamic power is the sum of switching and short-circuits power. Dynamic power is more significant for technologies larger than 65 nm as it is shown in Figure 2.1. For smaller technologies than 65 nm, dynamic power is less significant compared to the static power of the whole chip due to the leakage which will be explained in the next subchapter [3].

#### 2.3 Leakage power

The static power of the whole chip becomes dominant when the process technology reaches 65 nm which is exactly what this thesis deals with. Even turned-off transistors can consume some amount of power. This power loss is called leakage, which can be caused by a variety of problems. Several causes will be explained in this subchapter to give a basic idea of leakage power. As it was stated earlier, the leakage power is dependent on technology. With lower technology, the leakage is more dominant than the dynamic power as it is shown in Figure 2.1. For nanometer devices, leakage current is dominated by subthreshold leakage, thin-oxide tunneling leakage, and reverse-bias pn-junction leakage. There are still other leakage components like drain-induced barrier lowering and gate-induced drain leakage, but these are not important, because they are not significant in comparison with these three dominant leakages. These dominant leakages are shown in Figure 2.2. The leakage power cannot be simply expressed as in the case of dynamic power or short-circuit power, because the leakage is caused by many variables, but leakage power can be expressed as a function of many variables (2.4):

$$P_{leakage} = f\left(V_{DD}, V_{th}, \frac{W}{L}, T\right), \tag{2.4}$$

21

where the VDD is the power supply voltage,  $V_{th}$  is the threshold voltage of the transistor. W/L is the size of the transistor, when W is width, L length and T is the temperature in kelvins. Switching power and short-circuit power have a frequency as a variable in equations (2.2) and (2.3). Dynamic power dissipates in cycles, but the leakage power due to an absence of frequency in function (2.4) is continuous. That is the reason why the leakage is also called static power [3].



Figure 2.2 - Leakage components are shown in layout and schematic (I1 – Diode reverse bias current, I2 – Subthreshold current, I3 – Gate-induced drain leakage, I4 – Thin-oxide tunneling leakage) [11]

#### 2.3.1 Subthreshold leakage

The subthreshold leakage current  $I_{SUB}$  is a current which can flow between drain and source if the transistor is in weak inversion. A weak inversion occurs when the gate to source voltage  $V_{GS}$  is smaller than the threshold voltage  $V_{th}$  of the transistor. This current is happening because in the area between drain and source is a small amount of minority carrier concentration which can let the current through a drain to source. This leakage current is dependent on parameters of the transistor as is the power supply voltage, width, length, process, temperature, and type of the transistor. The reduction of this leakage current can be made with special topologies. For example, a standard inverter (inverter based on 2 transistors) can be replaced with a stacked inverter (inverter based on 4 transistors). Topology can be a good solution for subthreshold leakage reduction; but at the cost of the area of the cell. The subthreshold conduction can be used as an advantage. For example, it can be used in ultra-low power analog circuits, especially in dynamic random-access memories (DRAM) [3].

#### 2.3.2 Thin gate-oxide tunneling leakage

The silicon dioxide (insulator) between active and gate contact is thin, circa a few atoms in modern technologies. Due to the thickness of the insulator, there can be two types of leakages found. The first current leakage is called thin gate tunneling  $I_{TUNNEL}$ . The thin gate tunneling current is generated due to carries that are tunnelling through the thin insulator. The second one is known as hot carrier injection current  $I_{HC}$ . The cause of hot carrier injection current is the massive kinetic energy of carries which can overcome the gate potential barrier and go through the thin insulator. This effect is more usual to happen to electrons because their voltage barrier and effective mass are smaller than for holes [4].

#### 2.3.3 Reverse-bias pn-junction leakage

The last explained type of leakage is reverse-bias pn-junction leakage, which is caused by the structure of the CMOS transistor. In the structure of the CMOS transistors, shown in Figure 2.2, there are p-n junctions created between the active and the substrate. This pn-junction acts like a well-formed diode. Even if this diode is reverse-biased, the current flowing through is significant, because the reversed-biased diode still conducts a small amount of current I<sub>D</sub>. This reverse-bias diode current can be expressed as equation (2.5):

$$I_D = I_S \left( e^{\frac{V_{DB}}{V_T}} - 1 \right), \tag{2.5}$$

where  $I_s$  is reverse saturation current (parameter for the device),  $V_{DB}$  is the voltage between the drain and the body of the transistor and  $V_T$  is the thermal voltage which depends on temperature. For room temperature (T = 300 K) the value of  $V_T$  is 26 mV. Also, the thermal voltage can be expressed as equation (2.6):

$$V_T = \frac{kT}{q},\tag{2.6}$$

where the k is Boltzmann constant ( $1.38 \cdot 10^{-23}$  J/K), T is the absolute temperature in Kelvin and q is the electron charge [4].

## **3. LOW POWER TECHNIQUES**

The price of the chip is based on many things like the complexity of design, technology, size, performance, reliability, and power consumption. These parameters can be adjusted to the application, but this chapter will explain, how to reduce the power consumption. Power consumption is important in nanometer-scale devices because the power density is high. Power density can be decreased if the power consumption is lower. The high-power dissipation can cost a lot of money because the package of the chip will cost a lot too [3].

Low power techniques are supposed to reduce power consumption. A combination of low power techniques can be used to reduce power consumption to minimal value because different techniques can reduce only specific power consumption, like dynamic or static power. Low power techniques can increase the area of the cell or worsen another parameter of the circuit. There are a lot of techniques that can be used in application-specific integrated circuits. These techniques can be also divided into different levels of chip development as it is shown in Table 3.1. All design levels will be explained in the following subchapters [3].

| DESIGN LEVEL           | STRATEGIES                                                                                                                                                                                                                                                                    |  |
|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| SYSTEM OR CHIP         | partitioning, power-down, clock tree optimization (clock<br>gating), level shifters, always-on cells, substrate biasing<br>or adapting back biasing, and sleep or standby mode                                                                                                |  |
| PLACE AND<br>ROUTE     | placement/floorplan (reduce physical capacitance and<br>resistance), optimizing, buffering, routing, on-chip<br>power distribution, and glitch power                                                                                                                          |  |
| ALGORITHM              | parallelism and pipelining                                                                                                                                                                                                                                                    |  |
| ARCHITECTURE           | voltage scaling, frequency scaling, clustered voltage<br>scaling, adaptive voltage scaling, low swing clock, low<br>swing clock double-edge triggered flip flop                                                                                                               |  |
| LOGIC                  | data coding/encoding and operand isolation                                                                                                                                                                                                                                    |  |
| TECHNOLOGY             | threshold reduction and multi-threshold devices                                                                                                                                                                                                                               |  |
| CIRCUIT OR<br>TOPOLOGY | transistor sizing, longer channel in non-critical paths in<br>small technologies, best topology, pin swapping, special<br>circuits as state retention flip flop, dual or multi-bit flip<br>flops, dual edge-triggered flip flops, pulsed latch, and<br>multi-bit pulsed latch |  |

Table 3.1 - Different design levels with examples of low power techniques

### 3.1 System

The system-level should mainly provide information about the specification of the chip like voltage distribution. Chip developers can retrieve useful information from the specifications and can state which methods can be used for the best performance of the chip. There are many techniques how to make a better performance, for example, the chip can be divided into different blocks. These divided blocks can be powered with different power supply voltage depending on demands from the block because some parts of the chip do not need to be fast. The lower voltage of the block can reduce power consumption as it is shown in equations (2.2), (2.3), and (2.4) but at the cost of slower performance. The decision of what supply voltage the block will have depends on the application. Level shifters, which can change the levels of the signal, are used between blocks with different power supply voltages [3].

There may be parts of the chip that do not need to be on all the time, on the other hand, there may also be parts of the chip that need to be on all the time. Always-on cells can serve as an example, because they are turned on all the time. As it was explained in the chapter before, there is some standby power consumption which is significant in today's power consumption of the whole chip. A block or cell that does not need to be powered at a precise time can be turned off using two transistors connected between the power supply and the block and ground. It saves power consumption, but the area is increased because there were two transistors added to turn off the block. However, the design is becoming more complex.

The designer needs to place isolation cells between active and inactive blocks to reduce ticking especially in the clock path. This method is called clock tree optimization and it can save a lot of dynamic power consumption across the chip. In ultra-low techniques, substrate biasing or adapting back biasing can be used to reduce leakage if it is necessary. It means that the bulk of the transistors can be connected to a different voltage than typical  $V_{DD}$  or GND [15].

#### 3.2 Place & Route

Place and route are important steps in the chip development. This step can make difference between a good chip and a bad one because the correct placement of the cells can reduce parasitic elements such as resistance, inductance, and capacitance. Cells should be organized as a group of cells that need to be close to themselves because the routing can add parasitic elements too.

If the routing is not optimized, the size of the chip can be increased, because more buffers are needed than in the optimized state. On-chip power distribution is how the grid of rails with supply voltage will look. Power voltage distribution needs to be also optimized for application because it can help with voltage drop and performance of the chip. Unwanted glitch needs to be fixed too because it is causing more dynamic power consumption. However, the output is changed in a short time and without purpose. These glitches can be fixed with an additional simple logical gate that does not change the final logical function. The glitch can be also known as a hazard in combination with logical circuits. The simplest glitch generator can be made with two logical gates, an inverter and AND gate. The circuit is shown in Figure 3.1 [3].



Figure 3.1 - Example of AND glitch generator

## 3.3 Algorithm

The algorithm has a large impact on power consumption. Pipelining and parallelism can be used as the algorithm. These methods can be used in a combination to achieve a maximal reduction of power consumption. Pipelining can also reduce the critical path, and increase the clock speed, or sample speed. Parallelism is used for increased sample speed and reduction of power consumption. The used circuit is duplicated and connected in parallel to the original circuit with a multiplexer connected to the end of both parallel circuits. Circuits can work with half of the frequency than before, but the multiplexer needs to be controlled with the original frequency to control the output correctly. Both methods can be used separately but for the best efficiency, methods should be used together. Both methods are shown in Figure 3.2. The biggest disadvantage of these techniques is the increased area of the circuit. On the other hand, this leads to lower power consumption [3].



Figure 3.2 - Differences between (a) normal, (b) parallelism, and (c) pipelined circuit [3]

#### 3.4 Architecture

The architecture is like a system, but it has more advanced techniques. As it is written in the system subchapter, the parts (blocks) of the chip can be powered with different voltages. In the case of the system, this voltage is fixed. In architecture, the voltage can be decreased or increased according to the desired performance of the block in time. This technique is called voltage scaling. The application can change the voltage between 3 fixed voltages like fast, typical, and slow performance. The fast performance corresponds with higher voltage. The typical performance is a nominal voltage which is placed between fast and slow performance. The slow performance corresponds with a lower voltage than the nominal one. If the performance is needed to be high, the voltage can be increased depending on the application. Also, if it is not needed to have high performance, the voltage can be lower, and the circuit can save some power consumption. The voltage is changed automatically by the chip [16].

A more advanced technique of changing voltage is called adaptive voltage scaling, which is like voltage scaling, but the voltage can be changed linearly from minimal to the maximal value given by the circuit. The frequency can also be changed depending on the desired performance. The frequency of the clock can be reduced if the chip does not need to be fast and the circuit can save power consumption because the frequency is variable in dynamic power consumption. The frequency can be increased or decreased as it is with voltage. The frequency scaling can be made with a divider to have some fixed values, or the frequency can be adaptive. Adaptive frequency scaling can be scaled linearly [3].

Digital combination logical circuits are often controlled by the rising edge. In special applications, the circuit can be triggered by a falling edge or even with both edges. The circuits working on both edges are called dual trigger edge circuits. These circuits can change the output twice as much than standard circuits can. These are triggered only with one edge. For example, flip flops can be double edge-triggered, or they can even be double-edge triggered with a low swing clock source [3].

## 3.5 Logic

The coding and encoding need to be implemented correctly because different coding and encoding implementations can lead to different power consumptions and delays. For example, a 3-bit counter can be implemented using the binary counting sequence or the Gray code sequence. Both codes have different counting sequence results and number of toggled bits as it is shown in Table 3.2. The power dissipation is related to toggle activities, a grey code is generally more power-efficient than a binary code. This is the reason why logical code must be chosen correctly for low-power chips.

Operand isolation, which can reduce power dissipation in the data path, can be used in designs. The signal is controlled by enabling signal when the data path element is not needed to save power consumption [16].

| Bina     | ry code            | Gra      | y code    |
|----------|--------------------|----------|-----------|
| Sequence | Number of Security | Sequence | Number of |
| Sequence | toggles            | Sequence | toggles   |
| 000      | 3                  | 000      | 1         |
| 001      | 1                  | 001      | 1         |
| 010      | 2                  | 011      | 1         |
| 011      | 1                  | 010      | 1         |
| 100      | 3                  | 110      | 1         |
| 101      | 1                  | 111      | 1         |
| 110      | 2                  | 101      | 1         |
| 111      | 1                  | 100      | 1         |

Table 3.2 - Binary and Gray code sequence

#### 3.6 Technology

In every technology, there are transistors with different voltage thresholds like low voltage threshold (LVT), standard voltage threshold (SVT), and high voltage threshold (HVT). Different voltage thresholds of transistors lead to different results like speed and power consumption. Transistors with low voltage thresholds are faster with higher leakage than standard transistors are. These transistors are used when the circuit needs to be fast, and it does not matter how big the power consumption is. If the power consumption needs to be as lowest as possible, transistors with high switching threshold need to be chosen, because they have minimal leakage. However, at the cost of lower speed. The selection of transistors depends on the application [14].

## 3.7 Topology and Circuit

The topology of the circuit must be chosen carefully because different topologies always lead to different results; in area, power consumption, leakage, clock to output delay, and other parameters. The sizing of transistors is important too because the size of the transistor has a big impact on many parameters of the circuit. The size of the transistor means the width and length of the gate as it is shown in Figure 1.5.

Transistors with a bigger length of the gate are often used in critical paths because as it was explained before, transistors with a small length of the gate are leaking more than transistors with a longer length of the gate. Output stages are often created with transistors that have big width of a gate to have a possibility of driving a big load.

Pin swapping is also important because pins have different capacitances. Different capacitances are caused by the connectivity of the circuit and layout which can cause parasitic elements. For example, if two pins have different capacitances, the circuit should

better use the pin which has less capacitance, because the capacitance is a parameter for dynamic power consumption, and with less capacitance, the power consumption is also smaller.

Special topologies, like a state retention flip flop composed of two parts, can be used. The first part is a typical flip flop with a power voltage domain which can be turned off if the flip flop is not used. The second part of the state retention flip flop is internal memory which has its own power voltage domain that cannot be turned off. The memory can save the last data that can be used later. A flip flop can be used in multi-bit versions which can save area better than a single flip flop can. Special flip flops can also work for both edges. Multi-bit flip flops can be replaced with a more saving circuit like the multi-bit pulsed latch. The multi-bit pulsed latch is the main topic of this thesis which will be explained in the next chapter [3].

## **4. MULTI-BIT PULSED LATCH**

A multi-bit pulsed latch is a new solution for low-power libraries. In comparison with flip flops, multi-bit pulsed latches have a smaller area and a smaller power consumption. The multi-bit pulsed latch is composed of a pulse generator and pulsed latches shown in Figure 4.1 [3].



Figure 4.1 - Block diagram of multi-bit pulsed latch in normal mode

Multi-bit means that there is more than one latch that can be driven by one pulse generator. The pulsed latch is a simple latch that can be driven by a short pulse instead of a standard clock source. The latch driven by a pulse has the same truth table as the flip flop which means that the flip flop can be replaced by a pulsed latch. A pulse that is sufficiently wide enough is generated in a pulse generator. There are two possibilities for how the pulse generator can be developed as it is shown in Figure 4.2. The global pulse generator is developed separately from the latch and is used less frequently because of parasitic elements (longer routing than in the local pulse generator) and it is more complicated in place and route phase.



Figure 4.2 - Design of global and local pulse generator

The local pulse generator is a part of a multi-bit latch cell and is developed to cover all problems that can be found in the exact multi-bit pulsed latch. The layout of the multi-bit pulsed latch is developed as one cell which is better because the layout is created with minimal size and minimal additionally routing [2].

The specification of this thesis states that a multi-bit pulsed latch should work in scan mode. Scan mode is a mode where the output of the first latch is connected to the input of the next latch and the output of this latch is connected to the next latch input. This continues until the penultimate bit of the chain is connected to the last latch. This mode works as a shifter (serial memory) which is shown in Figure 4.3. Also, the circuit can be switched to normal mode where all inputs of latches can be used separately as it is shown in Figure 4.1. The switch is created as a multiplexer with two inputs (data and serial data) and one output. A multiplexer will be explained later because even multiplexers can be created differently which can lead to different results. Multi-bit versions are most often developed as 2, 4, and 8 bits with different output strength and additional functions like set, reset, or both. The practical side of this work is concerned with creating multi-bit pulsed latches with 2, 4, and 8 bits with reset pin and scan input. The area and power consumption savings should be most significant in the 8-bit version of the multi-bit pulsed latch.



Figure 4.3 - Block diagram of multi-bit pulsed latch in scan mode

#### 4.1 Pulse generator

As it was mentioned at the beginning of this chapter, the pulse generator can be developed as local or global. In the case of this work, there will be a pulse generator for local purpose designed. Many pulse generators can create different pulses but only one pulse generator must be chosen. Main differences between pulse generators are power, area of the cell, rising and falling times of edges, maximal load, the minimal and maximal activity of the clock, behaviors across the PVTs, and on what event is the pulse generated. Pulse can be generated for different events like rising, falling, or both edges.

This chapter will be divided into two sections. The first subchapter will cover pulse generators that are suitable for this work and can be used. The second subchapter will cover pulse generators that are special and can be used in exact cases if the application needs it. The composition of the pulse generator is simple. The generator is composed of some combination logic which makes glitch (pulse) and the delay block with negation as it is shown in Figure 4.4. The width of the glitch is dependent on the delay between clock signals as it is shown in Figure 4.5 where the signals were simulated in MATLAB and based on the block diagram shown in Figure 4.4. The simulation is very simple because it does not include parasitic elements. The width of the generated pulse is dependent on the delay block. The width of the pulse will be different across technologies.



Figure 4.4 - Block diagram of the simple pulse generator



Figure 4.5 - Simulated block diagram from Figure 4.4 in MATLAB

#### 4.1.1 Suitable pulse generators

The aim of this thesis is to reduce power consumption and area of the cell in a comparison to a multi-bit flip flop. This means that the chosen topology of the pulse generator should be simple yet power efficient. This subchapter will be about pulse generators that generate a pulse on the rising edge of the clock signal. At a first sight, the most suitable circuit for the pulse generator seems to be a 2-input logical AND with an inverter in one of the inputs which makes the delay as it is shown in Figure 4.6. The combination logic which makes glitches is not too hard to find because there is just one option for the circuit to have those required parameters like the smallest possible area and power consumption. The AND logical gate is the most suitable combination logic that can be used.



Figure 4.6 - Simple pulse generator made of inverter and 2-input AND

The main problem of the pulse generator development is the delay part that can also negate the clock source. Delay can be made in many ways, but every topology has different results. The simplest delay can be made with a standard inverter composed of 2 transistors as it is shown in Figure 4.6. The Delay of the inverter is dependent on the size of the transistors. As it was mentioned, the size of the gate can be changed with width and length parameters. The minimal width and length are given by technology. The delay of the inverter can be increased with decreasing width, but there is a minimal value of width that cannot be crossed. If the delay is not delayed enough to make the correct pulse, the length of the gate can be increased. A longer length makes the inverter slower, and the delay increases too. The delay dependent on the length of the simple inverter's channel is shown in Figure 4.7. A few values from the graph are shown in Table 4.1.

| Length [nm] | Delay [ps] |
|-------------|------------|
| 60          | 7.705      |
| 120         | 17.56      |
| 180         | 26.99      |
| 240         | 36.65      |
| 300         | 47.04      |

Table 4.1 - Example of values from the graph in Figure 4.7



Figure 4.7 - Graph of dependence of inverter delay on transistor channel length

The disadvantage of the simple inverter, for example of a delay block, is bigger leakage in comparison to other topologies. Leakage can be limited if different topology of the inverter is used. The better topology for the inverter is a stacked inverter that has lower leakage in comparison to a typical inverter, but the size of the cell is bigger. The stacked inverter is shown in Figure 4.8. A better solution is a double-stacked inverter as it is shown in Figure 4.9 [17]. There were opinions that more inverters with lower lengths can be used to make the delay bigger, but the power consumption is huge in comparison to other topologies. There is a simulated circuit with three and five inverters in serial shown in Figure 4.10.



Figure 4.8 - Stacked inverter



Figure 4.9 - Double-stacked inverter



Figure 4.10 - Five inverters delay (top) and three inverters delay (bottom)

The measured data for all mentioned circuits are shown in Table 4.2. There were three basic parameters measured that are most important for this work; total power, leakage, and area of the cell. The width of the cell is not so important because the actual area of the cell can be different based on this prediction. Still, it needs to be measured to make sure the width of the cell is not so different from other values of pulse generators. The layout of the cell can be different because some tricks that can help to reduce the area can be used. The double-stacked inverter is expected to have the smallest size of the layout. Leakage in comparison to total power is not so significant but the leakage in the whole chip can be significant and there is a slight difference between topologies, so the leakage needs to be investigated. The five inverters delay leaks the most, which is caused by added inverters. Also, the leakage is increased in the case of three inverters delay. The best topology for leakage is a double-stacked inverter. The total power consumption is also increased if the inverters are added, similarly to the case of three and five inverters in comparison to a single inverter. That is the reason why three and five inverters delays cannot be used because they do not have any advantages. Also, the layouts of three and five inverters are expected to be bigger than other topologies because of design rules which need to be met. Total power was the lowest with double-stacked inverters.

| Type of delay           | Total power [µW] | Leakage [pW] | Width of delay [nm] |
|-------------------------|------------------|--------------|---------------------|
| Inverter                | 1.258            | 71.96        | 755                 |
| Three inverters         | 1.325            | 83.59        | 582                 |
| Five inverters          | 1.441            | 93.06        | 560                 |
| Stacked inverter        | 1.329            | 70.37        | 748                 |
| Double-stacked inverter | 1.229            | 68.28        | 666                 |

Table 4.2 - Parameters of simulated pulse generators

The chosen topology is a double-stacked inverter because the circuit has the best total power and leakage. The width of the cell is not the best but as it was said before, the width of the cell in Table 4.2 is just a prediction. At this point, the circuit looks like there can be tricks used to reduce the area of the layout more than in other topologies. The rising and falling edge in every circuit is similar because the output stage is the same size. The simulation is the same in all cases in terms of load, frequency, activity, and other parameters. The chosen circuit is shown in Figure 4.11. The output pulse of the pulse generator for typical, slow, and fast process corners is shown in Figure 4.12. Specifications of corners are shown in Table 4.3. Models of transistors are simulated with  $3\sigma$ . The maximal frequency, which can be used, is limited by load, power supply voltage, and temperature. There should not be a minimum frequency that can be used because the width of the pulse is given by the delay block in the pulse generator. The next subchapter will explain pulse generators that can be used in special applications.



Figure 4.11 - Double stacked inverter with 2-input logical AND



Figure 4.12 - Generated pulses across the PVTs from chosen generator

| Name        | Slow corner         | Typical corner | Fast corner |
|-------------|---------------------|----------------|-------------|
| Process     | Slow 3 <sub>5</sub> | Typical        | Fast 30     |
| Voltage     | 1.08 V              | 1.20 V         | 1.32 V      |
| Temperature | -40 °C              | 27 °C          | 150 °C      |

Table 4.3 - Table of specific cross corners

#### 4.1.2 Pulse generators for special purpose

The purpose of the previous subchapter was to make a pulse generator that can make a pulse for the rising edge of the clock signal. These kinds of generators can make a pulse on the falling edge or both edges. First, the pulse generator that can make a pulse on the falling edge will be explained. This generator must be used for special purposes because the system needs to be prepared for this kind of generator. A pulse generator working on the falling edge can be made like the pulse generator shown in Figure 4.6 but instead of a logical AND, there is a logical NOR as it is shown in Figure 4.13.



Figure 4.13 - Block diagram of the pulse generator with 2-input NOR gate

The delay block can be also made with different inverters as they were before. The simulated pulse of this generator is shown in Figure 4.14. The simulated circuit in Cadence Virtuoso is shown in Figure 4.15. There is also the possibility to make a pulse generator for both edges. It means that this pulse generator can generate a pulse for rising and falling edges. The circuit is not different from the previous version because instead of AND or NOR logical gates, there is XOR logical gate. In comparison to other pulse generators, the area and power consumption of XOR cell is higher because there are more transistors, and the circuit is changing output two times as much than circuits before did. The circuit is shown in Figure 4.16. The basic parameters for both pulse generators are shown in Table 4.4 [2].

| Type of logical gate | Total power [µW] | Leakage [pW] | Width of cell [nm] |
|----------------------|------------------|--------------|--------------------|
| NOR                  | 1.226            | 126.9        | 756                |
| XOR                  | 1.836            | 161.1        | 822                |

Table 4.4 - Parameters of special simulated pulse generators



Figure 4.14 - Simulated block diagram from Figure 4.13 in MATLAB



Figure 4.15 - Circuit of the pulse generator with NOR logical gate



Figure 4.16 - Circuit of the pulse generator with XOR logical gate

## 4.2 Pulsed latch

This chapter is focused on single latches with only one output pin Q that can be controlled by pulse. Latches are compared in parameters that are specified by an assignment like power consumption and area. Picking a latch is a complicated process because more parameters need to be monitored at once. Found suitable topologies are shown here:

- PTLA [1]
- SSALA [1]
- SSA2LA [1]
- CPNLA [1]
- PPCLA [1]
- PPCLA modified

The comparison of the pulse latches is shown in Table 4.5. The circuit was simulated with the same inputs; the same simulated time, inputs (data and clock), and frequency. Voltage was set as a nominal value which is 1.2 V. Also, the process was set as a typical corner (typical models). The circuits were first simulated without chosen pulse generator which was explained in a previous chapter. An ideal pulse generator was used with parameters to make a pulse with the width of 100 ps and a frequency of 100 MHz. Minimal sizes of transistors were chosen. If the transistor needs to be bigger to make the

circuit functional, bigger sizes of transistors are chosen. There were 6 latches simulated as it is shown in Table 4.5.

| Name of circuit | Total<br>power<br>[nW] | Leakage<br>power<br>[pW] | No. of<br>transistors<br>[-] | Minimal width<br>of pulse [ps] | CLK to Q<br>propagation<br>delay [ps] |
|-----------------|------------------------|--------------------------|------------------------------|--------------------------------|---------------------------------------|
| PTLA            | 493.8                  | 163.7                    | 10                           | 94                             | 157.34                                |
| SSALA           | 371.6                  | 194.4                    | 11                           | 53.5                           | 120.36                                |
| SSA2LA          | 371.7                  | 172.6                    | 12                           | 55                             | 113.91                                |
| CPNLA           | 550.5                  | 201.5                    | 13                           | 56.5                           | 139.54                                |
| PPCLA           | 313.2                  | 165.1                    | 12                           | 41                             | 91.39                                 |
| PPCLA mod.      | 320.7                  | 166.3                    | 12                           | 35                             | 112.20                                |

Table 4.5 - The comparison of latches controlled by pulse

The first simulated latch was the PTLA latch that is shown in Figure 4.17. This topology is not suitable for this application because of the large total power and the minimal width of the pulse. The predicted area of the PTLA latch is the smallest of all simulated circuits and the leakage power is also minimal. However, in comparison to other topologies, there are more disadvantages than advantages and that is the reason why this circuit is not suitable.



Figure 4.17 - PTLA topology of a pulsed latch

SSALA in comparison to PTLA is better, especially in power consumption, clock to output propagation delay. Also, the minimal width of pulse can be lower. The number of transistors is increased than in the previous PTLA latch. The leakage power is higher, but the leakage is a bit decreased in a modified version of this circuit. The SSALA is shown in Figure 4.18. A modified version of SSALA is called SSA2LA. There is one more transistor that is driven by pulse. This transistor slightly affects leakage but in comparison to total power, it is not important. The predicted area is also slightly higher because another transistor is added. The minimal width of the pulse is higher than in the previous version but clock to output propagation delay is lower. The SSA2LA is shown in Figure 4.19, but it is still not a suitable circuit for this work's purpose. CPNLA is the worst circuit based on the Table 4.5 because total power consumption is huge in comparison to other pulsed latches that were mentioned. The circuit of CPNLA is shown in Figure 4.20.



Figure 4.18 - SSALA topology of a pulsed latch



Figure 4.19 - SSA2LA topology of a pulsed latch



Figure 4.20 - CPNLA topology of a pulsed latch

The first circuit, which is the best in almost all parameters, is PPCLA. It is great in total power consumption; leakage is similar to PTLA latch and the minimum width of the pulse is even lower than in latches mentioned before. The greatest disadvantage of this circuit is the predicted area of the cell. The area of the circuit is bigger than PTLA, SSALA, and SSA2LA but the performance is better. PPCLA is shown in Figure 4.21. In the Table 4.5, there is also a modification of the circuit PPCLA that is better than standard PPCLA.

Different circuits have different results as it is shown in Table 4.5. The main difference between PPCLA and modified PPCLA is the feedback. In the case of modified PPCLA, the feedback is divided into two parts: into an inverter and a pass gate. At first, the results did not look well. The total power consumption and leakage were slightly worse than standard PPCLA, but the minimal width of pulse can be shorter, which means the pulse generator can be developed with a smaller size and with smaller power consumption. The combination of the pulse generator and PPCLA modified latch can be lower because the pulse can be shorter. The most important thing is, that PPCLA modified latch can have a longer path from clock to output. This is important because of the scan mode which needs to be a part of the circuit.

If the circuit does not need scan mode, the standard PPCLA can be chosen. However, the scan mode is necessary, which means that chosen topology of the pulsed latch is modified PPCLA. The whole point of why this pulsed latch was chosen will be explained in the next chapter in more detail because scan mode has a problem in corners. The chosen modified PPCLA is shown in Figure 4.22.



Figure 4.21 - PPCLA topology of a pulsed latch



Figure 4.22 - PPCLA modified topology of a pulsed latch

### 4.3 Multi-bit pulsed latch

The main problem of creating a multi-bit pulsed latch is not the process of choosing the latch, but the process of making a multi-bit pulsed latch with scan mode out of the latch that was chosen. It is because the width of the pulse in scan mode is not limited just from the bottom but also from the top. Two errors may occur in scan mode. In the first one, the pulse width might be short. In that case, the latch would not work properly. In the second one, the pulse width might be so wide that one pulse would apply the data signal to the second latch. The output state of the second latch would be also changed with the single pulse. The width of the pulse can be changed by the process itself, voltage, and temperature. This means that there might be two corners that need to be examined. It is the fast corner where the voltage and temperature are the most possible. The voltage is 1.32 V and the temperature 150 °C, and also fast models of the transistors are used. The slow corner is the opposite. The voltage is 1.08 V, the temperature is -40 °C, and used models of transistors are slow. The nominal corner is 1.2 V and 27 °C with typical dispersion of models.

Before the simulation of the multi-bit pulsed latch, the multiplexer topology must be chosen, which is then used for switching between scan mode (shifter) and a normal mode where inputs of all latches can be connected separately. The standard multiplexer design is shown in Figure 4.23. The standard multiplexer has four inputs; D (data), SI (scan input), SE (scan enable), SEN (scan enable negated), and one output called Q. The design

is made of 10 transistors. In the case of the 8-bit multi-bit pulsed latch, there are 80 transistors because there needs to be a multiplexer at the beginning of each latch. This topology of the multiplexer is not appropriate for this work because of the large area [3].

Better topology of the multiplexer is shown in Figure 4.24. It is composed of 4 transistors with the same input, output, and other control pins, similarly to the standard topology. This topology is made of pass gate transistors. About 60 % of the area can be saved in comparison to a standard multiplexer. The pulsed latch with scan mode and negated output is shown in Figure 4.25. From the selected multiplexer, a multi-bit pulsed latch with scan mode can be created [3].



Figure 4.23 - Standard 2-bit multiplexer



Figure 4.24 - Pass gate topology of 2-bit multiplexer



Figure 4.25 - Pulsed latch with scan mode and negated output

Figure 4.26 shows a 2-bit pulsed latch with also negated output stage for complicated conditions in simulation. As it was mentioned before, there are two issues that might occur. These issues give us minimal and maximal value of the width of the pulse which can be used for maximal functionality of the circuit. This range is given by technology that cannot be changed; the topology of the latch, and the point of the chain where the signal is taken to the next latch. The modified PPCLA is chosen because the minimal width of the pulse is the smallest in Table 4.5 and the clock to output propagation delay is higher than in the case of the standard PPCLA because there are more inverters in the path that are used as an advantage.

The point to the next latch is often taken from the internal node which represents the output data of the first latch. In all circuits, there is no internal node that can reach maximal functionality, because the propagation delay between the pulse and path to the next latch stage is short. It means that the circuit can work only in a fast corner or slow corner, not in both. There are two solutions to this problem. The first one is an additional delay. The additional delay can be added to the path, especially before the input of the mux in the next latch. This solution with an additional delay block would have bigger power consumption and a bigger size of the area. That is unacceptable.



Figure 4.26 - 2-bit pulsed latch with scan mode and negated output

The second solution is to take the point for the next latch from the output stage where the Q is. The path will be long enough to prevent the stability of the circuit in all corners, fast and slow. The second solution has better power consumption and area because there was not any inverter added. The disadvantage of the circuit is that the maximal output load will be a bit decreased. However, the maximal output load will not be decreased as significantly in comparison to the first solution. The second solution is better because there are more advantages than disadvantages. If this solution is applied on standard PPCLA, the circuit will not work because the path is still not long enough. That is the reason why the modified version was chosen. The completed 2-bit pulsed latch is shown in Figure 4.26. The typical results for the circuit in scan mode are shown in Figure 4.28. It is shown that the circuit acts like a shifter because  $Q_1$  output is the same as  $Q_0$  but it shifted with another created pulse. Results for normal mode are shown in Figure 4.29 where outputs should have the same results because the same data inputs were applied for both latches.

The 2-bit pulsed latch can be modified with additional features like reset, set, or both. The required addition feature in an assignment is reset which is created with three transistors shown in Figure 4.27. The first idea that came to mind was that the reset can be done with only one transistor, but this thought was incorrect. The reason why the reset cannot be designed with one transistor is that the circuit has feedback. If the reset and

input combination of the circuit had the right values, the circuit would be in shortcut. Therefore, one transistor can reset the circuit but in some cases, the circuit can be in shortcut which is an unwanted state. The shortcut is undesirable because the circuit can be damaged in a moment. A reset transistor can be used, but two more transistors need to be added because those paths where the shortcut can occur need to be isolated.

As it was mentioned in chapters before, standard bit versions are often made as 2, 4, and 8 bits. The 4-bit pulsed latch is shown in Figure 4.30 where the pulse generator needed to be adjusted because the load of the pulse generator was increased. The length of the delay double-stacked inverter needed to be increased and the output stage of the pulse generator needed to be bigger too. The same generator, as for the 4-bit pulsed latch, was used for the 8-bit pulsed latch shown in Figure 4.31. The 8-bit pulsed latch is working with the same adjusted generator as in a 4-bit pulsed latch. The maximal frequency for all bits is shown in Table 4.6. The maximal frequency is about 1.6 GHz in the case of 2-bit and 4-bit versions. Frequency is slightly decreased in the 8-bit version and the maximal frequency is limited to 1.5 GHz.

Table 4.6 – Maximal frequency dependence on the number of bits

| Bits | Maximal frequency |
|------|-------------------|
| 2    | 1.6 GHz           |
| 4    | 1.6 GHz           |
| 8    | 1.5 GHz           |



Figure 4.27- Used topology of the pulsed latch with a multiplexer and reset pin



Figure 4.28 – Graphs of 2-bit pulsed latch in scan mode



Figure 4.29 – Graphs of 2-bit pulsed latch in normal mode



Figure 4.30 - The 4-bit pulsed latch with scan mode and negated output



Figure 4.31 - The 8-bit pulsed latch with scan mode and negated output

# 4.4 Comparation of multi-bit flip flops and multi-bit pulsed latches in simple simulation from schematic netlist

This chapter compares results of multi-bit flip flops and multi-bit pulsed latches simulated from the schematic netlist. That is the first step before stating that a multi-bit pulsed latch can be used as a replacement for multi-bit flip flops. This comparison is based on power consumption and the predicted area of the cell. The predicted area is still predicted because there is no layout of multi-bit pulsed latches that can be compared to the multi-bit flip flop layouts. The power consumption is not final too because if the simulation is made, there can be parasitic elements simulated, like resistances and capacitances. There can be a slight difference between results, but it is still a comparation of schematic circuits versus schematic circuits.

The comparison is divided into 3 tables because a multi-bit pulsed latch is compared across the bits. First results of the 2-bit pulsed latch are shown in Table 4.7. As it is shown, results are not good because the power consumption in both modes is worse than in the 2-bit flip flop. It is caused by the pulse generator because the pulse generator is significant in comparison to the whole power consumption. Also, there are just two latches for one generator. The predicted area is also not smaller, but that can be caused by the estimation of inaccuracies. The final layout can be smaller in comparison to the 2-bit flip flop. It looks like the 2-bit pulsed latch is not worth using instead of the 2-bit flip flop because both parameters are worse. When the layouts are compared, the results can be different.

|                           | Mode   | Pulsed latch | Flip flop | Difference [%] |
|---------------------------|--------|--------------|-----------|----------------|
| Power consumption<br>[µW] | Scan   | 2.535        | 2.318     | -9.36          |
|                           | Normal | 2.512        | 2.33      | -7.81          |
| Predicted area<br>[µm²]   | -      | 24.6         | 23.2      | -6.06          |

Table 4.7 - A comparison of the 2-bit pulsed latch and the 2-bit flip flop

The results of the 4-bit pulsed latch are more promising because it is better in comparison to the 4-bit flip flop. The results are shown in Table 4.8. Even the power consumption in scan mode and normal mode has different results. In scan mode, the power consumption difference between the 4-bit pulsed latch and the 4-bit flip flop is about 4.32 %. In normal mode, the difference is about 6.34 % which is even better. The predicted area is better about 19.05 % which is a great success in comparison to the 4-bit flip flop. As it was mentioned before, the predicted area is still a predicted area which means that in the final layout, the difference can be even bigger. The amount of the saved area in

comparison to a multi-bit flip flop is caused by the pulse generator which becomes more insignificant in the comparison to the whole layout.

|                           | Mode   | Pulsed latch | Flip flop | Difference [%] |
|---------------------------|--------|--------------|-----------|----------------|
| Power consumption<br>[µW] | Scan   | 4.342        | 4.538     | 4.32           |
|                           | Normal | 4.272        | 4.561     | 6.34           |
| Predicted area<br>[µm²]   | -      | 36.9         | 45.6      | 19.05          |

Table 4.8 - A comparison of the 4-bit pulsed latch and 4-bit flip flop

With increasing bits of multi-bit pulsed latch, the power consumption and area should be saved more in comparison to the multi-bit flip flop because the pulse generator is becoming more insignificant in comparison to the whole area and power consumption of pulsed latches. The most significant difference is shown in Table 4.9 which compares an 8-bit pulsed latch with an 8-bit flip flop. The power consumption in the case of scan mode is saved by circa 13.36 % and normal mode is about 14.6 % which is a significant value of power. The chip can have a lot of multi-bit flip flops that can be replaced with a better solution like the multi-bit pulsed latch. The predicted area of the cell also represents a huge difference because the predicted area is about 32.52 % smaller than the multi-bit flip flop.

|                           | •      |              |           |                |
|---------------------------|--------|--------------|-----------|----------------|
|                           | Mode   | Pulsed latch | Flip flop | Difference [%] |
| Power consumption<br>[µW] | Scan   | 7.721        | 8.912     | 13.36          |
|                           | Normal | 7.689        | 9.003     | 14.60          |
| Predicted area<br>[µm2]   | -      | 60.5         | 89.6      | 32.52          |

Table 4.9 - A comparison of the 8-bit pulsed latch and 8-bit flip flop

## 4.5 Layouts of designed multi-bit pulsed latches

The most difficult and time-consuming part of creating an entire library of digital standard cells is designing the layouts. The layout is simply a physical representation of a schematic netlist that is used in the manufacturing process. Each layer represents a mask that is used in fabrication. This process of layouting is very intensive and time consuming because digital standard cells need to be designed with accuracy. There is no tool that can automatically create a layout from a schematic netlist. In the past, layouters were drawing the layout on paper which is nowadays unimaginable. On the other hand, they had different sizes of transistors. Now we are using software that is helpful. In this case, Virtuoso software from Cadence is used.

Many people know the software called Eagle or Altium design software that can be used as layout software for PCB (Printed Circuit Board). They all include special design rules that can give rules that must be complied with. For example: spacing between wires, spacing between devices, size of drill, and others. Design rules used in PCB layouts are similar to the ones used in layouts of chips. A PDK (Process Design Kit) exists in each technology which can give us this kind of special design rules for each layer. These design rules must be met to have correctly manufactured chips. These design rules can be also different in various libraries, it is based on requirements. The typical rule in standard cells is spacing between metals which can be different because of the width and length of the metal. These design rules can be checked with a Calibre nmDRC (Design Rule Check). This tool can save a lot of time because it can show a list of unfulfilled design rules and the location of the incorrect rule in the layout. That is very helpful. Since layouts are made manually from a schematic, there must be a check which can check the difference between the designed layout and the schematic.

The calibre tool which can check these two views is Calibre nmLVS (Layout Versus Schematic). This tool works on the netlist level. Netlists of both views are extracted and then compared. Also, the sizes of the transistors are compared to have certainty that both views are the same. The design can be simulated on two levels. One of them can simulate the design on a schematic level. This simulation was used before because there was no created layout. The second one can simulate the design on the layout level which can include parasitic elements like resistances, capacitances, and inductances. Simply, the layout netlist is enriched with parasitic elements that can be simulated to get even more realistic results. This tool is called Calibre PEX (Parasitic Extraction). These tools are used in designs of digital standard cell libraries to make them correct.

As it was mentioned before, layouts are time-consuming, but it also depends on how complicated the circuit is. In this case, the circuit is very complicated, because the size of the circuit is large. At the start of layouting, the layouter needs to think about strategy. The placement of transistors is the most crucial because it has a huge impact on how big the area of the cell will be. Euler path can be used as a reduction method. It is a method that can reduce the number of gates in both NMOS and PMOS transistors. If the first

placement of the transistors is known, then the connection can be made. The connection on the digital standard cell is always made with the lowest possible metal called M1 as soon as it is possible. The higher metals like M2 and M3 can be used in more complex cells that cannot be connected with M1.

In scan multi-bit flip flops, higher metals can be found because the layout is complicated. M2 is always used in horizontal lines and M3 in vertical lines, but in some libraries or technologies the orientation of the metals can be swapped. If higher metals are used like this, those metals are not blocking each other when the top connection is needed. For example, if higher metals are used randomly, they can block the path and make the connection impossible. These higher metals need to be placed on grids to have maximal usability. In digital standard cell layouts, grid sizes are basic measurements of size. Pins like inputs and outputs should be placed in the center of the grid because then they can be connected with higher metals. If the layout is in progress, some special techniques can be found there, which can reduce the area of the cell. It is also possible to find a trick where one metal can be moved somewhere and the other can be placed better, it is up to the designer which way he will go.



Figure 4.32 - Layout of the plain 2-bit pulsed latch

In multi-bit digital standard cells, the smallest versions of cells can be first designed. Others can be duplicated with some minor changes. In the case of the multi-bit pulsed latch, there can be a 2-bit version with double height designed where this version is optimized for the area. The pulse generator can be layouted as four parts of almost equal sizes which is great because the generator can be divided into parts across the rows as it will be shown in following designs. The layout of the 2-bit pulsed latch is shown in Figure 4.32. The designed pulse generator is on the left side divided into two rows. The advantage of the double-stacked inverter is that there can be a trick used to make the double-stacked inverter layout less wide. This trick is saving a lot of grids in a landscape orientation.

The 4-bit version can be designed based on the 2-bit version where the right part of



Figure 4.33 - Layout of the plain 4-bit pulsed latch

the 2-bit pulsed latch can be copied on the left side and the generator can be between these latches. However, some nets need to be connected and that can cause some trouble. The pulse generator needed to be adjusted because the pulse generator was weak for four latches.

The load of the pulse generator was increased so the strength of the pulse generator needed to be increased too. The pulse needed to be wider, so the transistors used in the inverter, like a delay part, were changed and the output stage of combination logic was also changed. The length of the transistor in the delay part was increased and the width of the transistors in the output stage was increased too to have a stable pulse across the cell and PVTs. The 4-bit pulsed latch is shown in Figure 4.33.



Figure 4.34 - Layout of the plain 8-bit pulsed latch

The last 8-bit plain version is made similarly to the 4-bit pulsed latch from the 2-bit pulsed latch transformation. The 8-bit pulsed latch is made of the 4-bit pulsed latch with the 4-bit pulsed latch copied horizontally, so that the 8-bit pulsed latch has 4 rows. The advantage of this pulse generator is that those rows have the same number as those generator parts have. Therefore, these generator parts can be divided evenly into rows. As

before, some nets needed to be connected, like an impulse distribution. Also, the generator needed to be adjusted because the load of the pulse generator was significantly increased. The width of the pulse was increased and the strength of the output stage of the pulse generator was increased too, as it was similarly done before in the 4-bit version. The 8-bit pulsed latch is shown in Figure 4.34.

When simple versions of multi-bit pulsed latches are ready, it is possible to create a reset version. Designed circuits are enriched with 3 transistors per latch, that have an asynchronous reset function. This is the hardest part of the design because the layout can be easily designed from nothing but adding transistors to the layout is not so simple. A 2-bit pulsed latch was first edited and enriched with reset transistors. First was a 2-bit pulsed latch because it is the simplest cell where a trick, which can be also used by other cells, could be found. The 2-bit pulsed latch with reset is shown in Figure 4.35. The 4-bit pulsed latch with reset is shown in Figure 4.37.



Figure 4.35 - Layout of the reset 2-bit pulsed latch



Figure 4.36 - Layout of the reset 4-bit pulsed latch

The pulse generator did not need to be changed. All six designed layouts were simulated in cross corners with temperature and voltage given by the assignment. Transistors were simulated with  $3\sigma$  models. Also, netlists of the layouts were extracted with parasitic elements as it was mentioned before. The results for designed multi-bit pulsed latches in normal mode are shown in Table 4.10. Multi-bit pulsed latches were simulated in both cross corners: in slow corner and fast corner. The typical corner corresponds to the standard voltage, temperature, and model of transistors. The main row in this table is typical corner row which will be compared later to multi-bit flip flops. The results for multi-bit pulsed latches in scan mode are quite different. These results are shown in Table 4.11, where there are same rows as in the table for normal mode. The power consumption in normal mode is quite higher than in scan mode. The second parameter of multi-bit pulsed latches, which is necessary to compare, is the area. The measured area is shown in Table 4.12. The area of the cells is given by the height and the width of the cell. The plain version of the multi-bit pulsed latch is always smaller than a reset version because there are 3 transistors added that make the cell slightly bigger. It is helpful to know that the next bit version is not a double size of the previous bit version. It means that with an increasing number of bits, the area per bit is decreasing as it is shown in Table 4.12.



Figure 4.37 - Layout of the reset 8-bit pulsed latch

The last measured parameter of multi-bit pulsed latches is the maximal frequency where multi-bit pulsed latches can be used. The netlist simulation from layout with parasitic extraction is completely different from the simulation with a schematic netlist, especially in the frequency parameter. The results are shown in Table 4.13. The plain version can always run in slightly higher frequencies. The 2-bit plain version can run up to 0.58 GHz and the reset version up to 0.50 GHz. The 4-bit plain version is slightly decreased and can run up to 0.54 GHz. The 4-bit reset version can run up to 0.52 GHz. The 8-bit plain version is again slightly decreased in comparison to the 4-bit version and

can run up to 0.45 GHz and the reset version can run up to 0.43 GHz. The frequency of multi-bit pulsed latches can be increased because the frequency is dependent on the pulse generator. After minor changes in the pulse generator, multi-bit pulsed latches can run on higher frequencies, however, the power consumption in that case would be higher. That is not required. The output stage of the pulse generator needs to be stronger, and the pulse needs to be wider if the adjustment is needed. However, measured frequencies are sufficient for our application, so the adjustment is not needed. The whole overview of the designed multi-bit pulsed latches is shown in Table 4.14.

|             |         | Normal mode |       |        |       |        |       |
|-------------|---------|-------------|-------|--------|-------|--------|-------|
|             |         | 2 bits      |       | 4 bits |       | 8 bits |       |
|             |         | Plain       | Reset | Plain  | Reset | Plain  | Reset |
| Power       | Slow    | 3.61        | 3.67  | 6.02   | 6.07  | 10.52  | 10.56 |
| Consumption | Typical | 4.59        | 4.69  | 7.68   | 7.72  | 13.42  | 13.43 |
| [µW]        | Fast    | 5.78        | 5.90  | 9.70   | 9.77  | 17.05  | 17.06 |

Table 4.10 - Measured power consumption of the multi-bit pulsed latches in normal mode

Table 4.11 - Measured power consumption of the multi-bit pulsed latches in scan mode

|             |         | Scan mode |       |        |       |        |       |
|-------------|---------|-----------|-------|--------|-------|--------|-------|
|             |         | 2 bits    |       | 4 bits |       | 8 bits |       |
|             |         | Plain     | Reset | Plain  | Reset | Plain  | Reset |
| Power       | Slow    | 3.58      | 3.64  | 5.89   | 5.95  | 10.40  | 10.44 |
| Consumption | Typical | 4.55      | 4.65  | 7.48   | 7.56  | 13.22  | 13.27 |
| [µW]        | Fast    | 5.72      | 5.85  | 9.39   | 9.49  | 16.59  | 16.69 |

Table 4.12 - Measured area of the designed multi-bit pulsed latches

|                         | 2 bits |       | 4 t   | oits  | 8 bits |       |
|-------------------------|--------|-------|-------|-------|--------|-------|
|                         | Plain  | Reset | Plain | Reset | Plain  | Reset |
| Height [µm]             | 5.2    | 5.2   | 5.2   | 5.2   | 10.4   | 10.4  |
| Width [µm]              | 4.8    | 5.2   | 8.2   | 9.0   | 7.4    | 8.2   |
| Area [µm <sup>2</sup> ] | 24.96  | 27.04 | 42.64 | 46.80 | 76.96  | 85.28 |

Table 4.13 - Maximal usable frequency of the multi-bit pulsed latches

| Number of bits  | 2-bit       |      | 4-bit |       | 8-bit |       |
|-----------------|-------------|------|-------|-------|-------|-------|
| Version         | Plain Reset |      | Plain | Reset | Plain | Reset |
| Frequency [GHz] | 0.58        | 0.50 | 0.54  | 0.52  | 0.45  | 0.43  |

|                               |         | Multi-bit pulsed latches |       |          |       |        |       |  |
|-------------------------------|---------|--------------------------|-------|----------|-------|--------|-------|--|
|                               |         |                          |       | <u> </u> |       |        |       |  |
|                               |         | 2 ł                      | oits  | 41       | oits  | 8 bits |       |  |
|                               |         | Plain                    | Reset | Plain    | Reset | Plain  | Reset |  |
| Power                         | Slow    | 3.61                     | 3.67  | 6.02     | 6.07  | 10.52  | 10.56 |  |
| Consumption in<br>Normal Mode | Typical | 4.59                     | 4.69  | 7.68     | 7.72  | 13.42  | 13.43 |  |
| [µW]                          | Fast    | 5.78                     | 5.90  | 9.70     | 9.77  | 17.05  | 17.06 |  |
| Power                         | Slow    | 3.58                     | 3.64  | 5.89     | 5.95  | 10.40  | 10.44 |  |
| Consumption in<br>Scan Mode   | Typical | 4.55                     | 4.65  | 7.48     | 7.56  | 13.22  | 13.27 |  |
| [µW]                          | Fast    | 5.72                     | 5.85  | 9.39     | 9.49  | 16.59  | 16.69 |  |
| Height [µ                     | .m]     | 5.2                      | 5.2   | 5.2      | 5.2   | 10.4   | 10.4  |  |
| Width [μm]<br>Area [μm2]      |         | 4.8                      | 5.2   | 8.2      | 9.0   | 7.4    | 8.2   |  |
|                               |         | 24.96                    | 27.04 | 42.64    | 46.80 | 76.96  | 85.28 |  |
| Frequency [GHz]               |         | 0.58                     | 0.50  | 0.54     | 0.52  | 0.45   | 0.43  |  |

Table 4.14 - Overview of the measured parameters in designed multi-bit pulsed latches

Table 4.15 - Overview of parameters in standard multi-bit flip flops

|                               |         | Multi-bit flip flops |       |       |       |        |        |  |
|-------------------------------|---------|----------------------|-------|-------|-------|--------|--------|--|
|                               |         | 2 bitsPlainReset     |       | 4 t   | oits  | 8 bits |        |  |
|                               |         |                      |       | Plain | Reset | Plain  | Reset  |  |
| Power                         | Slow    | 3.20                 | 3.25  | 6.28  | 6.38  | 12.11  | 12.57  |  |
| Consumption in<br>Normal Mode | Typical | 4.09                 | 4.15  | 8.02  | 8.16  | 15.45  | 16.20  |  |
| [µW]                          | Fast    | 5.15                 | 5.24  | 10.19 | 10.36 | 19.56  | 20.62  |  |
| Power                         | Slow    | 3.21                 | 3.26  | 6.32  | 6.42  | 12.20  | 12.69  |  |
| Consumption in<br>Scan Mode   | Typical | 4.10                 | 4.16  | 8.08  | 8.22  | 15.61  | 16.28  |  |
| [µW]                          | Fast    | 5.17                 | 5.25  | 10.28 | 10.45 | 19.91  | 20.80  |  |
| Height [µ                     | m]      | 5.2                  | 5.2   | 5.2   | 5.2   | 10.4   | 10.4   |  |
| Width [µ                      | m]      | 5.8                  | 6.8   | 11.4  | 12.8  | 11.2   | 12.6   |  |
| Area [µm2]                    |         | 30.16                | 35.36 | 59.28 | 66.56 | 116.48 | 131.04 |  |

Results of standard multi-bit flip flop circuits should be also shown for a comparison. The results of standard multi-bit flip flop circuits are only shown in the overview and without frequency limits. Results will be compared with reference values later in graphs. An overview is shown in Table 4.15. Important values are shown in following graphs (Figure 4.38, Figure 4.39, Figure 4.40, Figure 4.41, Figure 4.42, Figure 4.43). The power consumption and area of the cell differences are also shown in Table 4.16, Table 4.17, and Table 4.18.



Figure 4.38 - Graph of the comparison of the area size of plain multi-bit pulsed latches and standard plain multi-bit flip flops



Figure 4.39 - Graph of the comparison of the area size of reset multi-bit pulsed latches and standard reset multi-bit flip flops



Figure 4.40 - Graph of the comparison of the power consumption in normal mode of plain multi-bit pulsed latches and standard plain multi-bit flip flops



Figure 4.41 - Graph of the comparison of the power consumption in normal mode of reset multi-bit pulsed latches and standard reset multi-bit flip flops



Figure 4.42 - Graph of the comparison of the power consumption in scan mode of plain multi-bit pulsed latches and standard plain multi-bit flip flops



Figure 4.43 - Graph of the comparison of the power consumption in scan mode of reset multi-bit pulsed latches and standard reset multi-bit flip flops

|                                                          | 2 bits |       | 4 bits |       | 8 bits |        |
|----------------------------------------------------------|--------|-------|--------|-------|--------|--------|
|                                                          | Plain  | Reset | Plain  | Reset | Plain  | Reset  |
| Area of the multi-bit<br>pulsed latch [μm <sup>2</sup> ] | 24.96  | 27.04 | 42.64  | 46.8  | 76.96  | 85.28  |
| Area of the multi-bit<br>flip flop [µm <sup>2</sup> ]    | 30.16  | 35.36 | 59.28  | 66.56 | 116.48 | 131.04 |
| Differences [%]                                          | 17.24  | 23.53 | 28.07  | 29.69 | 33.93  | 34.92  |

Table 4.16 - Table of area comparison of multi-bit pulsed latches and multi-bit flip flops

Table 4.17 - Table of power consumption of multi-bit pulsed latches and multi-bit flip flops in normal mode

|                                                          | 2 bits |        | 4 bits |       | 8 bits |       |
|----------------------------------------------------------|--------|--------|--------|-------|--------|-------|
|                                                          | Plain  | Reset  | Plain  | Reset | Plain  | Reset |
| Area of the multi-bit<br>pulsed latch [μm <sup>2</sup> ] | 4.59   | 4.69   | 7.68   | 7.72  | 13.42  | 13.43 |
| Area of the multi-bit<br>flip flop [μm <sup>2</sup> ]    | 4.09   | 4.15   | 8.02   | 8.16  | 15.45  | 16.20 |
| Differences [%]                                          | -12.38 | -12.83 | 4.30   | 5.39  | 13.18  | 17.12 |

Table 4.18 - Table of power consumption of multi-bit pulsed latches and multi-bit flip flops in scan mode

|                                                          | 2 bits |        | 4 bits |       | 8 bits |       |
|----------------------------------------------------------|--------|--------|--------|-------|--------|-------|
|                                                          | Plain  | Reset  | Plain  | Reset | Plain  | Reset |
| Area of the multi-bit<br>pulsed latch [μm <sup>2</sup> ] | 4.55   | 4.65   | 7.48   | 7.56  | 13.22  | 13.27 |
| Area of the multi-bit<br>flip flop [μm <sup>2</sup> ]    | 4.10   | 4.16   | 8.08   | 8.22  | 15.61  | 16.28 |
| Differences [%]                                          | -11.06 | -11.64 | 7.43   | 8.09  | 15.34  | 18.48 |

The first Figure 4.38 of comparison shows the area size of multi-bit pulsed latches and multi-bit flip flops. The chart bar shows that all designed multi-bit pulsed latches bit versions are smaller than multi-bit flip flops, even though the prediction was that a 2-bit version will be slightly bigger. The 2-bit version is smaller because there was a trick that helped us with area reduction. Especially the delayed part was the one that could be designed with a special trick. The difference between 2-bit plain versions is 17.24 %. As it was mentioned, the area is in all cases smaller in comparison to multi-bit flip flops. The 4-bit multi-bit pulsed latch in the plain version is smaller about 28.07 % which is an even more significant saving than in the comparison to the 2-bit plain version. These savings should be even smaller with an increasing number of bits because the pulse generator is becoming more insignificant with the whole area. This theory is confirmed in the last 8-bit plain version where the savings are about 33.93 %. The next Figure 4.39 is again comparing area size of multi-bit pulsed latches and multi-bit flip flops, but in reset versions. The designed 2-bit reset version is again smaller than the standard multi-bit flip flop version. It is about 23.53 % which is slightly more than in the case of the plain version. It is caused by the fact that reset transistors could be designed into the design with minimal expansion of the width of the cell. That is not possible in the case of the multi-bit flip flop, but the size of the 2-bit reset version is a little bit bigger than the 2-bit plain version because of additional reset transistors. The 4-bit and 8-bit reset versions are smaller too. In the case of the 4-bit reset version, the size is smaller about 29.69 % and the 8-bit reset version is smaller about 34.92 %. The whole overview of the area comparison is shown in Table 4.16.

The comparison of the power consumption is in the following figures. The power consumption is compared in normal and scan mode. The first investigated mode is normal. The power consumption in normal mode for the plain version is shown in Figure 4.40 and the power consumption for the reset version is shown in Figure 4.41. The power consumption is in both modes similar. Designed 2-bit versions are worse in power consumption as the prediction stated in comparison to 2-bit flip flops. The power consumption of the pulse generator is significant with the whole design, and it is causing higher power consumption in comparison to the 2-bit flip flop. The difference in the 2-bit plain version is -12.38 % in normal mode and -11.06 % in scan mode. The difference in the 2-bit reset version is -12.83 % in normal mode and -11.64 % in scan mode. Even though the power consumption is worse, both circuits can be still used as smaller cells. Other cells like 4-bit, 8-bit, and even plain or reset versions are better in both parameters: in power consumption and area size. The 4-bit plain version can save up to 7.43 % in scan mode and 4.30 % in normal mode. The difference between modes is not negligible. The 4-bit reset version is even better in comparison to the plain version. The 4-bit reset version can save up to 5.39% power consumption in normal mode and 8.09% in scan mode. The last compared multi-bit pulsed latch is an 8-bit plain and reset version of the pulsed latch. Savings should be most significant in comparison to all previous versions because the

pulse generator has the smallest area size compared to the entire design. The power consumption in the 8-bit plain version is saved about 13.18 % in normal mode and 15.34 % in scan mode. The 8-bit reset version is even better and can save up to 17.12 % power consumption in normal mode and 18.48 % in scan mode. As the results show, multi-bit pulsed latches can be used as a replacement for multi-bit flip flops. Multi-bit pulsed latches can be used as smaller cells or power saving cells. The results of power consumption are shown in Table 4.17 for normal mode and Table 4.18 for scan mode.

### 4.6 Testing in a simple design

Designed multi-bit pulsed latches were also simulated in a simple design to get knowledge about the impact on area size and power consumption. This is an additional test in which the design tests multi-bit pulse latches (MBPL) to see if they can work together. Afterwards, they are compared to multi-bit flip flops (MBFF). The first thought about the design, which can be tested, was that multi-bit pulsed latches would be used in a test chip that can be made on silicon, and that it could be measured in the laboratory. This would take a lot of effort and knowledge which in this case would not be possible. It would be great to see this on silicon, but that would be impossible. The second thought about simpler design was to make some design that can be used in the simulation. The chosen simple design is a 64-bit shift register which can be simply done based on 8-bit pulsed latches.

First, the measurement was done with multi-bit flip flops. Then it was done with a better replacement, for example with multi-bit pulsed latches. These results are really promising. The measurement was done with the same parameters as is it shown in Table 4.19. The results are shown in Table 4.20 where the power consumption is compared. Another Table 4.21 compares area size of designed cells and standard cells.

| Frequency | Duty<br>cycle | Supply<br>voltage | Load  | Temperature | Time of analysis |
|-----------|---------------|-------------------|-------|-------------|------------------|
| 5 MHz     | 50 %          | 1.2 V             | 30 fF | 27 °C       | 1 ms             |

Table 4.19 - Used parameters in a simple design simulation

Table 4.20 shows the difference between power consumption where the power consumption is better in multi-bit pulsed latches as it was predicted before. The difference between designs is 3.44  $\mu$ W which is about 8.94 %. Table 4.21 shows the difference between the area size of the 64-bit shift register where the impact of the multi-bit pulsed latch is significant. The area of the design using a multi-bit pulsed latch is about 615.68  $\mu$ m<sup>2</sup> which can save about 33.93 % in comparison to standard multi-bit flip flops.

| Power consumption<br>of MBPL [µW] | 35.06 |      |
|-----------------------------------|-------|------|
| Power consumption<br>of MBFF [µW] | 38.50 |      |
| Difference                        | [µW]  | 3.44 |
| Difference                        | [%]   | 8.94 |

Table 4.20 - Measured power consumption of 64-bit shift register

Table 4.21 - Measured area size of the 64-bit shift register

| Power consumption<br>of MBPL [µm <sup>2</sup> ] | 615.68 |        |
|-------------------------------------------------|--------|--------|
| Power consumption<br>of MBFF [µm <sup>2</sup> ] |        | 931.84 |
| Difference                                      | [µm²]  | 316.16 |
| Difference                                      | [%]    | 33.93  |

## **5. CONCLUSION**

The previous chapter was about investigating and designing multi-bit pulsed latches as a better replacement for multi-bit flip flops. The overview of measured parameters of designed multi-bit pulsed latches are shown in Table 4.14. The reference table for the comparison of multi-bit pulsed latches and multi-bit flip flops is shown in Table 4.15. The results of the comparison between multi-bit flip flops and multi-bit pulsed latches are promising because almost all cells have better results.

The only disadvantage of multi-bit pulsed latches is that the power consumption is slightly higher in 2-bit versions than in the case of a multi-bit flip flop. It is because the pulse generator is significant in comparison to the whole design. The power consumption of 2-bit plain pulsed latches can get worse by 11.06 % or even 12.38 % and the reset version from 11.64 % to 12.83 % but the area size of the cell is smaller than in the case of multi-bit flip flops. The area is saved about 17.24 % in the case of the 2-bit plain pulsed latch and 23.53 % in the case of the 2-bit reset pulsed latch. The 2-bit pulsed latch can be still used as a replacement because the area size is smaller than in the case of the 2-bit flip flop. Other designed cells are better in both compared parameters: in power consumption and the size of the cell. The power consumption of a plain 4-bit pulsed latch can be better by 4.30 % to even 7.43 % and reset version by 5.39 % to even 8.09 %. The area is saved about 28.07 % in the plain version and 29.69 % in the reset version. The saved area is increasing with the number of bits because the pulse generator is becoming more insignificant in the whole design. The plain 8-bit pulsed latch is even better because power consumption can be from 13.18 % to 15.34 % and the reset version can be from 17.12 % to 18.48 %. The area is saved about 33.93 % in the plain version and 34.92 % in the reset version. The comparison of the area size of cells is shown in Table 4.16. The power consumption is divided into two tables because the measurement is done in normal mode and in scan mode. The normal mode is shown in Table 4.17 and the scan mode is shown in Table 4.18. Also, the comparison is shown in graphs in the same subchapter.

Designed multi-bit pulsed latches were also simulated in a simple design to prove that designed multi-bit pulsed latches can work together. A 64-bit shift register was used as a simple design. The power consumption is shown in Figure 4.20 and area of the design is shown in Figure 4.21. As it is shown in Figure 4.20, the design based on multi-bit pulsed latches is more power efficient than in the design with multi-bit flip flops. The design with multi-bit pulsed latches can save power consumption about 8.94 % which is a great achievement, but the area of the design is even better. The area of the design is saved by about 33.93 % which is an incredible difference.

The maximal usable frequency is also shown in the whole overview in Table 4.14. The maximal frequency is dependent on the number of bits. The maximal frequency from the schematic is different from the layout but for our application, the measured frequency is still high enough. The minimal frequency is not defined because no lower frequency would cause the circuit to fail because pulse generator has fixed width of the pulse.

It is confirmed that multi-bit pulsed latches can be used as a better replacement for multi-bit flip flops and can be referred to as a low power method. It would be also beneficial to run more analysis: like electromigration and voltage drop. There is a peak current coming from the pulse generator, so an electromigration analysis should be performed. The voltage drop should be also performed because of the current peak. The best way to achieve the most realistic results is to make it on silicone which was quite unrealistic in this case.

## LITERATURE

- S. Heo, R. Krashinsky and K. Asanovic. "Activity-Sensitive Flip-Flop and Latch Selection for Reduced Energy" in IEEE Transactions on Very Large Scale Integration (VLSI) Systems. Vol. 15, no. 9, pp. 1060-1064, Sept. 2007, doi: 10.1109/TVLSI.2007.902211.
- [2] ROSAS, Rodriguez. Multi-bit pulse-based latches for low power design [online]. 5612 AZ Eindhoven, Nizozemsko, 2017 [cit. 2021-12-20]. Available from: https://research.tue.nl/en/studentTheses/multi-bit-pulse-based-latches-for-lowpower-design. Thesis. The Eindhoven University of Technology.
- [3] WESTE, Neil H.E a David Money HARRIS. *CMOS VLSI design: a circuits and systems perspective.* 4th ed. Boston: Addison Wesley, 2010. ISBN 0-321-54774-8.
- [4] MAHESH, Dananjaya. Low Power VLSI Design [online]. 2015 [cit. 2021-12-20].
  Available from:

https://www.researchgate.net/publication/275037527\_Low\_Power\_VLSI\_Design

- [5] RABAEY, Jan M., Anantha CHANDRAKASAN a Borivoje NIKOLIĆ. *Digital integrated circuits: a design perspective.* 2nd ed. Upper Saddle River: Pearson Education, 2003. Prentice Hall electronics and VLSI series. ISBN 0130909963.
- [6] Krishnamurthy, R., A. Alvandpour, S. Mathew, M. Anders, V. De, and S. Borkar. *High-Performance, Low-Power, and Leakage-Tolerance Challenges for Sub-*70nm Microprocessor Circuits. (Session Invited Paper). IEEE European Solid State Circuits Conference, Sept. 25, 2002. Paper no. C17.01.
- [7] Aydin, Ömer & Uçar, Orhan. (2017). Design for Smaller, Lighter and Faster ICT Products: Technical Expertise, Infrastructures and Processes. Advances in Science, Technology and Engineering Systems Journal. 2. ISBN 1114-1128. 10.25046/aj0203141.
- [8] ROY, Kaushik a Sharat PRASAD. *Low-power CMOS VLSI circuit design*. New York: John Wiley, 2000. ISBN 0-471-11488-x.
- [9] MOS Metal oxide Semiconductor. *Wikichip* [online]. 2014 [cit. 2021-12-08]. Available from: https://en.wikichip.org/wiki/mosfet
- [10] P. Nilsson. Arithmetic Reduction of the Static Power Consumption in Nanoscale CMOS. 2006 13th IEEE International Conference on Electronics, Circuits and Systems, 2006, pp. 656-659, doi: 10.1109/ICECS.2006.379874.
- [11] Power Consumption. Semiengineering [online]. [cit. 2021-12-12]. Available from: https://semiengineering.com/knowledge\_centers/low-power/low-powerdesign/power-consumption/
- [12] Onsemi: New Brand and Promise of Sustainable Future. *Onsemi* [online].
  PHOENIX, 2021 [cit. 2021-12-20]. Available from: https://www.onsemi.com/company/news-media/press-announcements/en/onsemi-new-brand-and-promise-of-sustainable-future.
- [13] Wikipedia contributors. (2021, November 22). 3 nm process. In Wikipedia, The Free Encyclopedia. Retrieved 12:50, December 20, 2021. Available

from: https://en.wikipedia.org/w/index.php?title=3\_nm\_process&oldid=10566189 93

- B. Padmavathi, B. T. Geetha and K. Bhuvaneshwari. *Low power design techniques and implementation strategies adopted in VLSI circuits*. 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), 2017, pp. 1764-1767, doi: 10.1109/ICPCSI.2017.8392017.
- [15] A. Keshavarzi, K. Roy and C. F. Hawkins. *Intrinsic leakage in low power deep submicron CMOS ICs*. Proceedings International Test Conference 1997, 1997, pp. 146-155, doi: 10.1109/TEST.1997.639607.
- [16] A Practical Guide to Low-Power Design [online]. powerforward, 2008 [cit. 2021-12-20]. Available from: https://picture.iczhiku.com/resource/eetop/WHiYWDoHqyLfIbXx.pdf
- [17] Ijjada, Dr & Ramparamesh, B & Rao,. (2011). *Reduction of Power Dissipation in Logic Circuits*. International Journal of Computer Applications. 24. 10.5120/2962-3946.
- BAKER, R. Jacob. *CMOS circuit design, layout, and simulation*. 3rd ed.
  Hoboken: Wiley-IEEE Press, 2010. IEEE series on microelectronics systems. ISBN 978-0-470-88132-3.

# **SYMBOLS AND ABBREVIATIONS**

#### Abbreviations:

| MBPL             | Multi-bit pulsed latch                  |
|------------------|-----------------------------------------|
| MBFF             | Multi-bit flip flop                     |
| CMOS             | Complementary Metal-Oxide-Semiconductor |
| ASIC             | Application Specific Integrated Circuit |
| PVT              | Process, Voltage, Temperature           |
| NMOS             | N-type metal oxide semiconductor        |
| PMOS             | P-type metal oxide semiconductor        |
| ESD              | Electrostatic discharge                 |
| GND              | Ground                                  |
| DRAM             | Dynamic Random Access Memory            |
| DRC              | Design rule check                       |
| LVS              | Layout versus schematic                 |
| PEX              | Parasitic extraction                    |
| SiO <sub>2</sub> | Silicon dioxide                         |
| HD               | High density                            |
| HS               | High speed                              |
| M1               | Metal one                               |
| M2               | Metal two                               |
| M3               | Metal three                             |
| CO               | Contact                                 |
| POLY             | Polysilicon                             |
| LVT              | Low voltage threshold                   |
| SVT              | Standard voltage threshold              |
| HVT              | High voltage threshold                  |
| PCB              | Printed circuit board                   |
| PDK              | Process design kit                      |
| Q                | Output                                  |
| $Q_1$            | The first output                        |
| Q2               | The second output                       |
| D                | Data                                    |
| SI               | Scan input                              |
| SE               | Scan enable                             |
| SEN              | Negated scan enable                     |
|                  |                                         |

## Symbols:

| W                   | Width                           | (m)   |
|---------------------|---------------------------------|-------|
| L                   | Length                          | (m)   |
| n                   | Number of fingers               | (-)   |
| т                   | Multiplier                      | (-)   |
| α                   | Switching activity factor       | (-)   |
| $C_L$               | Capacitance load                | (F)   |
| f                   | Frequency                       | (Hz)  |
| $V_{DD}$            | Power supply voltage            | (V)   |
| $P_{total}$         | Total power consumption         | (W)   |
| $P_{switching}$     | Switching power consumption     | (W)   |
| $P_{short-circuit}$ | Short-circuit power consumption | (W)   |
| $P_{leakage}$       | Leakage power consumption       | (W)   |
| ISC                 | Short-circuit current           | (A)   |
| $V_{th}$            | Threshold voltage               | (V)   |
| Т                   | Thermodynamic temperature       | (K)   |
| ISUB                | Subthreshold leakage current    | (A)   |
| Itunnel             | Gate tunneling current          | (A)   |
| Ihc                 | Hot-carrier current             | (A)   |
| $I_D$               | Diode current                   | (A)   |
| $I_S$               | Reverse saturation current      | (A)   |
| $V_{DB}$            | Voltage between drain and body  | (V)   |
| $V_T$               | Thermal voltage                 | (V)   |
| е                   | Euler's number                  | (-)   |
| k                   | Boltzmann constant              | (J/K) |
| q                   | Electric charge                 | (C)   |