

## **UNIVERSITI PUTRA MALAYSIA**

## STANDARD CELL LIBRARY EVALUATION AND OPTIMIZATION FOR NEAR-THRESHOLD VOLTAGE OPERATION

LIM YANG WEI

FK 2022 77



# STANDARD CELL LIBRARY EVALUATION AND OPTIMIZATION FOR NEAR-THRESHOLD VOLTAGE OPERATION



Thesis Submitted to the School of Graduate Studies, Universiti Putra Malaysia, in Fulfilment of the Requirements for the Degree of Doctor of Philosophy

May 2022

All material contained within the thesis, including without limitation text, logos, icons, photographs and all other artwork, is copyright material of Universiti Putra Malaysia unless otherwise stated. Use may be made of any material contained within the thesis for non-commercial purposes from the copyright holder. Commercial use of material may only be made with the express, prior, written permission of Universiti Putra Malaysia.

Copyright © Universiti Putra Malaysia



Abstract of thesis presented to the Senate of Universiti Putra Malaysia in fulfilment of the requirement for the degree of Doctor of Philosophy

# STANDARD CELL LIBRARY EVALUATION AND OPTIMIZATION FOR NEAR-THRESHOLD VOLTAGE OPERATION

By

## LIM YANG WEI

May 2022

Chair Faculty : Fakhrul Zaman Rokhani, PhD : Engineering

Near-threshold voltage (NTV) operation digital integrated circuits have come into sight in recent decades due to the need for energy-efficient design for batterypowered devices. While earning the energy benefits from the NTV operation, the challenges of the performance degradation and variability are preventing the NTV design to be widely implemented in most computing applications. Improving energy efficiency while maintaining performance becomes the primary goal for the NTV design. The standard cell library optimization should be carefully considered to achieve better energy, performance, and area of the design. This dissertation presents the joint optimization techniques of standard cell height tuning with two different transistor layout structures, namely full diffusion (FD) layout structure and inverse narrow width effect (INWE)-aware layout structure. An increased number of optimization parameters and techniques affect the evaluation efficiency of the standard cell library at the circuit level. The evaluation efficiency (i.e., synthesis runtime) requires to be improved using the modeling technique to fasten the time-consuming process while maintaining the accuracy. An area-efficiency curve modeling framework has been proposed in this dissertation to reduce the runtime to generate the area-delay tradeoff curve for the standard cell library evaluation.

The tuning of standard cell height with FD layout structure results in 5.5% higher performance when using a taller cell height (i.e., 14-track) library, and 55.4% lower energy when using a shorter cell height (i.e., 7-track) library. As compared to the FD layout structure, the INWE-aware layout structure shows higher energy-delay improvement due to the INWE that reduces the threshold voltage when using a narrow width transistor. Two INWE-aware layout structures, namely multiplier and multi-finger, have also been explored in this study. The proposed reduced height (i.e., 6-track) library with multi-finger layout structure results in 16% performance improvement and 14% area improvement as compared to the 8-track multiplier library. Lastly, the proposed area-efficiency

curve modeling framework can reduce about 16.5X to 18.5X of synthesis runtime with around 2.74% to 5.27% error from the uniform interval curve generation method.

In conclusion, the optimal NTV-operated standard cell library in terms of energy, performance, and area can be achieved by using the lower track height multifinger layout structure as compared to FD and multiplier layout structure. Besides, the evaluation of the standard cell library on area-performance tradeoff can be sped up through the proposed area-efficiency curve modeling framework.



Abstrak tesis yang dikemukakan kepada Senat Universiti Putra Malaysia sebagai memenuhi keperluan untuk ijazah Doktor Falsafah

## PENILAIAN AND PENGOPTIMUMAN PERPUSTAKAAN SEL STANDARD UNTUK OPERASI VOLTAN DEKAT-AMBANG

Oleh

## LIM YANG WEI

Mei 2022

#### Pengerusi : Fakhrul Zaman Rokhani, PhD Fakulti : Kejuruteraan

Litar bersepadu digital yang beroperasi dengan voltan dekat ambang (NTV) telah bermuncul sejak beberapa dekad kebelakangan ini kerana reka bentuk yang cekap tenaga diperlukan dalam peranti berkuasa bateri. Walaupun manfaat tenaga dapat diperolehi daripada operasi NTV, cabaran kemerosotan prestasi dan kebolehubahan menghalang reka bentuk NTV dilaksanakan secara meluas dalam kebanyakan aplikasi pengkomputeran. Peningkatan kecekapan tenaga sambil mengekalkan prestasi telah menjadi matlamat utama untuk reka bentuk NTV. Pengoptimuman perpustakaan sel standard harus dipertimbangkan dengan teliti untuk mencapai tenaga, prestasi, dan kawasan reka bentuk yang lebih baik. Disertasi ini membentangkan teknik pengoptimuman penalaan ketinggian sel standard bersama dengan dua struktur susun atur transistor yang berbeza, iaitu struktur susun atur difusi penuh (FD) dan struktur susun atur kesan lebar sempit songsang (INWE). Peningkatan bilangan parameter dan teknik pengoptimuman akan mempengaruhi kecekapan penilaian perpustakaan sel standard pada peringkat litar. Kecekapan penilaian (iaitu masa sintesis) perlu dipertingkatkan dengan menggunakan teknik pemodelan untuk mempercepatkan proses yang memakan masa dan mengekalkan ketepatan penilaian pada masa yang sama. Rangka kerja pemodelan keluk kecekapankawasan telah dicadangkan dalam disertasi ini untuk mengurangkan masa jalan dalam menjanakan keluk keseimbangan kawasan-lengah untuk penilaian perpustakaan sel standard.

Penalaan ketinggian sel standard dengan struktur susun atur FD telah menghasilkan prestasi 5.5% lebih tinggi apabila menggunakan ketinggian sel yang lebih tinggi (iaitu 14-trek), dan tenaga 55.4% lebih rendah apabila menggunakan ketinggian sel yang lebih pendek (iaitu 7-trek). Berbanding dengan struktur susun atur FD, struktur susun atur INWE menunjukkan peningkatan tenaga-lengah yang lebih tinggi disebabkan oleh INWE yang mengurangkan voltan ambang apabila menggunakan transistor lebar yang

sempit. Dua struktur susun atur INWE, iaitu pengganda dan berbilang jari, juga telah diterokai dalam kajian ini. Perpustakaan cadangan yang mengurangkan ketinggian (iaitu 6-trek) dengan struktur susun atur berbilang jari telah menghasilkan peningkatan prestasi 16% dan pengurangan kawasan 14% berbanding dengan perpustakaan pengganda 8-trek. Akhir sekali, rangka kerja pemodelan keluk kecekapan-kawasan dapat mengurangkan 16.5X hingga 18.5X masa sintesis dan 2.74% hingga 5.27% ralat berbanding dengan kaedah penjanaan keluk yang menggunakan selang seragam.

Kesimpulannya, pengoptiuman perpustakaan sel standard yang beroperasi dengan NTV dari segi tenaga, prestasi, dan kawasan boleh dicapai dengan menggunakan ketinggian trek yang lebih rendah dan struktur susun atur berbilang jari berbanding dengan struktur susun atur FD dan susun atur pengganda. Di samping itu, penilaian perpustakaan sel standard dalam keseimbangan kawasan-prestasi boleh depercepatkan melalui rangka kerja pemodelan keluk kecekapan-kawasan yang dicadangkan.

## ACKNOWLEDGEMENTS

First and foremost, I would like to express my sincerest gratitude to my supervisor, Assoc. Prof. Ir. Ts. Dr. Fakhrul Zaman Rokhani for his greatest support and motivation in helping me to accomplish my doctorate degree. Along with my PhD study, he not only supports me with his valuable knowledge, but also teaches me how to become an independent researcher, helps me to reach out to the essential resources, and motivates me on building networks with other researchers.

Besides, I would also like to thank Prof. Dr. Shaiful Jahari bin Hashim, Assoc. Prof. Dr. Roslina Mohd Sidek, and Assoc. Prof. Dr. Noor Ain Kamsani for serving as my supervisory committee members. Their technical advice and insightful feedback help me on working out my research successfully. Also, I would like to thank CREST (Collaborative Research in Engineering, Science and Technology) for financially supporting my PhD study.

Last but not least, deepest thanks to my family for their patience and support throughout my hard time of the study. Especially my parents, who never gave up believing in me. Their support and trust give me the courage to pursue my PhD without worries. This thesis was submitted to the Senate of Universiti Putra Malaysia and has been accepted as fulfilment of the requirement for the degree of Doctor of Philosophy. The members of the Supervisory Committee were as follows:

### Fakhrul Zaman bin Rokhani, PhD

Associate Professor, Ir., Ts. Faculty of Engineering Universiti Putra Malaysia (Chairman)

#### Shaiful Jahari bin Hashim, PhD

Professor Faculty of Engineering Universiti Putra Malaysia (Member)

#### Roslina binti Mohd Sidek, PhD

Associate Professor Faculty of Engineering Universiti Putra Malaysia (Member)

## Noor Ain binti Kamsani, PhD

Associate Professor Faculty of Engineering Universiti Putra Malaysia (Member)

### ZALILAH MOHD SHARIFF, PhD

Professor and Dean School of Graduate Studies Universiti Putra Malaysia

Date: 8 September 2022

## Declaration by Members of Supervisory Committee

This is to confirm that:

- the research and the writing of this thesis were done under our supervision;
- supervisory responsibilities as stated in the Universiti Putra Malaysia (Graduate Studies) Rules 2003 (Revision 2015-2016) are adhered to.

| Signature:<br>Name of Chairman<br>of Supervisory<br>Committee: | Associate Professor Ir. Ts. Dr.<br>Fakhrul Zaman Rokhani |
|----------------------------------------------------------------|----------------------------------------------------------|
| Signature:<br>Name of Member of<br>Supervisory<br>Committee:   | Professor Dr.<br>Shaiful Jahari Hashim                   |
| Signature:<br>Name of Member of<br>Supervisory<br>Committee:   | Associate Professor Dr.<br>Roslina Mohd Sidek            |
| Signature:<br>Name of Member of<br>Supervisory<br>Committee:   | Associate Professor Dr.<br>Noor Ain Kamsani              |

## TABLE OF CONTENTS

|                       | Page |
|-----------------------|------|
| ABSTRACT              | i    |
| ABSTRAK               | iii  |
| ACKNOWLEDGEMENTS      | V    |
| APPROVAL              | vi   |
| DECLARATION           | viii |
| LIST OF TABLES        | xiii |
| LIST OF FIGURES       | xv   |
| LIST OF ABBREVIATIONS | xx   |

## CHAPTER

| 1 | INT  | ODUCTION             |                       |                                         | 1  |
|---|------|----------------------|-----------------------|-----------------------------------------|----|
|   | 1.1  | Background           |                       |                                         | 1  |
|   | 1.2  | Problem Sta          | atement               |                                         | 3  |
|   | 1.3  | Aim and Ob           | jectives              |                                         | 5  |
|   | 1.4  | Thesis Scop          | be                    |                                         | 5  |
|   | 1.5  | Thesis Orga          | anization             |                                         | 7  |
| 2 | LITE | RATURE RE            | VIEW                  |                                         | 8  |
|   | 2.1  | Introduction         |                       |                                         | 8  |
|   | 2.2  | Standard Ce          | ell-based Design Ar   | pproach                                 | 9  |
|   |      | 2.2.1 Star           | ndard Cell Design     |                                         | 11 |
|   |      | 2.2.2 Star           | ndard Cell Library    |                                         | 12 |
|   |      | 2.2.3 Star           | ndard Cell Height     |                                         | 14 |
|   | 2.3  | <b>Digital Circu</b> | it Design Optimizat   | tion Cost Metrics                       | 14 |
|   |      | 2.3.1 Dela           | ay .                  |                                         | 15 |
|   |      | 2.3.2 Area           | a                     |                                         | 17 |
|   |      | 2.3.3 Pow            | /er                   |                                         | 18 |
|   |      | 2.3.4 Ene            | rgy                   |                                         | 20 |
|   | 2.4  | Ultra-Low V          | oltage Operation      |                                         | 22 |
|   |      | 2.4.1 Sub            | -threshold Voltage    |                                         | 23 |
|   |      | 2.4.2 Nea            | r-threshold Voltage   | 9                                       | 23 |
|   | 2.5  | Ultra-Low V          | oltage Standard Ce    | ell Library Design                      | 24 |
|   |      | Optimization         | า                     | , ,                                     |    |
|   |      | 2.5.1 Star           | ndard Cell Device S   | Sizing                                  | 24 |
|   |      | 2.5.2 Inve           | erse Narrow Width     | h Effect (INWE)                         | 25 |
|   |      | Awa                  | are Sizing            | , , , , , , , , , , , , , , , , , , ,   |    |
|   |      | 2.5.3 Rev            | erse Short Channe     | el Effect (RSCE)                        | 26 |
|   |      | Awa                  | are Sizing            | ( , , , , , , , , , , , , , , , , , , , |    |
|   |      | 2.5.4 Bod            | y Biasing             |                                         | 27 |
|   |      | 2.5.5 Join           | t Optimization Tech   | hniques                                 | 28 |
|   |      | 2.5.6 Ove            | rall Comparison       |                                         | 29 |
|   | 2.6  | Standard Ce          | ell Library Evaluatio | on                                      | 29 |
|   |      | 2.6.1 Con            | nposite Metric        |                                         | 29 |
|   |      | 2.6.2 Sen            | sitivity Analysis     |                                         | 33 |
|   |      | 2.6.3 Pare           | eto Efficiency Curve  | e                                       | 34 |
|   | 2.7  | Chapter Sur          | mmarv                 |                                         | 37 |

| 3 | METHODOLOGY<br>3.1 Introduction<br>3.2 Research Overview<br>3.3 Digital Standard Cell Library Design                                                                                                                                                                                                                                              | 39<br>39<br>39<br>42                         |
|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|
|   | <ul> <li>3.4 Digital ASIC Implementation Methodology</li> <li>3.5 Digital ASIC Benchmark Circuits</li> <li>3.5.1 ISCAS'89 Circuits</li> <li>3.5.2 Brent-Kung Adder</li> <li>3.5.3 AMBA AHB Controller</li> <li>3.5.4 Synopsys DW8051 Processor Core</li> <li>3.5.5 AES-256 Encryption Core</li> <li>3.5.6 ARM Cortex-M0 Processor Core</li> </ul> | 42<br>44<br>45<br>45<br>45<br>45<br>47<br>47 |
|   | 3.6 Test Chip Design and Measurement<br>Environment<br>3.6.1 Test Chip Design<br>3.6.2 Test Chip Measurement Environment                                                                                                                                                                                                                          | 47<br>48<br>49                               |
|   | 3.7 Chapter Summary                                                                                                                                                                                                                                                                                                                               | 51                                           |
| 4 | EARLY-STAGE AREA EFFICIENCY CURVE<br>MODELING FOR STANDARD CELL LIBRARY                                                                                                                                                                                                                                                                           | 53                                           |
|   | <ul> <li>4.1 Introduction</li> <li>4.2 Area-Delay Curve Modeling</li> <li>4.3 Experimental Results</li> <li>4.4 Chapter Summary</li> </ul>                                                                                                                                                                                                        | 53<br>53<br>55<br>62                         |
| 5 | FULL DIFFUSION LAYOUT STRUCTURE WITH<br>CELL HEIGHT OPTIMIZATION FOR NEAR-                                                                                                                                                                                                                                                                        | 64                                           |
|   | <ul> <li>5.1 Introduction</li> <li>5.2 P/N Ratio Optimization with Full Diffusion Layout<br/>Structure</li> </ul>                                                                                                                                                                                                                                 | 64<br>65                                     |
|   | 5.3 Joint Optimization with Standard Cell Height                                                                                                                                                                                                                                                                                                  | 68                                           |
|   | 5.4 Experimental Results<br>5.4.1 Cell-Level Evaluation<br>5.4.2 Block-Level Evaluation<br>5.4.3 Test Chip Results                                                                                                                                                                                                                                | 70<br>70<br>72<br>74<br>77                   |
| 6 | INWE-AWARE STANDARD CELL LAYOUT                                                                                                                                                                                                                                                                                                                   | 78                                           |
|   | <ul> <li>VOLTAGE OPERATION</li> <li>6.1 Introduction</li> <li>6.2 INWE-aware Design in Near-Threshold Voltage Operation</li> </ul>                                                                                                                                                                                                                | 78<br>78                                     |
|   | 6.2.1 Impact of INWE on Device Sizing<br>6.2.2 Parallel-Transistor-Stacking in Near-<br>threshold Voltage Design                                                                                                                                                                                                                                  | 79<br>80                                     |

xi

|                                                          | 6.3                      | 6.2.3<br>6.2.4<br>Standar<br>6.3.1 | PMOS-to-NMOS Ratio Optimization<br>Series Stack and Drive Strength Sizing<br>d Cell Library Layout Consideration<br>PTS Layout Structure Design and<br>Optimization | 81<br>83<br>84<br>85 |
|----------------------------------------------------------|--------------------------|------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
|                                                          |                          | 6.3.2                              | Impact of PTS Layout Structure on Energy, Delay, and Area                                                                                                           | 89                   |
|                                                          |                          | 6.3.3                              | Impact of Process Variation on<br>Performance                                                                                                                       | 91                   |
|                                                          |                          | 6.3.4                              | PTS Layout Structure for Multiple Fan-in<br>Cell Design                                                                                                             | 92                   |
|                                                          |                          | 6.3.5                              | Multi-Row Height Design for Multiplier                                                                                                                              | 92                   |
|                                                          | 6.4                      | Propose                            | ed Standard Cell Library Development                                                                                                                                | 94                   |
|                                                          |                          | 6.4.1                              | Individual Standard Cell Characteriza-<br>tion and Evaluation                                                                                                       | 94                   |
|                                                          |                          | 6.4.2                              | Application Specific Integrated Circuit (ASIC) Benchmark Evaluation                                                                                                 | 101                  |
|                                                          |                          | 6.4.3                              | Process Technology Scaling and Voltage Scaling Impact                                                                                                               | 104                  |
|                                                          | 6.5                      | Chapter                            | r Summary                                                                                                                                                           | 104                  |
| 7                                                        | <b>CON</b><br>7.1<br>7.2 | CLUSIO<br>Conclus<br>Future \      | N AND FUTURE WORKS<br>sion<br>Works                                                                                                                                 | 106<br>106<br>106    |
| REFERENCES<br>BIODATA OF STUDENT<br>LIST OF PUBLICATIONS |                          |                                    | 108<br>123<br>124                                                                                                                                                   |                      |

## LIST OF TABLES

| Table |                                                                                                                                                                                              | Page |
|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 2.1   | Comparison of energy-constrained and power-<br>constrained design                                                                                                                            | 21   |
| 2.2   | Ultra-low voltage standard cell library design optimization techniques comparison                                                                                                            | 30   |
| 3.1   | ISCAS'89 benchmark circuits information                                                                                                                                                      | 44   |
| 3.2   | Test chip design specification                                                                                                                                                               | 49   |
| 3.3   | Lab equipment information for test chip validation                                                                                                                                           | 50   |
| 4.1   | Time reduction and error of the model with different number of synthesis points for circuit s9234                                                                                            | 58   |
| 4.2   | Time reduction and error of the model for ISCAS'89 benchmark circuits                                                                                                                        | 60   |
| 4.3   | Time reduction and error of the model with different frequency intervals for circuit s9234                                                                                                   | 61   |
| 5.1   | Optimal P/N ratio comparison of T12 inverter designed in 110nm process technology using proposed FD layout structure method at $V_{DD}$ =1.2V, 0.8V, 0.6V, and 0.4V                          | 68   |
| 5.2   | Standard cell architecture related parameters for T7, T9, T12, and T14 designed in 110nm process technology                                                                                  | 69   |
| 5.3   | Standard cell library cell list for T7, T9, T12, and T14 designed in 110nm process technology.                                                                                               | 69   |
| 5.4   | Standard cells delay, energy, area, and leakage power comparison for different cell height libraries designed in 110nm process technology (Corner = TT, $V_{DD}$ = 0.6V, Temperature = 25°C) | 71   |
| 5.5   | Place & route results comparison for different cell height libraries designed in 110nm process technology at 0.6V operation                                                                  | 73   |
| 5.6   | Comparison of proposed work to state-of-the-art                                                                                                                                              | 76   |
| 6.1   | Relative change of carrier mobility and threshold voltage<br>for 130nm process devices based on different layout<br>structures                                                               | 87   |

6.2 Threshold voltage, on- and off-current for 130nm process 87 devices based on different layout structures (Corner = TT,  $V_{DD} = 0.4V$ , Temperature = 25°C) 6.3 INVX2 Cell area of different layout structures 91 6.4 Frequency variability of different layout structures in 92 130nm FO4 ring oscillator circuit (Corner = TT, V<sub>DD</sub> = 0.4V, Temperature =  $25^{\circ}C$ ) 6.5 95 Standard cell library cell list designed in 130nm process technology for T8MPSH, T8MPMH, and T6MFSH. 6.6 Standard cell energy, delay, and area comparison for 96 different layout structure libraries designed in 130nm process technology (Corner = TT,  $V_{DD}$  = 0.4V, Temperature =  $25^{\circ}C$ ) 6.7 Standard cells gate capacitance comparison for different 99 layout structure libraries designed in 130nm process technology (Corner = TT, V<sub>DD</sub> = 0.4V, Temperature = 25°C) 6.8 Standard cells leakage power comparison for different 100 layout structure libraries designed in 130nm process technology (Corner = TT, V<sub>DD</sub> = 0.4V, Temperature = 25°C)

xiv

## LIST OF FIGURES

| Figure |                                                                                                                                                          | Page |
|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 1.1    | Energy consumption reduction and supply voltage scaling over the process technology scaling. (Reproduced from [13]).                                     | 2    |
| 1.2    | Circuit energy and delay over a wide voltage scaling range. (Reproduce from [17]).                                                                       | 3    |
| 2.1    | 8-bit Kogge-Stone adder design (a) schematic and (b) layout using the standard cell-based design approach. (Reproduced from [39]).                       | 10   |
| 2.2    | Digital circuit (or ASIC) design flow using the standard cell-based approach. (Reproduced from [55]).                                                    | 11   |
| 2.3    | Inverter cell design (a) schematic and (b) layout.<br>(Reproduced from [15]).                                                                            | 12   |
| 2.4    | Standard cell design flow. (Reproduced from [59]).                                                                                                       | 13   |
| 2.5    | Example of design regularity of standard cell.<br>(Reproduced from [61]).                                                                                | 13   |
| 2.6    | Intel 10nm standard cell library with different heights for diverse PPA tradeoff. (Reproduced from [33]).                                                | 14   |
| 2.7    | Propagation delays of an inverter cell (timing diagram).<br>(Reproduced from [15]).                                                                      | 15   |
| 2.8    | Leakage power and technology scaling. (Reproduced from [75]).                                                                                            | 19   |
| 2.9    | Alternative approaches for differentiating power and energy                                                                                              | 21   |
| 2.10   | Performance and energy operation versus supply voltage $V_{DD}$ . (Reproduced from [3]).                                                                 | 22   |
| 2.11   | PDP of a FO4 inverter with various NMOS and PMOS widths at 0.3V. (Reproduced from [115]).                                                                | 32   |
| 2.12   | Energy-delay design space with all the possible design<br>solutions result from different gate sizing for a flip-flop<br>circuit. (Reproduced from [9]). | 34   |
| 2.13   | Energy efficient curve and designs optimizing the metrics <i>ED</i> . (Reproduced from [136]).                                                           | 35   |

| 2.14 | The optimal energy-delay curve bounded on the multiple energy-delay curves that are obtained from gate sizing at different supply voltages. (Reproduced from [9]). | 36 |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.15 | Energy-delay curves for the two different circuit topologies A and B. (Reproduced from [138]).                                                                     | 36 |
| 3.1  | The overview of the research.                                                                                                                                      | 40 |
| 3.2  | The design methodology for the standard cell-based ASIC design approach.                                                                                           | 41 |
| 3.3  | Standard cell library design flow and EDA tools used.                                                                                                              | 43 |
| 3.4  | ASIC implementation design flow and EDA tools used.                                                                                                                | 43 |
| 3.5  | 16-b <mark>it Brent-Kung Adder pref</mark> ix tree structure [39].                                                                                                 | 45 |
| 3.6  | AMBA AHB controller block diagram [40].                                                                                                                            | 46 |
| 3.7  | Synopsys DW8051 processor core block diagram [41].                                                                                                                 | 46 |
| 3.8  | Cortex-M0 processor block diagram [42].                                                                                                                            | 47 |
| 3.9  | 8-bit microcontroller test chip (a) architecture and (b) micro-architecture overview.                                                                              | 48 |
| 3.10 | Circuit board implementation (a) test chip-on-board and (b) validation board.                                                                                      | 50 |
| 3.11 | Test chip validation environment setup: (a) functional validation and power measurement, (b) leakage power measurement with temperature control.                   | 51 |
| 4.1  | Area-delay tradeoff curves of ISCAS'89 benchmark circuit s9324 implemented by using 3 commercial 65nm LP libraries: track-8, track-9, and track-12.                | 56 |
| 4.2  | Area-delay curve of circuit s9234 generated by the proposed model as the equation (4.1) using 6 synthesis points.                                                  | 57 |
| 4.3  | Tradeoff between model error (MAPE) and synthesis runtime.                                                                                                         | 59 |
| 4.4  | Area-delay curve comparison of ARM Cortex-M0 for 65nm LP 9-track library implementation with different transistor flavors: RVT, HVT, and LVT.                      | 62 |
| 5.1  | Physical design of the (a) full diffusion ( $W = W_{max}$ ) and (b) non-full diffusion layout structure ( $W < W_{max}$ ).                                         | 65 |

| 5.2 | T12 Inverter X1 PMOS and NMOS sizing design space.                                                                                                                              | 66 |
|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 5.3 | (a) Propagation delay and (b) energy of T12 Inverter X1 over PMOS widths in 110nm process technology (Corner = TT, $V_{DD}$ = 0.6V, Temperature = 25°C).                        | 67 |
| 5.4 | Recharacterization process of reference library to obtain 0.6V timing and power model                                                                                           | 70 |
| 5.5 | Energy-delay design space for AMBA AHB controller implementation using reference library, T7, T9, T12, and T14 library at 0.6V operation.                                       | 72 |
| 5.6 | 8051-based microcontroller micrograph and characteristics                                                                                                                       | 74 |
| 5.7 | Measured frequency and energy consumption across a wide voltage range.                                                                                                          | 75 |
| 5.8 | Measured (a) active energy consumption distribution<br>running at 35MHz and (b) leakage energy consumption<br>distribution for 10 chips at 0.6V operation.                      | 75 |
| 5.9 | Measured leakage energy consumption versus supply voltage at 0°C, 25°C and 75°C temperature.                                                                                    | 76 |
| 6.1 | Threshold voltage ( <i>V</i> <sub>th</sub> ) versus transistor width for 130nm (a) NMOS and (b) PMOS devices.                                                                   | 79 |
| 6.2 | Transistor on-current ( <i>I</i> <sub>on</sub> ) versus transistor width for 130nm (a) NMOS and (b) PMOS devices.                                                               | 79 |
| 6.3 | PTS breaks down PMOS and NMOS devices in an inverter from <i>N</i> -sizes into <i>N</i> -times of multiple small transistors with single device sizes (i.e., $W_p$ and $W_n$ ). | 80 |
| 6.4 | Threshold voltage ( $V_{th}$ ) versus device sizing through PTS method for 130nm (a) NMOS and (b) PMOS devices.                                                                 | 81 |
| 6.5 | Transistor on-current ( $I_{on}$ ) versus device sizing through PTS for 130nm (a) NMOS and (b) PMOS devices.                                                                    | 81 |
| 6.6 | Current efficiency versus transistor width for 130nm (a) NMOS and (b) PMOS devices (Corner = TT, $V_{DD} = 0.4V$ , Temperature = 25°C).                                         | 82 |
| 6.7 | 130nm Inverter (a) delay and energy and (b) energy-delay products metrics vary with P/N ratio (Corner = TT, $V_{DD}$ = 0.4V, Temperature = 25°C).                               | 82 |

| 6 | 6.8  | Example of (a) two NMOS series-stack in NAND2 gate, and (b) two PMOS series-stack in NOR2 gate.                                                                                                                               | 83  |
|---|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 6 | 6.9  | Device-sizing for various series-stack of (a) NMOS and (b) PMOS by using SDS and PTS methods.                                                                                                                                 | 84  |
| 6 | 5.10 | The circuit design of (a) INV, (b) NAND2, (c) NOR2 using PTS method.                                                                                                                                                          | 85  |
| 6 | 5.11 | PTS layout structures: (a) multi-finger and (b) multiplier.                                                                                                                                                                   | 85  |
| 6 | 5.12 | The device layout parameters LOD: SA and SB.                                                                                                                                                                                  | 86  |
| 6 | 5.13 | Various layout compositions for INVX2 based on [34]: (a) conventional monolithic width (T8MW) structure, (b) multiplier (T8MP) structure, and (c) multi-finger (T8MF) structure.                                              | 88  |
| 6 | 5.14 | Proposed layout composition for INVX2 that optimized<br>from the (a) multi-finger (T8MF) structure: (b) layout<br>optimized multi-finger (T8MF optimized) structure, and (c)<br>reduced height multi-finger (T6MF) structure. | 89  |
| 6 | 5.15 | 130nm INVX2 FO4 ring oscillator with various layout compositions. (a) Frequency, (b) energy, and (c) EDP versus supply voltage ( $V_{DD}$ ) in the NTV region (Corner = TT, Temperature = 25°C).                              | 90  |
| 6 | 5.16 | Two inputs NAND gate with (a) multiplier structure in 8-<br>track and (b) multi-finger structure in 6-track design.                                                                                                           | 93  |
| 6 | 6.17 | Double height cells design for 8-track multiplier structure:<br>(a) INVX4 (b) NAND2X2 (c) NOR2X2.                                                                                                                             | 93  |
| 6 | 5.18 | Power-delay curves of each developed library for Cortex-<br>M0 design with different gate sizing.                                                                                                                             | 101 |
| 6 | 5.19 | (a) Energy-delay curves and (b) area-delay curves of each developed library for Cortex-M0 design with different gate sizing.                                                                                                  | 103 |
| G | 5.20 | Energy consumption and energy saving of T6MFSH library over the wide voltage scaling for Cortex-M0 processor.                                                                                                                 | 105 |

## LIST OF ABBREVIATIONS

| AES              | Advanced Encryption Standard              |
|------------------|-------------------------------------------|
| AHB              | Advanced High-performance Bus             |
| AMBA             | Advanced Microcontroller Bus Architecture |
| ASIC             | Application Specific Integrated Circuit   |
| CMOS             | Complementary Metal-Oxide-Semiconductor   |
| DFF              | D Flip-Flop                               |
| DIBL             | Drain-Induced Barrier Lowering            |
| DRC              | Design Rules Check                        |
| EDA              | Electronic Design Automation              |
| EDP              | Energy-Delay Product                      |
| ED <sup>2</sup>  | Energy-Delay <sup>2</sup>                 |
| FBB              | Forward Body Biasing                      |
| FD               | Full Diffusion                            |
| FF               | Fast-Fast Corner                          |
| FO4              | Fan-out of 4                              |
| FPGA             | Field Programmable Gate Arrays            |
| GPIO             | General Purpose Input/Output              |
| HVT              | High Threshold Voltage                    |
| IC               | Integrated Circuit                        |
| l <sup>2</sup> C | Inter-Integrated Circuit                  |
| INWE             | Inverse Narrow Width Effect               |
| I/O              | Input/Output                              |
| loT              | Internet-of-Things                        |
| IR Drop          | Voltage Drop                              |

C)

| LOCOS           | Local Oxidation of Silicon                        |
|-----------------|---------------------------------------------------|
| LOD             | Length of Oxide Diffusion                         |
| LVS             | Layout Versus Schematic                           |
| LVT             | Low Threshold Voltage                             |
| MAPE            | Mean Absolute Percentage Error                    |
| MEP             | Minimum Energy Point                              |
| MF              | Multi-finger                                      |
| мн              | Multi-row Height                                  |
| MOSFET          | Metal-Oxide-Semiconductor Field Effect Transistor |
| MP              | Multiplier                                        |
| MW              | Monolithic Width                                  |
| nFD             | Non-Full Diffusion                                |
| NMOS            | N-channel Metal-Oxide-Semiconductor               |
| NTV             | Near-Threshold Voltage                            |
| Parasitic RC    | Parasitic Resistance and Capacitance              |
| PD <sup>2</sup> | Power-Delay <sup>2</sup>                          |
| PDN             | Pull-Down Network                                 |
| PDP             | Power-Delay Product                               |
| PEX             | Parasitic Extraction                              |
| PMOS            | P-channel Metal-Oxide-Semiconductor               |
| P/N Ratio       | PMOS-to-NMOS Ratio                                |
| PPA             | Power, Performance, and Area                      |
| PTS             | Parallel-Transistor-Stacking                      |
| PUN             | Pull-Up Network                                   |
| RAM             | Random Access Memory                              |
| RBB             | Reverse Body Biasing                              |

| RC delay model | Resistor-capacitor delay model              |
|----------------|---------------------------------------------|
| RISC           | Reduced Instruction Set Computer            |
| ROM            | Read Only Memory                            |
| RSCE           | Reverse Short Channel Effect                |
| RTL            | Register Transfer Level                     |
| RVT            | Regular Threshold Voltage                   |
| SCE            | Short Channel Effect                        |
| SDS            | Single Device Sizing                        |
| SH             | Single-row Height                           |
| SNM            | Static Noise Margin                         |
| SoC            | System on Chip                              |
| SRAM           | Static Random Access Memory                 |
| SS             | Slow-Slow Corner                            |
| STI            | Shallow Trench Isolation                    |
| STV            | Sub-Threshold Voltage                       |
| Тл             | Track-n                                     |
| п              | Typical-typical Corner                      |
| UART           | Universal Asynchronous Receiver-Transceiver |
| VHDL           | VHSIC Hardware Description Language         |
| VLSI           | Very Large-Scale Integration                |
| °C             | degree Celsius                              |
|                |                                             |

xxi

## CHAPTER 1

### INTRODUCTION

## 1.1 Background

Over the decades, the exponentially increased transistor density in the integrated circuits (IC) due to the Complementary-Metal-Oxide-Semiconductor (CMOS) technology scaling has allowed more functionalities to be compacted in a single chip and become a system-on-chip (SoC) [1]. The increasing complexity in the SoC enables the standard cell-based design approach to fasten the time-to-market of the product, where the standard cells are the pre-design logic gates that able to be reusable for different circuit block design [2]. The collection of the standard cells in a library form can be optimally designed for different SoC requirements such as high-performance, low power, or smaller area.

The appearing of multi-functional SoC also leads to the diversification of semiconductor applications into different market segments such as healthcare, agriculture, automotive, communication, and consumer electronics. The diversity of applications shifted the primary design concern of integrated circuits from the speed and area to the power and energy consumption due to the different requirement needs. For instance, the battery-operated devices that are used for Internet-of-Things (IoT), wearables, and biomedical sensors require a limited energy budget to sustain the battery operating lifetime [3]. Even the high-performance computation servers used in data centers require limited power usage due to high operational costs [4].

Focusing on power or energy minimization in the design does not imply that the design performance should be ignored. The appropriated optimization should either minimize the energy consumption for a given timing requirement or maximize the performance within an energy budget [5]. This energy-performance relationship has arisen numerous research on the optimization techniques across various layers of design abstraction, from the device and circuit to the micro-architecture level. Several common optimization techniques include transistor sizing [6], [7], gate sizing [5], [8], supply voltage scaling [5], [9], threshold voltage tuning [5], [10], body biasing [11], pipelining and parallelism [9], power-gating [12], and clock-gating [12]. Jointly implementation of the techniques across the layers could achieve the global optimal solution. However, careful consideration is needed to avoid redundant area overhead and/or performance degradation.

Among the existing optimization techniques, supply voltage scaling is a wellknown technique for improving the energy efficiency of the circuit due to the quadratic and linear dependency of supply voltage on the dynamic and leakage energy respectively [5], [14], [15]. One of the reasons that the energy consumption is reduced with the technology node scaling is because of the scaling of supply voltage. However, the supply voltage has almost remained constant around 65nm node and no longer delivers significant energy gains as shown in Figure 1.1 [13]. This is because substantially voltage downscaling exacerbates the performance degradation [14], [16].



Figure 1.1 : Energy consumption reduction and supply voltage scaling over the process technology scaling. (Reproduced from [13]).

Nevertheless, the ultra-low voltage design approach has come into sight in recent years due to the acceptable range of performance from a hundred kHz to a few MHz designs in the IoT and biomedical applications [17]. The supply voltage is aggressively scaled from the nominal voltage down to the near-threshold (NTV) and sub-threshold (STV) voltages for the ultra-low voltage approach. Figure 1.2 illustrates the magnitude of energy reduction and delay degradation in a wide voltage scaling range. As the voltage scales down to the STV, the increase of leakage energy due to the increase of circuit delay eventually dominates the dynamic energy and results in a minimum energy point (MEP) as seen in the figure. Though the MEP is located in the STV region, many applications could not support this voltage range due to the exponential decrease in the circuit performance. As compared to STV operation, NTV operation sacrifices some of the energy savings with relatively higher performance. The performance gain in the NTV significantly expands the application space from the STV operation [17], [18].



Despite that the STV/NTV designs have been well explored in academic research, still, it is not common in the industry area [19]. Two major challenges that affect the robust operation in the ultra-low voltage regions are performance degradation and process variability. These forces the changes of the design techniques on the architecture, circuit, as well as standard cell library [3], [13], [14], [17], [19]–[25]. Because of the contradiction of energy and speed of the circuit, the optimization of both energy and performance is difficult to be delivered at the same time. The energy-performance optimization in the STV/NTV designs should be minimizing the energy via voltage scaling while pushing the speed

through other design techniques. To proliferate the ultra-low voltage design approach for energy-efficient design, more efforts are required to improve the performance and robustness of the circuit to achieve a certain application need.



Figure 1.2 : Circuit energy and delay over a wide voltage scaling range. (Reproduce from [17]).

## 1.2 Problem Statement

In the past, digital integrated circuit designs in a fully custom manner potentially maximize the performance with high density and low power characteristics. However, the increased complexity of the chip that requires deliberate design for stringent performance targets takes a huge amount of human and time effort [26]. In the modern digital IC design, the standard cell-based design approach has been introduced as the key matter to meet time-to-market requirements. With the aid of the electronic design automation (EDA) tool, the IC can be constructed by a group of pre-designed and characterized logic gates, which are known as standard cells. These standard cells with different logic functions and drive strengths are usually provided by the silicon foundry or created in-house. The collection of the standard cells in a group is called standard cell library, and they can be designed and optimized to meet different power, performance, area (PPA) design targets.

Most of the currently available digital standard cell libraries in the market are well optimized for super-threshold voltage operation [22], [25]. Because of the different transistor current characteristics in the STV/NTV region as compared to the super-threshold voltage region [14], [16], [17], the existing standard cell libraries are not optimal in terms of PPA when operating at ultra-low voltage region. Therefore, optimization of standard cell libraries that operates at STV/NTV regions is highly desirable for energy-efficient digital circuits.

To address the performance degradation issue in STV/NTV design, the transistors in the standard cell need to be carefully resized [27]–[29]. Minimum transistor sizes that result in the minimum energy could worsen the delay variability and deteriorate the robustness of standard cell [30]; while the transistor sizes that ensure the reliability of standard cell is impractically large [14]. Joint design techniques with transistor sizing should be considered for ultra-low voltage standard cell design to have robust operation and better PPA optimization. In the super-threshold voltage standard cell design, cell height is one of the important parameters that are used to address the different PPA targets of the circuits. Taller height cells provide larger current drives but with larger area and power consumption; In contrast, shorter height cells result in relatively lower power and area with weaker drive strength [31]–[33]. However, the cell height parameter does not take much attention from the researcher that works on STV/NTV design.

Since the transistor drive current is exponentially dependent on the threshold voltage in the STV/NTV region, the device parasitic effect such as inverse narrow width effect (INWE) and reverse short channel effect (RSCE) now shows a significant impact on the transistor's delay [22]. In contrast to the traditional method, the transistor sizing with INWE and RSCE consideration could lead to higher current drive, and thus, faster performance. However, the effectiveness of the INWE and RSCE are depends on the process and might cause the increase of leakage current and area. The proposed INWE-aware transistor implementation for ultra-low voltage operation in [22] can realize in either multifinger or multiplier layout structure. Although the transistor sizes are the same for both layout structures, they exhibit different energy-delay results [34]. However, the previous works' exploration on the INWE-aware layout structures comparison only evaluated on the inverter cell using the ring oscillator circuit, which does not present the results of the other complex circuit blocks that contain different cell functions. Again, the impact of standard cell height on energy and performance has not been studied in the previous research.

 $\bigcirc$ 

For the standard cell libraries evaluation within the context of the circuit blocks, the exploration of the energy (or area) performance tradeoff of a certain tuning parameter can be observed through the energy efficiency curve. The energy/area efficiency curve, which sometimes is known as the energy/area-delay tradeoff curve [35] or Pareto optimal curve [36], is the optimal energy/area-delay boundary corresponding to the specific parameter(s) tuning in the energy/area-delay design space. To obtain the energy efficiency curve in the standard cell-based design approach, multiple synthesis runs are required to

result in various energy/area-delay solutions. The number of synthesis runs is depended on the energy/area-delay range target, and the prior study in [37] performed about 25-30 synthesis runs to obtain an energy (and area) efficiency curve. For the impractically large circuits, it might take a few hours to days in performing the multiple synthesis runs [38]. The evaluation of multiple standard cell tuning parameters (i.e., supply voltage, INWE-aware layout, and standard cell height) in ultra-low voltage design even increases the number of syntheses runs. This causes the standard cell libraries evaluation by using the energy/area efficiency curve to become more tedious and time-consuming. Therefore, a fast estimation or modeling of the energy/area efficiency curve is required for libraries evaluation.

#### 1.3 Aim and Objectives

The main aim of this research is to propose energy-performance-area optimized standard cell libraries for near-threshold voltage operation. The following objectives are set to support the aim:

- 1. To develop an area-efficiency curve modeling framework for analyzing and evaluating the area-performance tradeoff of the standard cell libraries at the circuit block level.
- 2. To develop standard cell library using the joint techniques of transistor sizing with full diffusion layout structure and cell height tuning in optimizing the energy and performance.
- 3. To develop the INWE-aware layout structure with reduced cell height for energy-efficient standard cell library.

#### 1.4 Thesis Scope

The optimization of a digital integrated circuit can be performed over different layers of design abstraction from device, circuit to micro-architecture as aforementioned. The scope of this thesis focuses on the standard cell library optimization and evaluation since the standard cells are the fundamental building blocks of the digital integrated circuit. The standard cell library optimizations mainly focus on the NTV operation to achieve better energy efficiency than requires by the battery-powered applications, such as IoT sensors, wearable, and biomedical devices. NTV operation not only benefits from the energy saving, but it also has relatively higher performance as compared to STV operation. Generally, the optimization of digital integrated circuits targets the PPA. However, energy consumption is being considered in this study instead of power, where energy is the derivation of the power and performance.

Multiple EDA tools were employed to develop the standard cell libraries as well as the implementation of Application Specific Integrated Circuit (ASIC) for the

evaluation of libraries. During the standard cell libraries development, *Cadence Virtuoso* was used for schematic and layout custom design, *Mentor Calibre* was used for physical verification, *Synopsys Hspice* was used for functional verification, and *Synopsys Liberty NCX* was used for standard cell characterization. Whereas during the ASIC implementation, *Synopsys VCS* was used for register transfer level (RTL) and gate-level simulation, *Synopsys Design Compiler* was used for RTL synthesis and optimization, *Synopsys IC Compiler* was used for place and route, *Synopsys PrimeTime* was used for timing closure signoff, and *Mentor Calibre* was used for physical verification signoff.

In this study, several ASIC benchmark circuits with different functions and various number of gates, ranging from 400 to 200,000 gates were used to evaluate the developed standard cell libraries. Those circuits include the 32-bit Brent-Kung adder [39], AMBA AHB controller [40], Synopsys DW8051 processor core [41], ARM Cortex-M0 processor core [42], and AES-256 encryption core [43]. The data path block, 32-bits Brent-Kung adder is self-developed based on the Brent Kung adder architecture [39], while the AMBA AHB controller is an open-source bus controller block obtained from the ARM Design Start website [40]. Both 8-bits DW8051 and 32-bits Cortex-M0 processor cores are proprietary circuits owned by the Synopsys and ARM respectively. Since they are proprietary circuits, the Verilog RTL for both processors are encrypted and unable to viewed by the designer. Although the RTL could not be viewed, the implementation of synthesis, place and route still can be performed using the EDA tools. The AES-256 encryption core benchmark is taken from the OpenCores website [43]. For the modeling of the area-efficiency curve, the benchmark circuits from ISCAS'89 [44], which contain both combinational and sequential cells, were employed.

The CMOS process technologies that employed for the standard cell library design and evaluation throughout the thesis were different. Three existing commercial standard cell libraries which developed in TSMC 65nm process were used for evaluating the proposed area-efficient curve modeling framework because these libraries were commonly used by the industry design and academic research. Whereas the NTV standard cell library development with FD layout structure and INWE-aware layout structure were implemented in Silterra 110nm and 130nm process respectively due to the limited access to the leading process design kit (i.e., 65mn and beyond) and the chip tape-out requirement that based on the research grant funding.

Since the NTV for Silterra 110nm and 130nm process is ranging from 0.4V to 0.6V, any supply voltage value within this range can be used for the NTV design operation. However, 0.6V was applied for the standard cell library with the FD layout structure to fulfill the timing requirement of the DW8051 design. Whereas 0.4V was applied for the standard cell library with INWE-aware layout structure due to the effect of INWE to the device current is much larger at 0.4V as compared to 0.6V.

## 1.5 Thesis Organization

This section provides an overview of the thesis structure.

Chapter 1 briefly introduces the background of research on the ultra-low voltage design approach, problem statement, and the aim of the research.

Chapter 2 presents the literature review on the state-of-the-art research. The details of STV and NTV design techniques and challenges are also discussed. Besides, the related works to the standard cell library optimization in ultra-low voltage regions are presented in the same chapter.

Chapter 3 discusses the design flows of ASIC implementation and standard cell library development in this research. The discussion includes the EDA tools used, design environment setup and constraints, design-related parameters, and the benchmark circuits for evaluation.

Chapter 4 presents a modeling framework for the area-efficiency curve that use to evaluate the standard cell library at the circuit level. This chapter describes the existing area-efficiency curve generated using the commercial synthesis tool and then demonstrates the proposed framework to model the area-efficiency curve. The model framework is evaluated using multiple standard cell libraries and benchmark circuits.

Chapter 5 proposes a joint optimization technique that considers the transistor sizing and standard cell height tuning in optimizing the energy and performance for NTV operation. A transistor sizing method with layout consideration is discussed. The latter part of the chapter discusses the implementation of different standard cell height libraries incorporated with the proposed transistor sizing method.

Chapter 6 explores the impact of different device layout structures that utilize INWE on energy, performance, and area for NTV operation. This chapter also proposes a reduced cell height architecture for further energy-performance optimization. The evaluation of the proposed structure is demonstrated in cell-and block-level design.

Chapter 7 concludes the contributions of this research and ends with some recommendations of the possible future work.

## REFERENCES

- [1]. R. Saleh et al., "System-on-Chip: Reuse and Integration," Proceedings of the IEEE, vol. 94, no. 6, pp. 1050–1069, Jun. 2006, doi: 10.1109/JPROC.2006.873611.
- [2]. W.-K. Chen, The VLSI handbook. CRC press, 1999.
- [3]. M. Alioto, Enabling the Internet of Things: From Integrated Circuits to Integrated Systems, 1st ed. Springer Publishing Company, Incorporated, 2018. doi: https://doi.org/10.1007/978-3-319-51482-6.
- [4]. Y. Shao, Q. Yang, Y. Gu, Y. Pan, Y. Zhou, and Z. Zhou, "A Dynamic Virtual Machine Resource Consolidation Strategy Based on a Gray Model and Improved Discrete Particle Swarm Optimization," IEEE Access, vol. 8, pp. 228639–228654, Dec. 2020, doi: 10.1109/ACCESS.2020.3046318.
- [5]. D. Markovic, V. Stojanovic, B. Nikolic, M. A. Horowitz, and R. W. Brodersen, "Methods for true energy-performance optimization," IEEE Journal of Solid-State Circuits, vol. 39, no. 8, pp. 1282–1293, Aug. 2004, doi: 10.1109/JSSC.2004.831796.
- [6]. A. E. Fishburn J. P. and Dunlop, "TILOS: A Posynomial Programming Approach to Transistor Sizing," in The Best of ICCAD: 20 Years of Excellence in Computer-Aided Design, A. Kuehlmann, Ed. Boston, MA: Springer US, 2003, pp. 295–302. doi: 10.1007/978-1-4615-0292-0\_23.
- [7]. S. S. Sapatnekar, V. B. Rao, P. M. Vaidya, and S.-M. Kang, "An exact solution to the transistor sizing problem for CMOS circuits using convex optimization," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 11, pp. 1621–1634, Nov. 1993, doi: 10.1109/43.248073.
- [8]. C.-P. Chen, C. C. N. Chu, and D. F. Wong, "Fast and exact simultaneous gate and wire sizing by Lagrangian relaxation," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 7, pp. 1014–1025, Jul. 1999, doi: 10.1109/43.771182.
- [9]. H. Q. Dao, B. R. Zeydel, and V. G. Oklobdzija, "Energy optimization of pipelined digital systems using circuit sizing and supply scaling," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 2, pp. 122–134, Feb. 2006, doi: 10.1109/TVLSI.2005.863760.
- [10]. S. Zhou, H. Yao, Q. Zhou, and Y. Cai, "Minimization of Circuit Delay and Power through Gate Sizing and Threshold Voltage Assignment," in 2011 IEEE Computer Society Annual Symposium on VLSI, Jul. 2011, pp. 212– 217. doi: 10.1109/ISVLSI.2011.29.
- [11]. K. von Arnim et al., "Efficiency of body biasing in 90 nm CMOS for low power digital circuits," in Proceedings of the 30th European Solid-State

Circuits Conference, Sep. 2004, pp. 175–178. doi: 10.1109/ESSCIR.2004.1356646.

- [12]. D. Flynn, R. Aitken, A. Gibbons, and K. Shi, Low power methodology manual: for system-on-chip design. Springer Science & Business Media, 2007. doi: https://doi.org/10.1007/978-0-387-71819-4.
- [13]. R. G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge, "Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits," Proceedings of the IEEE, vol. 98, no. 2, pp. 253–266, Feb. 2010, doi: 10.1109/JPROC.2009.2034764.
- [14]. B. H. Calhoun, A. Wang, and A. Chandrakasan, "Modeling and sizing for minimum energy operation in subthreshold circuits," IEEE Journal of Solid-State Circuits, vol. 40, no. 9, pp. 1778–1786, Sep. 2005, doi: 10.1109/JSSC.2005.852162.
- [15]. J. M. Rabaey, A. P. Chandrakasan, and B. Nikolic, Digital integrated circuits, 2nd ed., vol. 2. Prentice Hall Englewood Cliffs, 2002.
- [16]. M. Alioto, "Understanding DC Behavior of Subthreshold CMOS Logic Through Closed-Form Analysis," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 57, no. 7, pp. 1597–1607, Jul. 2010, doi: 10.1109/TCSI.2009.2034233.
- [17]. D. Markovic, C. C. Wang, L. P. Alarcon, T.-T. Liu, and J. M. Rabaey, "Ultralow-Power Design in Near-Threshold Region," Proceedings of the IEEE, vol. 98, no. 2, pp. 237–252, Feb. 2010, doi: 10.1109/JPROC.2009.2035453.
- [18]. Y. Chen and H. Jiao, "Standard Cell Optimization for Ultra-Low-Voltage Digital Circuits," in 2019 International Conference on IC Design and Technology (ICICDT), 2019, pp. 1–4. doi: 10.1109/ICICDT.2019.8790931.
- [19]. K. Singh and J. de Gyvez, "Twenty Years of Near/Sub-Threshold Design Trends and Enablement," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 68, no. 1, pp. 5–11, Jan. 2021, doi: 10.1109/TCSII.2020.3040970.
- [20]. M. Alioto, "Ultra-Low Power VLSI Circuit Design Demystified and Explained: A Tutorial," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 59, no. 1, pp. 3–29, Jan. 2012, doi: 10.1109/TCSI.2011.2177004.
- [21]. V. De, S. Vangal, and R. Krishnamurthy, "Near Threshold Voltage (NTV) Computing: Computing in the Dark Silicon Era," IEEE Design & Test, vol. 34, no. 2, pp. 24–30, Apr. 2017, doi: 10.1109/MDAT.2016.2573593.
- [22]. J. Zhou, S. Jayapal, B. Busze, L. Huang, and J. Stuyt, "A 40 nm Dual-Width Standard Cell Library for Near/Sub-Threshold Operation," IEEE

Transactions on Circuits and Systems I: Regular Papers, vol. 59, no. 11, pp. 2569–2577, Nov. 2012, doi: 10.1109/TCSI.2012.2190674.

- [23]. S. and I. T. and O. H. Kondo Masahiro and Nishizawa, "A Standard Cell Optimization Method for Near-Threshold Voltage Operations," in Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation, 2013, pp. 32–41. doi: https://doi.org/10.1007/978-3-642-36157-9\_4.
- [24]. H. Zhang, W. He, Y. Sun, and M. M. Seok, "An Energy-Efficient Logic Cell Library Design Methodology with Fine Granularity of Driving Strength for Near- and Sub-Threshold Digital Circuits," in 2021 IEEE International Symposium on Circuits and Systems (ISCAS), May 2021, pp. 1–5. doi: 10.1109/ISCAS51556.2021.9401508.
- [25]. Y. Chen, Y. Nie, and H. Jiao, "An Ultralow-Power 65-nm Standard Cell Library for Near/Subthreshold Digital Circuits," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 30, no. 5, pp. 676–680, May 2022, doi: 10.1109/TVLSI.2022.3151500.
- [26]. E. Shaphir, R. Pinter, and S. Wimer, "Efficient cell-based migration of VLSI layout," Optimization and Engineering, vol. 16, no. 1, pp. 203–223, Jun. 2015, doi: 10.1007/s11081-014-9257-7.
- [27]. J. Keane, H. Eom, T.-H. Kim, S. Sapatnekar, and C. Kim, "Stack Sizing for Optimal Current Drivability in Subthreshold Circuits," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 5, pp. 598–602, May 2008, doi: 10.1109/TVLSI.2008.917571.
- [28]. M. Nabavi, F. Ramezankhani, and M. Shams, "Optimum pMOS-to-nMOS Width Ratio for Efficient Subthreshold CMOS Circuits," IEEE Transactions on Electron Devices, vol. 63, no. 3, pp. 916–924, Mar. 2016, doi: 10.1109/TED.2016.2517446.
- [29]. R. Liao and C. Hutchens, "Digital circuit design for robust ultra-low-power cell library using optimum fingers," in 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS), Aug. 2012, pp. 446–449. doi: 10.1109/MWSCAS.2012.6292053.
- [30]. J. Kwong, Y. K. Ramadass, N. Verma, and A. P. Chandrakasan, "A 65 nm Sub-V\_t Microcontroller With Integrated SRAM and Switched Capacitor DC-DC Converter," IEEE Journal of Solid-State Circuits, vol. 44, no. 1, pp. 115–126, Jan. 2009, doi: 10.1109/JSSC.2008.2007160.
- [31]. S. A. Dobre, A. B. Kahng, and J. Li, "Design Implementation With Noninteger Multiple-Height Cells for Improved Design Quality in Advanced Nodes," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 4, pp. 855–868, Apr. 2018, doi: 10.1109/TCAD.2017.2731679.

- [32]. X. Xu, N. Shah, A. Evans, S. Sinha, B. Cline, and G. Yeric, "Standard cell library design and optimization methodology for ASAP7 PDK: (Invited paper)," in 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2017, pp. 999–1004. doi: 10.1109/ICCAD.2017.8203890.
- [33]. X. Wang et al., "Design-Technology Co-Optimization of Standard Cell Libraries on Intel 10nm Process," in 2018 IEEE International Electron Devices Meeting (IEDM), Dec. 2018, pp. 28.2.1-28.2.4. doi: 10.1109/IEDM.2018.8614662.
- [34]. J. Jun, J. Song, and C. Kim, "A Near-Threshold Voltage Oriented Digital Cell Library for High-Energy Efficiency and Optimized Performance in 65nm CMOS Process," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 5, pp. 1567–1580, May 2018, doi: 10.1109/TCSI.2017.2758793.
- [35]. F. Sheikh, M. Ler, R. Zlatanovici, D. Markovic, and B. Nikolic, "Power-Performance Optimal DSP Architectures and ASIC Implementation," in 2006 Fortieth Asilomar Conference on Signals, Systems and Computers, Oct. 2006, pp. 1480–1485. doi: 10.1109/ACSSC.2006.355004.
- [36]. S. Wang, A. Pan, C. O. Chui, and P. Gupta, "PROCEED: A pareto optimization-based circuit-level evaluator for emerging devices," in 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2014, pp. 818–824. doi: 10.1109/ASPDAC.2014.6742991.
- [37]. M. Vujkovic and C. Sechen, "Optimized power-delay curve generation for standard cell ICs," in IEEE/ACM International Conference on Computer Aided Design, 2002. ICCAD 2002., Nov. 2002, pp. 387–394. doi: 10.1109/ICCAD.2002.1167563.
- [38]. S. K. Karandikar and S. S. Sapatnekar, "Fast estimation of area-delay trade-offs in circuit sizing," in 2005 IEEE International Symposium on Circuits and Systems (ISCAS), May 2005, pp. 3575-3578 Vol. 4. doi: 10.1109/ISCAS.2005.1465402.
- [39]. N. H. E. Weste and D. Harris, CMOS VLSI design: a circuits and systems perspective. Pearson Education India, 2015.
- [40]. ARM Limited, "AMBA Specification Rev. 2.0," May 1999. Accessed: Jun. 28, 2022. [Online]. Available: https://www.arm.com/architecture/system-architectures/amba/amba-specifications
- [41]. Synopsys Inc, DesignWare DW8051 MacroCell Databook. Synopsys Inc, 2008.
- [42]. J. Yiu, The Definitive Guide to ARM® Cortex®-M0 and Cortex-M0+ Processors. Academic Press, 2015.

- [43]. H. Homer, "AES Core Specification," OpenCores, Oct. 2012. [Online]. Available: https://opencores.org/projects/tiny\_aes
- [44]. F. Brglez, D. Bryan, and K. Kozminski, "Combinational profiles of sequential benchmark circuits," in 1989 IEEE International Symposium on Circuits and Systems (ISCAS), May 1989, pp. 1929–1934 vol.3. doi: 10.1109/ISCAS.1989.100747.
- [45]. B. Kick, U. Baur, J. Koehl, T. Ludwig, and T. Pflueger, "Standard-Cell-Based Design Methodology for High-Performance Support Chips," IBM J. Res. Dev., vol. 41, no. 4–5, pp. 505–514, Jul. 1997, doi: 10.1147/rd.414.0505.
- [46]. H. Eriksson, P. Larsson-Edefors, T. Henriksson, and C. Svensson, "Fullcustom vs. standard-cell design flow - an adder case study," in Proceedings of the ASP-DAC Asia and South Pacific Design Automation Conference, 2003., Jan. 2003, pp. 507–510. doi: 10.1109/ASPDAC.2003.1195069.
- [47]. D. G. Chinnery and K. Keutzer, "Closing the power gap between ASIC and custom: an ASIC perspective," in Proceedings. 42nd Design Automation Conference, 2005., Jun. 2005, pp. 275–280. doi: 10.1145/1065579.1065651.
- [48]. N. Shafiee, S. Tewari, B. Calhoun, and A. Shrivastava, "Infrastructure Circuits for Lifetime Improvement of Ultra-Low Power IoT Devices," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 9, pp. 2598–2610, Sep. 2017, doi: 10.1109/TCSI.2017.2693181.
- [49]. S. Kawar, S. Krishnan, and K. Abugharbieh, "Power Management for Energy Harvesting in IoT – A Brief Review of Requirements and Innovations," in 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), Aug. 2021, pp. 360–364. doi: 10.1109/MWSCAS47672.2021.9531846.
- [50]. Q. Lin et al., "Wearable Multiple Modality Bio-Signal Recording and Processing on Chip: A Review," IEEE Sensors Journal, vol. 21, no. 2, pp. 1108–1123, Jan. 2021, doi: 10.1109/JSEN.2020.3016115.
- [51]. J. Lee et al., "A Self-Tuning IoT Processor Using Leakage-Ratio Measurement for Energy-Optimal Operation," IEEE Journal of Solid-State Circuits, vol. 55, no. 1, pp. 87–97, Jan. 2020, doi: 10.1109/JSSC.2019.2939890.
- [52]. X. Liu, S. Kamineni, J. Breiholz, B. H. Calhoun, and S. Li, "A 194nW Energy-Performance-Aware IoT SoC Employing a 5.2nW 92.6% Peak Efficiency Power Management Unit for System Performance Scaling, Fast DVFS and Energy Minimization," in 2022 IEEE International Solid- State Circuits Conference (ISSCC), Feb. 2022, vol. 65, pp. 1–3. doi: 10.1109/ISSCC42614.2022.9731758.

- [53]. O. Azizi, A. Mahesri, B. C. Lee, S. J. Patel, and M. Horowitz, "Energy-Performance Tradeoffs in Processor Architecture and Circuit Design: A Marginal Cost Analysis," SIGARCH Comput. Archit. News, vol. 38, no. 3, pp. 26–36, Jun. 2010, doi: 10.1145/1816038.1815967.
- [54]. V. Singhal and G. Girishankar, "Optimal Gate Size Selection for Standard Cells in a Library," in 2006 IEEE Dallas/CAS Workshop on Design, Applications, Integration and Software, Oct. 2006, pp. 47–50. doi: 10.1109/DCAS.2006.321030.
- [55]. D. Chinnery and K. Keutzer, Closing the power gap between ASIC & custom: tools and techniques for low power design. Springer Science & Business Media, 2008.
- [56]. C. Fisher, R. Blankenship, J. Jensen, T. Rossman, and K. Svilich, "Optimization of standard cell libraries for low power, high speed, or minimal area designs," in Proceedings of Custom Integrated Circuits Conference, May 1996, pp. 493–496. doi: 10.1109/CICC.1996.510604.
- [57]. K. Golshan, Physical Design Essentials: An ASIC Design Implementation Perspective. Springer New York, NY, 2007. doi: https://doi.org/10.1007/978-0-387-46115-1.
- [58]. J. Wang, A. K. Wong, and E. Y. Lam, "Standard cell layout with regular contact placement," IEEE Transactions on Semiconductor Manufacturing, vol. 17, no. 3, pp. 375–383, Aug. 2004, doi: 10.1109/TSM.2004.831522.
- [59]. S. Newton, "EE6350 VLSI Design Lab: A 4th-Order Continuous-Time Analog Filter Designed using Standard Cells and Automatic Digital Logic Design Tools." Columbia University, Nov. 2015. [Online]. Available: https://www.ee.columbia.edu/~kinget/EE6350\_S15/07\_DigitalFilter\_Scott /diedesign.html
- [60]. Y. Lin et al., "MrDP: Multiple-row detailed placement of heterogeneoussized cells for advanced nodes," in 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2016, pp. 1–8. doi: 10.1145/2966986.2967055.
- [61]. W. Ye, B. Yu, D. Z. Pan, Y. Ban, and L. Liebmann, "Standard Cell Layout Regularity and Pin Access Optimization Considering Middle-of-Line," Proceedings of the 25th edition on Great Lakes Symposium on VLSI, 2015.
- [62]. H. Onodera, M. Hashimoto, and T. Hashimoto, "ASIC design methodology with on-demand library generation," in 2001 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.01CH37185), Jun. 2001, pp. 57–60. doi: 10.1109/VLSIC.2001.934194.
- [63]. M. Hashimoto and H. Onodera, "Post-layout transistor sizing for power reduction in cell-based design," in Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat.

No.01EX455), Feb. 2001, pp. 359–365. doi: 10.1109/ASPDAC.2001.913333.

- [64]. I. Sutherland, R. F. Sproull, B. Sproull, and D. Harris, Logical effort: designing fast CMOS circuits. Morgan Kaufmann, 1999.
- [65]. J.-M. Shyu, A. Sangiovanni-Vincentelli, J. P. Fishburn, and A. E. Dunlop, "Optimization-based transistor sizing," IEEE Journal of Solid-State Circuits, vol. 23, no. 2, pp. 400–409, Apr. 1988, doi: 10.1109/4.1000.
- [66]. S. Hu, M. Ketkar, and J. Hu, "Gate Sizing For Cell Library-Based Designs," in 2007 44th ACM/IEEE Design Automation Conference, Jun. 2007, pp. 847–852.
- [67]. S. Shah, A. Srivastava, D. Sharma, D. Sylvester, D. Blaauw, and V. Zolotov, "Discrete Vt assignment and gate sizing using a self-snapping continuous formulation," in ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005., Nov. 2005, pp. 705–712. doi: 10.1109/ICCAD.2005.1560157.
- [68]. O. Coudert, "Gate sizing for constrained delay/power/area optimization," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 5, no. 4, pp. 465–472, Dec. 1997, doi: 10.1109/92.645073.
- [69]. D. S. Kung and R. Puri, "Optimal P/N width ratio selection for standard cell libraries," in 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051), Nov. 1999, pp. 178–184. doi: 10.1109/ICCAD.1999.810645.
- [70]. B. S. Carlson and S.-J. Lee, "Delay optimization of digital CMOS VLSI circuits by transistor reordering," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 14, no. 10, pp. 1183–1192, Oct. 1995, doi: 10.1109/43.466335.
- [71]. A. Wu, V. Chaiyakul, and D. Gajski, "Layout-area models for high-level synthesis," in 1991 IEEE International Conference on Computer-Aided Design Digest of Technical Papers, Nov. 1991, pp. 34, 35, 36, 37. doi: 10.1109/ICCAD.1991.185184.
- [72]. R. S. Ghaida and P. Gupta, "DRE: A Framework for Early Co-Evaluation of Design Rules, Technology Choices, and Layout Methodologies," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 9, pp. 1379–1392, Sep. 2012, doi: 10.1109/TCAD.2012.2192477.
- [73]. W. Chuang, S. S. Sapatnekar, and I. N. Hajj, "Timing and area optimization for standard-cell VLSI circuit design," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 14, no. 3, pp. 308–320, Mar. 1995, doi: 10.1109/43.365122.

- [74]. R. Chadha and J. Bhasker, An ASIC Low Power Primer: Analysis, Techniques and Specification. Springer Science & Business Media, 2012. doi: https://doi.org/10.1007/978-1-4614-4271-4.
- [75]. M.-Y. Chen, D.-R. Chen, and S.-M. Hsieh, "A Blocking-Aware Scheduling for Real-Time Task Synchronization Using a Leakage-Controlled Method," International Journal of Distributed Sensor Networks, vol. 10, no. 2, p. 428230, 2014, doi: 10.1155/2014/428230.
- [76]. P. R. Panda, B. V. N. Silpa, A. Shrivastava, and K. Gummidipudi, Powerefficient system design. Springer Science & Business Media, 2010.
- [77]. Y. Aizik and A. Kolodny, "Exploration of energy-delay tradeoffs in digital circuit design," in 2008 IEEE 25th Convention of Electrical and Electronics Engineers in Israel, Dec. 2008, pp. 1–5. doi: 10.1109/EEEI.2008.4736618.
- [78]. E. Yoneno and P. Hurat, "Power and performance optimization of cellbased designs with intelligent transistor sizing and cell creation," 2001.
- [79]. M. Blesken, U. Rückert, D. Steenken, K. Witting, and M. Dellnitz, "Multiobjective optimization for transistor sizing of CMOS logic standard cells using set-oriented numerical techniques," in 2009 NORCHIP, Nov. 2009, pp. 1–4. doi: 10.1109/NORCHP.2009.5397800.
- [80]. H. Q. Dao, B. R. Zeydel, and V. G. Oklobdzija, "Energy minimization method for optimal energy-delay extraction," in ESSCIRC 2004 - 29th European Solid-State Circuits Conference (IEEE Cat. No.03EX705), Sep. 2003, pp. 177–180. doi: 10.1109/ESSCIRC.2003.1257101.
- [81]. V. Stojanovic, D. Markovic, B. Nikolic, M. A. Horowitz, and R. W. Brodersen, "Energy-delay tradeoffs in combinational logic using gate sizing and supply voltage optimization," in Proceedings of the 28th European Solid-State Circuits Conference, Sep. 2002, pp. 211–214.
- [82]. V. G. Oklobdzija, M. Aktan, and D. Baran, "Optimal transistor sizing and voltage scaling for minimal energy use at fixed performance," in 2012 Argentine School of Micro-Nanoelectronics, Technology and Applications (EAMTA), Aug. 2012, pp. 1–10.
- [83]. R. Zlatanovici and B. Nikolić, "Power–performance optimization for custom digital circuits," in International Workshop on Power and Timing Modeling, Optimization and Simulation, 2005, pp. 404–414.
- [84]. K. K. Parhi, VLSI digital signal processing systems: design and implementation. John Wiley & Sons, 2007.
- [85]. B. R. Zeydel, D. Baran, and V. G. Oklobdzija, "Energy-Efficient Design Methodologies: High-Performance VLSI Adders," IEEE Journal of Solid-State Circuits, vol. 45, no. 6, pp. 1220–1233, Jun. 2010, doi: 10.1109/JSSC.2010.2048730.

- [86]. M. Igarashi et al., "A low-power design method using multiple supply voltages," in Proceedings of 1997 International Symposium on Low Power Electronics and Design, Aug. 1997, pp. 36–41. doi: 10.1145/263272.263279.
- [87]. Q. Ma and E. F. Y. Young, "Multivoltage Floorplan Design," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 4, pp. 607–617, Apr. 2010, doi: 10.1109/TCAD.2010.2042895.
- [88]. C. Yeh and Y.-S. Kang, "Cell-based layout techniques supporting gatelevel voltage scaling for low power," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 8, no. 5, pp. 629–633, Oct. 2000, doi: 10.1109/92.894169.
- [89]. S. Herbert and D. Marculescu, "Analysis of dynamic voltage/frequency scaling in chip-multiprocessors," in Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07), Aug. 2007, pp. 38–43. doi: 10.1145/1283780.1283790.
- [90]. B. Rountree, D. K. Lowenthal, M. Schulz, and B. R. de Supinski, "Practical performance prediction under Dynamic Voltage Frequency Scaling," in 2011 International Green Computing Conference and Workshops, Jul. 2011, pp. 1–8. doi: 10.1109/IGCC.2011.6008553.
- [91]. K. Agarwal, H. Deogun, D. Sylvester, and K. Nowka, "Power gating with multiple sleep modes," in 7th International Symposium on Quality Electronic Design (ISQED'06), Mar. 2006, pp. 5 pp. – 637. doi: 10.1109/ISQED.2006.102.
- [92]. H. Jiang, M. Marek-Sadowska, and S. R. Nassif, "Benefits and costs of power-gating technique," in 2005 International Conference on Computer Design, Oct. 2005, pp. 559–566. doi: 10.1109/ICCD.2005.34.
- [93]. M. Anis, S. Areibi, and M. Elmasry, "Design and optimization of multithreshold CMOS (MTCMOS) circuits," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 22, no. 10, pp. 1324–1342, Oct. 2003, doi: 10.1109/TCAD.2003.818127.
- [94]. S. Mukhopadhyay, C. Neau, R. T. Cakici, A. Agarwal, C. H. Kim, and K. Roy, "Gate leakage reduction for scaled devices using transistor stacking," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 11, no. 4, pp. 716–730, Aug. 2003, doi: 10.1109/TVLSI.2003.816145.
- [95]. P. Gupta, A. B. Kahng, P. Sharma, and D. Sylvester, "Selective gatelength biasing for cost-effective runtime leakage control," in Proceedings. 41st Design Automation Conference, 2004., Jul. 2004, pp. 327–330. doi: 10.1145/996566.996661.
- [96]. P. Gupta, A. B. Kahng, P. Sharma, and D. Sylvester, "Gate-length biasing for runtime-leakage control," IEEE Transactions on Computer-Aided

Design of Integrated Circuits and Systems, vol. 25, no. 8, pp. 1475–1485, Aug. 2006, doi: 10.1109/TCAD.2005.857313.

- [97]. S. A. Tawfik and V. Kursun, "Low Power and High Speed Multi Threshold Voltage Interface Circuits," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 5, pp. 638–645, May 2009, doi: 10.1109/TVLSI.2008.2006793.
- [98]. A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-threshold design for ultra low-power systems, vol. 95. Springer US, 2006. doi: https://doi.org/10.1007/978-0-387-34501-7.
- [99]. J. Myers, P. Prabhat, A. Savanth, S. Yang, and R. Gaddh, "Design challenges for near and sub-threshold operation: A case study with an ARM Cortex-M0+ based WSN subsystem," in 2016 26th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), Sep. 2016, pp. 56–63. doi: 10.1109/PATMOS.2016.7833426.
- [100]. T. T. Habte, H. Saleh, B. Mohammad, and M. Ismail, Ultra low power ECG processing system for IoT devices. Springer, 2019. doi: https://doi.org/10.1007/978-3-319-97016-5.
- [101]. T. Tekeste, H. Saleh, B. Mohammad, and M. Ismail, "Ultra-Low Power QRS Detection and ECG Compression Architecture for IoT Healthcare Devices," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 2, pp. 669–679, Feb. 2019, doi: 10.1109/TCSI.2018.2867746.
- [102]. B. H. Calhoun, S. Khanna, Y. Zhang, J. Ryan, and B. Otis, "System design principles combining sub-threshold circuit and architectures with energy scavenging mechanisms," in Proceedings of 2010 IEEE International Symposium on Circuits and Systems, May 2010, pp. 269–272. doi: 10.1109/ISCAS.2010.5537887.
- [103]. A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low-power CMOS digital design," IEEE Journal of Solid-State Circuits, vol. 27, no. 4, pp. 473–484, Apr. 1992, doi: 10.1109/4.126534.
- [104]. J. Myers et al., "A 12.4pJ/cycle sub-threshold, 16pJ/cycle near-threshold ARM Cortex-M0+ MCU with autonomous SRPG/DVFS and temperature tracking clocks," in 2017 Symposium on VLSI Circuits, Jun. 2017, pp. C332–C333. doi: 10.23919/VLSIC.2017.8008529.
- [105]. H. Reyserhove and W. Dehaene, "Margin Elimination Through Timing Error Detection in a Near-Threshold Enabled 32-bit Microcontroller in 40nm CMOS," IEEE Journal of Solid-State Circuits, vol. 53, no. 7, pp. 2101– 2113, Jul. 2018, doi: 10.1109/JSSC.2018.2821121.
- [106]. W. Zhao, Y. Ha, and M. Alioto, "Novel Self-Body-Biasing and Statistical Design for Near-Threshold Circuits With Ultra Energy-Efficient AES as Case Study," IEEE Transactions on Very Large Scale Integration (VLSI)

Systems, vol. 23, no. 8, pp. 1390–1401, Aug. 2015, doi: 10.1109/TVLSI.2014.2342932.

- [107]. C. Wang et al., "Near-Threshold Energy- and Area-Efficient Reconfigurable DWPT/DWT Processor for Healthcare-Monitoring Applications," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, no. 1, pp. 70–74, Jan. 2015, doi: 10.1109/TCSII.2014.2362791.
- [108]. X. Liu et al., "An Ultralow-Voltage Sensor Node Processor With Diverse Hardware Acceleration and Cognitive Sampling for Intelligent Sensing," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, no. 12, pp. 1149–1153, Dec. 2015, doi: 10.1109/TCSII.2015.2468927.
- [109]. J. Zhou, T. T.-H. Kim, and Y. Lian, "Near-threshold processor design techniques for power-constrained computing devices," in 2017 IEEE 12th International Conference on ASIC (ASICON), Oct. 2017, pp. 920–923. doi: 10.1109/ASICON.2017.8252627.
- [110]. M. Seok, D. Sylvester, and D. Blaauw, "Optimal technology selection for minimizing energy and variability in low voltage applications," in Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08), Aug. 2008, pp. 9–14. doi: 10.1145/1393921.1393930.
- [111]. J. Zhou, S. Jayapal, J. Stuyt, J. Huisken, and H. de Groot, "The impact of inverse narrow width effect on sub-threshold device sizing," in 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011), Jan. 2011, pp. 267–272. doi: 10.1109/ASPDAC.2011.5722196.
- [112]. J. Kwong and A. P. Chandrakasan, "Variation-Driven Device Sizing for Minimum Energy Sub-threshold Circuits," in ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design, Oct. 2006, pp. 8–13. doi: 10.1145/1165573.1165578.
- [113]. M. Blesken, S. Lütkemeier, and U. Rückert, "Multiobjective optimization for transistor sizing sub-threshold CMOS logic standard cells," in Proceedings of 2010 IEEE International Symposium on Circuits and Systems, May 2010, pp. 1480–1483. doi: 10.1109/ISCAS.2010.5537349.
- [114]. S. Pandit and C. K. Sarkar, "Analytical modelling of inverse narrow width effect for narrow channel STI MOSFETs," International Journal of Electronics, vol. 99, no. 3, pp. 361–377, 2012, doi: 10.1080/00207217.2011.629215.
- [115]. M.-Z. Li et al., "Energy Optimized Subthreshold VLSI Logic Family With Unbalanced Pull-Up/Down Network and Inverse Narrow-Width Techniques," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 12, pp. 3119–3123, Dec. 2015, doi: 10.1109/TVLSI.2015.2388783.

- [116]. J. Morris, P. Prabhat, J. Myers, and A. Yakovlev, "Unconventional Layout Techniques for a High Performance, Low Variability Subthreshold Standard Cell Library," in 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Jul. 2017, pp. 19–24. doi: 10.1109/ISVLSI.2017.14.
- [117]. B. C. Paul, A. Raychowdhury, and K. Roy, "Device optimization for digital subthreshold logic operation," IEEE Transactions on Electron Devices, vol. 52, no. 2, pp. 237–247, Feb. 2005, doi: 10.1109/TED.2004.842538.
- [118]. C. Subramanian, J. Hayden, W. Taylor, M. Orlowski, and T. McNelly, "Reverse short channel effect and channel length dependence of boron penetration in PMOSFETs," in Proceedings of International Electron Devices Meeting, Dec. 1995, pp. 423–426. doi: 10.1109/IEDM.1995.499229.
- [119]. T.-H. Kim, H. Eom, J. Keane, and C. Kim, "Utilizing Reverse Short Channel Effect for Optimal Subthreshold Circuit Design," in ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design, Oct. 2006, pp. 127–130. doi: 10.1145/1165573.1165603.
- [120]. J.-S. Wang, K.-J. Chang, S.-Y. Yang, T.-H. Hsieh, and C. Yeh, "RSCEaware ultra-low-voltage 40-nm CMOS circuits," in 2011 International SoC Design Conference, Nov. 2011, pp. 131–134. doi: 10.1109/ISOCC.2011.6138664.
- [121]. A. Keshavarzi, S. Narendra, B. Bloechel, S. Borkar, and V. De, "Forward body bias for microprocessors in 130nm technology generation and beyond," in 2002 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.02CH37302), Jun. 2002, pp. 312–315. doi: 10.1109/VLSIC.2002.1015113.
- [122]. C. Neau and K. Roy, "Optimal body bias selection for leakage improvement and process compensation over different technology generations," in Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03., Aug. 2003, pp. 116– 121. doi: 10.1109/LPE.2003.1231846.
- [123]. G. Lallement, F. Abouzeid, J.-M. Daveau, P. Roche, and J.-L. Autran, "A 1.1-pJ/cycle, 20-MHz, 0.42-V Temperature Compensated ARM Cortex-M0+ SoC With Adaptive Self Body-Biasing in FD-SOI," IEEE Solid-State Circuits Letters, vol. 1, no. 7, pp. 174–177, Jul. 2018, doi: 10.1109/LSSC.2019.2897016.
- [124]. D. Bol et al., "19.6 A 40-to-80MHz Sub-4μW/MHz ULV Cortex-M0 MCU SoC in 28nm FDSOI With Dual-Loop Adaptive Back-Bias Generator for 20μs Wake-Up From Deep Fully Retentive Sleep Mode," in 2019 IEEE International Solid- State Circuits Conference - (ISSCC), Feb. 2019, pp. 322–324. doi: 10.1109/ISSCC.2019.8662293.

- [125]. M. Blagojević, M. Cochet, B. Keller, P. Flatresse, A. Vladimirescu, and B. Nikolić, "A fast, flexible, positive and negative adaptive body-bias generator in 28nm FDSOI," in 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), Jun. 2016, pp. 1–2. doi: 10.1109/VLSIC.2016.7573479.
- [126]. A. Hesham, A. Nassar, and H. Mostafa, "Design and implementation of energy-efficient near-threshold standard cell library for IoT applications," AEU - International Journal of Electronics and Communications, vol. 139, p. 153907, 2021, doi: https://doi.org/10.1016/j.aeue.2021.153907.
- [127]. F. Abouzeid, S. Clerc, F. Firmin, M. Renaudin, T. Sas, and G. Sicard, "40nm CMOS 0.35V-Optimized Standard Cell Libraries for Ultra-Low Power Applications," ACM Trans. Des. Autom. Electron. Syst., vol. 16, no. 3, Jun. 2011, doi: 10.1145/1970353.1970369.
- [128]. C. Tretz and C. Zukowski, "CMOS transistor sizing for minimization of energy-delay product," in Proceedings of the Sixth Great Lakes Symposium on VLSI, Mar. 1996, pp. 168–173. doi: 10.1109/GLSV.1996.497614.
- [129]. B. Fu and P. Ampadu, "Comparative Analysis of Ultra-Low Voltage Flip-Flops for Energy Efficiency," in 2007 IEEE International Symposium on Circuits and Systems, May 2007, pp. 1173–1176. doi: 10.1109/ISCAS.2007.378259.
- [130]. A. Shafaei, H. Afzali-Kusha, and M. Pedram, "Minimizing the energy-delay product of SRAM arrays using a device-circuit-architecture cooptimization framework," in 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), Jun. 2016, pp. 1–6. doi: 10.1145/2897937.2898044.
- [131]. N. and de M. I. and D. W. Weckx Pieter and Reynders, "Design of a 150 mV Supply, 2 MIPS, 90nm CMOS, Ultra-Low-Power Microprocessor," in Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation, 2013, pp. 175–184.
- [132]. H. Reyserhove and W. Dehaene, Efficient Design of Variation-Resilient Ultra-Low Energy Digital Processors. Springer, 2019. doi: https://doi.org/10.1007/978-3-030-12485-4.
- [133]. M. and P. P. I. Martin Alain J. and Nyström, "Et2: A Metric for Time and Energy Efficiency of Computation," in Power Aware Computing, R. Graybill Robert and Melhem, Ed. Boston, MA: Springer US, 2002, pp. 293–315. doi: 10.1007/978-1-4757-6217-4\_15.
- [134]. P. I. Pénzes and A. J. Martin, "Energy-Delay Efficiency of VLSI Computations," in Proceedings of the 12th ACM Great Lakes Symposium on VLSI, 2002, pp. 104–111. doi: 10.1145/505306.505330.
- [135]. M. Emadi, A. Jafargholi, H. S. Moghadam, and M. M. Nayebi, "Optimum Supply and Threshold Voltages and Transistor Sizing Effects on Low

Power SOI Circuit Design," in APCCAS 2006 - 2006 IEEE Asia Pacific Conference on Circuits and Systems, Dec. 2006, pp. 1394–1398. doi: 10.1109/APCCAS.2006.342461.

- [136]. M. Alioto, E. Consoli, and G. Palumbo, "Metrics and design considerations on the energy-delay tradeoff of digital circuits," in 2009 IEEE International Symposium on Circuits and Systems, May 2009, pp. 3150–3153. doi: 10.1109/ISCAS.2009.5118471.
- [137]. V. Zyuban and P. Strenski, "Unified methodology for resolving powerperformance tradeoffs at the microarchitectural and circuit levels," in Proceedings of the International Symposium on Low Power Electronics and Design, Aug. 2002, pp. 166–171. doi: 10.1109/LPE.2002.146731.
- [138]. M. Aktan, V. Oklobdzija, S. Paramesvaran, and J. Moon, "Energy-Delay Space Exploration of Clocked Storage Elements Using Circuit Sizing," 2009.
- [139]. L. Wei and D. Antoniadis, "CMOS device design and optimization from a perspective of circuit-level energy-delay optimization," in 2011 International Electron Devices Meeting, Dec. 2011, pp. 15.3.1-15.3.4. doi: 10.1109/IEDM.2011.6131558.
- [140]. R. Afonso, M. Rahman, H. Tennakoon, and C. Sechen, "Power efficient standard cell library design," in 2009 IEEE Dallas Circuits and Systems Workshop (DCAS), Oct. 2009, pp. 1–4. doi: 10.1109/DCAS.2009.5505245.
- [141]. J. Bhasker and R. Chadha, Static timing analysis for nanometer designs: A practical approach. Springer Science & Business Media, 2009.
- [142]. E. W. Weisstein, "Least Squares Fitting–Exponential." MathWorld–A Wolfram Web Resource, 2019. [Online]. Available: https://mathworld.wolfram.com/LeastSquaresFittingExponential.html
- [143]. W. Lim, I. Lee, D. Sylvester, and D. Blaauw, "8.2 Batteryless Sub-nW Cortex-M0+ processor with dynamic leakage-suppression logic," in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, Feb. 2015, pp. 1–3. doi: 10.1109/ISSCC.2015.7062968.
- [144]. K. H. Lee and N. Verma, "A Low-Power Processor With Configurable Embedded Machine-Learning Accelerators for High-Order and Adaptive Analysis of Medical-Sensor Signals," IEEE Journal of Solid-State Circuits, vol. 48, no. 7, pp. 1625–1637, Jul. 2013, doi: 10.1109/JSSC.2013.2253226.
- [145]. C. Ndiaye, V. Huard, R. Bertholon, M. Rafik, X. Federspiel, and A. Bravaix, "Layout Dependent Effect: Impact on device performance and reliability in recent CMOS nodes," in 2016 IEEE International Integrated Reliability

Workshop (IIRW), Oct. 2016, pp. 24–28. doi: 10.1109/IIRW.2016.7904894.

- [146]. K.-L. Yeh and J.-C. Guo, "The Impact of Layout-Dependent STI Stress and Effective Width on Low-Frequency Noise and High-Frequency Performance in Nanoscale nMOSFETs," IEEE Transactions on Electron Devices, vol. 57, no. 11, pp. 3092–3100, Nov. 2010, doi: 10.1109/TED.2010.2072959.
- [147]. C. Ndiaye et al., "Reliability compact modeling approach for layout dependent effects in advanced CMOS nodes," in 2017 IEEE International Reliability Physics Symposium (IRPS), Apr. 2017, pp. 4C-4.1-4C-4.7. doi: 10.1109/IRPS.2017.7936315.
- [148]. C. K. Dabhi et al., "BSIM4 4.8.1 MOSFET Model User's Manual," University of California, Berkeley, Berkeley, Feb. 2017.
- [149]. S. Baek, H. Kim, Y.-K. Lee, D.-Y. Jin, S.-C. Park, and J.-D. Cho, "Ultrahigh density standard cell library using multi-height cell structure," 2008.
- [150]. H. H. Nguyen, L. L. Chau, T. Pham, and T. D. T. Nguyen, "7-tracks standard cell library," no. 6938226B2. Aug. 2005. [Online]. Available: https://patents.google.com/patent/US6938226B2/en
- [151]. G. Wu and C. Chu, "Detailed Placement Algorithm for VLSI Design With Double-Row Height Standard Cells," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 35, no. 9, pp. 1569– 1573, Sep. 2016, doi: 10.1109/TCAD.2015.2511141.
- [152]. S. Jain, L. Lin, and M. Alioto, "Design-Oriented Energy Models for Wide Voltage Scaling Down to the Minimum Energy Point," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 12, pp. 3115– 3125, Dec. 2017, doi: 10.1109/TCSI.2017.2736540.