

# UNIVERSITI PUTRA MALAYSIA

# RELIABILITY MODELING OF DYNAMIC THERMAL MANAGEMENT IN MULTICORE PROCESSOR

SOMAYEH RAHIMI POUR

FK 2020 114



# RELIABILITY MODELING OF DYNAMIC THERMAL MANAGEMENT IN MULTICORE PROCESSOR



Thesis Submitted to the School of Graduate Studies, Universiti Putra Malaysia, in Fulfilment of the Requirements for the Degree of Doctor of Philosophy

January 2018

All material contained within the thesis, including without limitation text, logos, icons, photographs and all other artwork, is copyright material of Universiti Putra Malaysia unless otherwise stated. Use may be made of any material contained within the thesis for non-commercial purposes from the copyright holder. Commercial use of material may only be made with the express, prior, written permission of Universiti Putra Malaysia.

Copyright © Universiti Putra Malaysia

C



# **DEDICATIONS**

This thesis is dedicated to my parents for their unconditional love, support, and encouragement. Also to my husband, the love of my life, a great soul mate and a big encouragement for all my moves in life. This way was so much harder to pass without them.



Abstract of thesis presented to the Senate of Universiti Putra Malaysia in fulfilment of the requirement for the degree of Doctor of Philosophy

# RELIABILITY MODELING OF DYNAMIC THERMAL MANAGEMENT IN MULTICORE PROCESSOR

By

#### SOMAYEH RAHIMI POUR

January 2018

Chair Faculty : Fakhrul Zaman Rokhani, PhD : Engineering

With the continuous downscaling in semiconductor technology, the growing power density and thermal issues in multi-core processors are challenging and crucial. The system reliability associated with increased power dissipation affect the reliability of thermal management.

High temperatures and large thermal variations on the die create severe challenges in system reliability, performance, leakage power, and cooling costs. Dynamic thermal management (DTM) methods regulate the operating temperature based on the provided temperature profile from thermal sensors, which is transmitted using network-on-chip (NoC) in multi-core systems. DTM efficiency is highly dependent on the accuracy of thermal data.

Temperature profile inaccuracies are caused by various factors including sensor placement, sensor device imprecision, and interconnection deep sub-micron (DSM) noise. While temperature profile inaccuracies due to sensor placement and sensor device imprecision have been widely addressed, limited study performed on the impact of interconnection DSM noise on DTM efficiency. Hence, this thesis develops a comprehensive simulator model to investigate the impact of interconnect DSM noise on thermal data accuracy and DTM efficiency. The simulation results demonstrate that DSM noise severely affecting the MSbs of thermal data that leads to significant degradation of DTM performance.

To mitigate the DSM noise impact on DTM efficiency, an NoC fault tolerance scheme, exploiting inherent characteristics of DSM noise impacting the thermal data, is proposed that comparing to the standard coding scheme achieves lower cost in term of area and power consumption while increasing DTM efficiency by 38%.

The second source of chip reliability involves power delivery network (PDN). PDN suffers from long-term reliability threats such as electro- migration (EM). Loss of limited Controlled Collapse Chip Connection (C4) pads to electro-migration makes delivering a stable supply voltage more critical. C4 bumps failure mechanism depends on current density, on-chip voltage noise, and temperature. In this thesis, the C4 bumps failure mechanisms dependency on each individual bumps' temperature value is explored that leads to more accurate mean-time-to-failure (MTTF) of the whole system. The simulation results demonstrate that using uniform temperature leads underestimating the system MTTF by up to 16 times due to exponentially dependency of C4 bump failure to temperature.



Abstrak tesis yang dikemukakan kepada Senat Universiti Putra Malaysia sebagai memenuhi keperluan untuk ijazah Doktor Falsafah

# PEMODELAN KEBOLEHPERCAYAAN PENGURUSAN TERMA DINAMIK DALAM PEMPROSES BERBILANG TERAS

Oleh

#### SOMAYEH RAHIMI POUR

Januari 2018

Pengerusi : Fakhrul Zaman Rokhani, PhD Fakulti : Kejuruteraan

Dengan penskalaan rendah berterusan dalam teknologi semikonduktor, isu-isu peningkatan ketumpatan kuasa dan terma dalam pemproses berbilang teras adalah penting dan mencabar. Kebolehpercayaan sistem yang dikaitkan dengan peningkatan pelesapan kuasa boleh memVpengaruhi kebolehpercayaan pengurusan terma.

Suhu tinggi dan variasi terma yang besar pada die mewujudkan cabaran yang teruk dalam kebolehpercayaan sistem, prestasi, kuasa bocor, dan kos penyejukan. Pengurus DTM mengawal suhu operasi berdasarkan profil suhu yang disediakan dari sensor terma, yang dihantar menggunakan rangkaian-atas-cip (NoC) dalam sistem berbilang teras. Kecekapan adalah DTM sangat bergantung kepada ketepatan data terma.

Ketidaktepatan profil suhu adalah disebabkan oleh pelbagai faktor termasuk penempatan sensor, ketidaktepatan peranti sensor dan hingar DSM antara-sambung. Walaupun ketidaktepatan profil suhu yang disebabkan oleh penempatan sensor dan ketidaktepatan peranti sensor telah ditangani secara meluas, kajian masih terhad dalam kesan hingar DSM antara-sambung kepada kecekapan DTM. Oleh itu, tesis ini membangunkan satu platform simulator yang komprehensif untuk menyiasat kesan hingar DSM antara-sambungan pada ketepatan data suhu dan kecekapan DTM. Hasil simulasi menunjukkan bahawa hingar DSM teruk menjejaskan data terma yang membawa kepada kemerosotan prestasi DTM yang ketara.

Untuk mengurangkan kesan hingar DSM kepada kecekapan DTM, skim toleransi kegagalan NoC yang menggunakan teknik pengiraan anggaran telah dicadangkan, dan berbanding dengan skim pengekodan standard dapat mencapai kos yang lebih rendah dari segi penggunaan kawasan dan kuasa serta meningkatkan kecekapan DTM sebanyak 38%.

Sumber kedua kebolehpercayaan cip melibatkan penghantaran kuasa rangkaian. PDN mengalami ancaman kebolehpercayaan jangka panjang seperti pengelektrohijrahan (EM). Kehilangan pad C4 yang terhad kepada pengelektrohijrahan menyebabkan penyampaian voltan bekalan yang stabil lebih kritikal. Mekanisme kegagalan C4 lebam bergantung kepada ketumpatan arus, bunyi dan suhu voltan atas-cip. Dalam tesis ini, kami meneroka kebergantungan mekanisme kegagalan C4 lebam pada setiap nilai suhu lebam masing-masing yang boleh membawa kepada min masa untuk kegagalan (MTTF) yang lebih tepat. Hasil penyelakuan menunjukkan bahawa menggunakan suhu seragam akan meremehkan sistem MTTF sebanyak 16 kali kerana eksponensial ketergantungan kegagalan C4 lebam pada suhu.



# ACKNOWLEDGEMENTS

First and foremost, I would like to thank Dr. Fakhrul Zaman, my thesis advisor. His support and encouragement motivated me to accomplish my doctorate degree. Also, I thank him for teaching me how to be an independent researcher as well as a constructive team player. I would also like to thank Associate Professor Roslina bt Mohd Sidek, Dr. Noor Ain Kamsani, Associate Professor Shaiful Jahari bin Hashim, and Professor Mircea Stan, for serving as my thesis committee members.

Last but not the least, I would like to thank my family for offering encouragement and patience as I finished my degree. Their support over the last five years has been very important in helping me to complete this dissertation.



This thesis was submitted to the Senate of Universiti Putra Malaysia and has been accepted as fulfilment of the requirement for the degree of Doctor of Philosophy. The members of the Supervisory Committee were as follows:

#### Fakhrul Zaman bin Rokhani, PhD

Associate Professor, Ir. Ts. Faculty of Engineering Universiti Putra Malaysia (Chairman)

#### Shaiful Jahari bin Hashim, PhD

Professor Faculty of Engineering Universiti Putra Malaysia (Member)

#### Noor Ain binti Kamsani, PhD

Associate Professor Faculty of Engineering Universiti Putra Malaysia (Member)

#### Roslina binti Mohd Sidek, PhD

Associate Professor Faculty of Engineering Universiti Putra Malaysia (Member)

# Mircea R. Stan, PhD

Professor School of Engineering and Applied Science University of Virginia United State of America (Member)

## ZALILAH MOHD SHARIFF, PhD

Professor and Dean School of Graduate Studies Universiti Putra Malaysia

Date: 16 July 2020

# **Declaration by Members of Supervisory Committee**

This is to confirm that:

G

- the research and the writing of this thesis were done under our supervision;
- supervisory responsibilities as stated in the Universiti Putra Malaysia (Graduate Studies) Rules 2003 (Revision 2015-2016) are adhered to.

| Signature:<br>Name of Chairman of<br>Supervisory<br>Committee: |  |
|----------------------------------------------------------------|--|
| Signature:<br>Name of Member of<br>Supervisory<br>Committee:   |  |

# TABLE OF CONTENTS

|                       | Page |
|-----------------------|------|
| ABSTRACT              | i    |
| ABSTRAK               | iii  |
| ACKNOWLEDGEMENTS      | v    |
| APPROVAL              | vi   |
| DECLARATION           | viii |
| LIST OF TABLES        | xiii |
| LIST OF FIGURES       | xiv  |
| LIST OF ABBREVIATIONS | xix  |
|                       |      |

# CHAPTER

| 1 | INTE | RODUCTION                                       | 1  |  |  |  |  |  |
|---|------|-------------------------------------------------|----|--|--|--|--|--|
|   | 1.1  | Introduction                                    | 1  |  |  |  |  |  |
|   | 1.2  | Problem Statement                               | 3  |  |  |  |  |  |
|   | 1.3  | Aim and Objective                               |    |  |  |  |  |  |
|   | 1.4  | Contributions                                   |    |  |  |  |  |  |
|   | 1.5  | Scope                                           | 4  |  |  |  |  |  |
|   | 1.6  | Thesis Organization                             | 4  |  |  |  |  |  |
| 2 | LITE | ERATURE REVIEW                                  | 6  |  |  |  |  |  |
|   | 2.1  | Temperature and Reliability in Nanoscale System | 6  |  |  |  |  |  |
|   |      | Design                                          |    |  |  |  |  |  |
|   | 2.2  | Thermal Modeling                                | 7  |  |  |  |  |  |
|   | 2.3  | Dynamic Thermal Management                      | 7  |  |  |  |  |  |
|   |      | 2.3.1 Thermal Management Techniques for         | 9  |  |  |  |  |  |
|   |      | Multicore Processors                            |    |  |  |  |  |  |
|   |      | 2.3.2 Thermal Sensor                            | 10 |  |  |  |  |  |
|   |      | 2.3.3 Inaccuracy in Thermal Data                | 11 |  |  |  |  |  |
|   | 2.4  | Network on Chip                                 | 17 |  |  |  |  |  |
|   |      | 2.4.1 NoC Architecture                          | 18 |  |  |  |  |  |
|   |      | 2.4.2 NoC Communication Reliability Issues      | 20 |  |  |  |  |  |
|   |      | 2.4.3 Monitoring Network on Chip                | 20 |  |  |  |  |  |
|   | 2.5  | Fault Tolerance Coding Schemes                  | 22 |  |  |  |  |  |
|   |      | 2.5.1 Error Model                               | 22 |  |  |  |  |  |
|   |      | 2.5.2 Error Control Coding Schemes              | 23 |  |  |  |  |  |
|   |      | 2.5.3 Reliability Metric                        | 25 |  |  |  |  |  |
|   | 2.6  | Approximate Computing                           | 28 |  |  |  |  |  |
|   | 2.7  | Power Delivery Network                          | 29 |  |  |  |  |  |
|   |      | 2.7.1 Supply Voltage Noise                      | 30 |  |  |  |  |  |
|   |      | 2.7.2 PDN Resource Scarcity and Optimization    | 31 |  |  |  |  |  |
|   |      | 2.7.3 PDN Simulation and Modeling               | 32 |  |  |  |  |  |
|   |      | 2.7.4 Electro-migration and Long-term PDN       | 32 |  |  |  |  |  |
|   |      | Reliability Issues                              |    |  |  |  |  |  |
|   |      | 2.7.5 EM-induced C4 Bumps Wearout               | 33 |  |  |  |  |  |
|   | 2.8  | Summary                                         | 34 |  |  |  |  |  |

| 3 | MET     | HODOLOGY                                             | 36  |
|---|---------|------------------------------------------------------|-----|
|   | 3.1     | Introduction                                         | 36  |
|   | 3.2     | Simulation Setup                                     | 37  |
|   |         | 3.2.1 Performance Modeling                           | 40  |
|   |         | 3.2.2 Power Modeling                                 | 40  |
|   |         | 3.2.3 Thermal Modeling                               | 40  |
|   |         | 3.2.4 Dynamic Thermal Management                     | 41  |
|   |         | 3.2.5 Noise Modeling and Scaling Trend in Deep       | 41  |
|   |         | Sub-micron Technology                                |     |
|   | 3.3     | Proposed Fault Tolerance Scheme                      | 47  |
|   |         | 3.3.1 Performance and Cost                           | 48  |
|   |         | 3.3.2 Develop Interconnection Simulator              | 48  |
|   |         | Plugin/API to Measure Reliability and Max            |     |
|   |         | Delay                                                |     |
|   |         | 3.3.3 Develop Hardware Realization to Measure        | 51  |
|   |         | Power and Area                                       |     |
|   | 3.4     | Statistical Simulation of Bump Failures              | 51  |
|   |         | 3.4.1 Single Bump EM Failure                         | 51  |
|   |         | 3.4.2 Impact of Power-Supply Bump Failures           | 52  |
|   |         | 3.4.3 Monte Carlo Simulation                         | 53  |
|   |         | 3.4.4 MCS Model for Bump Failure Study               | 54  |
|   | 3.5     | PDN Simulation Methodology                           | 55  |
|   | 3.6     | Summary                                              | 57  |
|   |         |                                                      |     |
| 4 | RES     | ULTS AND DISCUSSION                                  | 58  |
|   | 4.1     | Categorization of Thermal Profile                    | 58  |
|   | 4.2     | Effect of Different Sources of Noise on DTM in       | 60  |
|   |         | Multi-core Processors                                |     |
|   |         | 4.2.1 Metrics to Investigate the Effect of Different | 62  |
|   |         | Sources of Noise on DTM Efficiency                   |     |
|   |         | 4.2.2 Effect of Sensor Noise on DTM                  | 65  |
|   |         | 4.2.3 Effect of Placement Noise on DTM               | 69  |
|   |         | 4.2.4 Effect of DSM Noise on DTM                     | 73  |
|   |         | 4.2.5 Effect of All Sources of Noise on DTM          | 76  |
|   | 4.3     | Proposed Fault Tolerance Coding Scheme               | 86  |
|   |         | 4.3.1 Area and Power Consumption                     | 86  |
|   |         | 4.3.2 Maximum and Average Packet Latency             | 87  |
|   |         | 4.3.3 Reliability at Different Noise Level           | 88  |
|   |         | 4.3.4 DTM Efficiency                                 | 89  |
|   | 4.4     | PDN Reliability                                      | 97  |
|   |         | 4.4.1 Analyze the Effect of Accurate Temperature     | 97  |
|   |         | on System's Robustness                               |     |
|   |         | 4.4.2 Over Provisioning C4 Pads                      | 98  |
|   | 4 7     | 4.4.3 Role of Cold Regions                           | 99  |
|   | 4.5     | Summary                                              | 100 |
| 5 | CON     | CLUSION AND FUTURE WORK                              | 102 |
| 5 | 51      | Conclusions                                          | 102 |
|   | 5.2     | Future Works                                         | 102 |
|   | <i></i> | - where it office                                    | 105 |

| REFERENCES           | 104 |
|----------------------|-----|
| APPENDICES           | 120 |
| BIODATA OF STUDENT   | 168 |
| LIST OF PUBLICATIONS | 169 |



(G)

# LIST OF TABLES

| Tabl | e                                                                                                               | Page |
|------|-----------------------------------------------------------------------------------------------------------------|------|
| 2.1  | Comparison of DTM Policies                                                                                      | 9    |
| 2.2  | Effective Coupling Capacitance                                                                                  | 17   |
| 2.3  | Literature Review Summary                                                                                       | 35   |
| 3.1  | Design Parameters for Modeled Processor                                                                         | 38   |
| 3.2  | Experimental Setup                                                                                              | 39   |
| 3.3  | The Trends of $VDD$ and $V_{th}$ Scaling                                                                        | 45   |
| 3.4  | The Corresponding BER for Each Technology Node in Different Noise Conditions                                    | 46   |
| 3.5  | Interconnect Simulation Parameters                                                                              | 49   |
| 3.6  | PDN Parameters Used in This Study                                                                               | 56   |
| 4.1  | Temperature Categorization                                                                                      | 59   |
| 4.2  | Thermal Profile Categorization of SPEC CPU2006<br>Benchmarks, used in this study, in Different Technology Nodes | 61   |
| 4.3  | Mean $(\mu)$ and Standard Deviation $(\sigma)$ from Monte Carlo Simulation.                                     | 65   |
| 4.4  | Temperature Modification Parameters                                                                             | 78   |
| 4.5  | Power Consumption of the Proposed Scheme and Baseline Scheme                                                    | 87   |
| 4.6  | Area for the Proposed Scheme and Baseline Scheme                                                                | 87   |
| 4.7  | Temperature and Current Density of Failing Pads                                                                 | 99   |
|      |                                                                                                                 |      |

# LIST OF FIGURES

| Figure |                                                                                                                                                           | Page |
|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 1.1    | Power Dissipation Across Multiple Generations of Intel Chips [1].                                                                                         | 1    |
| 1.2    | IBM POWER4 Chip Temperature Map [2]                                                                                                                       | 2    |
| 1.3    | Impact of Thermal Runaway on a Test Socket [3]                                                                                                            | 2    |
| 2.1    | Overview of Dynamic Thermal Management [4].                                                                                                               | 8    |
| 2.2    | Ring Oscillator Based Thermal Sensor [5].                                                                                                                 | 10   |
| 2.3    | Deep Sub-micron Noise Sources                                                                                                                             | 15   |
| 2.4    | Alpha Particle Strike on a Transistor [6]                                                                                                                 | 15   |
| 2.5    | Noise Waveforms of Crosstalk Coupling Between Two Lines<br>[7]                                                                                            | 16   |
| 2.6    | Cross section of interconnect structure showing the coupling<br>and self capacitance [8]                                                                  | 17   |
| 2.7    | NOC mesh architecture [8]                                                                                                                                 | 19   |
| 2.8    | NoC pipelined operation [8]                                                                                                                               | 19   |
| 2.9    | Detailed view of MNoC for multiple cores [9].                                                                                                             | 21   |
| 2.10   | MNoC Classification                                                                                                                                       | 21   |
| 2.11   | Bit error rate as a function of voltage swing for different noise deviations [8].                                                                         | 23   |
| 2.12   | Effect of retransmission on the residual flit error probability, $P_{res}$ [8].                                                                           | 28   |
| 2.13   | An illustration of voltage noise and the evaluation metrics [10].                                                                                         | 31   |
| 2.14   | The log-normal CDF of a single conductor's EM-induced failure time. The y-axis shows the conductor's failure probability, or $Prob[Lifetime] \le t$ [10]. | 33   |
| 3.1    | Overview of Methodology.                                                                                                                                  | 37   |
| 3.2    | Floor-plan of the Penryn-like 8core processor [11]                                                                                                        | 38   |

| 3.3  | Proposed DSM-DTM simulator consist of power, temperature, interconnection and DTM simulator.                                                   | 39 |
|------|------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.4  | Proposed Simulator to Investigate the Effect of Sensor Noise.                                                                                  | 42 |
| 3.5  | Ring Oscillator Based Thermal Sensor [5].                                                                                                      | 43 |
| 3.6  | Ring Oscillator Schematic in Virtuoso.                                                                                                         | 43 |
| 3.7  | (a) Sensor Location on Floor-plan and (b) Corresponding<br>Placement Noise for each Sensor, at 90nm Technology Node,<br>for GemsFDTD Benchmark | 44 |
| 3.8  | Proposed Simulator to Investigate the Effect of Placement Noise.                                                                               | 45 |
| 3.9  | Proposed Simulator to Investigate the Effect of DSM Noise.                                                                                     | 46 |
| 3.10 | Proposed Coding Scheme                                                                                                                         | 48 |
| 3.11 | Packet Size and Flit Structure of Thermal Data                                                                                                 | 50 |
| 3.12 | The consequence of a power bump failure [11].                                                                                                  | 53 |
| 3.13 | MCS model for failure study.                                                                                                                   | 54 |
| 3.14 | Proposed Simulator to Investigate the Effect of Accurate Temperature on System's Robustness.                                                   | 55 |
| 4.1  | Categorizing the Thermal Profile.                                                                                                              | 59 |
| 4.2  | Histogram of GemsFDTD Based on Different Regions in (a) 90nm, (b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.                         | 60 |
| 4.3  | Proposed DSM-DTM simulator consist of power, temperature, interconnection and DTM simulator.                                                   | 62 |
| 4.4  | Noise Evaluation                                                                                                                               | 62 |
| 4.5  | (a)Thermal Profile Without Noise (Reference) (b) Thermal Profile under the Effect of Noise.                                                    | 63 |
| 4.6  | Thermal Profile Under the Effect of DSM-Noise.                                                                                                 | 64 |
| 4.7  | The Effect of Sensor Noise on DTM in term of $SP_D$ on (a) Hot, (b) Cold, and (c) Warm Benchmarks.                                             | 66 |
| 4.8  | The Effect of Sensor Noise on DTM in term of $UC_{ET}$ on (a) Hot, (b) Cold, and (c) Warm Benchmarks.                                          | 68 |

| 4.9  | The Trend of Placement Noise Over Technology Scaling from 90nm to 22nm.                                                                                                    | 69 |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 4.10 | The Effect of Placement Noise on DTM in term of $SP_D$ on (a) Hot, (b) Cold, and (c) Warm Benchmarks.                                                                      | 71 |
| 4.11 | The Effect of Placement Noise on DTM in term of $UC_{ET}$ on (a) Hot, (b) Cold, and (c) Warm Benchmarks.                                                                   | 72 |
| 4.12 | The Effect of DSM Noise on DTM in term of $SP_D$ as a function of noise deviation on (a) Hot, (b) Cold, and (c) Warm Benchmarks.                                           | 74 |
| 4.13 | The Effect of DSM Noise on DTM in term of $UC_{ET}$ as a function of noise deviation on (a) Hot, (b) Cold, and (c) Warm Benchmarks.                                        | 75 |
| 4.14 | Contribution of Each Source of Noise on Thermal Data                                                                                                                       | 77 |
| 4.15 | Overall Contribution of Noise Sources on Thermal Data                                                                                                                      | 77 |
| 4.16 | Contribution of Each Source of Noise on Thermal Data as a Function of Noise Deviation in (a) 90nm, (b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.                | 79 |
| 4.17 | Correct DTM Decision in Presence of Noise $(T_O \& T_D > T_{th})$ (a)<br>$T_O > T_D$ , (b) $T_O < T_D$ .                                                                   | 80 |
| 4.18 | Correct DTM Decision in Presence of Noise $(T_O \& T_D < T_{th})$ (a)<br>$T_O > T_D$ , (b) $T_O < T_D$ .                                                                   | 80 |
| 4.19 | Incorrect DTM Decision in Presence of Noise (a) Case 1, (b) Case 2.                                                                                                        | 81 |
| 4.20 | Incorrect DTM Decision when $\Delta T_s > \Delta T_{required}$ (a) Case 1, (b) Case 2.                                                                                     | 82 |
| 4.21 | Incorrect DTM Decision when $\Delta T_s + \Delta T_p > \Delta T_{required}$ (a) Case 1, (b) Case 2.                                                                        | 82 |
| 4.22 | Incorrect DTM Decision when $\Delta T_s + \Delta T_p + \Delta T_d > \Delta T_{required}$ (a)<br>Case 1, (b) Case 2.                                                        | 83 |
| 4.23 | Contribution of Each Source of Noise on GemsFDTD Thermal<br>Data as a Function of Noise Deviation in (a) 90nm, (b) 65nm,<br>(c) 45nm, (d) 32nm, (e) 22nm technology nodes. | 84 |
| 4.24 | Contribution of Each Source of Noise on MCF Thermal Data as<br>a Function of Noise Deviation in (a) 90nm, (b) 65nm, (c) 45nm,<br>(d) 32nm, (e) 22nm technology nodes.      | 85 |

| 4.25 | The Percentage of Incorrect DTM Decisions Over Number of Removed LSBs.                                                                       | 86  |
|------|----------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 4.26 | Average packet latency as a function of varying injection rate at: (a) $\sigma_N = 0.06V$ , (b) $\sigma_N = 0.1V$ , (c) $\sigma_N = 0.14V$ . | 88  |
| 4.27 | Maximum packet latency as a function of varying injection rate at: (a) $\sigma_N = 0.06V$ , (b) $\sigma_N = 0.1V$ , (c) $\sigma_N = 0.14V$ . | 88  |
| 4.28 | Residual flit error probability, $P_{res}$ as a function of noise deviation, $\sigma_N$ for different technology nodes.                      | 89  |
| 4.29 | DTM Efficiency for Bzip2 in Terms of $SP_D$ with (a) No-Protection (b) Proposed Scheme (c) Baseline Scheme.                                  | 90  |
| 4.30 | DTM Efficiency for Bzip2 in Terms of $UC_{ET}$ with (a) No-<br>Protection (b) Proposed Scheme (c) Baseline Scheme.                           | 91  |
| 4.31 | DTM Efficiency for Astar in Terms of $SP_D$ with (a) No-Protection (b) Proposed Scheme (c) Baseline Scheme.                                  | 92  |
| 4.32 | DTM Efficiency for Astar in Terms of $UC_{ET}$ with (a) No-<br>Protection (b) Proposed Scheme (c) Baseline Scheme.                           | 93  |
| 4.33 | DTM Efficiency for Omnetpp in Terms of $SP_D$ with (a) No-<br>Protection (b) Proposed Scheme (c) Baseline Scheme.                            | 94  |
| 4.34 | DTM Efficiency for Omnetpp in Terms of $UC_{ET}$ with (a) No-<br>Protection (b) Proposed Scheme (c) Baseline Scheme.                         | 96  |
| 4.35 | Whole-System MTTF Under Different Noise Margin Settings.                                                                                     | 97  |
| 4.36 | (a) Thermal Map vs (b) Current Density Map.                                                                                                  | 98  |
| 4.37 | Chip with (a) Non-Uniform Temperature (b) Uniform Temperature.                                                                               | 100 |
| A.1  | Histogram of Astar Based on Different Regions in (a) 90nm, (b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.                          | 120 |
| A.2  | Histogram of Bzip2 Based on Different Regions in (a) 90nm,<br>(b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.                       | 121 |
| A.3  | Histogram of GemsFDTD Based on Different Regions in (a) 90nm, (b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.                       | 122 |
| A.4  | Histogram of Gromaces Based on Different Regions in (a) 90nm, (b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.                       | 123 |

| A.5  | Histogram of Hmmer Based on Different Regions in (a) 90nm,<br>(b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.    | 124 |
|------|---------------------------------------------------------------------------------------------------------------------------|-----|
| A.6  | Histogram of LBM Based on Different Regions in (a) 90nm, (b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.         | 125 |
| A.7  | Histogram of Leslie3d Based on Different Regions in (a) 90nm,<br>(b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes. | 126 |
| A.8  | Histogram of Mcf Based on Different Regions in (a) 90nm, (b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.         | 127 |
| A.9  | Histogram of Milc Based on Different Regions in (a) 90nm, (b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.        | 128 |
| A.10 | Histogram of Namd Based on Different Regions in (a) 90nm,<br>(b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.     | 129 |
| A.11 | Histogram of Omnetpp Based on Different Regions in (a) 90nm,<br>(b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.  | 130 |
| A.12 | Histogram of Sjeng Based on Different Regions in (a) 90nm, (b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.       | 131 |
| A.13 | Histogram of Soplex Based on Different Regions in (a) 90nm,<br>(b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.   | 132 |
| A.14 | Histogram of Sphinx3 Based on Different Regions in (a) 90nm,<br>(b) 65nm, (c) 45nm, (d) 32nm, (e) 22nm technology nodes.  | 133 |
| A.15 | The Effect of Sensor Noise on DTM in term of System Performance Degradation.                                              | 134 |
| A.16 | The Effect of Sensor Noise on DTM in term of Unattended Cycles in Emergency Temperature.                                  | 139 |
| A.17 | The Effect of Placement Noise on DTM in term of System Performance Degradation.                                           | 144 |
| A.18 | The Effect of Placement Noise on DTM in term of System Unattended Cycles in Emergency Temperature.                        | 149 |
| A.19 | The Effect of DSM Noise on DTM in term of System Performance Degradation.                                                 | 154 |
| A.20 | The Effect of DSM Noise on DTM in term of System Unattended Cycles in Emergency Temperature.                              | 161 |

6

# LIST OF ABBREVIATIONS

| C4        | Controlled Collapse Chip Connection        |
|-----------|--------------------------------------------|
| CDF       | Cumulative Distribution Function           |
| CIBD      | Crosstalk Induced Bus Delay                |
| CV        | Coefficient of Variation                   |
| DN        | Deep Sub-Micron Noise                      |
| DSM       | Deep Sub-Micron                            |
| DTM       | Dynamic Thermal Management                 |
| DVFS      | Dynamic Voltage Frequency Scaling          |
| EM        | Electro-Migration                          |
| EMI       | Electro-Magnetic Interference              |
| IP        | Intellectual Property                      |
| MTTF      | Mean Time To Failure                       |
| NoC       | Network on Chip                            |
| PCB       | Printed Circuit Board                      |
| PDN       | Power Delivery Network                     |
| PN        | Placement Noise                            |
| РТМ       | Predictive Technology Models               |
| RO        | Ring Oscillator                            |
| SN        | Sensor Noise                               |
| $SP_D$    | System Performance Degradation             |
| $UC_{ET}$ | Unattended Cycles in Emergency Temperature |
| VRM       | Voltage Regulator Modules                  |

Ċ

#### **CHAPTER 1**

#### **INTRODUCTION**

#### 1.1 Introduction

In the previous decades, the scaling of CMOS technology enabled the semiconductor industry to successfully keep an exponential growth rate in device integration. Despite the fact that CMOS technology scaling has brought about exponentially greater transistor densities but threshold and supply voltages do not decrease sufficiently quick to avoid exponential growth in on-chip power density [12]. Figure 1.1 shows the power dissipation over multiple generations of Intel chips where each labeled point is a new chip generation and the branches are the changes in power dissipation as chips are scaled to smaller technologies.



Figure 1.1: Power Dissipation Across Multiple Generations of Intel Chips [1].

Since a significant fraction of chip power consumption is converted to heat, an exponential rise in heat density is also experienced. Diverse activities and sleep modes of the functional blocks in high-performance chips cause severe hot spots on a chip, creating large temperature variations (Figure 1.2), which can decrease functionality or cause timing failure. An emergency temperature happens when temperatures increment past the maximum temperature tolerance. In emergency temperature, the chip cannot function at its required speed, resulting in erroneous computations. In addition to the risk of functional failure caused by delay increases, extended exposure to high temperatures can result in aging and electro-migration [13].



Figure 1.2: IBM POWER4 Chip Temperature Map [2]

Although temperature is one of many sources of variation facing nanoscale systems [13], its tight coupling with power dissipation and power density makes it among the most important of factors constraining nanoscale system design. In deep sub-micron, leakage current is primarily responsible for the exponential rise in heat density. The sub-threshold leakage causes the overall leakage current to increase exponentially with temperature. A positive thermal feedback may lead to a thermal runaway rendering permanent damage to the circuits. Thermal runaway is the condition where an increase in temperature causes an increase in leakage current, and the increase in leakage current dissipates enough additional power to further increase the temperature, resulting in a cycle of increasing leakage and temperature that can have unstable consequences (Figure 1.3).



Figure 1.3: Impact of Thermal Runaway on a Test Socket [3]

Since thermal issues have been recognized as a critical barrier to utilize transistors effectively [14], it is becoming increasingly challenging to remove the massive heat generated by silicon chips. Temperature has an exponential effect on electro-migration and affects the stability of power delivery network (PDN), too.

To maintain performance and reliability in multi-core processors, dynamic thermal management (DTM) techniques adapt the behavior of the chip based on the provided temperature profile from thermal sensors, which is transmitted using network-on-chip (NoC) in multi-core chips [15, 16, 17]. Many techniques have been proposed to manage on-chip heat dissipation [14, 4, 18]. The problems regarding the DTM and PDN reliability will be discussed in the following section.

### **1.2 Problem Statement**

DTM efficiency is highly dependent on accurate input thermal data [14], as system performance degrades in consequence of unnecessary invokes of DTM techniques. In addition, inaccurate temperature profile lower than the actual temperature can result in late activation of DTM techniques, which could potentially result in physical damage [19]. Temperature sensing inaccuracies are cause by various factors including sensor placement, sensor device imprecision and interconnection DSM noise that will be explained in Section 2.3.3. The effect of sensor placement noise and sensor device imprecision on DTM efficiency are widely investigated [20, 21, 22, 23, 14].

One of the most worrying effects in nanometer technologies is the escalation of deep sub-micron noise. The reliability of thermal data transmitted over the bus in single core systems and over the NoC in multicore systems is challenging due to the increasing of noise in DSM technology, but to the best of our knowledge the effect of DSM noise on thermal data and consequently on DTM efficiency is missing from the literature. The above scenario motivates the need for a comprehensive investigation methodology to explore how DTM efficiency is affected by DSM noise. To highlight the dominant effect of DSM noise, this work compares the effect of DSM noise with the other noise sources affecting thermal data.

To mitigate the DSM noise impact to DTM efficiency, a specific fault tolerance scheme is proposed, exploiting inherent characteristics of DSM noise impacting the thermal data, based on approximate computing, which improve the DTM efficiency while consuming less power and area. This scheme enables continued operation of dynamic thermal management even in the presence of high ratio of DSM noise in deep sub-micron technology.

Reliability of thermal data is very important for DTM efficiency and consequently for the whole system performance and reliability. The other serious reliability challenge is a stable power delivery network to deliver sufficient current to switching transistors. Supply voltage can become noisy (i.e. drop or fluctuate) due to the PDN's intrinsic resistance, capacitance and inductance and cause timing errors and threatening program correctness. PDN suffers from long-term reliability threats such as electro-migration (EM). Temperature has an exponential effect on electro-migration and affects chip life time and voltage stability. The effect of temperature on reliability of C4 bumps are widely investigated, but a uniform temperature for the whole chip was considered. As another contribution of this study, the C4 bumps failure mechanisms dependency on accurate chip temperature is explored. This helps to reduce packaging cost and support more off-chip I/O channels in current and near-future technology nodes by enabling the designers provision bump allocation under different thermal maps.

## 1.3 Aim and Objective

The aim of this thesis is to provide a reliable thermal data transmission over NoC to maintain high DTM efficiency and investigate the reliability of power delivery network. To achieve this goal, the thesis objectives are based on three main approaches:

- Develop a simulation model to investigate the DTM performance under the effect of different sources of noise, including sensor noise, placement noise, and DSM noise, individually and in combination at 90nm technology scaling down to 22nm.
- Design and develop a specific fault tolerance coding scheme exploiting inherent characteristics of DSM noise impacting the thermal data, based on approximate computing technique.
- Develop Monte Carlo Simulation (MCS) to analyze the impact of temperature on mechanism of multiple, EM-induced, random power-C4 bump failures.

## 1.4 Contributions

The main contribution of this study is to investigate the impact of interconnect DSM noise on DTM efficiency and proposing a novel NoC fault tolerance scheme to mitigate the impact of interconnect DSM noise on DTM. Meanwhile, this study investigate the effect of temperature on power delivery network reliability.

#### 1.5 Scope

The scope of this study is limited to exploiting one DTM technique to investigate the effect of all sources of noise on multicore processors. In this study the raw thermal data captured directly from sensors are used and no processing applied to them. Also in this thesis the focus is on the on-chip PDN and not the I/O bumps.

#### 1.6 Thesis Organization

Next chapter, Chapter 2 is divided into seven sections, which introduce the required background and construct the foundation for the contributions of this thesis. Chapter 3 explains the simulation scenarios. The first simulation model is used to investigate the effect of each source of noise on thermal data and DTM efficiency. A novel approach is proposed for modeling different sources of noise and their scaling trends from 90nm to 22nm technology node. A novel fault tolerance design is also proposed due to the inherent characteristics of thermal monitoring data by exploiting approximate computing technique.

Since the DSM noise is increasing significantly by technology scaling, it is expected that by providing specific fault tolerance scheme more power and area savings are possible while maintaining DTM efficiency in current and near future technology nodes. This chapter also introduces the performance and cost metrics used to evaluate the DTM efficiency. For the other contribution of this study, a statistical simulation model is built to analyze the mechanism and consequences of multiple EM-induced C4 pad failures under the effect of C4 pad temperature. Chapter 4 first evaluates the effect of each source of noise on thermal data and DTM efficiency individually and in combination, using the noise models and metrics presented in Chapter 3. The second part of chapter 4 compares the proposed fault tolerance scheme with the benchmark scheme in terms of hardware implementation area and power consumption, the network performance such as reliability and maximum latency and the DTM efficiency. The third part of chapter 4 evaluates the effect of C4 bump temperature on PDN reliability. The last chapter, Chapter 5, ends the thesis with some conclusion and possible future works.

#### REFERENCES

- [1] D. Wolpert and P. Ampadu, *The Role of Temperature in Electronic Design*, pp. 1–13. Springer, 2012.
- [2] J. D. Warnock, J. M. Keaty, J. Petrovick, J. G. Clabes, C. J. Kircher, B. L. Krauter, P. J. Restle, B. A. Zoric, and C. J. Anderson, "The circuit and physical design of the power4 microprocessor," *IBM Journal of Research and Development*, vol. 46, no. 1, pp. 27–51, 2002.
- [3] A. Vassighi, O. Semenov, M. Sachdev, A. Keshavarzi, and C. Hawkins, "Cmos ic technology scaling and its impact on burn-in," *IEEE Transactions on Device* and Materials Reliability, vol. 4, no. 2, pp. 208–221, 2004.
- [4] D. Brooks and M. Martonosi, "Dynamic thermal management for high performance microprocessors," in *The Seventh International Symposium on High-Performance Computer Architecture*, 2001. HPCA., pp. 171–182, IEEE, 2001.
- [5] E. Boemo and S. López-Buedo, *Thermal monitoring on FPGAs using ring-oscillators*. Springer, 1997.
- [6] N. Bidokhti, "Seu concept to reality (allocation, prediction, mitigation)," in *Reliability and Maintainability Symposium (RAMS)*, 2010, pp. 1–5, IEEE, 2010.
- [7] R. Khazaka and M. Nakhla, "Analysis of high-speed interconnects in the presence of electromagnetic interference," *IEEE transactions on microwave theory and techniques*, vol. 46, no. 7, pp. 940–947, 1998.
- [8] W. N. Flayyih, *Crosstalk aware error control coding techniques for reliable and energy efficient network on chip.* PhD thesis, University Putra Malaysia, 2014.
- [9] E. A. S. Madduri, "A monitor interconnect and support subsystem for multicore processors," in *Proc. of the IEEE/ACM Design Automation and Test in Europe*, (France), 2009.
- [10] R. Zhang, *Pre-RTL On-Chip Power Delivery Modeling and Analysis*. PhD thesis, University of Virginia, 2015.
- [11] R. Zhang, B. H. Meyer, K. Wang, M. R. Stan, and K. Skadron, "Tolerating the consequences of multiple em-induced c4 bump failures," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 24, no. 6, pp. 2335–2344, 2016.
- [12] C. A. Mack, "Fifty years of moore's law," *IEEE Transactions on semiconductor manufacturing*, vol. 24, no. 2, pp. 202–207, 2011.
- [13] K. Bernstein, D. J. Frank, A. E. Gattiker, W. Haensch, B. L. Ji, S. R. Nassif, E. J. Nowak, D. J. Pearson, and N. J. Rohrer, "High-performance cmos variability in the 65-nm regime and beyond," *IBM journal of research and development*, vol. 50, no. 4.5, pp. 433–449, 2006.

- [14] K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan, *Temperature-aware microarchitecture*, vol. 31. ACM, 2003.
- [15] M. S. Floyd, S. Ghiasi, T. W. Keller, K. Rajamani, F. Rawson, J. C. Rubio, and M. S. Ware, "System power management support in the IBM power6 microprocessor," *IBM Journal of Research and Development*, vol. 51, no. 6, pp. 733–746, 2007.
- [16] B. Vermeulen and K. Goossens, "A network-on-chip monitoring infrastructure for communication-centric debug of embedded multi-processor socs," in *International Symposium on VLSI Design, Automation and Test*, pp. 183–186, IEEE, 2009.
- [17] J. Zhao, S. Madduri, R. Vadlamani, W. Burleson, and R. Tessier, "A dedicated monitoring infrastructure for multicore processors," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 19, no. 6, pp. 1011–1022, 2011.
- [18] S. Bhunia, N. Banerjee, H. Mahmoodi, Q. Chen, and K. Roy, "Synthesis approach for active leakage power reduction using dynamic supply gating," Google Patents, 2008.
- [19] S. Sharifi and T. S. Rosing, "Accurate direct and indirect on-chip temperature sensing for efficient dynamic thermal management," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 29, no. 10, pp. 1586–1599, 2010.
- [20] K.-J. Lee, K. Skadron, and W. Huang, "Analytical model for sensor placement on microprocessors," in *International Conference on Computer Design*, pp. 24– 27, IEEE, 2005.
- [21] J. Ranieri, A. Chebira, and M. Vetterli, "Near-optimal sensor placement for linear inverse problems," *IEEE Transactions on signal processing*, vol. 62, no. 5, pp. 1135–1146, 2014.
- [22] R. Mukherjee and S. O. Memik, "Systematic temperature sensor allocation and placement for microprocessors," in *Proceedings of the 43rd annual Design Automation Conference*, pp. 542–547, ACM, 2006.
- [23] Y. Zhang and A. Srivastava, "Accurate temperature estimation using noisy thermal sensors," in *Proceedings of the 46th Annual Design Automation Conference*, pp. 472–477, ACM, 2009.
- [24] W. Huang, K. Rajamani, M. R. Stan, and K. Skadron, "Scaling with design constraints: Predicting the future of big chips," *IEEE Micro*, no. 4, pp. 16–29, 2011.
- [25] E. Rotem, J. Hermerding, A. Cohen, and H. Cain, "Temperature measurement in the intel (r) coretm duo processor," *Proceedings of 12th International Workshop on Thermal investigations of ICs*, 2007.

- [26] S. Rusu, "Trends and challenges in high-performance microprocessor design," in *Electronic Design Processes (EDP) Workshop*, (California Intel Corporation), 2004.
- [27] T. Ghani, "Challenges and innovations in nano-cmos transistor scaling," *Capturado intel*, 2009.
- [28] M. Z. Hasan and M. Bird, "Energy reductions for embedded processors in reconfigurable hardware," in *IEEE International Conference on Electro/Information Technology (EIT)*, 2011, pp. 1–8, IEEE, 2011.
- [29] A. Chakraborty and S. N. Pradhan, "A technique for power reduction of cmos circuit at 65nm technology," in 1st International Conference on Recent Advances in Information Technology (RAIT), pp. 576–580, IEEE, 2012.
- [30] "International technology roadmap for semiconductors," 2016.
- [31] C. Jiang, X. Xu, J. Wan, and X. You, "Energy management for microprocessor systems: challenges and existing solutions," in *International Symposium on Intelligent Information Technology Application Workshops*, 2008. IITAW'08., pp. 1071–1076, IEEE, 2008.
- [32] E. Kursun, C.-Y. Cher, A. Buyuktosunoglu, and P. Bose, "Investigating the effects of task scheduling on thermal behavior," in *Third Workshop on Temperature-Aware Computer Systems (TACS'06)*, 2006.
- [33] Y. Liu, R. P. Dick, L. Shang, and H. Yang, "Accurate temperature-dependent integrated circuit leakage power estimation is easy," in *Proceedings of the conference on Design, automation and test in Europe*, pp. 1526–1531, EDA Consortium, 2007.
- [34] V. K. Pamula and K. Chakrabarty, "Cooling of integrated circuits using dropletbased microfluidics," in *Proceedings of the 13th ACM Great Lakes symposium* on VLSI, pp. 84–87, ACM, 2003.
- [35] R. Cochran and S. Reda, "Spectral techniques for high-resolution thermal characterization with limited sensor data," in *Proceedings of the 46th Annual Design Automation Conference*, pp. 478–483, ACM, 2009.
- [36] S. Sharifi, C. Liu, and T. S. Rosing, "Accurate temperature estimation for efficient thermal management," in *9th International Symposium on Quality Electronic Design (ISQED 2008)*, pp. 137–142, IEEE, 2008.
- [37] H. F. Sheikh, I. Ahmad, Z. Wang, and S. Ranka, "An overview and classification of thermal-aware scheduling techniques for multi-core processing systems," *Sustainable Computing: Informatics and Systems*, vol. 2, no. 3, pp. 151–169, 2012.
- [38] R. Rao, S. Vrudhula, and C. Chakrabarti, "Throughput of multi-core processors under thermal constraints," in *Proceedings of the 2007 international symposium on Low power electronics and design*, pp. 201–206, ACM, 2007.

- [39] Z. Wang and S. Ranka, "A simple thermal model for multi-core processors and its application to slack allocation," in *IEEE International Symposium on Parallel and Distributed Processing (IPDPS)*, pp. 1–11, IEEE, 2010.
- [40] J. Yang, X. Zhou, M. Chrobak, Y. Zhang, and L. Jin, "Dynamic thermal management through task scheduling," in *IEEE International Symposium on Performance Analysis of Systems and software*, 2008. ISPASS 2008., pp. 191– 201, IEEE, 2008.
- [41] P. Bailis, V. J. Reddi, S. Gandhi, D. Brooks, and M. Seltzer, "Dimetrodon: processor-level preventive thermal management via idle cycle injection," in *Proceedings of the 48th Design Automation Conference*, pp. 89–94, ACM, 2011.
- [42] K.-Y. Ting, A. Mehta, S. K. Goel, and S. John, "Dynamic frequency scaling," 2012.
- [43] S. Dighe, S. R. Vangal, P. Aseron, S. Kumar, T. Jacob, K. A. Bowman, J. Howard, J. Tschanz, V. Erraguntla, and N. Borkar, "Within-die variation aware dynamic-voltage-frequency-scaling with optimal core allocation and thread hopping for the 80-core teraflops processor," *IEEE Journal of Solid State Circuits*, vol. 46, no. 1, pp. 184–193, 2011.
- [44] B. Salami, M. Baharani, and H. Noori, "Proactive task migration with a self adjusting migration threshold for dynamic thermal management of multi-core processors," *The Journal of Supercomputing*, vol. 68, no. 3, pp. 1068–1087, 2014.
- [45] P. Ituero, J. L. Ayala, and M. Lopez-Vallejo, "Leakage-based on-chip thermal sensor for cmos technology," in *IEEE International Symposium on Circuits and Systems*, pp. 3327–3330, IEEE, 2007.
- [46] V. Székely, C. Márta, Z. Kohari, and M. Rencz, "Cmos sensors for on-line thermal monitoring of vlsi circuits," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 5, no. 3, pp. 270–276, 1997.
- [47] B. Datta and W. Burleson, "Low-power and robust on-chip thermal sensing using differential ring oscillators," in *50th Midwest Symposium on Circuits and Systems*, pp. 29–32, IEEE, 2007.
- [48] H. Hanson, S. W. Keckler, S. Ghiasi, K. Rajamani, F. Rawson, and J. Rubio, "Thermal response to dvfs: analysis with an intel pentium m," in *Proceedings of the 2007 international symposium on Low power electronics and design*, pp. 219–224, ACM, 2007.
- [49] B. Datta and W. Burleson, "Low-power, process-variation tolerant onchip thermal monitoring using track and hold based thermal sensors," in *Proceedings* of the 19th ACM Great Lakes symposium on VLSI, pp. 145–148, ACM, 2009.

- [50] D. Pham, S. Asano, M. Bolliger, M. Day, H. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, and Y. Masubuchi, "The design and implementation of a first-generation cell processor-a multi-core soc," in 2005 International Conference on Integrated Circuit Design and Technology, 2005. ICICDT 2005., pp. 49–52, IEEE, 2005.
- [51] R. Kuppuswamy, S. R. Sawant, S. Balasubramanian, P. Kaushik, N. Natarajan, and J. D. Gilbert, "Over one million tpcc with a 45nm 6-core xeon cpu," in *IEEE International Solid-State Circuits Conference-Digest of Technical Papers*, pp. 70–71, 71a, IEEE, 2009.
- [52] J. Wang, *Thermal Modeling and Management of Multi-Core Processors*. PhD thesis, Technical University of Kaiserslautern, 2014.
- [53] S. Suman and B. Singh, "Ring oscillator based cmos temperature sensor design," *International Journal of Scientific and Technology Research*, vol. 1, no. 4, pp. 76–81, 2012.
- [54] C. Yao, K. K. Saluja, and P. Ramanathan, "Calibrating on-chip thermal sensors in integrated circuits: A design-for-calibration approach," *Journal of Electronic Testing*, vol. 27, no. 6, pp. 711–721, 2011.
- [55] Mootec, "Moortec embedded temperature sensor (mets) ip," 2017.
- [56] J. L. Henning, "Spec cpu2006 benchmark descriptions," ACM SIGARCH Computer Architecture News, vol. 34, no. 4, pp. 1–17, 2006.
- [57] D. A. Narciso, N. P. Faisca, and E. N. Pistikopoulos, "A framework for multiparametric programming and control-an overview," in *IEEE International Engineering Management Conference*, 2008.
- [58] *Blind identification of power sources in processors*, European Design and Automation Association, 2017.
- [59] X. Li, X. Ou, Z. Li, H. Wei, W. Zhou, and Z. Duan, "On-line temperature estimation for noisy thermal sensors using a smoothing filter-based Kalman predictor.," *Sensors*, vol. 18, no. 2, 2018.
- [60] M. Pedram and S. Nazarian, "Thermal modeling, analysis, and management in vlsi circuits: Principles and methods," *Proceedings of the IEEE*, vol. 94, no. 8, pp. 1487–1501, 2006.
- [61] C. J. Lasance, "Thermally driven reliability issues in microelectronic systems: status-quo and challenges," *Microelectronics Reliability*, vol. 43, no. 12, pp. 1969–1974, 2003.
- [62] A. K. Coskun, T. S. Rosing, K. Mihic, G. De Micheli, and Y. Leblebici, "Analysis and optimization of MPSoC reliability," *Journal of Low Power Electronics*, vol. 2, no. 1, pp. 56–69, 2006.

- [63] M. Cho, S. Ahmedtt, and D. Z. Pan, "Taco: temperature aware clock-tree optimization," in *Proceedings of the 2005 IEEE/ACM International conference* on Computer-aided design, pp. 582–587, IEEE Computer Society, 2005.
- [64] Y. Zhang, B. Shi, and A. Srivastava, "A statistical framework for designing on chip thermal sensing infrastructure in nano-scale systems," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 2014.
- [65] R. F. Van der Wijngaart, T. G. Mattson, and W. Haas, "Light-weight communications on intel's single-chip cloud computer processor," ACM SIGOPS Operating Systems Review, vol. 45, no. 1, pp. 73–83, 2011.
- [66] N. R. Shanbhag, "Reliable and efficient system-on-chip design," *Computer*, vol. 37, no. 3, pp. 42–50, 2004.
- [67] A. Flores, J. L. Aragón, and M. E. Acacio, "An energy consumption characterization of on-chip interconnection networks for tiled cmp architectures," *The Journal of Supercomputing*, vol. 45, no. 3, pp. 341–364, 2008.
- [68] W. J. Dally and J. W. Poulton, *Digital systems engineering*. Cambridge university press, 2008.
- [69] S. Krishnamohan and N. R. Mahapatra, "A highly-efficient technique for reducing soft errors in static cmos circuits," in *IEEE International Conference* on Computer Design: VLSI in Computers and Processors, pp. 126–131, IEEE, 2004.
- [70] M. Gordon, P. Goldhagen, K. Rodbell, T. Zabel, H. Tang, J. Clem, and P. Bailey, "Measurement of the flux and energy spectrum of cosmic-ray induced neutrons on the ground," *IEEE Transactions on Nuclear Science*, vol. 51, no. 6, pp. 3427– 3434, 2004.
- [71] J. D. Wilkinson, C. Bounds, T. Brown, B. J. Gerbi, and J. Peltier, "Cancer radiotherapy equipment as a cause of soft errors in electronic equipment," *IEEE Transactions on Device and Materials Reliability*, vol. 5, no. 3, pp. 449–451, 2005.
- [72] X. Li, K. Shen, M. C. Huang, and L. Chu, "A memory soft error measurement on production systems.," in USENIX Annual Technical Conference, pp. 275–280, 2007.
- [73] C. Duan, V. H. C. Calle, and S. P. Khatri, "Efficient on-chip crosstalk avoidance codec design," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 17, no. 4, pp. 551–560, 2009.
- [74] C. Constantinescu, "Trends and challenges in vlsi circuit reliability," *IEEE micro*, vol. 23, no. 4, pp. 14–19, 2003.
- [75] L. Franco, F. Gomez, A. Iglesias, J. Pardo, A. Pazos, J. Pena, and M. Zapata, "Seus on commercial sram induced by low energy neutrons produced at a clinical linac facility," in *European Congress RADECS*, pp. 13–19, 2005.

- [76] S. Mukherjee, Architecture design for soft errors. Morgan Kaufmann, 2011.
- [77] P. P. Pande, A. Ganguly, H. Zhu, and C. Grecu, "Energy reduction through crosstalk avoidance coding in networks on chip," *Journal of Systems Architecture*, vol. 54, no. 3, pp. 441–451, 2008.
- [78] B. Fu, *Crosstalk-aware multiple error control for reliable on-chip interconnects*. Proquest, Umi Dissertatio, 2011.
- [79] S. R. Sridhara and N. R. Shanbhag, "Coding for reliable on-chip buses: A class of fundamental bounds and practical codes," *IEEE Trans. on CAD of Integrated Circuits and Systems*, vol. 26, no. 5, pp. 977–982, 2007.
- [80] P. P. Sotiriadis and A. Chandrakasan, "Reducing bus delay in submicron technology using coding," in *Proceedings of the 2001 Asia and South Pacific Design Automation Conference*, pp. 109–114, ACM, 2001.
- [81] R. McGowen, C. A. Poirier, C. Bostak, J. Ignowski, M. Millican, W. H. Parks, and S. Naffziger, "Power and temperature control on a 90-nm itanium family processor," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 1, pp. 229–237, 2006.
- [82] S. Aboobacker, "Razor: circuit-level correction of timing errors for low-power operation," *IEEE Micro*, 2011.
- [83] M. Saen, K. Osada, S. Misaka, T. Yamada, Y. Tsujimoto, Y. Kondoh, T. Kamei, Y. Yoshida, E. Nagahama, and Y. Nitta, "Embedded soc resource manager to control temperature and data bandwidth," in *Digest of Technical Papers 2007 IEEE International Solid-State Circuits Conference.*, pp. 296–604, IEEE, 2007.
- [84] C. Grecu, P. P. Pande, A. Ivanov, and R. Saleh, "Structured interconnect architecture: a solution for the non-scalability of bus-based socs," in *Proceedings* of the 14th ACM Great Lakes symposium on VLSI, pp. 192–195, ACM, 2004.
- [85] W. J. Dally and B. Towles, "Route packets, not wires: on-chip interconnection networks," in *Design Automation Conference*, 2001, pp. 684–689, IEEE, 2001.
- [86] L. Benini and G. De Micheli, "Networks on chips: a new soc paradigm," *computer*, vol. 35, no. 1, pp. 70–78, 2002.
- [87] C. Nicopoulos, V. Narayanan, and C. R. Das, *Network-on-Chip Architectures: A Holistic Design Exploration*, vol. 45. Springer Science and Business Media, 2009.
- [88] H. G. Lee, N. Chang, U. Y. Ogras, and R. Marculescu, "On-chip communication architecture exploration: A quantitative evaluation of point-to-point, bus, and network-on-chip approaches," ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 12, no. 3, p. 23, 2007.
- [89] H.-J. Yoo, K. Lee, and J. K. Kim, *Low-power noc for high-performance soc design*. CRC press, 2008.

- [90] D. Sigüuenza-Tortosa and J. Nurmi, *From buses to networks*, pp. 231–251. Springer, 2005.
- [91] T. Bjerregaard and S. Mahadevan, "A survey of research and practices of network-on-chip," *ACM Computing Surveys (CSUR)*, vol. 38, no. 1, p. 1, 2006.
- [92] E. Salminen, A. Kulmala, and T. D. Hamalainen, "Survey of network-on-chip proposals," *white paper, OCP-IP*, vol. 1, p. 13, 2008.
- [93] G. De Micheli and L. Benini, *Networks on chips: technology and tools*. Academic Press, 2006.
- [94] A. Agarwal, C. Iskander, and R. Shankar, "Survey of network on chip (noc) architectures and contributions," *Journal of engineering, Computing and Architecture*, vol. 3, no. 1, pp. 21–27, 2009.
- [95] J. Liu, L.-R. Zheng, and H. Tenhunen, "A circuit-switched network architecture for network-on-chip," in *IEEE International SOC Conference*, pp. 55–58, IEEE, 2004.
- [96] P. T. Wolkotte, G. J. Smit, G. K. Rauwerda, and L. T. Smit, "An energy efficient reconfigurable circuit-switched network-on-chip," in 19th IEEE International Parallel and Distributed Processing Symposium, pp. 155a–155a, IEEE, 2005.
- [97] E. Nilsson, "Design and implementation of a hot-potato switch in a network on chip," 2002.
- [98] C. Gómez, M. E. Gómez, P. López, and J. Duato, *Reducing packet dropping in a bufferless noc*. Springer, 2008.
- [99] S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif, B. Liewei, and J. Brown, *On-chip interconnection architecture of the tile processor*. ISSCC, 2008.
- [100] N. Wang, A. Sanusi, P. Zhao, M. Elgamel, and M. A. Bayoumi, "Pmcnoc: A pipelining multi-channel central caching network-on-chip communication architecture design," *Journal of Signal Processing Systems*, vol. 60, no. 3, pp. 315–331, 2010.
- [101] W. J. Dally and B. P. Towles, *Principles and practices of interconnection networks*. Elsevier, 2004.
- [102] L. Guang, E. Nigussie, J. Isoaho, P. Rantala, and H. Tenhunen, "Interconnection alternatives for hierarchical monitoring communication in parallel socs," *Microprocessors and Microsystems*, vol. 34, no. 5, pp. 118–128, 2010.
- [103] J. Nurmi, J. Isoaho, A. Jantsch, and H. Tenhunen, *Interconnect-centric design for advanced SoC and NoC*. Springer, 2004.

- [104] R. Hegde and N. R. Shanbhag, "Toward achieving energy efficiency in presence of deep submicron noise," *IEEE Transactions on Very Large Scale Integration* (VLSI) Systems, vol. 8, no. 4, pp. 379–391, 2000.
- [105] A. Ganguly, P. P. Pande, B. Belzer, and C. Grecu, "Design of low power and reliable networks on chip through joint crosstalk avoidance and multiple error correction coding," *Journal of Electronic Testing*, vol. 24, no. 1-3, pp. 67–81, 2008.
- [106] D. Bertozzi, L. Benini, and G. De Micheli, "Error control schemes for on-chip communication links: the energy-reliability tradeoff," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 24, no. 6, pp. 818–831, 2005.
- [107] D. Sylvester and K. Keutzer, "A global wiring paradigm for deep submicron design," *IEEE Transactions on Computer-Aided Design of Integrated Circuits* and Systems, vol. 19, no. 2, pp. 242–252, 2000.
- [108] S. Lin and D. J. Costello, *Error control coding*. Pearson Education India, 2004.
- [109] S. Pasricha and N. Dutt, *On-chip communication architectures: system on chip interconnect*. Morgan Kaufmann, 2010.
- [110] S. R. Sridhara and N. R. Shanbhag, "Coding for system-on-chip networks: a unified framework," *IEEE transactions on very large scale integration (VLSI)* systems, vol. 13, no. 6, pp. 655–667, 2005.
- [111] D. Rossi, V. Van Dijk, R. P. Kleihorst, A. Nieuwland, and C. Metra, *Coding* scheme for low energy consumption fault-tolerant bus. IEEE, 2002.
- [112] K. N. Patel and I. L. Markov, "Error-correction and crosstalk avoidance in dsm busses," in *Proceedings of the 2003 international workshop on System-level interconnect prediction*, pp. 9–14, ACM, 2003.
- [113] S. Murali, T. Theocharides, N. Vijaykrishnan, M. J. Irwin, L. Benini, and G. De Micheli, "Analysis of error recovery schemes for networks on chips," *Trans. VLSI Systems*, vol. 8, no. 4, pp. 379–391, 2000.
- [114] A. Thomas and K. Pattabiraman, "Llfi: An intermediate code level fault injector for soft computing applications," in *Workshop on Silicon Errors in Logic System Effects (SELSE)*, 2013.
- [115] S. Mittal, "A survey of techniques for approximate computing," ACM Computing Surveys (CSUR), vol. 48, no. 4, p. 62, 2016.
- [116] A. Rahimi, A. Ghofrani, K.-T. Cheng, L. Benini, and R. K. Gupta, *Approximate associative memristive memory for energy-efficient GPUs*. EDA Consortium, 2015.
- [117] H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, *Architecture support for disciplined approximate programming*, vol. 47. ACM, 2012.

- [118] A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman, *EnerJ: Approximate data types for safe and general low-power computation*, vol. 46. ACM, 2011.
- [119] M. De Kruijf, S. Nomura, and K. Sankaralingam, "Relax: An architectural framework for software recovery of hardware faults," ACM SIGARCH Computer Architecture News, vol. 38, no. 3, pp. 497–508, 2010.
- [120] L. Leem, H. Cho, J. Bau, Q. A. Jacobson, and S. Mitra, *ERSA: Error resilient* system architecture for probabilistic applications. IEEE, 2010.
- [121] X. Li and D. Yeung, "Exploiting soft computing for increased fault tolerance," 2006.
- [122] X. Li and D. Yeung, "Application-level correctness and its impact on fault tolerance," in *IEEE 13th International Symposium on High Performance Computer Architecture*, pp. 181–192, IEEE, 2007.
- [123] M. Rinard, H. Hoffmann, S. Misailovic, and S. Sidiroglou, *Patterns and statistical analysis for understanding reduced resource computing*, vol. 45. ACM, 2010.
- [124] V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan, "Analysis and characterization of inherent application resilience for approximate computing," in *Proceedings of the 50th Annual Design Automation Conference*, p. 113, ACM, 2013.
- [125] Y. Fang, H. Li, and X. Li, "A fault criticality evaluation framework of digital systems for error tolerant video applications," in *Asian Test Symposium*, pp. 329– 334, IEEE, 2011.
- [126] A. Heinig, V. J. Mooney, F. Schmoll, P. Marwedel, K. Palem, and M. Engel, "Classification-based improvement of application robustness and quality of service in probabilistic computer systems," in *International Conference on Architecture of Computing Systems*, pp. 1–12, Springer, 2012.
- [127] M. De Kruijf and K. Sankaralingam, "Exploring the synergy of emerging workloads and silicon reliability trends," 2009.
- [128] S. Roy, T. Clemons, S. Faisal, K. Liu, N. Hardavellas, and S. Parthasarathy, "Elastic fidelity: Trading-off computational accuracy for energy reduction," in *North- western University Technical report NWU-EECS-11-02.*, 2011.
- [129] T. Y. Yeh, G. Reinman, S. J. Patel, and P. Faloutsos, "Fool me twice: Exploring and exploiting error tolerance in physics-based animation," *ACM Transactions* on *Graphics (TOG)*, vol. 29, no. 1, p. 5, 2009.

- [130] V. K. Chippa, D. Mohapatra, A. Raghunathan, K. Roy, and S. T. Chakradhar, "Scalable effort hardware design: exploiting algorithmic resilience for energy efficiency," in *Proceedings of the 47th Design Automation Conference*, pp. 555– 560, ACM, 2010.
- [131] S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn, "Flicker: Saving refresh-power in mobile devices through critical data partitioning," in *International Conference on Architectural Support for Programming Languages* and Operating Systems (ASPLOS), Technical Report. Microsoft Research, 2011.
- [132] M. Shoushtari, A. BanaiyanMofrad, and N. Dutt, "Exploiting partially-forgetful memories for approximate computing," *IEEE Embedded Systems Letters*, vol. 7, no. 1, pp. 19–22, 2015.
- [133] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, "Low-power digital signal processing using approximate adders," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 32, no. 1, pp. 124–137, 2013.
- [134] J. Miao, *Modeling and synthesis of approximate digital circuits*. PhD thesis, University of Texas at Austin, 2014.
- [135] V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, *IMPACT: imprecise adders for low-power approximate computing*. IEEE Press, 2011.
- [136] A. B. Kahng and S. Kang, Accuracy-configurable adder for approximate arithmetic designs. ACM, 2012.
- [137] M. Popovich, A. V. Mezhiba, and E. G. Friedman, *Power distribution networks with on-chip decoupling capacitors*. Springer Science and Business Media, 2007.
- [138] M. Saint-Laurent and M. Swaminathan, "Impact of power-supply noise on timing in high-frequency microprocessors," *IEEE Transactions on Advanced Packaging*, vol. 27, no. 1, pp. 135–144, 2004.
- [139] R. Jakushokas and E. G. Friedman, *Line width optimization for interdigitated power/ground networks*. ACM, 2010.
- [140] M. Popovich, E. G. Friedman, R. M. Secareanu, and O. L. Hartin, "Efficient distributed on-chip decoupling capacitors for nanoscale ics," *IEEE Transactions* on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 12, pp. 1717–1721, 2008.
- [141] T. Yu and M. D. Wong, A novel and efficient method for power pad placement optimization. IEEE, 2013.
- [142] R. Zhang, B. H. Meyer, W. Huang, K. Skadron, and M. R. Stan, "Some limits of power delivery in the multicore era," *Proceedings of WEED*, 2012.

- [143] Y. Zhong and M. D. Wong, "Fast placement optimization of power supply pads," Proceedings of the 2007 Asia and South Pacific Design Automation Conference, pp. 763–767, 2007.
- [144] M. B. Healy and S. K. Lim, "Distributed tsv topology for 3-d power supply networks," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 20, no. 11, pp. 2066–2079, 2012.
- [145] A. Todri, S. Kundu, P. Girard, A. Bosio, L. Dilillo, and A. Virazel, "A study of tapered 3-d tsvs for power and thermal integrity," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 21, no. 2, pp. 306–319, 2013.
- [146] C. R. Lefurgy, A. J. Drake, M. S. Floyd, M. S. Allen-Ware, B. Brock, J. A. Tierno, and J. B. Carter, Active management of timing guardband to save energy in POWER7. ACM, 2011.
- [147] T. Sato, M. Hashimoto, and H. Onodera, "Successive pad assignment algorithm to optimize number and location of power supply pad using incremental matrix inversion," in *Proceedings of the 2005 Asia and South Pacific Design Automation Conference*, pp. 723–728, ACM, 2005.
- [148] K. Wang, B. H. Meyer, R. Zhang, K. Skadron, and M. Stan, "Walking pads: Fast power-supply pad-placement optimization," 19th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 537–543, 2014.
- [149] K. Wang, B. H. Meyer, R. Zhang, M. Stan, and K. Skadron, "Walking pads: Managing c4 placement for transient voltage noise minimization," 51<sup>st</sup> ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6, 2014.
- [150] A. V. Mezhiba and E. G. Friedman, *Electrical characteristics of multi-layer* power distribution grids, vol. 5. IEEE, 2003.
- [151] R. Jakushokas, M. Popovich, A. V. Mezhiba, S. Köse, and E. G. Friedman, *Inductance model of interdigitated power and ground distribution networks*, pp. 323–335. Springer, 2011.
- [152] R. Jakushokas and E. G. Friedman, "Multi-layer interdigitated power distribution networks," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 19, no. 5, pp. 774–786, 2011.
- [153] S. Köse and E. G. Friedman, "Simultaneous co-design of distributed on-chip power supplies and decoupling capacitors," in 23rd IEEE International SOC Conference, pp. 15–18, IEEE, 2010.
- [154] M. Popovich, E. G. Friedman, M. Sotman, A. Kolodny, and R. M. Secareanu, "Maximum effective distance of on-chip decoupling capacitors in power distribution grids," in *Proceedings of the 16th ACM Great Lakes symposium on VLSI*, pp. 173–179, ACM, 2006.
- [155] P. L'Ecuyer, Non-uniform random variate generations, pp. 991–995. Springer, 2011.

- [156] C. Pei, R. Booth, H. Ho, N. Kusaba, X. Li, M. Brodsky, P. Parries, H. Shang, R. Divakaruni, and S. Iyer, "A novel, low-cost deep trench decoupling capacitor for high-performance, low-power bulk cmos applications," in 9<sup>th</sup> International Conference on Solid-State and Integrated-Circuit Technology, pp. 1146–1149, IEEE, 2008.
- [157] G. Katti, M. Stucchi, K. De Meyer, and W. Dehaene, "Electrical modeling and characterization of through silicon via for three-dimensional ics," *IEEE Transactions on Electron Devices*, vol. 57, no. 1, pp. 256–262, 2010.
- [158] Z. Xu, X. Gu, M. Scheuermann, K. Rose, B. C.Webb, J. U. Knickerbocker, and J.-Q. Lu, "Modeling of power delivery into 3d chips on silicon interposer," in *IEEE 62nd Electronic Components and Technology Conference*, pp. 683–689, IEEE, 2012.
- [159] X. Zhao, Y. Wan, M. Scheuermann, and S. K. Lim, "Transient modeling of tsvwire electromigration and lifetime analysis of power distribution network for 3d ics," in 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 363–370, IEEE, 2013.
- [160] H. Yu, J. Ho, and L. He, "Allocating power ground vias in 3d ics for simultaneous power and thermal integrity," ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 14, no. 3, p. 41, 2009.
- [161] W. H. Kao, C.-Y. Lo, M. Basel, and R. Singh, "Parasitic extraction: current state of the art and future trends," *Proceedings of the IEEE*, vol. 89, no. 5, pp. 729– 739, 2001.
- [162] S. R. Nassif and J. N. Kozhaya, "Fast power grid simulation," in *Proceedings of the 37th Annual Design Automation Conference*, pp. 156–161, ACM, 2000.
- [163] Z. Zeng, Z. Feng, P. Li, and V. Sarin, "Locality-driven parallel static analysis for power delivery networks," ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 16, no. 3, p. 28, 2011.
- [164] J. Lloyd, "On the log-normal distribution of electromigration lifetimes," *Journal of Applied Physics*, vol. 50, no. 7, pp. 5062–5064, 1979.
- [165] J. R. Black, "Electromigration—a brief survey and some recent results," *IEEE Transactions on Electron Devices*, vol. 16, no. 4, pp. 338–347, 1969.
- [166] B.-K. Liew, N. W. Cheung, and C. Hu, "Projecting interconnect electromigration lifetime for arbitrary current waveforms," *IEEE Transactions on Electron Devices*, vol. 37, no. 5, pp. 1343–1351, 1990.
- [167] J. Tao, B.-K. Liew, J. F. Chen, N. W. Cheung, and C. Hu, "Electromigration under time-varying current stress," *Microelectronics Reliability*, vol. 38, no. 3, pp. 295–308, 1998.

- [168] J. Lloyd and J. Clement, "Electromigration in copper conductors," *Thin solid films*, vol. 262, no. 1, pp. 135–141, 1995.
- [169] K. Tu, "Recent advances on electromigration in very-large-scale-integration of interconnects," *Journal of applied physics*, vol. 94, no. 9, pp. 5451–5473, 2003.
- [170] W. Choi, E. Yeh, and K. Tu, "Mean-time-to-failure study of flip chip solder joints on cu/ni (v)/al thin-film under-bump-metallization," *Journal of applied physics*, vol. 94, no. 9, pp. 5665–5671, 2003.
- [171] Y. Yeh, C. Chou, Y. Hsu, C. Chen, and K. Tu, "Threshold current density of electromigration in eutectic snpb solder," *Applied Physics Letters*, vol. 86, no. 20, p. 203504, 2005.
- [172] K.-D. Lee and P. S. Ho, "Statistical study for electromigration reliability in dualdamascene cu interconnects," *IEEE Transactions on Device and Materials Reliability*, vol. 4, no. 2, pp. 237–245, 2004.
- [173] A. Todri and M. Marek-Sadowska, *Electromigration study of power-gated grids*. ACM, 2009.
- [174] S. Wright, R. Polastre, H. Gan, L. Buchwalter, R. Horton, P.Andry, E. Sprogis, C. Patel, C. Tsang, and J. Knickerbocker, *Characterization of micro-bump C4 interconnects for Si-carrier SOP applications*. IEEE, 2006.
- [175] W. N. Flayyih, K. Samsudin, S. J. Hashim, F. Z. Rokhani, and Y. I. Ismail, "Crosstalk-aware multiple error detection scheme based on two-dimensional parities for energy efficient network on chip," *IEEE Transactions on Circuits* and Systems I: Regular Papers, vol. 61, no. 7, pp. 2034–2047, 2014.
- [176] D. H. Hoe, "The use of error correcting codes for nanoelectronic systems: Overview and future prospects," in 45th Southeastern Symposium on System Theory (SSST), 2013, pp. 51–54, IEEE, 2013.
- [177] R. Zhang, B. H. Meyer, K. Wang, M. R. Stan, and K. Skadron, "Tolerating the consequences of multiple em-induced c4 bump failures," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 24, no. 6, pp. 2335–2344, 2016.
- [178] V. George, S. Jahagirdar, C. Tong, K. Smits, S. Damaraju, S. Siers, V. Naydenov, T. Khondker, S. Sarkar, and P. Singh, "Penryn: 45-nm next generation intel® core<sup>™</sup> 2 processor," in *IEEE Asian Solid-State Circuits Conference*, 2007. ASSCC'07., pp. 14–17, IEEE, 2007.
- [179] S. O. Memik, R. Mukherjee, M. Ni, and J. Long, "Optimizing thermal sensor allocation for microprocessors," *IEEE Transactions on ComputerAided Design* of Integrated Circuits and Systems, vol. 27, no. 3, pp. 516–527, 2008.
- [180] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, and S. Sardashti, "The gem5 simulator," ACM SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1–7, 2011.

- [181] S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen and N. P. Jouppi, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 469-480, 2009.
- [182] J. W. Haskins Jr and K. Skadron, "Accelerated warmup for sampled microarchitecture simulation," ACM Transactions on Architecture and Code Optimization (TACO), vol. 2, no. 1, pp. 78–108, 2005.
- [183] R. Zhang, K. Wang, B. H. Meyer, M. R. Stan, and K. Skadron, "Architecture implications of pads as a scarce resource," in ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 373–384, IEEE, 2014.
- [184] G. G. Faust, R. Zhang, K. Skadron, M. R. Stan, and B. H. Meyer, "Archfp: Rapid prototyping of pre-rtl floorplans," in *IEEE/IFIP 20th International Conference* on VLSI and System-on-Chip, 2012 (VLSI-SoC), pp. 183–188, IEEE, 2012.
- [185] A. K. Coskun, T. S. Rosing, and K. Whisnant, "Temperature aware task scheduling in mpsocs," in *Proceedings of the conference on Design, automation and test in Europe*, pp. 1659–1664, EDA Consortium, 2007.
- [186] J. Donald and M. Martonosi, "Techniques for multicore thermal management: Classification and new exploration," in ACM SIGARCH Computer Architecture News, vol. 34, pp. 78–88, IEEE Computer Society, 2006.
- [187] D. Sengupta and S. S. Sapatnekar, "Predicting circuit aging using ring oscillators," in 19th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 430–435 147992816X, IEEE, 2014.
- [188] X. Wang, J. Keane, T. T.-H. Kim, P. Jain, Q. Tang, and C. H. Kim, "Silicon odometers: Compact in situ aging sensors for robust system design," *IEEE Micro*, vol. 34, no. 6, pp. 74–85, 2014.
- [189] Virtuoso, "cadence virtuoso tool," 2018.
- [190] H.Yu, *Process variation aware design and applications for fpgas*. The Chinese University of Hong Kong (People's Republic of China), 2012.
- [191] J. Long, S. O. Memik, G. Memik, and R. Mukherjee, "Thermal monitoring mechanisms for chip multiprocessors," ACM Transactions on Architecture and Code Optimization (TACO), vol. 5, no. 2, p. 9, 2008.
- [192] A. N. Nowroz, R. Cochran, and S. Reda, *Thermal monitoring of real processors: Techniques for sensor allocation and full characterization*. ACM, 2010.
- [193] Y. Cao, *Predictive technology model for robust nanoelectronic design*. Springer Science and Business Media, 2011.

- [194] V. Raghunathan, M. B. Srivastava, and R. K. Gupta, "A survey of techniques for energy efficient on-chip communication," in *Proceedings of the 40<sup>th</sup> annual Design Automation Conference*, pp. 900–905, ACM, 2003.
- [195] A. Ejlali, B. M. Al-Hashimi, P. Rosinger, S. G. Miremadi, and L. Benini, "Performability/energy tradeoff in error-control schemes for on-chip networks," *IEEE transactions on very large scale integration (VLSI) systems*, vol. 18, no. 1, pp. 1–14, 2010.
- [196] Q.Yu and P.Ampadu, "Adaptive error control for nanometer scale network onchip links," *IET computers and digital techniques*, vol. 3, no. 6, pp. 643–659, 2009.
- [197] B. Fu and P. Ampadu, "Error control combining hamming and product codes for energy efficient nanoscale on-chip interconnects," *IET computers and digital techniques*, vol. 4, no. 3, pp. 251–261, 2010.
- [198] P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, "Performance evaluation and design trade-offs for network-on-chip interconnect architectures," *IEEE transactions on Computers*, vol. 54, no. 8, pp. 1025–1040, 2005.
- [199] NanGate, "Nangate 45nm open cell library," 2016.
- [200] D. U. Becker, "Efficient microarchitecture for network-on-chip routers," 2012.
- [201] S. Bhat, "Energy models for network-on-chip components," *Master of Science, Department of Mathematics and Computer Science, Technische Universiteit Eindhoven, Eindhoven*, 2005.
- [202] JEDEC, "Failure mechanisms and models for semiconductor devices," *JEDEC Publication*, vol. 122, 2010.
- [203] X. Huang, T. Yu, V. Sukharev, and S. X.-D. Tan, "Physics-based electromigration assessment for power grid networks," in *Proceedings of the* 51st Annual Design Automation Conference, pp. 1–6, ACM, 2014.
- [204] D.-a. Li, Z. Guan, M. Marek-Sadowska, and S. R. Nassif, "Multi-via electromigration lifetime model," *lateral*, vol. 3, p. 1, 2012.
- [205] M. Fawaz, "Electromigration reliability analysis of power delivery networks in integrated circuits," 2013.
- [206] R. A. Johnson, I. Miller, and J. Freund, "Probability and statistics for engineers," *Miller and Freund's*, pp. 546–554, 2000.
- [207] C. Bienia and K. Li, Benchmarking modern multiprocessors. Princeton University New York, 2011.
- [208] K. J. Kuhn, "Reducing variation in advanced logic technologies: Approaches to process and design for manufacturability of nanoscale cmos," in *IEEE International Electron Devices Meeting*, 2007. *IEDM* 2007., pp. 471–474, IEEE, 2007.