Citation
Pour, Somayeh Rahimi
(2018)
Reliability modeling of dynamic thermal management in multicore processor.
Doctoral thesis, Universiti Putra Malaysia.
Abstract
With the continuous downscaling in semiconductor technology, the growing power
density and thermal issues in multi-core processors are challenging and crucial. The
system reliability associated with increased power dissipation affect the reliability of
thermal management.
High temperatures and large thermal variations on the die create severe challenges in
system reliability, performance, leakage power, and cooling costs. Dynamic thermal
management (DTM) methods regulate the operating temperature based on the provided
temperature profile from thermal sensors, which is transmitted using network-on-chip
(NoC) in multi-core systems. DTM efficiency is highly dependent on the accuracy of
thermal data.
Temperature profile inaccuracies are caused by various factors including sensor
placement, sensor device imprecision, and interconnection deep sub-micron (DSM)
noise. While temperature profile inaccuracies due to sensor placement and sensor device
imprecision have been widely addressed, limited study performed on the impact of
interconnection DSM noise on DTM efficiency. Hence, this thesis develops a
comprehensive simulator model to investigate the impact of interconnect DSM noise on
thermal data accuracy and DTM efficiency. The simulation results demonstrate that
DSM noise severely affecting the MSbs of thermal data that leads to significant
degradation of DTM performance.
To mitigate the DSM noise impact on DTM efficiency, an NoC fault tolerance scheme,
exploiting inherent characteristics of DSM noise impacting the thermal data, is proposed
that comparing to the standard coding scheme achieves lower cost in term of area and
power consumption while increasing DTM efficiency by 38%.
The second source of chip reliability involves power delivery network (PDN). PDN
suffers from long-term reliability threats such as electro- migration (EM). Loss of limited
Controlled Collapse Chip Connection (C4) pads to electro-migration makes delivering a
stable supply voltage more critical. C4 bumps failure mechanism depends on current
density, on-chip voltage noise, and temperature. In this thesis, the C4 bumps failure
mechanisms dependency on each individual bumps' temperature value is explored that
leads to more accurate mean-time-to-failure (MTTF) of the whole system. The
simulation results demonstrate that using uniform temperature leads underestimating the
system MTTF by up to 16 times due to exponentially dependency of C4 bump failure to
temperature.
Download File
Additional Metadata
Item Type: |
Thesis
(Doctoral)
|
Subject: |
Electronic apparatus and appliances - Temperature control |
Subject: |
Heat - Transmission |
Subject: |
Microprocessors |
Call Number: |
FK 2020 114 |
Chairman Supervisor: |
Fakhrul Zaman Rokhani, PhD |
Divisions: |
Faculty of Engineering |
Depositing User: |
Ms. Rohana Alias
|
Date Deposited: |
25 Jul 2023 02:02 |
Last Modified: |
25 Jul 2023 02:02 |
URI: |
http://psasir.upm.edu.my/id/eprint/104250 |
Statistic Details: |
View Download Statistic |
Actions (login required)
|
View Item |