Fast Software Thermal Sensing and Control for
Efficient Dynamic Thermal Management


Jun Yang

PhD Students

Wei Wu, Lingling Jin

External Collaborators

Sheldon X.-D. Tan (UCR), Jie Chen (UCR)


The evolution of microprocessors has been hindered by their increasing power consumption and the speed heat is generated on-die. High temperature impairs the processor reliability and reduces its lifetime. While heat can be removed by the cooling package, designing such package for the worst-case temperature is not cost-effective as the worst-cases are rare but the cost increases super-linearly. Therefore, a more reasonable solution is to use a less expensive package with dynamic thermal managements (DTM) that throttles the processor performance when the cooling is inadequate.

To ensure an effective control of the chip temperature, it is imperative to be able to monitor the temperature variations across the die timely and accurately. Most current techniques rely on on-chip thermal sensors, typically one or two, to report the temperature of the processor. Unfortunately, the significant variation in chip temperature both spatially and temporally exposes the limitation of the sensors since hot spots migrate with workloads. We present an alternative approach to tracking chip temperature through an OS resident software module that generates live power and thermal spectral distributions of the processor. We developed such a software thermal sensor (STS) with low overhead in a Linux system with a Pentium 4 Northwood core. The software thermal sensor offers detailed power and temperature breakdowns of each functional unit at runtime.

We also develope a thermal-aware job scheduling mechanism for reducing the performance loss due to the thermal pressure. Our methods leverage the natural discrepancies in thermal behavior among different workloads, and schedule them to keep the chip temperature within the cooling limit so as to minimize the amount of throttling. Our Linux kernel implementation of the entire framework shows noticeable performance improvements over a traditional thermal-oblivious job scheduling method while retaining its requirements for real-time and interactive jobs.