2010 43rd Annual IEEE/ACM
International Symposium on
Microarchitecture
(MICRO 2010)
Atlanta, Georgia, USA
4 – 8 December 2010
IEEE Catalog Number:
ISBN:
CFP10071-PRT
978-1-4244-9071-4
2010 43rd Annual IEEE/ACM
International Symposium
on Microarchitecture
MICRO 2010
Table of Contents
Message from the General Co-Chairs.....................................................................................................x
Message from the Program Chair.............................................................................................................xi
Organizing Committee....................................................................................................................................xii
Program Committee........................................................................................................................................xiii
MICRO 2010 Reviewers.................................................................................................................................xiv
Session 1: Transactional Systems
Scalable Speculative Parallelization on Commodity Clusters ........................................................................................3
Hanjun Kim, Arun Raman, Feng Liu, Jae W. Lee, and David I. August
Hardware Support for Relaxed Concurrency Control in Transactional Memory .........................................................15
Utku Aydonat and Tarek S. Abdelrahman
A Dynamically Adaptable Hardware Transactional Memory ......................................................................................27
Marc Lupon, Grigorios Magklis, and Antonio González
ASF: AMD64 Extension for Lock-Free Data Structures and Transactional Memory .................................................39
Jaewoong Chung, Luke Yen, Stephan Diestelhorst, Martin Pohlack, Michael Hohmuth,
David Christie, and Dan Grossman
Session 2A: Scheduling
Memory Latency Reduction via Thread Throttling ......................................................................................................53
Hsiang-Yun Cheng, Chung-Hsiang Lin, Jian Li, and Chia-Lin Yang
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access
Behavior ........................................................................................................................................................................65
Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter
v
Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production
Processors via Software-Guided Thread Scheduling ...................................................................................................77
Vijay Janapa Reddi, Svilen Kanev, Wonyoung Kim, Simone Campanoni,
Michael D. Smith, Gu-Yeon Wei, and David Brooks
Task Superscalar: An Out-of-Order Task Pipeline .......................................................................................................89
Yoav Etsion, Felipe Cabarcas, Alejandro Rico, Alex Ramirez, Rosa M. Badia,
Eduard Ayguade, Jesus Labarta, and Mateo Valero
Session 2B: Reliability/Scheduling
Combating Aging with the Colt Duty Cycle Equalizer ..............................................................................................103
Erika Gunadi, Abhisek A. Sinkar, Nam Sung Kim, and Mikko H. Lipasti
SAFER: Stuck-At-Fault Error Recovery for Memories .............................................................................................115
Nak Hee Seong, Dong Hyuk Woo, Vijayalakshmi Srinivasan, Jude A. Rivers,
and Hsien-Hsin S. Lee
AVF Stressmark: Towards an Automated Methodology for Bounding the Worst-Case
Vulnerability to Soft Errors ........................................................................................................................................125
Arun Arvind Nair, Lizy Kurian John, and Lieven Eeckhout
Flexible and Efficient Instruction-Grained Run-Time Monitoring Using On-Chip
Reconfigurable Fabric ................................................................................................................................................137
Daniel Y. Deng, Daniel Lo, Greg Malysa, Skyler Schneider, and G. Edward Suh
Session 3A: Caching
Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal
Locality Aware (TLA) Cache Management Policies .................................................................................................151
Aamer Jaleel, Eric Borch, Malini Bhandaru, Simon C. Steely Jr., and Joel Emer
STEM: Spatiotemporal Management of Capacity for Intra-core Last Level Caches .................................................163
Dongyuan Zhan, Hong Jiang, and Sharad C. Seth
Sampling Dead Block Prediction for Last-Level Caches ...........................................................................................175
Samira Manabi Khan, Yingying Tian, and Daniel A. Jiménez
The ZCache: Decoupling Ways and Associativity .....................................................................................................187
Daniel Sanchez and Christos Kozyrakis
Session 3B: Data Parallelism
Efficient Selection of Vector Instructions Using Dynamic Programming.................................................................201
Rajkishore Barik, Jisheng Zhao, and Vivek Sarkar
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications .................................................................213
Jaekyu Lee, Nagesh B. Lakshminarayana, Hyesoon Kim, and Richard Vuduc
vi
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic,
FPGAs, and GPGPUs? ...............................................................................................................................................225
Eric S. Chung, Peter A. Milder, James C. Hoe, and Ken Mai
Improving SIMT Efficiency of Global Rendering Algorithms with Architectural
Support for Dynamic Micro-Kernels ..........................................................................................................................237
Michael Steffen and Joseph Zambreno
Session 4A: Concurrency
InstantCheck: Checking the Determinism of Parallel Programs Using On-the-Fly
Incremental Hashing ...................................................................................................................................................251
Adrian Nistor, Darko Marinov, and Josep Torrellas
Tolerating Concurrency Bugs Using Transactions as Lifeguards ..............................................................................263
Jie Yu and Satish Narayanasamy
Architectural Support for Fair Reader-Writer Locking ..............................................................................................275
Enrique Vallejo, Ramón Beivide, Adrián Cristal, Tim Harris, Fernando Vallejo,
Osman Unsal, and Mateo Valero
AtomTracker: A Comprehensive Approach to Atomic Region Inference and Violation
Detection .....................................................................................................................................................................287
Abdullah Muzahid, Norimasa Otsuki, and Josep Torrellas
Session 4B: Microarchitecture I
Register Cache System Not for Latency Reduction Purpose .....................................................................................301
Ryota Shioya, Kazuo Horio, Masahiro Goshima, and Shuichi Sakai
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors ...........................................313
Shekhar Srikantaiah and Mahmut Kandemir
Erasing Core Boundaries for Robust and Configurable Performance ........................................................................325
Shantanu Gupta, Shuguang Feng, Amin Ansari, and Scott Mahlke
Minimal Multi-threading: Finding and Removing Redundant Instructions
in Multi-threaded Processors ......................................................................................................................................337
Guoping Long, Diana Franklin, Susmit Biswas, Pablo Ortiz, Jason Oberg,
Dongrui Fan, and Frederic T. Chong
Session 5A: Memories
Parichute: Generalized Turbocode-Based Error Correction for Near-Threshold Caches ..........................................351
Timothy N. Miller, Renji Thomas, James Dinan, Bruce Adcock, and Radu Teodorescu
Understanding the Energy Consumption of Dynamic Random Access Memories ....................................................363
Thomas Vogelsang
Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory ...............................................375
Jeffrey Stuecheli, Dimitris Kaseridis, Hillery C.Hunter, and Lizy K. John
vii
Moneta: A High-Performance Storage Array Architecture for Next-Generation,
Non-volatile Memories ...............................................................................................................................................385
Adrian M. Caulfield, Arup De, Joel Coburn, Todor I. Mollow, Rajesh K. Gupta,
and Steven Swanson
Session 5B: NoCs
Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks ............................................399
Minseon Ahn and Eun Jung Kim
LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support ............................................409
Jin Ouyang and Yuan Xie
Throughput-Effective On-Chip Networks for Manycore Accelerators ......................................................................421
Ali Bakhoda, John Kim, and Tor M. Aamodt
Adaptive Flow Control for Robust Performance and Energy ....................................................................................433
Syed Ali Raza Jafri, Yu-Ju Hong, Mithuna Thottethodi, and T.N. Vijaykumar
Session 6A: Coherence
ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment ..............................................447
Xuehai Qian, Wonsun Ahn, and Josep Torrellas
Virtual Snooping: Filtering Snoops in Virtualized Multi-cores .................................................................................459
Daehoon Kim, Hwanju Kim, and Jaehyuk Huh
Fractal Coherence: Scalably Verifiable Cache Coherence .........................................................................................471
Meng Zhang, Alvin R. Lebeck, and Daniel J. Sorin
Session 6B: Microarchitecture II
A Predictive Model for Dynamic Microarchitectural Adaptivity Control .................................................................485
Christophe Dubach, Timothy M. Jones, Edwin V. Bonilla, and Michael F.P. O’Boyle
ReMAP: A Reconfigurable Heterogeneous Multicore Architecture ..........................................................................497
Matthew A. Watkins and David H. Albonesi
Probabilistic Distance-Based Arbitration: Providing Equality of Service
for Many-Core CMPs .................................................................................................................................................509
Michael M. Lee, John Kim, Dennis Abts, Michael Marty, and Jae W. Lee
Session 7: Tools
Adaptive and Speculative Slack Simulations of CMPs on CMPs ..............................................................................523
Jainwei Chen, Lakshmi Kumar Dabbiru, Daniel Wong, Murali Annavaram,
and Michel Dubois
SD3: A Scalable Approach to Dynamic Data-Dependence Profiling ........................................................................535
Minjang Kim, Hyesoon Kim, and Chi-Keung Luk
viii
Automatic Parallelization in a Binary Rewriter ..........................................................................................................547
Aparna Kotha, Kapil Anand, Matthew Smithson, Greeshma Yellareddy, and Rajeev Barua
Author Index .......................................................................................................................................................559
ix