Hyper Threading
Concepts & Architecture
Hyperthreading – Concepts & Architecture
Agenda
    Technical Journey to Hyperthreading
    Hardware & Software Requirements
    Performance Issues
Hyperthreading
Timeline
           Now, Parallel Computing is available on Single Processor
                                Symmetric
         Cluster                Multi                  Processor
                                Processing
                                (SMP)
Hyperthreading
Parallel Computing - Goals
Parallel computing is when a program
 uses concurrency to either:
    decrease the runtime for the solution to a
     problem.
    Increase the size of the problem that can
     be solved.
Hyperthreading
         Single-threaded Processor
      Parts of Processor:
       Front-end:
        fetching/decoding/reordering
       Execution core:
                            Concurrency
        actual execution      Illusion
   Multiple programs in memory
   Only one executes at a time
      4-issue CPU with bubbles
      7-unit CPU with pipeline bubbles
   Time-slicing via context switching
Hyperthreading
Single-threaded SMP
                                               What is SMP?
                                                  “Symmetric Multi-
                                                   Processors”
                                                  Tolerably, mislabeled as
                                                   “Shared-Memory
                                                   Processors”
                                                  Processors all connected
                                                   to a (large) memory
                                                  UMA: Uniform Memory
                                                   Access, makes is easy to
                                                   program
                                                  Symmetric: all memory is
    Two threads execute at once, so threads       equally close to all
                                                   processors
     spend less time waiting
                                                  Cache Coherence via
    Twice as much speed and twice as much         “snoopy caches”
     waste
    Hyperthreading
                         Super-threading
                     [Time-Slice Multithreading]
    Principle: the processor can execute
     more than one thread at a time
    Requires more hardware cleverness
       logic switches at each cycle
    Leads to less Waste
       Just a finer grain of interleaving
    BUT, each stage of the front end or the
     execution core only runs instructions
     from ONE thread!
    Does not help with poor instruction
     parallelism within one thread
    Hyperthreading
    Simultaneous Multi Threading (SMT)
    Principle: the processor can execute more
     than one thread at a time, even within a
     single clock cycle!!
    Requires even more hardware cleverness
           logic switches within each cycle
    Finest level of interleaving
    From the OS perspective, there are two
     “logical” processors
    Hyperthreading
Evolution of Hyper-Threading
    Two ways of faster computing
           Increase Clock Speed
           Better utilization of resources
    Clock Speed cannot be increased beyond certain limit
           Lot of heat generation
    Better utilization of resources is now the choice
           Memory access takes relatively more time
           During this interval, CPU resources can be used by other threads
           This requires – Out-Of-Order Execution, Register Re-naming,...
Hyperthreading
Hyper Threading
With these points in mind, Intel came up
with its version of Simultaneous Multi
Threading (SMT) called Hyper Threading
(HT)
Hyperthreading
    Hardware Requirements
    Because the additional threads all run on the same CPU elements
     (FPU, ALU) the only additions that are needed are the initial
     scheduling process.
    Although hyper-threading might seem like a pretty large departure
     from the kind of conventional, process-switching multithreading done
     on a single-threaded CPU, it actually doesn't add too much
     complexity to the hardware.
    Intel reports that adding hyper-threading to their Xeon processor
     added only 5% to its die area.
    Hyperthreading
Intel Xeon – Case Study
    Capable of executing at most two threads in parallel on two logical
     processors.
    Must be able to maintain information for two distinct and independent
     thread contexts.
    Done by dividing up the processor's micro-architectural resources into
     three types:
                    replicated
                    partitioned
                    shared
Hyperthreading
Intel Xeon – Resources Division
                 •   Register renaming logic
                     Instruction Pointer
Replicated
                 •
                 •   ITLB
                 •   Return stack predictor
                 •   Various other architectural registers
                     Re-order buffers (ROBs)
Partitioned
                 •
                 •   Load/Store buffers
                 •   Various queues, like the scheduling queues, uop queue, etc.
                     Caches: trace cache, L1, L2, L3
Shared
                 •
                 •   Micro-architectural registers
                 •   Execution Units
Hyperthreading
    Replicated Resources
      Some resources have to be replicated like
            Instruction Pointer
                    1 Instruction Pointer for each Logical Processor.
                     Xeon: 2 Instruction Pointer
            Register Allocation Table
                    For mapping architectural registers (8 integers and 8 floating-point) onto
                     128 General Purpose Registers and 128 Floating Point Registers
                    Replicated Resource managing a Shared Resource
    Hyperthreading
Partitioned Resources
    Queues are partitioned resources
    Statically Partitioned Queue   Dynamically Partitioned Queue
Hyperthreading
Shared Resources
    Heart of Hyperthreading:
     More Shared Resources => More Efficient Hyperthreading <=
     squeezing maximum amount of computing power out of the
     minimum amount of die space
    Such resources are: registers, load/store units
    SMT unaware
Hyperthreading
Hyper-Threading Architecture
Overview
Hyperthreading
Hyperthreading
Confusing Notions
Is Hyper-threaded Processor same as Dual
Core?
Answer: NO
    Hyper-Threaded = 2 Logical Processors
    Dual-Core = 2 Actual Processors on single chip
Hyperthreading
HT – System Requirements
    HT enabled Processor
           Pentium 4 3.06 GHz, Xeon
    HT enabled Chipsets
           Intel 945G Express
    HT enabled System BIOS
    HT enabled Operating System
           Windows 2000, XP, Linux 2.4.12
Hyperthreading
HT – Requirements from User
    Enable HT in BIOS
    To utilize HT
           Use multi-threaded applications
                         OR
           Run multiple applications at same time
Hyperthreading
Performance Issues - 1
    2 Logical Processors != Double Power
    Lesser CPU intensive programs may not
     show much any gain
    Reported gains are 20-40%
Hyperthreading
Performance Issues - 2
Death-Traps
    Main cause: Shared Resource
    Xeon Philosophy <->Cooperative Multitasking OS
    Cases:
           Floating Point Unit (FPU):
                    One floating-point intensive thread takes up the FPU; Another similar thread
                     contending for same FPU gets stalled
           Cache
                    No cache-coherency problem as in SMP
                    But, cache conflict between two logical processors
                    Worst-Case: Two threads accessing different parts of memory and sharing no data =>
                     Lot of thrashing
    Benchmarks Results: Non-SMT may perform better
    With the wrong mix of code, hyper-threading decreases performance
Hyperthreading
HT Hardware Hands-On
    Need to Enable Hyperthreading through BIOS
    Simple Test:
       Do together with and without HT
          Compress 1GB File
          Play Windows Media Player with Visualization plug-in
       Analyze the time taken in 2 cases
    Good Benchmark: Embarrassing Parallel (EP) from NASA
Hyperthreading
4 Processors View in Task Manager
Hyperthreading
Key Point
   Hyper-Threading Technology gives better
    utilization of processor resources
   Hyper-Threading Technology gives more
    computing power for multithreaded
    applications
   Thread Level Parallelism on single
    processor
Hyperthreading
References
    "Hyper-Threading Technology." Intel.
    Deborah T. Marr, Frank Binns, David L. Hill, Glenn Hinton, David A.
     Koufaty, J. Alan Miller, Michael Upton. "Hyper-Threading
     Technology Architecture and Microarchitecture." Intel
    Susan Eggers, Hank Levy, Steve Gribble. Simultaneous
     Multithreading Project. University of Washington
    Susan Eggers, Joel Emer, Henry Levy, Jack Lo, Rebecca Stamm,
     and Dean Tullsen. "Simultaneous Multithreading: A Platform for
     Next-generation Processors." IEEE Micro, September/October
     1997, pages 12-18.
    Jack Lo, Susan Eggers, Joel Emer, Henry Levy, Rebecca Stamm,
     and Dean Tullsen. "Converting Thread-Level Parallelism Into
     Instruction-Level Parallelism via Simultaneous Multithreading."
     ACM Transactions on Computer Systems, August 1997, pages
     322-354.
Hyperthreading
                         Thank You
                         E-mail: zainvi.sf@gmail.com
                 You have downloaded this presentation from:
                      http://www.zainvi.tophonors.com
Hyperthreading