Hyper Threading Seminar Report
Hyper Threading Seminar Report
                                                      INTRODUCTION
                         Hyper-Threading (HT) Technology is ground breaking technology from
                               Intel that allows processors to work more efficiently. This new
                         technology enables the processor to execute two series, or threads, of
                            instructions at the same time, thereby improving performance and
                         system responsiveness while delivering performance headroom for the
                                                            future.
                          Intel Hyper-Threading Technology improves the utilization of onboard
                             resources so that a second thread can be processed in the same
                               processor. Hyper-Threading Technology provides two logical
                                          processors in a single processor package.
                   MULTI-THREADED APPLICATIONS
   Virtually all contemporary operating systems (including Microsoft
  Windows* and Linux*) divide their workload up into processes and
  threads that can be independently scheduled and dispatched. The
  same division of workload can be found in many high-performance
    applications such as database engines, scientific computation
 programs, engineering-workstation tools, and multi-media programs.
  By default, each chat room has 20 users. A total of 10 chat rooms will
 have 20x10 = 200 users. For each user in the chat room, the client will
  make a connection to the server. So since we have 200 users, we will
 have 200 connections to the server. Now, for each user (or connection)
   in the chat room, a "send" thread and a "receive" thread are created.
 Thus, a 10-chat-room scenario will create 10x20x2 = 400 client threads
   and 400 server threads, for a total of 800 threads. But there's more.
Each client "send" thread will send the specified number of messages to
  the server. For 10 chat rooms and 100 messages, the client will send
10x20x100 = 20,000 messages. The server "receive" thread will receive
the corresponding number of messages. The chat room server will echo
  each of the messages back to the other users in the chat room. Thus,
for 10 chat rooms and 100 messages, the server "send" thread will send
   10x20x100x19 or 380,000 messages. The client "receive" thread will
              receive the corresponding number of messages.
 Effects of Hyper-Threading on Linux multithreaded file server workload
    The effect of Hyper-Threading on the file server was measured with
    dbench and its companion test, tbench. dbench is similar to the well
    known NetBench benchmark from the Ziff-Davis Media benchmark
    program, which lets you measure the performance of file servers as
       they handle network file requests from clients. However, while
NetBench requires an elaborate setup of actual physical clients, dbench
  simulates the 90,000 operations typically run by a NetBench client by
  sniffing a 4 MB file called client.txt to produce the same workload. The
    contents of this file are file operation directives such as SMBopenx,
 SMBclose, SMBwritebraw, SMBgetatr, etc. Those I/O calls correspond
  to the Server Message Protocol Block (SMB) that the SMBD server in
 SAMBA would produce in a netbench run. The SMB protocol is used by
    Microsoft Windows 3.11, NT and 95/98 to share disks and printers.
In our tests, a total of 18 different types of I/O calls were used including
    open file, read, write, lock, unlock, get file attribute, set file attribute,
close, get disk free space, get file time, set file time, find open, find next,
 find close, rename file, delete file, create new file, and flush file buffer.
   dbench can simulate any number of clients without going through the
expense of a physical setup. dbench produces only the filesystem load,
   and it does no networking calls. During a run, each client records the
 number of bytes of data moved and divides this number by the amount
 of time required to move the data. All client throughput scores are then
added up to determine the overall throughput for the server. The overall
 I/O throughput score represents the number of megabytes per second
     transferred during the test. This is a measurement of how well the
                 server can handle file requests from clients.
   dbench is a good test for Hyper-Threading because it creates a high
  load and activity on the CPU and I/O schedulers. The ability of Hyper-
    Threading to support multithreaded file serving is severely tested by
  dbench because many files are created and accessed simultaneously
   by the clients. Each client has to create about 21 megabytes worth of
   test data files. For a test run with 20 clients, about 420 megabytes of
   data are expected. dbench is considered a good test to measure the
    performance of the elevator algorithm used in the Linux filesystem.
    dbench is used to test the working correctness of the algorithm, and
 whether the elevator is aggressive enough. It is also an interesting test
                            for page replacement.
                                      tbench
     tbench is another file server workload similar to dbench. However,
tbench produces only the TCP and process load. tbench does the same
   socket calls that SMBD would do under a netbench load, but tbench
 does no filesystem calls. The idea behind tbench is to eliminate SMBD
 from the netbench test, as though the SMBD code could be made fast.
  The throughput results of tbench tell us how fast a netbench run could
      go if we eliminated all filesystem I/O and SMB packet processing.
                tbench is built as part of the dbench package.
                Hyper-Threading support in Linux kernel 2.5.x
  Linux kernel 2.4.x was made aware of HT since the release of 2.4.17.
    The kernel 2.4.17 knows about the logical processor, and it treats a
  Hyper-Threaded processor as two physical processors. However, the
 scheduler used in the stock kernel 2.4.x is still considered naive for not
 being able to distinguish the resource contention problem between two
        logical processors versus two separate physical processors.
Consider a system with two physical CPUs, each of which provides two
 virtual processors. If there are two tasks running, the current scheduler
would let them both run on a single physical processor, even though far
better performance would result from migrating one process to the other
 physical CPU. The scheduler also doesn't understand that migrating a
  process from one virtual processor to its sibling (a logical CPU on the
same physical CPU) is cheaper (due to cache loading) than migrating it
                         across physical processors.
                      HT-aware passive load-balancing:
 The IRQ-driven balancing has to be per-physical-CPU, not per-logical-
 CPU. Otherwise, it might happen that one physical CPU runs two tasks
 while another physical CPU runs no task; the stock scheduler does not
recognize this condition as "imbalance." To the scheduler, it appears as
 if the first two CPUs have 1-1 task running while the second two CPUs
  have 0-0 tasks running. The stock scheduler does not realize that the
             two logical CPUs belong to the same physical CPU.
                           "Active" load-balancing:
      This is when a logical CPU goes idle and causes a physical CPU
 imbalance. This is a mechanism that simply does not exist in the stock
1:1 scheduler. The imbalance caused by an idle CPU can be solved via
    the normal load-balancer. In the case of HT, the situation is special
   because the source physical CPU might have just two tasks running,
 both runnable. This is a situation that the stock load-balancer is unable
      to handle, because running tasks are hard to migrate away. This
migration is essential -- otherwise a physical CPU can get stuck running
               two tasks while another physical CPU stays idle.
                             HT-aware task pickup:
    When the scheduler picks a new task, it should prefer all tasks that
  share the same physical CPU before trying to pull in tasks from other
CPUs. The stock scheduler only picks tasks that were scheduled to that
                             particular logical CPU.
                                 HT-aware affinity:
    Tasks should attempt to "stick" to physical CPUs, not logical CPUs.
                                HT-aware wakeup:
   The stock scheduler only knows about the "current" CPU, it does not
know about any sibling. On HT, if a thread is woken up on a logical CPU
   that is already executing a task, and if a sibling CPU is idle, then the
  sibling CPU has to be woken up and has to execute the newly woken-
                              up task immediately.
                EACH PROGRAM HAS A MIND OF ITS OWN
 The OS and system hardware not only cooperate to fool the user about
     the true mechanics of multi-tasking, but they cooperate to fool each
   running program as well. While the user thinks that all of the currently
    running programs are being executed simultaneously, each of those
programs thinks that it has a monopoly on the CPU and memory. As far
as a running program is concerned, it's the only program loaded in RAM
and the only program executing on the CPU. The program believes that
 it has complete use of the machine's entire memory address space and
    that the CPU is executing it continuously and without interruption. Of
course, none of this is true. The program actually shares RAM with all of
     the other currently running programs, and it has to wait its turn for a
  slice of CPU time in order to execute, just like all of the other programs
                                 on the system.
                   A few terms: process, context, and thread
       Before continuing our discussion of multiprocessing, let's take a
      moment to unpack the term "program" a bit more. In most modern
   operating systems, what users normally call a program would be more
       technically termed a process. Associated with each process is a
   context, "context" being just a catch-all term that encompasses all the
     information that completely describes the process's current state of
 execution (e.g. the contents of the CPU registers, the program counter,
                                the flags, etc.).
     Processes are made up of threads, and each process consists of at
least one thread: the main thread of execution. Processes can be made
up of multiple threads, and each of these threads can have its own local
   context in addition to the process's context, which is shared by all the
       threads in a process. In reality, a thread is just a specific type of
     stripped-down process, a "lightweight process," and because of this
       throughout the rest of this article I'll use the terms "process" and
                     "thread" pretty much interchangeably.
     Even though threads are bundled together into processes, they still
      have a certain amount of independence. This independence, when
      combined with their lightweight nature, gives them both speed and
flexibility. In an SMP system like the ones we'll discuss in a moment, not
    only can different processes run on different processors, but different
  threads from the same process can run on different processors. This is
    why applications that make use of multiple threads see performance
         gains on SMP systems that single-threaded applications don't
                  IMPLEMENTING HYPER-THREADING
Although hyper-threading might seem like a pretty large departure from
  the kind of conventional, process-switching multithreading done on a
single-threaded CPU, it actually doesn't add too much complexity to the
     hardware. Intel reports that adding hyper-threading to their Xeon
                 processor added only %5 to its die area.
 Intel's Xeon is capable of executing at most two threads in parallel on
   two logical processors. In order to present two logical processors to
       both the OS and the user, the Xeon must be able to maintain
   information for two distinct and independent thread contexts. This is
  done by dividing up the processor's microarchitectural resources into
              three types: replicated, partitioned, and shared.
                                   In the Chips
    The first Intel chips to take advantage of hyperthreading were Xeon
 server processors. But in November 2002, Intel brought hyperthreading
  to the desktop with its 3.06 GHz Pentium 4. "We will be providing this
   technology in additional SKUs over time," Alfs told NewsFactor. "We
  intend to have hyperthreading in a majority of our desktop Pentium 4
processors." chief research officer Peter Kastner said he expected such
         a move from the company. "Intel has hinted that it will push
      hyperthreading technology throughout its Pentium line, making it
         available to most PC buyers, not just at the top end," he told
                                  NewsFactor.
                                Software Support
       Of course, microprocessor improvements mean nothing without
software that can take advantage of them. For hyperthreading, software
          support is in the early stages. "Buying the Pentium 4 with
hyperthreading will be an increasingly smart decision over the life of the
 desktop," Kastner said. "While many applications are not optimized for
      hyperthreading today, we expect that as new releases come out,
 hyperthreading will become a standard feature." For software to benefit
from hyperthreading, the program must support multithreaded execution
    -- that is, it must allow two distinct tasks to be executed at the same
           time, vice president Steve Kleynhans told NewsFactor.
                                 Two Paths
      There are two ways to achieve this goal. The first is to write an
application that is specifically designed to be multithreaded. The second
  is to run two independent applications at the same time. "People are
     running multiple, mixed loads of applications on their desktops,"
Kleynhans said. "Many of those are background tasks." Both Home and
Professional Editions support hyperthreading out of the box. Numerous
    other multithreaded applications also can get a boost from Intel's
 hyperthreading feature, particularly content creation applications, such
              as Photoshop, and video and audio encoding
         INTEL INNOVATION COULD DOUBLE CHIP POWER
    Intel showed off a new chip technology will allow one chip to act like
                                        two.
        Called "hyperthreading," the new technology essentially takes
    advantage of formerly unused circuitry on the Pentium 4 that lets the
chip operate far more efficiently--and almost as well as a dual-processor
        computer. With it, a desktop can run two different applications
simultaneously or run a single application much faster than it would on a
                           standard one-processor box.
   "It makes a single processor look like two processors to the operating
       system," said Shannon Poulin, enterprise launch and disclosure
    manager at Intel. "It effectively looks like two processors on a chip."
        Paul Otellini, general manager of the Intel Architecture Group,
    demonstrated the hyperthreading technology at the Intel Developer's
     Forum. They showed off a 3.5GHz Pentium 4 running the computer
          game "Quake 3" and managing four different video streams
 simultaneously. The Pentium 4 demonstration didn't depend on Hyper-
     Threading; instead, it came out as part of Intel's effort to show how
   consumers and software developers will continue to need faster PCs.
     "There are a lot of tremendous applications on the horizon that will
   consume the MIPS (millions of instructions per second)," Otellini said.
       "Gigahertz are necessary for the evolution and improvement of
                                     computing."
    Technically, hyperthreading takes advantage of additional registers--
     circuits that help manage data inside a chip--that come on existing
Pentium 4's but aren't used. Through these registers, the processor can
       handle more tasks at once by taking better advantage of its own
   resources. The chip can direct instructions from one application on its
floating-point unit, which is where the heavy math is done, and run parts
            of another application through its integer unit. A chip with
hyperthreading won't equal the computing power of two Pentium 4's, but
    the performance boost is substantial, Poulin said. A workstation with
       hyperthreaded Xeon chips running Alias-Wavefront, a graphics
   application, has achieved a 30 percent improvement in tests, he said.
 Servers with hyperthreaded chips can manage 30 percent more users.
                           Will developers climb aboard?
    The open question is whether software developers will latch onto the
  idea. Software applications will need to be rewritten to take advantage
   of hyperthreading, and getting developers to tweak their products can
 take an enormous amount of time. Intel, for instance, has been working
  for well over a year to get developers to rewrite their programs to take
  full advantage of the features of the Pentium 4, which has been out for
  approximately nine months. The company even changed the migration
 program to speed the process of optimizing Pentium III applications for
                                   the Pentium 4.
      Still, to date, only 30 applications have been enhanced to take full
   advantage of the Pentium 4, according to Louis Burns, vice president
and general manager of the Desktop Platforms Group at Intel. But more
        are on the way, he said. Otellini acknowledged that recruiting
                             developers will take time.
   "The real key is going to be to get the applications threaded, and that
   takes a lot of work," he said. Nonetheless, adopting the technology to
       server and workstations applications should be fairly easy if the
 application already runs on dual-processor systems, other Intel officials
said. "Thread your applications and drivers and OSes to take advantage
 of this relatively free performance," Otellini asked developers during his
                                     speech.
  Hyperthreading, which will appear in servers and workstations in 2002
    and desktops in 2003, is part of an overall Intel strategy to find new
      ways to squeeze more performance out of silicon. For years, the
  company has largely relied on boosting the clock speed and tweaking
 parts of the chip's architecture to eke out gains. The performance gains
  to be achieved from boosting the clock speed, however, are limited. In
        all practicality, most users won't experience that much realistic
   difference between a 1GHz computer and one that contains a 2GHz
      chip, according to, among others, Dean McCarron, an analyst at
                                Mercury Research.
Ideally, hyperthreading, which has been under development for four and
 a half years, will show meatier benefits. An individual could play games
while simultaneously downloading multimedia files from the Internet with
 a computer containing the technology, Poulin predicted. Hyperthreaded
 chips would also be cheaper than dual-processor computers. "You only
 need one heat sink, one fan, one cooling solution," he said, along with,
of course, one chip. Chips running hyperthreading have been produced,
 and both Microsoft's Windows XP and Linux can take advantage of the
       technology, according to Poulin. Computers containing a single
   hyperthreaded chip differ from dual-processor computers in that two
applications can't take advantage of the same processor substructure at
the same time. "Only one gets to use the floating point at a single time,"
                                   Poulin said.
       On other fronts, Intel on Tuesday also unveiled Machine Check
 Architecture, which allows servers to catch data errors more efficiently.
The company will also demonstrate McKinley for the first time. McKinley
 is the code name for the next version of Itanium, Intel's 64-bit chip that
  competes against Sun's UltraSparc. McKinley is due in demonstration
                          systems by the end of this year.
                                  CONCLUSION
    Intel Xeon Hyper-Threading is definitely having a positive impact on
 Linux kernel and multithreaded applications. The speed-up from Hyper-
   Threading could be as high as 30% in stock kernel 2.4.19, to 51% in
     kernel 2.5.32 due to drastic changes in the scheduler run queue's
 support and Hyper-Threading awareness. Today with Hyper-Threading
Technology, processor-level threading can be utilized which offers more
       efficient use of processor resources for greater parallelism and
          improved performance on today's multi-threaded software.
                              REFERENCES
                        • http://www.google.com
                          • http://www.intel.com
                     • http://www.arstechnica.com
                                CONTENTS
                           1. INTRODUCTION
          2. UTILIZATION OF PROCESSOR RESOURCES
3. HYPER-THREADING TECHNOLOGY IMPROVES PERFORMANCE
                4. MULTI-THREADED APPLICATIONS
      5. .MULTIPROCESSOR PERFORMANCE ON A SINGLE
                              PROCESSOR
               6. HYPER-THREADING SPEEDS LINUX
           7. EACH PROGRAM HAS A MIND OF ITS OWN
               8. IMPLEMENTING HYPER-THREADING
                 9. WORKING OF HYPERTHREADING
                              10. WHAT HYPERTHREADING CAN (AND CAN'T) DO FOR YOU
                                  11. INTEL INNOVATION COULD DOUBLE CHIP POWER
                                                  12. CONCLUTION
                                                ACKNOWLEDGEMENTS
Reference: http://www.seminarprojects.com/Thread-hyper-threading-seminar-report#ixzz0mmKIB2jQ