See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/228972595
A performance analysis for microprocessor architectures
Article · January 2007
CITATIONS READS
0 161
2 authors:
Nakhoon Baek Hwanyong Lee
Kyungpook National University Ajou University
140 PUBLICATIONS 292 CITATIONS 35 PUBLICATIONS 88 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
OpenGL ES implementation View project
All content following this page was uploaded by Nakhoon Baek on 22 May 2014.
The user has requested enhancement of the downloaded file.
Proceedings of the 6th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 15-17, 2007 436
A Performance Analysis for Microprocessor Architectures
Nakhoon Baek∗ Hwanyong Lee
School of EECS Solution Division
Kyungpook National University HUONE Inc.
Daegu 702-701 Daegu 702-205
Korea Korea
oceancru@gmail.com hylee@hu1.com
Abstract: In this paper, we selected three different CPU architectures for performance analysis: single-core, dual-
core and hyper-threading CPU’s. Four kinds of operations are executed on these architectures. After analyzing
all the data, we found that the single-core and dual-core act as usually expected: the execution times of combined
operations are very close to the sum of that of compounding operations. In contrast, the hyper-threading CPU
shows better performance when each thread performs specific operations, rather than mixed operations.
Key–Words: CPU architectures, performance analysis, multi-threading
1 Introduction 2 Background Works
In computer programming, a thread means a light pro-
Nowadays, the performance of microprocessors is ap- cess, which executes a given area of programming
proaching their physical limits. In the case of large- codes, with a dedicated stack area[4]. In contrast to
scale computers including super computers and main- usual processes, threads can share their memory each
frames, they already met this kind of technical lim- other, which can act as a strong point. Comparing
its in their CPU powers. Thus, they developed var- the conventional sequential programming paradigm
ious parallel processing techniques including multi- to the multi-threaded programming, one of the most
threading, super-threading, hyper-threading, and so strong point for the multi-threading is that multiple
on[1]. threads can be simultaneously executed in a paral-
In these days, microprocessors used in conven- lelized manner[5].
tional PC’s and even in high-end embedded systems Nowadays, multiple threads can be simultane-
have improved their ability to effectively support par- ously executed on many computer systems. On single
allel processing techniques. At this time, we already processor systems, the time sharing method is used to
have some commercial multi-core CPU’s including execute several threads, in turns. Through alternating
Intel Core2 Duo, Intel Core2 Quad, Intel Xeon, cus- the executing thread very frequently, this system can
tomized triple-core CPU’s for Xbox 360, etc[2, 3]. make the illusion of simultaneous executions. How-
It is clear that conventional programming models ever, in fact, the single processor systems only alter-
based on the sequential processing paradigm is not ex- nate multiple threads, rather than physically executing
actly suitable for these multi-core CPU’s. We need the threads in a parallelized way.
computer programs based on the parallel process- Multi-processor systems or multi-core processors
ing paradigm such as multi-processing and/or multi- are capable of physically executing multiple threads.
threading. In other words, multi-processing is now possible.
Thus, we can use the multi-threading in more wide
In this paper, we represent the experimental re- areas of programming, although they recommend it
sults on the execution time of some CPU-intensive only for I/O intensive works, in past.
operations for an amount of integer operations and/or
Multi-threading does not always make overall
floating-point operations, to finally analyze which
speed ups in all situations[6]. First of all, parallel
programming architecture is more suitable for newly
programs based on the multi-threading needs much
appeared CPU architectures.
more steps to start something useful, in comparison
with previous sequential programming techniques.
Thus, in worst case, the preparing and arranging times
∗
Corresponding Author. for the multi-threading requires somewhat significant
Proceedings of the 6th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 15-17, 2007 437
Hyper-Threading Architecture Dual Core Architecture
On-Die On-Die
Cache Cache
Architectural Architectural Architectural Architectural
State State State State
APIC APIC APIC APIC
Processor Processor Processor
Core Core Core
system bus system bus
Figure 1: Hyper-threading architecture. Figure 2: Dual-core architecture.
portions in its overall execution time. Of course, pro- At this time, we have dual-core and quad-core
grammers should avoid this situation. CPU’s commercially available. A customized CPU
Hyper-threading is a recently developed technol- for Xbox 360 has triple cores, while some CPU’s for
ogy for more efficient multi-threading, on the Pen- workstation computers also have multi-core architec-
tium4 microprocessor architectures, delivered by In- tures.
tel. It is also officially known as HTT(hyper-threading
technology). In this technology, when a processing
core is active, the other CPU pipelines not in use may 3 Performance Analysis
be used by other threads, to finally simulate two log- In this paper, we will compare several CPU archi-
ical processors in a single physical processor. So, we tectures: single-core, dual-core and hyper-threading
can expect two logical processors in a hyper-threading CPU’s. For this purpose, we select four kinds of op-
possible CPU’s[7]. Figure 1 shows the conceptual di- erations. Basically, we focused on the arithmetic op-
agram for the hyper-threading environment. erations, to fully utilize the internal computing power
In spite of its strong points, hyper-threading also of CPU’s. To test integer operation units and floating-
has drawbacks. Since the logical cores share level-1 point units separately, we prepare the following oper-
and level-2 caches, there is some security holes and ations:
some slow-downs in real world applications[8].
Multi-core microprocessors have two or more • integer operations: consist of 1,000 integer ad-
processing cores in a single physical processor pack- ditions, which are repeated 5,000,000 times.
age, as shown in Figure 2. In this case, each pro-
cessing core has its own resources such as caches, • floating-point operations: consist of 1,000 dou-
registers, execution units, etc. Thus, there is no ble precision floating-point additions, repeated
resource sharing in multi-core architectures, while 5,000,000 times.
hyper-threading invokes some kind of resource shar- • mixed operations: consist of 1,000 integer ad-
ing. Some multi-core processors are designed to co- ditions and 1,000 floating-point additions. When
operate with hyper-threading technology. multiple threads are used, each thread is allotted
Although multi-core CPU’s are one of cost- to the same amount of integer and floating-point
effective way of implementing parallel programming operations.
paradigm, it also has some drawbacks. At this time,
multi-core CPU’s have slower clock than conventional • separated operations: consist of 1,000 inte-
single-core CPU’s. Thus, current multi-core CPU’s ger additions and 1,000 floating-point additions.
show bad scores for sequentially designed computer When multiple threads are used, each thread
programs, in comparison with single-core CPU’s[9]. is wholly served for integer operations or for
Proceedings of the 6th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 15-17, 2007 438
Table 1: Execution times on the single-core CPU. Table 2: Execution times on the dual-core CPU.
(unit: sec) (unit: sec)
num. operations num. operations
threads integer double mixed separated threads integer double mixed separated
1 1.294 2.145 3.491 3.453 1 0.691 1.772 2.716 2.459
2 1.306 2.157 3.459 3.435 2 0.366 0.894 1.256 1.775
3 1.316 2.173 3.499 3.457 3 0.403 0.941 1.294 1.334
4 1.326 2.155 3.463 3.457 4 0.400 0.919 1.297 1.331
5 1.336 2.137 3.461 3.469 5 0.400 0.931 1.306 1.306
6 1.346 2.165 3.595 3.477 6 0.409 0.928 1.319 1.303
7 1.388 2.169 3.545 3.455 7 0.412 0.925 1.319 1.281
8 1.390 2.147 3.511 3.467 8 0.416 0.903 1.316 1.294
of operations are shown in Table 1. All the operations
are measured for varying number of threads from 1 to
4.0 8. The graphical representation of these data is also
shown in Figure 3. Since there is only one CPU core,
the execution time is independent on the number of
3.5
threads.
double As we can trivially guess, the execution times for
3.0 separated-operations and mixed-operations threads
integer
separated
are very close to the sum of those of integer-operations
execution time (sec)
2.5 and double-operations threads. Additionally, there is
mixed
no noticeable difference between the execution time
2.0 of mixed-operation and separated-operation threads.
1.5 3.2 Dula-core case
For dual-core cases, we use an Intel Core2 E6400
1.0 2.13GHz CPU system, with 1.0GB memory. The
experiments are actually the same to the single-core
0.5 case. The experimental results are summarized in Ta-
ble 2 and Figure 4.
Since we use a dual-core CPU, the execution time
0.0 with 2 or more threads are dropped to half of the ex-
1 2 3 4 5 6 7 8 ecution time of single threaded case. Similar to the
number of threads single-core case, the execution times for separated-
operations and mixed-operations threads are very
close to the sum of those of integer-operations and
double-operations threads.
Figure 3: Single-core CPU performance.
3.3 Hyper-threading case
floating-point operations. I.e., the integer and To test the hyper-threading CPU case, an Intel Pen-
floating-point operations are separated into inde- tium4 2.8 GHz processor, with hyper-threading facil-
pendent threads. ity is used, with 1.0GB memory. Measured execu-
tion times are listed in Table 3, and its corresponding
graphical representation is shown in Figure 5.
3.1 Single-core case One remarkable thing on the graph is that the
We use an Intel Pentium4 1.6 GHz processor with separated-operations threads show better performance
1.5GB memory as a testing system for the single-core with respect to the mixed-operations threads. We
case. The measured execution times for the four kinds guess that the processor core is fully utilized when a
Proceedings of the 6th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 15-17, 2007 439
4.0 4.0
double double
3.5 integer 3.5
integer
separated separated
3.0 mixed 3.0
mixed
execution time (sec)
execution time (sec)
2.5 2.5
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
number of threads number of threads
Figure 4: Dual-core CPU performance. Figure 5: Hyper-threading CPU performance.
Table 3: Execution times on the hyper-threading CPU.
ations are executed on these architectures. We mea-
(unit: sec) sured the execution times of these all operations, for
num. operations different number of threads from 1 to 8.
threads integer double mixed separated After analyzing all the data, we found that the
1 0.756 2.200 3.266 2.941 single-core and dual-core act as usually expected, i.e.
2 0.747 1.125 1.844 2.181 the execution time of combined operations are very
3 0.744 1.144 1.875 1.816 close to the sum of that of compounding operations. In
4 0.750 1.119 1.903 1.819 contrast, the hyper-threading CPU shows better per-
5 0.766 1.128 1.909 1.856 formance when each thread performs specific opera-
6 0.791 1.141 1.925 1.859 tions, rather than mixed operations.
7 0.781 1.156 1.950 1.853 Conclusively, in the case of hyper-threading
8 0.784 1.131 1.941 1.856 CPU’s, we had better design the multi-threading soft-
ware to avoid a thread with mixed operations. We
need more experiments and analysis for more precise
inferences.
thread uses integer-operation unit and another thread
uses floating-point unit.
Acknowledgements: This work is financially sup-
ported by the Ministry of Education and Human Re-
4 Conclusion sources Development(MOE), the Ministry of Com-
merce, Industry and Energy(MOCIE) and the Min-
In this paper, we selected three different CPU archi- istry of Labor(MOLAB) through the fostering project
tectures for performance analysis: single-core, dual- of the Industrial-Academic Cooperation Centered U-
core and hyper-threading CPU’s. Four kinds of oper- niversity.
Proceedings of the 6th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 15-17, 2007 440
References:
[1] J. Stokes. Introduction to multithreading, superth-
reading and hyperthreading, 2005. http://arstech-
nica.com/articles/paedia/cpu/hyperthreading.ars.
[2] J. Stokes. Inside the Xbox 360, part I: procedural
synthesis and dynamic worlds, 2005. http://ars-
technica.com/articles/paedia/cpu/xbox360-1.ars.
[3] J. Stokes. Inside the Xbox 360, part II: the
Xenon CPU, 2005. http://arstechnica.com/arti-
cles/paedia/cpu/xbox360-2.ars.
[4] K. Wackowski and P. Gepner. Hyper-threading
technology speeds clusters. In Proc. 5th Int’l
Conf. on Parallel Proc. and Appl. Math., pages
17–26, 2003.
[5] G. Keren. Multi-threaded programming with
POSIX threads, 2002. http://users.actcom.coil/-
choo/lupg/index.html.
[6] D. Sarkar. Cost and time-cost effectiveness of
multiprocessing. IEEE Trans. Parallel Distrib.
Syst., 4(6):704–712, 1993.
[7] T. Martinez and Sunish Parikh. Understand-
ing dual processors, hyper-threading technology,
and multi-core systems. Intel Optimizing Center,
2005. http://www.devx.com/Intel/Article/27399.
[8] C. Percival. Cache missing for fun and profit. In
BSDCan ’05, 2005.
[9] J. Handy. The Cache Memory Book. Academic
Press, 1998.
View publication stats