[go: up one dir, main page]

0% found this document useful (0 votes)
38 views159 pages

Processors Basic

The document discusses multicore processors, highlighting their design, functionality, and advantages such as enhanced performance and reduced power consumption. It also addresses the challenges associated with multicore technology, including software dependency and performance limitations. Major use cases for multicore processors include virtualization, databases, analytics, cloud computing, and visualization.

Uploaded by

Harshit Rathour
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views159 pages

Processors Basic

The document discusses multicore processors, highlighting their design, functionality, and advantages such as enhanced performance and reduced power consumption. It also addresses the challenges associated with multicore technology, including software dependency and performance limitations. Major use cases for multicore processors include virtualization, databases, analytics, cloud computing, and visualization.

Uploaded by

Harshit Rathour
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 159

Contents

 Multi-Core Processors

 Super Scalar Processors

 Very Long Instruction Word (VLIW) Processors

 Vector Processors
Multi-core Processors

 A multicore processor is an integrated circuit that has two or


more processor cores attached for

• Enhanced performance and

• Reduced power consumption.

 These processors also enable more efficient simultaneous


processing of multiple tasks,

• such as parallel processing and multithreading.


Multi-core Processors Contd..

 A dual core setup is similar to having multiple, separate


processors installed on a computer.

 However, because the two processors are plugged into the same
socket,

• the connection between them is faster.


Multi-core Processors Contd..

 The use of multicore processors is one approach to boost


processor performance

• without exceeding the practical limitations of


semiconductor design and fabrication.

 Using multicores also ensure safe operation in areas such as heat


generation.
How do multicore processors work?

 The heart of every processor is an execution engine, also known


as a core.

 The core is designed to process instructions and data according


to
• the software programs in the computer's memory.

 Over the years, designers found that every new processor design
has limits.
How do multicore processors work?
Contd..
 Numerous technologies were utilized to accelerate
performance, as following ones:

• Clock Speed

• Hyper-threading

• More Chips
How do multicore processors work?
(Clock Speed)
Clock Speed:

 One approach is to make the processor's clock faster.

 Clock is the "drumbeat" used to

• synchronize the processing of instructions and data


through the processing engine.

 Clock speeds have accelerated from several megahertz (MHz)


to several gigahertz (GHz) nowadays.
How do multicore processors work?
(Clock Speed) Contd..
 However, transistors use up power with each clock tick.

 As a result, clock speeds have nearly reached their limits

• using current semiconductor fabrication and heat


management techniques.
How do multicore processors work?
(Hyper Threading)
Hyper Threading:

 Another approach involved the handling of multiple instruction


threads.

 Intel calls this hyper-threading.

 With hyper-threading, processor cores are designed to

• handle two separate instruction threads at the same time.


How do multicore processors work?
(Hyper Threading) Contd..

 When properly enabled and supported by both the computer's


firmware and operating system,

• hyper-threading techniques enable one physical core to


function as two logical cores.

 Still, the processor only possesses a single physical core.


How do multicore processors work?
(Hyper Threading) Contd..

 The logical abstraction of the physical processor added little real


performance to the processor

• other than to help streamline the behavior of multiple


simultaneous applications running on the computer.
How do multicore processors work?
(More Chips)
More Chips:

 The next step is to add processor chips to the processor package,

• which is the physical device that plugs into the


motherboard.

 A dual-core processor includes two separate processor cores.

 A quad-core processor includes four separate cores.


How do multicore processors work?
Contd..
 Today's multicore processors can easily include 12, 24 or even
more processor cores.

 The multicore approach is almost identical to the use of


multiprocessor motherboards,

• which have two or four separate processor sockets.


How do multicore processors work?
Contd..
 Today's huge processor performance involves the use of
processor products

• that combine fast clock speeds and multiple hyper-


threaded cores.
How do multicore processors work?
Contd..
 However, multicore chips have several issues to consider.

 First, the addition of more processor cores doesn't automatically


improve computer performance.

 The OS and applications must direct software program


instructions to

• recognize and use the multiple cores.


How do multicore processors work?
Contd..
 This must be done in parallel,

• using various threads to different cores within the


processor package.

 Some software applications may need to be refactored to

• support and use multicore processor platforms.

 Otherwise, only the default first processor core is used, and any
additional cores are unused or idle.
How do multicore processors work?
Contd..
 Second, the performance benefit of additional cores is not a
direct multiple.

 That is, adding a second core does not double the processor's
performance,

• or a quad-core processor does not multiply the processor's


performance by a factor of four.
How do multicore processors work?
Contd..
 This happens because of the shared elements of the processor,
such as access to

• Internal memory or caches,

• External buses,

• and Computer system memory.


How do multicore processors work?
Contd..
 The benefit of multiple cores can be substantial, but there are
practical limits.

 Still, the acceleration is typically better than a traditional


multiprocessor system because

• the coupling between cores in the same package is tighter and

• there are shorter distances and fewer components between


cores.
Why are multicore processors used?

 Multicore processors work on any modern computer hardware


platform.

 Virtually, all PCs and laptops today build in some multicore


processor model.

 However, the true power and benefit of these processors depend on

• software applications designed to emphasize parallelism.


Why are multicore processors used?
Contd..
 A parallel approach divides application work into numerous
processing threads,

• and then distributes and manages those threads across


two or more processor cores.
Major Use cases for Multicore Processors

There are several major use cases for multicore processors,


including the following five:

• Virtualization

• Databases

• Analytics & HPC

• Cloud

• Visualization
Major Use cases for Multicore Processors
(Visualization)
Virtualization:

 A virtualization platform, such as VMware, is designed to

• abstract the software environment from the underlying


hardware.

 Virtualization is capable of abstracting physical processor cores into

• virtual processors or central processing units (vCPUs)

❑ which are then assigned to Virtual Machines (VMs).


Major Use cases for Multicore Processors
(Visualization) Contd..

 Each VM becomes a virtual server capable of running its own


OS and application.

 It is possible to assign more than one vCPU to each VM,

• allowing each VM and its application to run parallel


processing software if required.
Major Use cases for Multicore Processors
(Databases)
Databases:

 A database is a complex software platform that frequently needs


to run many simultaneous tasks such as queries.

 As a result, databases are highly dependent on multicore


processors to

• distribute and handle these many task threads.


Major Use cases for Multicore Processors
(Databases) Contd..
 The use of multiple processors in databases is often coupled with
extremely high memory capacity

• that can reach 1 terabyte or more on the physical server.


Major Use cases for Multicore Processors
(Analytics & HPC)
Analytics and HPC:

 Big data analytics, such as

• machine learning and High Performance Computing


(HPC) both require

❑ breaking large & complex tasks into smaller and


more manageable pieces.
Major Use cases for Multicore Processors
(Analytics & HPC) Contd..
 Each piece of the computational effort can then be solved by

• distributing each piece of the problem to a different


processor.

 This approach enables each processor to work in parallel to

• solve the overarching problem far faster and more


efficiently than with a single processor.
Major Use cases for Multicore Processors
(Cloud)
Cloud:

 Organizations building a cloud adopt multicore processors to

• support all the virtualization needed to

❑ accommodate the highly scalable,

❑ and highly transactional demands of cloud


software platforms such as OpenStack.
Major Use cases for Multicore Processors
(Cloud) Contd..
 A set of servers with multicore processors can allow the cloud to

• create and scale up more VM instances on demand.


Major Use cases for Multicore Processors
(Visualization)
Visualization:

 Graphics applications, such as games and data-rendering engines,

• have the same parallelism requirements as other HPC


applications.

 Visual rendering is task-intensive,

• So visualization applications can make extensive use of


multiple processors to distribute the calculations required.
Major Use cases for Multicore Processors
(Visualization) Contd..

 Many graphics applications rely on Graphics Processing Units


(GPUs) rather than CPUs.

 GPUs are tailored to optimize graphics-related tasks.

 GPU packages often contain multiple GPU cores, similar in


principle to multicore processors.
Pros and cons of multicore processors

 Multicore processor technology is mature and well-defined.

 However, the technology poses its share of pros and cons,

• which should be considered when buying and deploying


new servers.
Advantages of Multicore Processor

Some of the advantages of multicore processors are following:

 Better application performance

 Better hardware performance


Advantages of Multicore Processor
(Better Application Performance)
Better Application Performance:

 The principle benefit of multicore processors is more potential


processing capability.

 Each processor core is effectively a separate processor that


OSes and applications can use.

 In a virtualized server, each VM can employ one or more


virtualized processor cores,

• enabling many VMs to coexist and operate


simultaneously on a physical server.
Advantages of Multicore Processor Contd..
(Better Application Performance)
 Similarly, an application designed for high levels of parallelism
may use any number of cores to

• provide high application performance that would be


impossible with single-chip systems.
Advantages of Multicore Processor
(Better Hardware Performance)
Better Hardware Performance:

 By placing two or more processor cores on the same device, it


can use shared components such as

• Common internal buses,

• and Processor caches more efficiently.


Advantages of Multicore Processor Contd..
(Better Hardware Performance)

 It also benefits from superior performance compared with


multiprocessor systems

• that have separate processor packages on the same


motherboard.
Disadvantages of Multicore Processor

Some of the disadvantages of multicore processor are following:

 Software dependent,

 Performance boosts are limited,

 Power, heat and clock restrictions.


Disadvantages of Multicore Processor
(Software Dependent)

Software Dependent:

 The application uses processors not the other way around.

 OSes and applications are always default to use the first


processor core, dubbed core 0.

 Any additional cores in the processor package will remain unused


or idle

• until software applications are enabled to use them.


Disadvantages of Multicore Processor
(Software Dependent)
Contd..
 Such applications include database applications and big data
processing tools like Hadoop.

 A business should consider for what a server will be used and


the applications it plans to use

• before making a multicore system investment

❑ to ensure that the system delivers its optimum


computing potential.
Disadvantages of Multicore Processor
(Performance boosts are limited)

Performance boosts are limited:

 Multiple processors in a processor package must share common


system buses and processor caches.

 The more processor cores share a package,

• the more sharing take place across common processor


interfaces and resources.
Disadvantages of Multicore Processor
(Performance boosts are limited)
Contd..

 This results in diminishing returns to performance as cores are


added.

 For most situations, the performance benefit of having multiple


cores
• far outweighs the performance lost to such sharing,

❑ but it's a factor to consider when testing application


performance.
Disadvantages of Multicore Processor
(Power, heat and clock restrictions)

Power, heat and clock restrictions:

 A computer may not be able to drive a processor with many


cores

• as hard as a processor with fewer cores or a single-core


processor.

 A modern processor core may contain over 500 million


transistors.
Disadvantages of Multicore Processor
(Power, heat and clock restrictions)
Contd..

 Each transistor generates heat when it switches,

• and this heat increases as the clock speed increases.

 All of that heat generation must be safely dissipated

• from the core through the processor package.


Disadvantages of Multicore Processor
(Power, heat and clock restrictions)
Contd..

 When more cores are running,

• this heat can multiply and quickly exceed the cooling


capability of the processor package.

 Thus, some multicore processors may actually reduce clock


speeds for instance,

• from 3.5 GHz to 3.0 GHz to help manage heat.


Disadvantages of Multicore Processor
(Power, heat and clock restrictions)
Contd..
 This reduces the performance of all processor cores in the
package.

 High-end multicore processors require

• complex cooling systems,

• and careful deployment & monitoring

to ensure long-term system reliability.


Architecture of Multicore Processors
Architecture of Multicore Processors
Contd..
The components of multicore processors are as follows:

 Cores

 Processor Support

 Caches
Architecture of Multicore Processors
(Cores)
Cores:

 Every multicore processor consists of two or more cores along


with a series of caches.

 Cores are the central component of multicore processors.


Architecture of Multicore Processors
(Cores) Contd..
 Cores contain

• all of the registers and circuitry,

• sometimes hundreds of millions of individual transistors


needed to

❑ perform the closely-synchronized tasks of ingesting


data and instruction,

❑ process content and outputting logical decisions or


results.
Architecture of Multicore Processors
(Processor Support)
Processor Support:

 Processor support circuitry includes an assortment of


input/output control and management circuitry, such as

• Clocks,

• Cache consistency,

• Power & thermal control,

• and External bus access.


Architecture of Multicore Processors
(Caches)
Caches:

 Caches are relatively small areas of very fast memory.

 A cache retains often-used instructions or data,

• making that content readily available to the core

❑ without the need to access system memory.


Architecture of Multicore Processors
(Caches) Contd..

 A processor checks the cache first.

 If the required content is present,

• the core takes that content from the cache, enhancing


performance benefits.

 If the content is absent, the core will access system memory for
the required content.
Architecture of Multicore Processors
(Caches) Contd..
 Level 1, or L1, cache is the smallest and fastest cache unique to
every core.

 Level 2, or L2, cache is a larger storage space shared among


the cores.

 Some multicore processor architectures may dedicate both L1 and


L2 caches.
Homogenous vs. Heterogeneous
Multicore Processors

Homogenous vs. heterogeneous multicore processors:

 The cores within a multicore processor may be homogeneous or


heterogeneous.

 Mainstream Intel and AMD multicore processors


for x86 computer architectures

• are homogeneous and provide identical cores.


Homogenous vs. Heterogeneous
Multicore Processors
Contd..

 However, dedicating a complex device to do a simple job or to


get greatest efficiency is often wasteful.

 There is a heterogeneous multicore processor market

• that uses processors with different cores for different


purposes.
Homogenous vs. Heterogeneous
Multicore Processors
Contd..
 Heterogeneous cores are generally found in embedded or Arm
processors that

• might mix microprocessor and microcontroller cores in


the same package.
Goals for Heterogeneous Multicore Processors

There are three general goals for heterogeneous multicore


processors:

 Optimized performance

 Optimized power

 Optimized security
Goals for Heterogeneous Multicore Processors
(Optimized Performance)

Optimized Performance:

 While homogeneous multicore processors are typically intended


to
• provide universal processing capabilities,

❑ many processors are not intended for such


generic system use cases.
Goals for Heterogeneous Multicore Processors
(Optimized Performance) Contd..

 Instead, they are designed and sold for use in embedded,


dedicated or task-specific systems

• that can benefit from the unique strengths of different


processors.
Goals for Heterogeneous Multicore Processors
(Optimized Performance) Contd..

 For example, a processor intended for a signal processing device

• might use an Arm processor

❑ that contains a Cortex-A general-purpose processor,

❑ with a Cortex-M core for dedicated signal processing


tasks.
Goals for Heterogeneous Multicore Processors
(Optimized Power)

Optimized Power:

 Providing simpler processor cores reduces the transistor count


and eases power demands.

 This makes the processor package and the overall system cooler
and more power-efficient.
Goals for Heterogeneous Multicore Processors
(Optimized Security)

Optimized Security:

 Jobs or processes can be divided among different types of cores,

• enabling designers to deliberately build high levels of


isolation

❑ that tightly control access among the various


processor cores.
Goals for Heterogeneous Multicore Processors
(Optimized Security) Contd..

 This greater control and isolation offer better stability and


security for the overall system,

• though at the cost of general flexibility.


Examples of Multicore Processors

Examples of multicore processors:

 Most modern processors designed and sold for general-purpose


x86 computing include multiple processor cores.

 Examples of Intel 12th-generation multicore processors include the


following:

• Intel Core i9 12900 family provides 8 cores and 24 threads.

• Intel Core i7 12700 family provides 8 cores and 20 threads.

• Intel Core i5 12600 processors offer 6 cores and 16 threads.


Examples of Multicore Processors
Contd..
 Examples of AMD Zen multicore processors include:

• AMD Zen 3 family (provides 4 to 16 cores).

• AMD Zen 2 family (provides up to 64 cores).

• AMD Zen+ family (provides 4 to 32 cores).


What is Superscalar Processor?

 A type of microprocessor that is used to implement a type of


parallelism

• known as instruction-level parallelism in a single processor


to

❑ execute more than one instruction during a clock


cycle by

▪ dispatching simultaneously various instructions


to special execution units on the processor.
What is Superscalar Processor?
Contd..
 A scalar processor executes single instruction for each clock
cycle;

 A superscalar processor can execute more than one instruction


during a clock cycle.
Features of Superscalar Processors

Features of superscalar processors include the following:

 Superscalar architecture is a parallel computing technique


utilized in various processors.

 In a superscalar computer, the CPU manages several instruction


pipelines to

• perform numerous instructions simultaneously during a


clock cycle.
Features of Superscalar Processors
Contd..
 Superscalar architectures include all pipelining features

• Although, there are several instructions executing


simultaneously within the same pipeline.

 Superscalar design methods normally comprise

• Parallel register renaming,

• Parallel instruction decoding,

• Speculative execution & out-of-order execution.


Features of Superscalar Processors
Contd..
 So, these methods are normally used with complementing
design methods like

• Caching,

• Pipelining,

• Branch prediction & multi-core in recent microprocessor


designs.
Superscalar Processor Architecture

 A superscalar processor is a CPU that

• executes above one instruction for each Clock cycle


because

❑ processing speeds are simply measured in Clock


cycles for each second.

 Compared to a scalar processor, this processor is faster.


Superscalar Processor Architecture
Contd..
 Superscalar processor architecture mainly includes parallel
execution units

• where these units can implement instructions


simultaneously.

 So first, this parallel architecture was implemented within a


Reduced Instruction Set Computer (RISC) processor that

• utilizes simple & short instructions to execute


calculations.
Superscalar Processor Architecture
Contd..
 Due to their superscalar abilities,

• Normally, Reduced Instruction Set Computer


(RISC) processors have performed better as compared to

❑ Complex Instruction Set Computer (CISC) processors


which run at the same megahertz.

 But, most CISC processors now like the Intel Pentium comprise
some RISC architecture also,

• which allows them to perform instructions in parallel.


Block Diagram for Superscalar Processor
Superscalar Processor Architecture
Contd..

 The superscalar processor is equipped with several processing


units for handling

• various instructions in parallel in every processing stage.

 By using the above architecture,

• a number of instructions start execution within a similar


clock cycle.
Superscalar Processor Architecture
Contd..
 These processors are capable of

• obtaining an instruction execution output of the one


instruction for each cycle.

 In the previous architecture diagram, a processor is used with


two execution units

• where one is used for integer & other one is used for the
operations of floating point.
Superscalar Processor Architecture
Contd..
 The Instruction Fetch Unit (IFU) is capable of

• instructions reading at a time & stores them within the


instruction queue.

 In every cycle, the dispatch unit fetches & decodes

• up to 2 instructions from the queue front.


Superscalar Processor Architecture
Contd..
 If there is a single integer, single floating point instruction & no
hazards,

• then both instructions are dispatched within a similar


clock cycle.
Scalar Pipelining

Pipelining:

 Pipelining is the procedure of breaking down tasks into sub-steps &

• executing them within different processor parts.

 Pipelining architecture in the scalar processor and the superscalar


processor is shown in next slides.
Scalar Pipelining Contd..
Super scalar Pipelining

 The instructions in a superscalar processor are issued from a


sequential instruction stream.

 It must allow multiple instructions for each clock cycle and

• the CPU must check dynamically for data dependencies


between instructions.
Super scalar Pipelining Contd..

 In the following superscalar pipeline,

• two instructions can be fetched and dispatched at a time to

❑ complete a maximum of 2 instructions per cycle.


Super scalar Pipelining
Super scalar Pipelining Contd..

 A scalar processor issues single instruction per clock cycle and

• performs a single pipeline stage per clock cycle

 whereas a superscalar processor, issues two instructions per


clock cycle in previous example and

• it executes two instances of each stage in parallel.


Super scalar Pipelining Contd..

 So, the instruction execution in a scalar processor takes more


time
• whereas in a superscalar it takes less time to execute
instructions.
Types of Superscalar Processors

 Some of the different types of superscalar processors are as


follows:

• Intel Pentium Processor

• IBM Power PC601


Types of Superscalar Processors
(Intel Pentium Processor)
Intel Pentium Processor:

 In Intel Pentium processor superscalar pipelined architecture

• CPU executes a minimum of two or above instructions


for each cycle.

 This processor is widely used in personal computers.


Types of Superscalar Processors
(Intel Pentium Processor)
Contd..
 Intel Pentium processor devices are normally built for

• Online use,

• Cloud computing,

• & Collaboration.

 So this processor perfectly works for tablets and Chromebooks to

• provide strong local performance & efficient online


interactions.
Types of Superscalar Processors
(IBM Power PC601)
IBM Power PC601:

 The superscalar processor like IBM power PC601 is from the


family of PowerPC of RISC microprocessors.

 This processor is capable of issuing as well as retiring three


instructions for each clock.
Types of Superscalar Processors
(IBM Power PC601)
Contd..
 Instructions are totally out of order for improved performance;

• but, the PC601 make the execution emerge in order.


Types of Superscalar Processors
(IBM Power PC601)
Contd..
 The power PC601 processor provides

• 32-bit logical addresses,

• 16 & 32 bits integer data types,

• 32 & 64 bits floating-point data types.


Characteristics of Super scalar Processor

Superscalar processor characteristics include the following:

 A superscalar processor is a super-pipelined model

• where simply the independent instructions are


performed serially without any waiting situation.

 A superscalar processor fetches & decodes at a time

• several instructions of the incoming instruction stream.


Characteristics of Super scalar Processor
Contd..
 The architecture of superscalar processor exploits

• the potential of instruction-level parallelism.

 Scalar processors mainly issue the single instruction for every


cycle.

 The number of instructions issued mainly depends on

• the instructions within the instruction stream.


Characteristics of Super scalar Processor
Contd..
 Instructions are frequently reordered to fit the architecture of
the processor better.

 The superscalar method is usually associated with some


identifying characteristics.

 Instructions are normally issued from a sequential instruction


stream.
Characteristics of Super scalar Processor
Contd..
 The CPU checks dynamically for data dependencies in between
instructions at run time.

 The CPU executes multiple instructions for each clock cycle.


Disadvantages of Superscalar Processor

Disadvantages of the superscalar processor include the following:

 Superscalar processors are not used much in small embedded


systems due to power usage.

 The problem with scheduling can happen in this architecture.

 Superscalar processor increases the complexity-level in the


designing of hardware.
Disadvantages of Superscalar Processor
Contd..
 The instructions in this processor are simply fetched based on
their sequential program order

• but this is not the best execution order.


Applications of Superscalar Processor

Applications of a superscalar processor include the following:

 The superscalar execution is frequently used in a laptop or


desktop.

 This processor simply scans the program in execution to

• discover sets of instructions that can be executed as one.


Applications of Superscalar Processor
Contd..
 A superscalar processor includes various data path hardware
copies

• which execute various instructions at once.

 This processor is mainly designed to generate an implementation


speed of above one instruction for

• each clock cycle of a single sequential program.


Introduction to VLIW Architecture

 The limitations of the Superscalar processor are prominent

• as the task of scheduling instruction becomes complex.


Introduction to VLIW Architecture
Contd..
 Intrinsic parallelism in the instruction stream,

 complexity,

 cost,

 and the branch instruction issue

• get resolved by a higher instruction set architecture called


the Very Long Instruction Word (VLIW) or VLIW
Machines.
Introduction to VLIW Architecture
Contd..
 VLIW uses Instruction Level Parallelism,

• i.e., it has programs to control the parallel execution of


the instructions.
Introduction to VLIW Architecture
Contd..
 In other architectures, the performance of the processor is
improved by using either of the following methods:

• pipelining (break the instruction into subparts),

• parallel processing (independently execute the instructions


in different parts of the processor),

• out-of-order-execution (execute instructions differently to


the program)
Introduction to VLIW Architecture
Contd..
 But each of the previous methods, add very much complexity to
the hardware.

 VLIW Architecture deals with it by depending on the compiler.

 The programs decide the parallel flow of the instructions to


resolve conflicts.

 This increases compiler complexity but decreases hardware


complexity by a lot.
Features of VLIW Architecture

Features:

 The processors in this architecture have multiple functional units,

• fetch from the Instruction cache that have Very Long


Instruction Word.

 Multiple independent operations are grouped together in a


single VLIW Instruction.

 They are initialized in the same clock cycle.

 Each operation is assigned an independent functional unit.


Features of VLIW Architecture
Contd..
 All the functional units share a common register file.

 Instruction words are typically of the length 64 to 1024 bits


depending on

• the number of execution unit,

• and the code length required to control each unit.


Features of VLIW Architecture
Contd..
 Instruction scheduling and parallel dispatch of the word is
done statically by the compiler.

 The compiler checks for dependencies before scheduling


parallel execution of the instructions.
Applications of VLIW Architecture

Some common applications of VLIW architecture include:

 Digital Signal Processing

 Multimedia Processing

 Scientific Computing

 Embedded Systems
Applications of VLIW Architecture
(Digital Signal Processing)
Digital Signal Processing (DSP):

 VLIW processors are well-suited for DSP applications because of

• their ability to perform multiple operations in parallel.

 DSP applications require high computational power

• and often involve multiple parallel data streams,

❑ which VLIW processors can handle, efficiently.


Applications of VLIW Architecture
(Multimedia Processing)
Multimedia Processing:

 VLIW processors are also used for multimedia applications such


as video and audio processing,

• where high throughput and parallelism are required.


Applications of VLIW Architecture
(Scientific Computing)
Scientific Computing:

 VLIW processors can be used for scientific computing


applications,

• where high-performance computing is required to solve


complex numerical problems.
Applications of VLIW Architecture
(Embedded Systems)
Embedded Systems:

 VLIW processors are used in many embedded systems, such as

• Automotive control systems,

• Medical devices,

• and Industrial automation equipment.


Applications of VLIW Architecture
(Embedded Systems) Contd..

 These systems require high-performance processors

• that can execute multiple instructions in parallel while


consuming minimal power.
Advantages of VLIW Architecture

Advantages:

 Reduces hardware complexity.

 Reduces power consumption because of reduction of hardware


complexity.
Advantages of VLIW Architecture
Contd..
 Since compiler takes care of

• Data dependency check,

• Decoding,

• Instruction issues,

Hence, it becomes a lot simpler.


Advantages of VLIW Architecture
Contd..
 Increases potential clock rate.

 Functional units are positioned corresponding to the instruction


pocket by compiler.
Disadvantages of VLIW Architecture

Disadvantages:

 Complex compilers are required which are hard to design.

 Increased program code size.


Disadvantages of VLIW Architecture
Contd..
 Unscheduled events,

• for example, a cache miss could lead to a stall which will


stall the entire processor.

 In case of un-filled opcodes in a VLIW,

• there is waste of memory space and instruction


bandwidth.
Vector Processor

 Vector processor is basically a central processing unit

• that has the ability to execute the complete vector input


in a single instruction.

 More specifically, it is a complete unit of hardware resources

• that executes sequential set of similar data items in the


memory using a single instruction.
Vector Processor Contd..

 Elements of the vector are ordered properly to have successive


addressing format of the memory.

• This is the reason that it implements the data sequentially.

 It holds a single control unit but has multiple execution units

• that perform the same operation on different data


elements of the vector.
Vector Processor Contd..

 Unlike scalar processors that operate on only a single pair of


data, a vector processor operates on multiple pair of data.

 However, one can convert a scalar code into vector code.

 This conversion process is known as vectorization.

 Vector processing allows operation on multiple data elements


by the help of single instruction.
Vector Processor Contd..

 These instructions are said to be single instruction multiple


data or vector instructions.

 The CPU used in recent time makes use of vector processing as


it is advantageous than scalar processing.
Architecture and Working

 The figure below represents the typical diagram showing vector


processing by a vector computer:
Architecture and Working
Contd..
Architecture and Working
Contd..

 The functional units of a vector computer are as follows:

• IPU or Instruction Processing Unit

• Vector Register

• Scalar Register

• Scalar Processor
Architecture and Working
Contd..

• Vector Instruction Controller

• Vector Access Controller

• Vector Processor
Architecture and Working
Contd..

 As vector computer has several functional pipes thus it can


execute the instructions over the operands.

 Both data and instructions are present in the memory at the


desired memory location.

 So, the instruction processing unit i.e., IPU fetches the


instruction from the memory.
Architecture and Working
Contd..

 Once the instruction is fetched

• then IPU determines either the fetched instruction is scalar


or vector in nature.

 If it is scalar in nature, then

• the instruction is transferred to the scalar register

• and then further scalar processing is performed.


Architecture and Working
Contd..

 While, when the instruction is vector in nature

• then it is fed to the vector instruction controller.

 This vector instruction controller first decodes the vector


instruction

• then accordingly determines the address of the vector


operand present in the memory.
Architecture and Working
Contd..

 Then it gives a signal to the vector access controller about

• the demand of the respective operand.

 This vector access controller then fetches the desired operand


from the memory.

 Once the operand is fetched then it is provided to the


instruction register

• so that it can be processed at the vector processor.


Architecture and Working
Contd..

 At times, when multiple vector instructions are present,

• then the vector instruction controller provides the


multiple vector instructions to the task system.

 And in case, the task system shows that the vector task is very
long

• then the processor divides the task into sub-vectors.


Architecture and Working
Contd..

 These sub-vectors are fed to the vector processor

• that makes use of several pipelines

❑ in order to execute the instruction over the operand


fetched from the memory at the same time.

 The various vector instructions are scheduled by the vector


instruction controller.
Classification of Vector Processor

 The classification of vector processor relies on

• the ability of vector formation,

• as well as the presence of vector instruction for processing.

 So, depending on these criteria, vector processor architecture is


classified as follows:

• Register to Register Architecture

• Memory to Memory Architecture


Classification of Vector Processor
Contd..
Register to Register Architecture

 This architecture is highly used in vector computers.

 In this architecture, fetching of the operand or the previous


results

• indirectly takes place through the main memory by the


use of registers.
Register to Register Architecture
Contd..
 Several vector pipelines present in the vector computer help in

• retrieving data from the registers,

• and also storing the results in the register.

 These vector registers are user instruction programmable.


Register to Register Architecture
Contd..
 This means that according to the register address present in the
instruction,

• the data is fetched and stored in the desired register.

 These vector registers hold fixed length

• like the register length in a normal processing unit.


Register to Register Architecture
Contd..
 Some examples of a supercomputer using the register to register
architecture are following:

• Cray – 1 belongs to Fujitsu series


Memory to Memory Architecture

 In memory to memory architecture,

• the operands or the results are directly fetched from the


memory despite using registers.

 However, it is to be noted here that the address of the desired


data to be accessed

• must be present in the vector instruction.


Memory to Memory Architecture
Contd..
 This architecture enables the fetching of data of size 512 bits
from memory to pipeline.

 However, due to high memory access time,

• pipelines of the vector computer requires higher startup


time,

❑ as higher time is required to initiate the vector


instruction.
Memory to Memory Architecture
Contd..
 Some examples of supercomputers that possess memory to
memory architecture are following:

• Cyber 205, developed by CDC


Characteristics of Vector Processor

Characteristics of Vector Processor:

 Vector Processors are designed to process multiple data elements


in parallel,

• while Scalar Processors process one data element at a time.

 Vector Processors can be more efficient,

• as they can complete a given task with fewer instructions


than a Scalar Processor.
Characteristics of Vector Processor
Contd..
 Vector Processors are more complex than Scalar Processors,

• and require more memory as well as power to operate.

 Vector Processors are used for more demanding tasks, such as

• scientific calculations,

• 3D game rendering.
Characteristics of Vector Processor
Contd..
 while Scalar Processors are used for simpler tasks, such as

• Basic calculations,

• and Web browsing.

 Vector Processors are more suitable for data-intensive


applications,

• while Scalar Processors are better suited for applications


that require fewer calculations.
Characteristics of Vector Processor
Contd..
 Vector Processors can be more expensive than Scalar Processors,

• as they require more complex hardware and software.

 Register to register architecture is better than memory to


memory architecture

• because it offers a reduction in vector access time.


Advantages of Vector Processor

 Better performance

 Highly parallel

 High memory bandwidth

 Reduced software overhead

 Improved accuracy
Applications of Vector Processor

 Computer Aided Design

 Image Processing

 Virtual Reality

 Scientific Computing

 Artificial Intelligence

 Data Analysis

You might also like