0% found this document useful (0 votes)

161 views43 pages

Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey

The document discusses parallel computing platforms, describing SIMD and MIMD architectures as well as their control structures. It also covers communication models for parallel platforms, distinguishing between shared memory and message passing. Finally, it addresses memory hierarchies and how the performance of parallel applications can be limited by memory bandwidth and latency.

Uploaded by

askbilladdmicrosoft

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

161 views43 pages

Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey

Uploaded by

askbilladdmicrosoft

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Parallel Computing Platforms and Memory System Performance

John Mellor-Crummey
Department of Computer Science Rice University johnmc@cs.rice.edu

COMP 422

Lecture 9

8-10 February 2011

Topics for Today

SIMD and MIMD control structure Communication models for parallel platforms Memory hierarchy and performance

Parallel Computing Platforms

A parallel computing platform must specify
concurrency = control structure interaction between concurrent tasks = communication model

Control Structure of Parallel Platforms

Parallelism ranges from instructions to processes

Processor control structure alternatives

operate under the centralized control of a single control unit, or work independently

SIMD
Single Instruction stream
single control unit dispatches the same instruction to processors

Multiple Data streams

processors work on their own data

MIMD
Multiple Instruction streams
each processor has its own control control unit each processor can execute different instructions

Multiple Data streams

processors work on their own data 4

SIMD and MIMD Processors

SIMD architecture

MIMD architecture

PE = Processing Element

SIMD Control

SIMD relies on the regular structure of computations
media processing, scientific kernels (e.g. linear algebra, FFT)

Activity mask
per PE predicated execution: turn off operations on certain PEs
each PE tests own conditional and sets own activity mask PE can conditionally perform operation predicated on mask value

Executing a Conditional on SIMD Processors

if (B == 0) then C = A else C = A/B

conditional statement

initial values

execute then branch execute else branch

SIMD Examples

Many early parallel computers
Illiac IV, MPP, DAP, Connection Machine CM-1/2, and MasPar MP-1/2

Today
vector units: SSE, SSE2, Altivec (Velocity Engine, VMX)
128-bit vector registers 16 8-bit chars, 8 16-bit short int, 4 32-bit ints, 4 32-bit FP variables SSE2 also operates on 2 64-bit double precision values

co-processors ClearSpeed array processor (control PE + array of 96 PEs)

http://www.clearspeed.com

nVidia G80 GPGPU

SSE/SSE2 as examples of SIMD vector units

Scalar processing
traditional mode one operation produces one result

SIMD vector units

with SSE / SSE2 one operation produces multiple results

+
Y X+Y Y X+Y y3 x3+y3 y2

+
y1 x1+y1 y0 x0+y0

x2+y2

Slide Credit: Alex Klimovitski & Dean Macri, Intel Corporation

SSE / SSE2 SIMD Processing

SSE2 data types: anything that fits into 16 bytes, e.g.,

4x floats 2x doubles 16x bytes

Instructions operate in parallel on data in this 16 byte register

add, multiply etc.

Data bytes must be contiguous in memory and aligned Additional instructions needed for
masking data moving data from one part of a register to another
10

SIMD: ClearSpeed MTAP Co-processor

MTAP processor

Features hardware multi-threading asynchronous, overlapped I/O extensible instruction set

SIMD core poly controller poly execution unit

array of 96 PEs 64- and 32-bit floating point 250 MHz (key to low power) 128 million transistors low power: ~10 Watts 11

Figure credit:http://www.clearspeed.com/images/arch_mtap.jpg

The Subtlety of Short Vectors

Consider the following:

Stream alignment conflict! A solution: data layout transformation

Figure credit: P. Sadayappan. See Henretty et al. [CC11]

Dimension-lifted Transformation

(a) 1D array in memory (b) 2D view of same array (c) Transposed 2D array brings non-interacting elements into contiguous vectors (d) New 1D layout after transformation
Figure credit: P. Sadayappan. See Henretty et al. [CC11]

MIMD Processors
Execute different programs on different processors

Platforms include current generation systems

shared memory
multicore laptop workstation with multiple quad core processors SGI Altix UV (up to 32K sockets, each with an 8-core processor) Legacy: Cray X1 (up to 8K processors)

distributed memory
clusters (e.g. sugar.rice.edu, stic.rice.edu, ada.rice.edu) Cray XT, IBM Blue Gene

SPMD programming paradigm

Single Program, Multiple Data streams same program on different PEs, behavior conditional on thread id
14

SIMD vs. MIMD

SIMD platforms
special purpose: not well-suited for all applications custom designed with long design cycles less hardware: single control unit need less memory: only 1 copy of program today: SIMD common only for accelerators and vector units

MIMD platforms
suitable for broad range of applications inexpensive: off-the-shelf components + short design cycle need more memory: program and OS on each processor

Communication Models for Parallel Platforms

Two primary forms of data exchange between parallel tasks
accessing a shared data space exchanging messages

Platforms that provide a shared data space =

shared-memory platforms
AKA multiprocessors

Platforms that support messaging =

message-passing platforms
AKA multicomputers

Shared Memory Platforms

Components
set of processors part (or all) memory is accessible to all processors

Processor interactions
modify data objects stored in shared memory

Flavors of shared address space platforms

UMA: uniform memory access
time taken by a processor to access any memory word is identical

NUMA: non-uniform memory access

time taken by a processor to memory may vary

NUMA and UMA Platforms

UMA shared address space platform with cache (Sequent Symmetry, 1988)

NUMA shared address space platform (BBN Butterfly, 1988)

UMA shared address space platform (BBN Monarch, 1990)

SGI Origin 2000: NUMA Platform (1997)

J. Laudon and D. Lenoski. The SGI Origin: a ccNUMA highly scalable server. Proc. of the 24th annual Intl. Symp. on Computer Architecture, Denver, 241 - 251,1997

Contemporary NUMA Multiprocessor

Figure credit: AMD (http://bit.ly/dT7Zxd)

SGI Origin Network Topology

Example: bristled hypercube for 64 processors

Figure credit: http://web.cecs.pdx.edu/~alaa/ece588/papers/laudon_isca_1997.pdf

NUMA and UMA Platforms

NUMA and UMA platforms differ with respect to locality
algorithms must exploit locality for performance on NUMA platforms

Programming these platforms

easy communication: reads and writes are implicitly visible to other processors tricky coordination: read-write operations to shared data must be coordinated
Pthreads mutexes, OpenMP critical sections

Cache coherent vs. non cache-coherent architectures

non-cache coherent shared-address space architectures
provides an address map, but not coordinated sharing processors must explicitly flush data to memory before sharing examples: BBN Butterfly, Cray T3E

cache coherent architectures: caches coordinate access to multiple copies

hardware support to keep copies consistent up to date values can be retrieved from cache by remote processors examples: SGI Origin, SGI Altix

Shared Memory vs. Shared Address Space

Shared memory
access shared data with load/store

Shared address space

name shared data with global address cant necessarily access all shared data with load/store
e.g. Cray T3E: GET/PUT

Can provide shared address space abstraction on distributed memory multicomputers using software
e.g. Unified Parallel C, Co-array Fortran

Message-Passing Multicomputers

Components
set of processors each processor has its own exclusive memory

Examples
clustered workstations non-shared-address-space multicomputers
Cray XT, IBM Blue Gene, many others

Communication model
exchange data using send and receive primitives

Common message passing library: MPI

Important Aspects of Communication

Latency: How long does a single operation take?
measured in microseconds

Bandwidth: What data rate can be sustained?

measured in Mbytes/second

These terms can be applied to

memory access messaging

Bandwidth vs. Latency in a Pipeline

Dave Pattersons Laundry example: 4 people doing laundry wash (30 min) + dry (40 min) + fold (20 min) = 90 min

6 PM
T a s k O r d e r 30

7
40 40

8
40 40

9
Time 20

In this example: sequential execution 4 * 90 min = 6 hours Pipelined execution 30+ 4 * 40 + 20 = 3.5 hours Bandwidth = loads/hour BW = 4/6 l/h w/o pipelining BW = 4/3.5 l/h w pipelining BW <= 1.5 l/h w pipelining, more total loads Pipelining helps bandwidth but not latency (90 min) Bandwidth limited by slowest pipeline stage Potential speedup = Number pipe stages

A B C D

Memory Systems and Performance

Consider architecture model X
a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a latency of 100 ns, no cache

Assume that the processor can execute 1 FLOP per cycle

FLOP = floating point operation peak performance = 1 GFLOPS

Consider adding two vectors on this architecture

each FLOP requires two data accesses peak speed of this computation
one addition (i.e. 1 FLOP) every 200 ns = speed of 5 MFLOPS

achieves small fraction of the peak processor performance

Memory system is the rate limiter for performance

A Modern Memory Hierarchy (Itanium 2)

Processor TLB TLB
(FP)

L1 Cache-I L1 Cache-D
Write Buffers, Etc

16K I + 16K D, 1 cycle 256K, 5 (6 FP) cycles 3M, 13.3 (13.1 FP cycles)

L2 Cache L3 cache memory controller mem bank 1 mem bank 2

209.6 ns
28

http://www.devx.com/Intel/Article/20521

Consider the Impact of Adding a Cache

Add a cache of size 32 KB with a latency of 1 ns or one cycle Use this setup to multiply n x n matrices: C = A x B
let n = 32: carefully chosen so all matrices fit in cache

Observations
fetching the two matrices into the cache: 2n2 words
fetch 2K words 100ns x 2K = ~200 s

multiplying two n n matrices takes 2n3 operations

64K operations total 64K cycles = ~64 s

total time for the computation is therefore approximately sum of

time = time for load/store operations + time for the computation time = 200 s + 64 s

peak computation rate = (64K flop) / (264 s) = or 248 MFLOPS

Memory Bandwidth

Limited by both
the bandwidth of the memory bus the bandwidth of the memory modules

Can be improved by increasing the size of memory blocks Memory system takes l time units to deliver b units of data
l is the latency of the system b is the block size

Consider the Impact of Non-unit Block Size

Again, consider architecture model X, except
block size is 4 words instead of 1 word

Repeat the vector-add computation analysis

assume vectors are laid out contiguously in memory single memory access fetches four consecutive words four additions can be performed in 200 cycles performance = 1 FLOP / 50 ns; peak speed = 20 MFLOPS

Reusing Data in the Memory Hierarchy

Spatial reuse: using more than one word in a multi-word line
using multiple words in a cache line

Temporal reuse: using a word repeatedly

accessing the same word in a cache line more than once

Applies at every level of the memory hierarchy

e.g. TLB
spatial reuse: access multiple cache lines in a page temporal reuse: access data on the same page repeatedly

Experimental Study of Memory (membench)

Microbenchmark for memory system performance

s for array A of length L from 4KB to 8MB by 2x for stride s from 4 Bytes (1 word) to L/2 by 2x time the following loop time the following loop (repeat many times and average) (repeat many times and average) for i from 0 to L for i from 0 to L by s load A[i] from memory (4 Bytes) load A[i] from memory (4 Bytes)

1 experiment

Membench: What to Expect

average cost per access
memory time

size > L1
cache hit time
s = stride

total size < L1

Consider the average cost per load

plot one line for each array length, time vs. stride unit stride is best: if cache line holds 4 words, only miss if array is smaller than a cache, all accesses will hit after first run time for first run is negligible with enough repetitions upper right figure assumes only one level of cache performance profile is more complicated on modern systems 34

Memory Hierarchy on a Sun Ultra-2i

Sun Ultra-2i, 333 MHz
Array length

Mem: 396 ns (132 cycles)

L2: 2 MB, 12 cycles (36 ns) L1: 16 KB 2 cycles (6ns) 8 K pages, 32 TLB entries

L1: 16 B line

L2: 64 byte line

See www.cs.berkeley.edu/~yelick/arvindk/t3d-isca95.ps for details

Memory Hierarchy on a Pentium III

Katmai processor on Millennium, 550 MHz Array size

L2: 512 KB 60 ns

L1: 32 byte line ?

L1: 64K 5 ns, 4-way?

Memory System Performance: Summary

Examples here illustrate the following concepts:
exploiting spatial and temporal locality is critical for
amortizing memory latency increasing effective memory bandwidth

ratio of the number of operations to number of memory accesses

good indicator of anticipated tolerance to memory bandwidth

memory layout and computation organization significantly affect spatial and temporal locality

Multithreading for Latency Hiding

A thread is a single stream of control in the flow of a program. We illustrate threads with a dense matrix vector multiply for (i = 0; i < n; i++) c[i] = dot_product(get_row(a, i), b);

Each dot-product is independent of others

thus, can execute concurrently

Can rewrite the above code segment using threads

for (i = 0; i < n; i++) c[i] = create_thread(dot_product,get_row(a, i), b);

Multithreading for Latency Hiding (contd)

Consider how the code executes

first thread accesses a pair of vector elements and waits for them second thread can access two other vector elements in the next cycle ...

After l units of time

(l is the latency of the memory system) first thread gets its data from memory and performs its madd

Next cycle
data items for the next function instance arrive

... Every clock cycle, we can perform a computation

Multithreading for Latency Hiding (contd)

Previous example makes two hardware assumptions
memory system can service multiple outstanding requests processor is capable of switching threads at every cycle

Also requires program to have explicit threaded concurrency Machines such as the HEP, Tera, and Sun T2000 (Niagara-2) rely on multithreaded processors
can switch the context of execution in every cycle are able to hide latency effectively

Sun T2000, 64-bit SPARC v9 processor @1200MHz

organization: 8 cores, 4 strands per core, 8KB Data cache and 16KB Instruction cache per core, L2 cache: unified 12-way 3MB, RAM: 32GB
40

Prefetching for Latency Hiding

Misses on loads cause programs to stall; why not load data before it is needed?
by the time it is actually needed, it will be there!

Drawback: need space to store early loads

may overwrite other necessary data in cache if early loads are overwritten, we are little worse than before!

Prefetching support
software only, e.g. Itanium2 hardware and software, e.g. Opteron

Hardware prefetching requires

predictable access pattern limited number of independent streams

Tradeoffs in Multithreading and Prefetching

Multithreaded systems
bandwidth requirements
may increase very significantly because of reduced cache/ thread

can become bandwidth bound instead of latency bound

Multithreading and prefetching

only address latency may often exacerbate bandwidth needs have significantly larger data footprint; need hardware for that

References

Adapted from slides Parallel Programming Platforms by Ananth Grama accompanying course textbook Vivek Sarkar (Rice), COMP 422 slides from Spring 2008 Jack Dongarra (U. Tenn.), CS 594 slides from Spring 2008, http://www.cs.utk.edu/%7Edongarra/WEB-PAGES/ cs594-2008.htm Kathy Yelick (UC Berkeley), CS 267 slides from Spring 2007, http://www.eecs.berkeley.edu/~yelick/cs267_sp07/lectures Tom Henretty, Kevin Stock, Louis-Nol Pouchet, Franz Franchetti, J. Ramanujam and P. Sadayappan. Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures. In ETAPS Intl. Conf. on Compiler Construction (CC'2011), Springer Verlag, Saarbrucken, Germany, March 2011. To appear.
43

PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
CS213 Parallel Processing Syllabus
No ratings yet
CS213 Parallel Processing Syllabus
26 pages
CS516: Parallelization of Programs: Overview of Parallel Architectures
No ratings yet
CS516: Parallelization of Programs: Overview of Parallel Architectures
43 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
No ratings yet
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
17 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Chapter2 Part 3
No ratings yet
Chapter2 Part 3
27 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
Architecture
No ratings yet
Architecture
67 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Seminar
No ratings yet
Seminar
85 pages
PARALLEL PROGRAMMING Module 1
No ratings yet
PARALLEL PROGRAMMING Module 1
20 pages
Flynn's Classification
No ratings yet
Flynn's Classification
46 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
Unit IV CA
No ratings yet
Unit IV CA
73 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Module 2
No ratings yet
Module 2
124 pages
Multi Core
No ratings yet
Multi Core
7 pages
Parallel Processing: sp2016 Lec#5
No ratings yet
Parallel Processing: sp2016 Lec#5
27 pages
Onur Digitaldesign 2020 Lecture20 Gpu Beforelecture
No ratings yet
Onur Digitaldesign 2020 Lecture20 Gpu Beforelecture
73 pages
Parallel Architectures Parallel Architectures: Ever Faster
No ratings yet
Parallel Architectures Parallel Architectures: Ever Faster
11 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
No ratings yet
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
48 pages
Introduction To Parallel Processing Architecture
No ratings yet
Introduction To Parallel Processing Architecture
31 pages
Parallel Computers
No ratings yet
Parallel Computers
39 pages
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
No ratings yet
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
22 pages
Parallel Processing Lecture2
No ratings yet
Parallel Processing Lecture2
62 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
atII Bks Lec 2021 31 32
No ratings yet
atII Bks Lec 2021 31 32
16 pages
Parallel Processing Explained
No ratings yet
Parallel Processing Explained
22 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
SIMD and Associative Computational Models: Parallel & Distributed Algorithms
No ratings yet
SIMD and Associative Computational Models: Parallel & Distributed Algorithms
31 pages
CP4253 Map Unit I
No ratings yet
CP4253 Map Unit I
31 pages
Lecture 10 - SIMD Architecture
No ratings yet
Lecture 10 - SIMD Architecture
27 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Paralelismo 2024
No ratings yet
Paralelismo 2024
30 pages
ACA UNIT-5 Notes
No ratings yet
ACA UNIT-5 Notes
15 pages
Cs8083 MCP Unit I Notes
No ratings yet
Cs8083 MCP Unit I Notes
31 pages
Flynn's Taxonomy & Parallel Models
No ratings yet
Flynn's Taxonomy & Parallel Models
27 pages
Motivation For Parallelism Motivation For Parallelism
No ratings yet
Motivation For Parallelism Motivation For Parallelism
6 pages
CC Unit 1.2
No ratings yet
CC Unit 1.2
39 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
NOTES
No ratings yet
NOTES
19 pages
Unit Iv Parallelism
No ratings yet
Unit Iv Parallelism
80 pages
ACA Mod1
No ratings yet
ACA Mod1
118 pages
Ceg4131 Models
No ratings yet
Ceg4131 Models
27 pages
Unit-1 ACA
No ratings yet
Unit-1 ACA
26 pages
C Faq
No ratings yet
C Faq
116 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
Comp422 2011 Lecture8 UPC
No ratings yet
Comp422 2011 Lecture8 UPC
44 pages
Graph Algorithms: John Mellor-Crummey
No ratings yet
Graph Algorithms: John Mellor-Crummey
33 pages
Parallel Computing Course Guide
No ratings yet
Parallel Computing Course Guide
50 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Mumbai University B.E. Electronics Syllabus
No ratings yet
Mumbai University B.E. Electronics Syllabus
58 pages
Gender Discrimination in Indian Judicial System: Causes, Consequences and Cures
No ratings yet
Gender Discrimination in Indian Judicial System: Causes, Consequences and Cures
13 pages
Actions in Case of Breach of Contract of Carriage
No ratings yet
Actions in Case of Breach of Contract of Carriage
11 pages
Digital Meter User Guide
100% (1)
Digital Meter User Guide
17 pages
Praveen Gupta: Experience
No ratings yet
Praveen Gupta: Experience
1 page
Cambridge Primary Progression Test - English As A Second Language 2016 Stage 3 - Paper 1 Question - PD - Duration 30 Minutes Engli
No ratings yet
Cambridge Primary Progression Test - English As A Second Language 2016 Stage 3 - Paper 1 Question - PD - Duration 30 Minutes Engli
1 page
Forex Backtesting Insights
No ratings yet
Forex Backtesting Insights
40 pages
Future of AI in VLSI Design (2.0)
No ratings yet
Future of AI in VLSI Design (2.0)
14 pages
Enna - 5S Office Facilitator Guide-Productivity Press
No ratings yet
Enna - 5S Office Facilitator Guide-Productivity Press
80 pages
University of Liverpool
No ratings yet
University of Liverpool
8 pages
24 Javier vs. COMELEC 144 SCRA 194 (1986)
No ratings yet
24 Javier vs. COMELEC 144 SCRA 194 (1986)
2 pages
Welded Joint Categories Guide
No ratings yet
Welded Joint Categories Guide
544 pages
Directory of Deaf Schools
No ratings yet
Directory of Deaf Schools
14 pages
Vagão Forrageiro 3170
No ratings yet
Vagão Forrageiro 3170
45 pages
TG16 Conduit Fill Requirements
No ratings yet
TG16 Conduit Fill Requirements
4 pages
Physical Verification1
100% (1)
Physical Verification1
163 pages
Week 8 GEC 7 Science Technology and Society
No ratings yet
Week 8 GEC 7 Science Technology and Society
14 pages
Internship Report On Nishat Mills 2012
82% (17)
Internship Report On Nishat Mills 2012
81 pages
Arithmetic Progressions
No ratings yet
Arithmetic Progressions
13 pages
Loan Sanction Letter
No ratings yet
Loan Sanction Letter
8 pages
Inventário de Espaço Confinado Brady
No ratings yet
Inventário de Espaço Confinado Brady
2 pages
IP 21042023 Letter Writing Eng
No ratings yet
IP 21042023 Letter Writing Eng
2 pages
CA IPCC Company Law Study Guide
100% (1)
CA IPCC Company Law Study Guide
315 pages
Chapter 7 Variable Costing A Tool For Management
No ratings yet
Chapter 7 Variable Costing A Tool For Management
34 pages
FBS - 1st Periodical Exam
No ratings yet
FBS - 1st Periodical Exam
6 pages
Mahatma Jyotiba Phule Rohilkhand University, Bareilly: Under-Graduate (UG) Admission Registration Form
No ratings yet
Mahatma Jyotiba Phule Rohilkhand University, Bareilly: Under-Graduate (UG) Admission Registration Form
1 page
Battle Bus Drop at DuckDuckGo
No ratings yet
Battle Bus Drop at DuckDuckGo
1 page
Penalty Fares: South Western Railway April 2018
No ratings yet
Penalty Fares: South Western Railway April 2018
2 pages
Design and Fabrication of Smart Seed Sowing Robot: Materials Today: Proceedings August 2020
No ratings yet
Design and Fabrication of Smart Seed Sowing Robot: Materials Today: Proceedings August 2020
6 pages
CV Rigger Level 3 PDF
No ratings yet
CV Rigger Level 3 PDF
7 pages