0% found this document useful (0 votes)

25 views9 pages

Bit HAcks

Uploaded by

SurajRGupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views9 pages

Bit HAcks

Uploaded by

SurajRGupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

5/10/2020 The Aggregate Magic Algorithms

The Aggregate Magic Algorithms

There are lots of people and places that create and collect algorithms of all types (here are a few WWW
sites). Unfortunately, in building systems hardware and software, we in The Aggregate often have found it
necessary to do relatively obscure low-level things very efficiently. Many of the tricks we've devised or
collected either require assembly language coding or are not entirely portable when coded in HLLs like C,
but these techniques are still valuable because they can yield significant performance improvements over the
more obvious ways of doing things.

None of the following coding tricks came from proprietary sources; further, we believe that each of the tricks
we did not invent is essentially "standard engineering practice" in the specialized niche where it applies.
Thus, although we have not conducted patent searches, etc., to confirm it, we believe that these are tricks that
freely can be used for any purpose. Of course, The Aggregate accepts no responsibility for your use of these
tricks; you must confirm that the trick does what you want and that you can use it as you intend. That said,
we do intend to maintain this page by adding new algorithms and/or correcting existing entries. If you have
any comments, please contact Professor Hank Dietz.

This document should be cited using something like the following bibtex entry, but with the date fetched
from this site inserted:
@techreport{magicalgorithms,
author={Henry Gordon Dietz},
title={{The Aggregate Magic Algorithms}},
institution={University of Kentucky},
howpublished={Aggregate.Org online technical report},
URL={http://aggregate.org/MAGIC/}
}

Index of Algorithms
Absolute Value of a Float
Alignment of Pointers
Average of Integers
Bit Reversal
Comparison of Float Values
Comparison to Mask Conversion
Divide Rounding
Dual-Linked List with One Pointer Field
GPU Any
GPU SyncBlocks
Gray Code Conversion
Integer Constant Multiply
Integer Minimum or Maximum
Integer Power
Integer Selection
Is Power of 2
Leading Zero Count
Least Significant 1 Bit
Log2 of an Integer
Next Largest Power of 2
Most Significant 1 Bit
Natural Data Type Precision Conversions
Polynomials
aggregate.org/MAGIC/ 1/9
5/10/2020 The Aggregate Magic Algorithms

Population Count (Ones Count)

Same Within Tolerance
Shift-and-Add Optimization
Sign Extension
Swap Values Without a Temporary
SIMD Within A Register (SWAR) Operations
Trailing Zero Count

Absolute Value of a Float

IEEE floating point uses an explicit sign bit, so the absolute value can be taken by a bitwise AND with the
complement of the sign bit. For IA32 32-bit, the sign bit is an int value of 0x80000000, for IA32 64-bit, the
sign bit is the long long int value 0x8000000000000000. Of course, if you prefer to use just int values, the
IA32 64-bit sign bit is 0x80000000 at an int address offset of +1. For example:
double x;

/* make x = abs(x) */
*(((int *) &x) + 1) &= 0x7fffffff;

Alignment of Pointers

Alignment of pointers is a pretty common problem, and there are several relevant tricks, so, at the suggestion
of Jean-Charles Meyrignac (who also provided an example of the upward alignment described below), I've
added a little description here.

It is fairly obvious that the downward alignment of an address a to a multiple-of-b boundary, where b is a
power of 2, is simply (a & ~(b-1)). Of course, ~(b-1) is also -b, so (a & -b) also works (the difference is
usually nothing; if b is a constant, most compilers will fold the first into the second form).

For upward alignment, we simply add b-1: ((a + (b-1)) & -b).

Of course, there are a few complications. First, languages like C, which allow pointer arithmetic, generally
scale pointer offsets by the size of the target object -- which would keep our math from working. It used to be
that casting a pointer as a (char *) would turn-off this scaling, but with long char and such out there, a cast
as (void *) is probably a safer bet. Unfortunately, C doesn't define bitwise operations on pointers of any
flavor, so you'll need to cast to an appropriately-large integer type before doing a bitwise AND.

Secondly, aligning an address doesn't help unless you allocated a large enough chunk of memory so that you
can treat your data structure as starting at the aligned address. In general, if you wish to create a b-aligned
data structure with c bytes, you would do something like: a=((typeof(a))(((int)(((void *)malloc(c+(b-
1)))+(b-1)))&-b)). Please excuse my use of the GCC typeof(). Anyway, this is particularly useful for
cache-line alignment of data structures. One little annoyance: you can't call free(a); you'll need to keep a
copy of the original block address for that.

Average of Integers
This is actually an extension of the "well known" fact that for binary integer values x and y, (x+y) equals
((x&y)+(x|y)) equals ((x^y)+2*(x&y)).

Given two integer values x and y, the (floor of the) average normally would be computed by (x+y)/2;
unfortunately, this can yield incorrect results due to overflow. A very sneaky alternative is to use (x&y)+
((x^y)/2). If we are aware of the potential non-portability due to the fact that C does not specify if shifts are
signed, this can be simplified to (x&y)+((x^y)>>1). In either case, the benefit is that this code sequence
cannot overflow.
aggregate.org/MAGIC/ 2/9
5/10/2020 The Aggregate Magic Algorithms

Bit Reversal
Reversing the bits in an integer x is somewhat painful, but here's a SWAR algorithm for a 32-bit value:
unsigned int
reverse(register unsigned int x)
{
x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1));
x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2));
x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4));
x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8));
return((x >> 16) | (x << 16));

It also is possible to re-write this algorithm to use 4 instead of 8 constants, thus saving some instruction
bandwidth. On my 1.2GHz Athlon (a Thunderbird), the difference is too small to measure reliably. Here's the
other version:
unsigned int
reverse(register unsigned int x)
{
register unsigned int y = 0x55555555;
x = (((x >> 1) & y) | ((x & y) << 1));
y = 0x33333333;
x = (((x >> 2) & y) | ((x & y) << 2));
y = 0x0f0f0f0f;
x = (((x >> 4) & y) | ((x & y) << 4));
y = 0x00ff00ff;
x = (((x >> 8) & y) | ((x & y) << 8));
return((x >> 16) | (x << 16));
}

Comparison of Float Values

IEEE floating point has a number of nice properties, including the ability to use 2's complement integer
comparisons to compare floating point values, provided the native byte order is consistent between float and
integer values. The only complication is the use of sign+magnitude representation in floats. The AMD Athlon
Processor x86 Code Optimization Guide gives a nice summary on Page 43. Here's a set of C routines that
embody the same logic:

#define FasI(f) (((int ) &(f)))

#define FasUI(f) (*((unsigned int *) &(f)))

#define lt0(f) (FasUI(f) > 0x80000000U)

#define le0(f) (FasI(f) <= 0)
#define gt0(f) (FasI(f) > 0)
#define ge0(f) (FasUI(f) <= 0x80000000U)

Comparison to Mask Conversion

In many cases, it is useful to convert the result of a comparison, which is either 0 or some non-zero bit
pattern, into either a "clean" 0 or -1 bit mask.

For many systems, this can be efficienty done by C code that simply uses the logic operators and negation: -
(x!=0) or -!!x. This is a very well known and old method, really a direct consequence of (and partial
motivation for) the C concept of conditional results being integers. However, for some compilers and
instruction sets (especially SWAR targets), the code generated for logicals is terrible, sometimes even
including conditional branches! Where this obvious coding doesn't work well, here are some alternatives.

aggregate.org/MAGIC/ 3/9
5/10/2020 The Aggregate Magic Algorithms

If the messy non-negative integer value is x, the sanitized version is:

(((int) (-x)) >> (WORDBITS - 1))

To remove the constraint that the messy value be non-negative, use:

(((int) (x | -x)) >> (WORDBITS - 1))

Logically, this works because the shift by (WORDBITS-1) replicates the sign bit to create a mask -- be aware,
however, that the C language does not require that shifts are signed even if their operands are signed, so there
is a potential portability problem. Additionally, one might think that a shift by any number greater than or
equal to WORDBITS would have the same effect, but many instruction sets have shifts that behave strangely
when such shift distances are specified.

Of course, the opposite condition can be tested using:

(((int) ~(x | -x)) >> (WORDBITS - 1))

If you prefer the C-standard 0 or 1 comparison result, simply use the unsigned shift:

(((unsigned int) (x | -x)) >> (WORDBITS - 1))

The opposite condition can be obtained using:

(((unsigned int) ~(x | -x)) >> (WORDBITS - 1))

Dual-Linked List with One Pointer Field

Normally, a dual-linked circular list would contain both previous and next pointer fields and the current
position in the list would be identified by a single pointer. By using two current pointers, one to the node in
question and the other to the one just before/after it, it becomes possible to store only a single pointer value
in each node. The value stored in each node is the XOR of the next and previous pointers that normally
would have been stored in each node. Decoding is obvious.

Unfortunately, using this trick in C is awkward because the XOR operation is not defined for pointers.

Divide Rounding

Joe Ibershoff, one of my students, pointed-out that integer divide normally yields the floor, but both ceiling
and round-to-nearest are easy and useful. I thought these were fairly well-known tricks closely related to the
Alignment of Pointers magic, but perhaps they aren't so obvious...? He points out that Ceiling(a/b) is (a+b-
1)/b and RoundToNearest(a/b) is (a+(b/2))/b. Of course, these tricks also work if divide is implemented in
less obvious ways, such as shifts or shift-and-subtract sequences.

GPU Any
A basic SIMD operation, "any" is a logical-OR reduction that returns true if any of its inputs are true. On
SIMD hardware, this is usually very easy... but not so on GPUs (Graphics Processing Units). NVIDIA's
CUDA has recently added hardware support, but there is a more portable way that is just as fast. The
p_any(flag) operation within a block can be accomplished by:

if (flag) sharedtemp = serial; /* maskable store */

__syncthreads(); /* optional with right block size */
p_any = (sharedtemp == (serial++));

We first publically announced this at SC08, and we're pretty sure we invented it. The trick is that NVIDIA's
hardware seems to take constant time to resolve N threads storing into the same object, i.e., it picks a winner.
aggregate.org/MAGIC/ 4/9
5/10/2020 The Aggregate Magic Algorithms

This behaviour is not documented, but has been experimentally observed; this p_any(flag) will run on any of
the CUDA hardware, and takes essentially the same time as the atomic any that was added to later CUDA
hardware. There are actually quite a few useful operations that can be built using variations on this trick....

GPU SyncBlocks
The most fundamental aggregate function (or you might call it a collective communication) is a barrier
synchronization. On SIMD hardware, this is usually implicit... but not so on GPUs (Graphics Processing
Units). Within a Block, NVIDIA's CUDA provides a barrier called __syncthreads(). Across Blocks -- if you
are running a number of Blocks that the GPU can timeshare rather than batch process -- you can synchronize
using this:
/* First, sync within each Block */
__syncthreads();
/* Pick a representative from each (here, 1D) block */
if (threadIdx.x == 0) {
/* Get my barrier number */
int barno = barnos[blockIdx.x] + 1;
int hisbarno;
int who = (blockIdx.x + 1) % gridDim.x;
/* Check in at barrier */
barnos[blockIdx.x] = barno;
/* Scan for all here or somebody passed */
do {
/* Wait for who */
do {
hisbarno = barnos[who];
} while (hisbarno < barno);
/* Bump to next who */
if (++who >= gridDim.x) who = 0;
} while ((hisbarno == barno) && (who != blockIdx.x));
/* Tell others we are all here */
barnos[blockIdx.x] = barno + 1;
}
/* Rejoin with rest of my Block */
__syncthreads();

The above code assumes that barnos[] is a volatile (forced memory access) array in GPU global memory
that is initialized to 0. The type can be either int or float; it is not critical because either way wrap-around
will take longer than GPUs will let one kernel run by default. Cost is O(number of Blocks) if all arrive at the
same time, but O(1) for the last to arrive if there is any temporal skew. The O(1) behavior is due to counting
by 2 per barrier; if all Blocks typically arrive roughly simultaneously, the algorithm can be simplified to
count by 1. The OpenCL version of this algorithm has been tested on both NVIDIA and ATI GPUs with
good performance. This algorithm also is the obvious basis for efficient within-a-kernel reductions and
scans....

We first publically showed various GPU variants of this algorithm at SC08, and it was published within the
MS thesis of two of Dietz's students in July 2009 and September 2009. Actually, it is a trivial variation on the
lockless shared memory barrier that we developed for SHMAPERS and published over a decade ago. (In
fact, it took less time to synchronize four processors in a Sun server than for one of those processors to
execute a single atomic memory instruction!) I note the dates because late in 2009 somebody else published
and claimed to have invented what is an inferior variant of this algorithm and did not cite us....

Gray Code Conversion

A Gray code is any binary coding sequence in which only a single bit position changes as we move from one
value to the next. There are many such codes, but the traditional one is computed such that the Kth Gray code
is K^(K>>1).

aggregate.org/MAGIC/ 5/9
5/10/2020 The Aggregate Magic Algorithms

The well-known algorithm for conversion from Gray to binary is a linear sequence of XORs that makes it
seem each bit must be dealt with separately. Fortunately, that is equivalent to a parallel prefix XOR that can
be computed using SWAR techniques in log time. For 32-bit Gray code values produced as described above,
the conversion from Gray code back to unsigned binary is:

unsigned int
g2b(unsigned int gray)
{
gray ^= (gray >> 16);
gray ^= (gray >> 8);
gray ^= (gray >> 4);
gray ^= (gray >> 2);
gray ^= (gray >> 1);
return(gray);
}

Integer Constant Multiply

Given an integer value x and an integer or floating point value y, the value of x*y can be computed efficiently
using a sequence derived from the binary value of x. For example, if x is 5 (4 + 1):
y2 = y + y;
y4 = y2 + y2;
result = y + y4;

In the special case that y is an integer, this can be done with shifts:

y4 = (y << 2);
result = y + y4;

Integer Minimum or Maximum

Given 2's complement integer values x and y, the minimum can be computed without any branches as x+
(((y-x)>>(WORDBITS-1))&(y-x)). Logically, this works because the shift by (WORDBITS-1) replicates the sign
bit to create a mask -- be aware, however, that the C language does not require that shifts are signed even if
their operands are signed, so there is a potential portability problem. Additionally, one might think that a
shift by any number greater than or equal to WORDBITS would have the same effect, but many instruction sets
have shifts that behave strangely when such shift distances are specified.

Of course, maximum can be computed using the same trick: x-(((x-y)>>(WORDBITS-1))&(x-y)).

Actually, the Integer Selection coding trick is just as efficient in encoding minimum and maximum....

Integer Power
Given an integer value x and an integer or floating point value y, the value of y to the x power can be
computed efficiently using a sequence derived from the binary value of x. For example, if x is 5 (4 + 1):
y2 = y * y;
y4 = y2 * y2;
result = y * y4;

Integer Selection

A branchless, lookup-free, alternative to code like if (a<b) x=c; else x=d; is ((((a-b) >> (WORDBITS-1)) &
(c^d)) ^ d). This code assumes that the shift is signed, which, of course, C does not promise.

aggregate.org/MAGIC/ 6/9
5/10/2020 The Aggregate Magic Algorithms

Is Power of 2

A non-negative binary integer value x is a power of 2 iff (x&(x-1)) is 0 using 2's complement arithmetic.

Leading Zero Count

Some machines have had single instructions that count the number of leading zero bits in an integer; such an
operation can be an artifact of having floating point normalization hardware around. Clearly, floor of base 2
log of x is (WORDBITS-lzc(x)). In any case, this operation has found its way into quite a few algorithms, so it
is useful to have an efficient implementation:

unsigned int
lzc(register unsigned int x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
return(WORDBITS - ones(x));
}

Least Significant 1 Bit

This can be useful for extracting the lowest numbered element of a bit set. Given a 2's complement binary
integer value x, (x&-x) is the least significant 1 bit. (This was pointed-out by Tom May.) The reason this
works is that it is equivalent to (x & ((~x) + 1)); any trailing zero bits in x become ones in ~x, adding 1 to
that carries into the following bit, and AND with x yields only the flipped bit... the original position of the
least significant 1 bit.

Alternatively, since (x&(x-1)) is actually x stripped of its least significant 1 bit, the least significant 1 bit is
also (x^(x&(x-1))).

Log2 of an Integer

Given a binary integer value x, the floor of the base 2 log of that number efficiently can be computed by the
application of two variable-precision SWAR algorithms. The first "folds" the upper bits into the lower bits to
construct a bit vector with the same most significant 1 as x, but all 1's below it. The second SWAR algorithm
is population count, defined elsewhere in this document. However, we must consider the issue of what the
log2(0) should be; the log of 0 is undefined, so how that value should be handled is unclear. The following
code for handling a 32-bit value gives two options: if LOG0UNDEFINED, this code returns -1 for log2(0);
otherwise, it returns 0 for log2(0). For a 32-bit value:

unsigned int
floor_log2(register unsigned int x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
#ifdef LOG0UNDEFINED
return(ones32(x) - 1);
#else
return(ones32(x >> 1));
#endif
}

aggregate.org/MAGIC/ 7/9
5/10/2020 The Aggregate Magic Algorithms

Suppose instead that you want the ceiling of the base 2 log. The floor and ceiling are identical if x is a power
of two; otherwise, the result is 1 too small. This can be corrected using the power of 2 test followed with the
comparison-to-mask shift used in integer minimum/maximum. The result is:
unsigned int
log2(register unsigned int x)
{
register int y = (x & (x - 1));

y |= -y;
y >>= (WORDBITS - 1);
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
#ifdef LOG0UNDEFINED
return(ones(x) - 1 - y);
#else
return(ones(x >> 1) - y);
#endif
}

Next Largest Power of 2

Given a binary integer value x, the next largest power of 2 can be computed by a SWAR algorithm that
recursively "folds" the upper bits into the lower bits. This process yields a bit vector with the same most
significant 1 as x, but all 1's below it. Adding 1 to that value yields the next largest power of 2. For a 32-bit
value:

unsigned int
nlpo2(register unsigned int x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
return(x+1);
}

Most Significant 1 Bit

Given a binary integer value x, the most significant 1 bit (highest numbered element of a bit set) can be
computed using a SWAR algorithm that recursively "folds" the upper bits into the lower bits. This process
yields a bit vector with the same most significant 1 as x, but all 1's below it. Bitwise AND of the original
value with the complement of the "folded" value shifted down by one yields the most significant bit. For a
32-bit value:

unsigned int
msb32(register unsigned int x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
return(x & ~(x >> 1));
}

Natural Data Type Precision Conversions

aggregate.org/MAGIC/ 8/9
5/10/2020 The Aggregate Magic Algorithms

For integers used to represent natural data types, simply shifting right works well for conversion to a lower
precision, but shifting left is not very effective for converting to a higher precision. The problem is simply
that if the "new" bits are taken to be 0s, the maximum value will never be attained. Likewise, if taken to be
any fixed non-0 value, the value zero will never be obtained. A good answer to this problem is to replicate
the existing bit pattern in the "new" bits, truncating or repeating the pattern if more bits are needed.

For example, a 10-bit raw pixel value (e.g., from my Canon G1) called x

aggregate.org/MAGIC/ 9/9

Manual Solution For RISC-V Edition
100% (5)
Manual Solution For RISC-V Edition
100 pages
C Essentials Mod 2 Notes
No ratings yet
C Essentials Mod 2 Notes
72 pages
Lecture 8 - Timing Constraints
No ratings yet
Lecture 8 - Timing Constraints
18 pages
Wireshark Lab: TCP: Approach, 7 Ed., J.F. Kurose and K.W. Ross
No ratings yet
Wireshark Lab: TCP: Approach, 7 Ed., J.F. Kurose and K.W. Ross
11 pages
Ascent SW User Manual For Fluoroskan Ascent, CF & FL
100% (1)
Ascent SW User Manual For Fluoroskan Ascent, CF & FL
400 pages
Answers 2 Reviews and Exercises
No ratings yet
Answers 2 Reviews and Exercises
26 pages
Computer Architecture and Organization: The Central Processing Unit
100% (1)
Computer Architecture and Organization: The Central Processing Unit
126 pages
number bases worksheet
No ratings yet
number bases worksheet
2 pages
Bugreport OnePlusN200TMO SKQ1.210216.001 2024 05 29 05 51 12 Dumpstate - Log 25424
No ratings yet
Bugreport OnePlusN200TMO SKQ1.210216.001 2024 05 29 05 51 12 Dumpstate - Log 25424
33 pages
03 Integersfloats
No ratings yet
03 Integersfloats
99 pages
Stanford lecture
No ratings yet
Stanford lecture
92 pages
Lecture 3 - Bits, Bytes, and Integers
No ratings yet
Lecture 3 - Bits, Bytes, and Integers
78 pages
Lecture 2 - Bits, Bytes and Integers
No ratings yet
Lecture 2 - Bits, Bytes and Integers
51 pages
Reciprocity and Interreciprocity
No ratings yet
Reciprocity and Interreciprocity
50 pages
Determine the power status of a virtual machine on an ESX or ESXi Host
No ratings yet
Determine the power status of a virtual machine on an ESX or ESXi Host
4 pages
Computer Architecture Eposee
No ratings yet
Computer Architecture Eposee
21 pages
Chapter2 - Machine Instructions and Programs
No ratings yet
Chapter2 - Machine Instructions and Programs
85 pages
29 802.1q
No ratings yet
29 802.1q
10 pages
TM105 Meeting1 Slides
No ratings yet
TM105 Meeting1 Slides
49 pages
AIT_CS_03
No ratings yet
AIT_CS_03
26 pages
Uk Openmp Users 2018 Advanced Openmp Tutorial PDF
No ratings yet
Uk Openmp Users 2018 Advanced Openmp Tutorial PDF
106 pages
Bits, Bytes, Bit Operations and Image Processing Image Processing
No ratings yet
Bits, Bytes, Bit Operations and Image Processing Image Processing
41 pages
EL3011 Computer System Architecture Chapter 2 Bits and Bytes
No ratings yet
EL3011 Computer System Architecture Chapter 2 Bits and Bytes
34 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
30 pages
Bit Twiddling Hacks: by Sean Eron Anderson Seander@cs - Stanford.edu
No ratings yet
Bit Twiddling Hacks: by Sean Eron Anderson Seander@cs - Stanford.edu
34 pages
Lec 11
No ratings yet
Lec 11
27 pages
module_2
No ratings yet
module_2
33 pages
ITC4344 3 Cloud Computing Technologies
No ratings yet
ITC4344 3 Cloud Computing Technologies
35 pages
Week 1 Tutorial - Data Representation
No ratings yet
Week 1 Tutorial - Data Representation
7 pages
C Basic Data Types
No ratings yet
C Basic Data Types
16 pages
02-03-Bits-Ints (32423332)
No ratings yet
02-03-Bits-Ints (32423332)
86 pages
Assembler Language I
No ratings yet
Assembler Language I
42 pages
CPU_Chapter3_DetailedNotes
No ratings yet
CPU_Chapter3_DetailedNotes
11 pages
03 AVR Programming
No ratings yet
03 AVR Programming
38 pages
Javelin3/Javelin3Pro PDF Readers: Secure PDF Reader Program For The Drumlin Digital Rights Management (DRM) Service
No ratings yet
Javelin3/Javelin3Pro PDF Readers: Secure PDF Reader Program For The Drumlin Digital Rights Management (DRM) Service
27 pages
2 Data - Information - Storage v2
No ratings yet
2 Data - Information - Storage v2
23 pages
Intro To Computers 18
No ratings yet
Intro To Computers 18
196 pages
CSC 307 - Computer System Architecture-Elizade 20182019new
No ratings yet
CSC 307 - Computer System Architecture-Elizade 20182019new
162 pages
Ch00 - Introduction To Subject - Review
No ratings yet
Ch00 - Introduction To Subject - Review
59 pages
7 Machine - Condition - Codes v2
No ratings yet
7 Machine - Condition - Codes v2
25 pages
Faiz Assignment CSC134
No ratings yet
Faiz Assignment CSC134
6 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
30 pages
Bits
No ratings yet
Bits
22 pages
Bitwise Operations
No ratings yet
Bitwise Operations
18 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
28 pages
Cisco Switch Guide: Scalable, Intelligent LAN Switching For Campus, Branch, and Data Center Networks of All Sizes
No ratings yet
Cisco Switch Guide: Scalable, Intelligent LAN Switching For Campus, Branch, and Data Center Networks of All Sizes
134 pages
Machine Instructions and Programs
No ratings yet
Machine Instructions and Programs
85 pages
Chapter 2_Part 2
No ratings yet
Chapter 2_Part 2
39 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
30 pages
Computer Organization and Architecture: William Stallings
No ratings yet
Computer Organization and Architecture: William Stallings
7 pages
Bits and Bytes: Giving Credit Where Credit Is Due
No ratings yet
Bits and Bytes: Giving Credit Where Credit Is Due
6 pages
Computer Arithmetic in Computer Organization and Architecture
100% (1)
Computer Arithmetic in Computer Organization and Architecture
41 pages
Protocol Reader STI
No ratings yet
Protocol Reader STI
16 pages
Running Opengl On Visual Studio 2005
No ratings yet
Running Opengl On Visual Studio 2005
23 pages
Microprocessors and Interfacing Paper
No ratings yet
Microprocessors and Interfacing Paper
3 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
32 pages
COD - Unit-3 - N - 4 - PPT AJAY Kumar
No ratings yet
COD - Unit-3 - N - 4 - PPT AJAY Kumar
93 pages
Alpha Assembly Language Guide: Randal E. Bryant Carnegie Mellon University
No ratings yet
Alpha Assembly Language Guide: Randal E. Bryant Carnegie Mellon University
18 pages
Bit Fiddling and Arduino Digital IO
No ratings yet
Bit Fiddling and Arduino Digital IO
61 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
32 pages
Chapter2 - Machine Instructions and Programs
No ratings yet
Chapter2 - Machine Instructions and Programs
85 pages
Bits and Bytes PDF
No ratings yet
Bits and Bytes PDF
76 pages
Introduction To LVDS, PECL, and CML
No ratings yet
Introduction To LVDS, PECL, and CML
17 pages
Computer Organization & Assembly Language: CS/COE0447
No ratings yet
Computer Organization & Assembly Language: CS/COE0447
30 pages
Bits, Bytes, and Integers: Computer Architecture and Organization
No ratings yet
Bits, Bytes, and Integers: Computer Architecture and Organization
87 pages
Algoritmi Pentru Operatile Pe Biti
No ratings yet
Algoritmi Pentru Operatile Pe Biti
41 pages
Ac500 Cpus: Ac500-Cpus Pm571, Pm581, Pm582, Pm590 and Pm591
No ratings yet
Ac500 Cpus: Ac500-Cpus Pm571, Pm581, Pm582, Pm590 and Pm591
24 pages
Chapter 4 Arithmetic and Logic ARM
No ratings yet
Chapter 4 Arithmetic and Logic ARM
31 pages
fa24 Week 2
No ratings yet
fa24 Week 2
36 pages
Bit Maipulation Algorithms
No ratings yet
Bit Maipulation Algorithms
53 pages
Computer Basics - What Is A Computer (Handout)
No ratings yet
Computer Basics - What Is A Computer (Handout)
2 pages
10 - Operating Instruction - DSC... - CAN Bus, Serial Bus - BEA - 209283-En-03
No ratings yet
10 - Operating Instruction - DSC... - CAN Bus, Serial Bus - BEA - 209283-En-03
8 pages
Lec2 Instructions
No ratings yet
Lec2 Instructions
33 pages
Integers Floating Point: N N S E
No ratings yet
Integers Floating Point: N N S E
4 pages
MIPS Architecture - BITS Pilani
No ratings yet
MIPS Architecture - BITS Pilani
58 pages
PDSC 1
No ratings yet
PDSC 1
101 pages
Design Note DN401: Interfacing CC1020/1 To The MSP430
No ratings yet
Design Note DN401: Interfacing CC1020/1 To The MSP430
12 pages
Aptitude
No ratings yet
Aptitude
18 pages
Bitwise Complement: The Bitwise Complement Operator, The Tilde,, Flips Every Bit
No ratings yet
Bitwise Complement: The Bitwise Complement Operator, The Tilde,, Flips Every Bit
30 pages
Nse7 Efw-7.2
No ratings yet
Nse7 Efw-7.2
5 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Chapter2 - Machine Instructions and Programs
No ratings yet
Chapter2 - Machine Instructions and Programs
54 pages
Common Framework For Memory Hierarchy
No ratings yet
Common Framework For Memory Hierarchy
10 pages
What Is A Wireless Router
No ratings yet
What Is A Wireless Router
3 pages
Bar-One Software: User-Friendly, Bar Code, Label Design and Connectivity Solutions
No ratings yet
Bar-One Software: User-Friendly, Bar Code, Label Design and Connectivity Solutions
4 pages
Bits and Bytes PDF
No ratings yet
Bits and Bytes PDF
76 pages
Massachusetts Institute of Technology: C A B C B
No ratings yet
Massachusetts Institute of Technology: C A B C B
5 pages
Serial Port Component
100% (1)
Serial Port Component
9 pages
6 TH Sanskrit 108
No ratings yet
6 TH Sanskrit 108
5 pages
The File Systems Supported by Windows NT
No ratings yet
The File Systems Supported by Windows NT
7 pages
Agents
No ratings yet
Agents
2 pages
Vehicle To Grid
No ratings yet
Vehicle To Grid
7 pages
Christ Apostolic Church Hymn Book Yoruba Version Sample
83% (6)
Christ Apostolic Church Hymn Book Yoruba Version Sample
23 pages
C Programming
From Everand
C Programming
Netra
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet

Bit HAcks

Uploaded by

Bit HAcks

Uploaded by

5/10/2020 The Aggregate Magic Algorithms

The Aggregate Magic Algorithms

Population Count (Ones Count)

Absolute Value of a Float

Comparison of Float Values

#define FasI(f) (*((int *) &(f)))

#define lt0(f) (FasUI(f) > 0x80000000U)

Comparison to Mask Conversion

If the messy non-negative integer value is x, the sanitized version is:

(((int) (-x)) >> (WORDBITS - 1))

To remove the constraint that the messy value be non-negative, use:

(((int) (x | -x)) >> (WORDBITS - 1))

Of course, the opposite condition can be tested using:

(((int) ~(x | -x)) >> (WORDBITS - 1))

(((unsigned int) (x | -x)) >> (WORDBITS - 1))

The opposite condition can be obtained using:

Dual-Linked List with One Pointer Field

if (flag) sharedtemp = serial; /* maskable store */

Gray Code Conversion

Integer Constant Multiply

Integer Minimum or Maximum

Of course, maximum can be computed using the same trick: x-(((x-y)>>(WORDBITS-1))&(x-y)).

Leading Zero Count

Least Significant 1 Bit

Next Largest Power of 2

Most Significant 1 Bit

Natural Data Type Precision Conversions

You might also like

#define FasI(f) (((int ) &(f)))