Data Representation using Computers
Q: List the data that you have dealt with in real
life.
Q. Can you categorize data from real life?
One way of categorizing real life data:
Integer
Real Numbers
Sequence of Characters.
Above categorization excludes many types of data e.g. Music, Images, Videos, etc. but still covers
a lot.
Computers/Processors (e.g. i3, i5, snapdragon, rayzon, etc), that is "hardware" can deal with
Limited Sized integer and real numbers.
Number system
● Lets begin with Decimal number system.
–By"387" we understand Three Hundred Eighty
Seven, that is
–387 = 300 + 80 + 7
– = 3 * 100 + 8 * 10 + 7
– = 3 * 102 + 8 * 101 + 7 * 100
Ternary Number system
● A ternary number system.
–Uses 0, 1, 2 as numeric symbols
–Has 3 as base
–Uses powers of 3 as positional weights.
● So
(212) in ternary is
Binary Number System
● Binary number system
–Uses 0, 1 bits
–Has 2 as base
–uses powers of 2 as positional weights.
● So
–(101) in binary is
Number Sequences
●Sequence of numbers in decimal is derived by adding 1
to last number, and using 0 specially to start a new
sequence.
–E.g. 1, 2, 3, ... 9 are followed by 10
–that is 1 and then 0 which means ten
–10 is followed by 10 + 1 = 11
–11 is followed by 11 + 1 = 12
Number Sequences
●Similarly in binary, the number sequence is generated
by adding 1 to last number, using 0 in a special way, and
using carries.
–So the sequence of binary numbers is
1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010, 1011,
1100, 1101, 1110, 1111, etc.
● Homework: write all binary numbers from 0 to 100.
Conversions
● Converting binary to decimal
● is Easy.
●Simply use positional weights of powers of 2 and do
the summation.
● (1011) in binary =
● 1 * 23 + 0 * 22 + 1 * 21 + 1 * 20
Conversions
Theorem: Any number can be expressed as a sum of powers of
two.
Can you prove this theorem?
This will get covered in other courses, later.
Conversions
Converting decimal to binary
the principle we use here is that any number is a sum of powers of two.
●
●as we know the fact that we "take or leave" a power of two, in representing a number in
binary.
So , let's take 23. Can we express 23 as sum of powers of 2? Yes.
●
23 = 16 + 4 + 2 + 1
Here we excluded 8.
So, we took 24, 22, 21, 20, and left 23.
23 in decimal = (10111) in binary.
–We write it as
Conversions
Converting decimal to binary
Better method
●
●Keep dividing the number by 2, and
remembering the reminder.
Dividor No/2
● Reminder
● 2 23
● 11 1
Conversions
Converting decimal to binary
● Better method. Lets try 87
● Dividor No/2 Reminder
● 2 87
● 43 1
● 21 1
● 10 1
● 5 0
Conversions
Converting decimal to binary: for FRACTIONAL Numbers
Example: Convert 0.625 to binary.
●
0.625 × 2 = 1.25 (whole number 1)
●
0.25 × 2 = 0.5 (whole number 0)
●
0.5 × 2 = 1.0 (whole number 1)
●
–So, the binary representation of 0.625 is 0.101
Convert the Fractional Part
●
–Multiply by 2: Multiply the fractional part by 2.
–Record the Whole Number: Note the whole number part (0 or 1).
Some important notes about binary
numbers.
1 is 1
11 is 3
111 is 7
1111 is 15
Some important notes about binary numbers.
Some important notes about binary
numbers.
●2
10 = 1024 ~= 1000
● 220 = 1024*1024 ~= 1 million or 10 lakhs
●2
30
= 1024*1024*1024 ~= 1 billion or 100
crore
Storing “Integer” data using computers
Data representation using computers.
Now, so far we dicussed mathematics.
●Lets turn back to Data representation using
computers.
–The computer/processor can store and
process
–limited size integers and real numbers.
● E.g. xyz processor can store 1 byte integers.
Data representation using computers.
Question: if a computer has 32-bit storage
for integers, then how much can you store
in it?
Answer: the possibilities are from
00000000 00000000 00000000 00000000
to
Storing integers using computers
● Note: here computers mean “processor”.
● Question: how to store negative numbers?
–How to store the sign?
–Given
that you can only use 0s and 1s to store
ANYTHING!
● Solution: use one bit for sign!
–So out of 32 bits, one is used for sign, leaving 31 bits
sign-magnitude notation
● Let’s say 8 bits used for storing integer.
● 1 bit used for sign. So 7 left.
● So maximum values that can be stored are
–000 0000 to 111 1111, that is 0 to 127
● 0000 0000 to 0111 1111 is 0 to 127
● 1000 0000 to 1111 1111 is -0 to -127
Problems with sign-magnitude notation
● There are two 0s!
–+0 and -0
●How to do addition/substraction (in the
hardware/processor?)
Say, 5 + -7 ? (answer is -2 , that is 1000 0010)
0000 0101
+ 1000 0111
2’s complement notation
●One bit is given for sign. However -ve numbers are
obtained like this:
–Consider the number as positive, get binary
–Take “complement” (i.e. ~) of the binary number (1-
>0, 0->1) (this is called 1’s complement, but we will
skip this concept)
–Add1 to the complement, and you get 2’s
complement!
2’s complement notation
● 5 is 0000 0101
● -5 is 1111 1011 (found on earlier slide)
● Now -(-5) is ?
–~(1111 1011) + 1 = (0000 0100) + 1 = 0000 0101 = 5
–Got 5 back!
2’s complement notation
●Addition/substraction ? Is easy! (It’s easy to do in
hardware!)
● 5 + 7 = 12
0000 0101
+ 0000 0111
-------------------
0000 1100
2’s complement notation
● Addition/substraction ? Is easy!
● 5 + -7 = -2
0000 0101
+ 1111 1001 (this is 2’s complement for -7)
-------------------
1111 1110
2’s complement notation
● What is 0 in 2’s complement notation?
–Just all 0s !
–No “-0” and “+0”
● What is 1000 0000 ?
–First bit is 1, so it’s negative. But which number?
–Justtake 2’s complement of this, and you get 0111
1111 +1 = 1000 0000, that is 128
2’s complement notation
● Maximum and Minimum values ?
● 8 bit integers? 0000 0000 to 1111 1111
–1 used for sign, 7 left
–So 0000 0000 to 0111 1111 is 0 to 127
– 1000 0000 to 1111 1111 is -128 to -1
● So with “increasing bit pattern” numbers go like this:
Issues of overflow
(exceed upper limit)
and underflow
(exceed lower limit)
(using 4 bit integers)
Here we have
values in range
0 to 7
-8 to -1
OVERFLOW RULE
●If two numbers are added, and they are both positive
or both negative, then overflow occurs if and only if
the result has the opposite sign.
SUBTRACTION RULE
●To subtract one number (subtrahend) from another
(minuend), take the twos complement (negation) of
the subtrahend and add it to the minuend.
● Simply put
–Substraction is addition of the 2’s complement
How to do
substraction
?
Storing “Real Number” data using computers
Real Numbers
● Examples
–11.123
–-1.1231232
–0.123123
–1287231231.000000001
–Etc.
Real Numbers on Processor?
● Need to store “2 parts” and “decimal point”
●“Decimal point” can be ignored, if a convention is
followed
–For example, if decided to use , out of 32 bits: use 8
for integer part and 24 for fraction part
–Or 16 bits for integer part and 24 bits for fraction part
–Etc.
Real Numbers on Processor?
● IEEE Format
● Slightely complicated
● OK if you do not understand it completely !
IEEE Standard 754 Floating Point Numbers
85.125 2. Double precision:
1. Single
85 = 1010101
precision: biased exponent
0.125 = 001
85.125 = 1010101.001
biased exponent 1023+6=1029
127+6=133 1029 =
=1.010101001 x 2^6
sign = 0 133 = 10000101 10000000101
Normalised Normalised mantisa
mantisa (ignore = 010101001
Real Numbers on Processor?
● IEEE Floating point notation
Storing “Character” data using computers
Character data?
● “Abhijit” is a sequence of 7 characters
–A, b, h, i, j, i, t
●How many such characters are there (or you know? )
in “english” world?
–On your keyboard
● Alphabets 26 + 26
● Digits 10
Character data?
● How to represent characters using computers?
●Given that everthing can only be stored as sequence
of bits (0 and 1)?
● Idea!
–Let’s assign some numbers to characters
–E.g. a is 97, b is 98, c is 99, ... z is 122
–Store a as 0110 0001
ASCII
ASCII
Character data?
● Problem!
–If ‘a’
is 0110 0001 (that is 97), then how to store the
number 97 itself?
–OR:
Given 0110 0001, how can you tell whether it’s
number 97 or ‘a’
● Same problem is there for any data !
–Given a 32 bit pattern, can you tell if
● It’s a 32 bit integer?
Unicode
●ASCII covered only ROMAN characters, and some
special characters
–Old standard
● Unicode
–Covers almost every langauge and every script
–ASCII is subset of unicode
●ASCII is 7 bit code, Unicode is not 7-bit code but just a
numberinghttps://www.utf8-chartable.de/unicode-utf8-table.pl?start=2304&number=128
Processor and ASCII/unicode
● No relationship!
● Hardware does not understand it
–Processor plays with numbers and does numerical
calculations
●“Numbers” to “characters” conversion is all an
“interepretation”, hence in software world
–Your programs do this !
Hexadecimal notation
● Hexadecimal number system
–Base 16
● Symbols: 0, 1, 2, ...9, A, B, C, D, E, F
● Hex to Binary and Vice-Versa
–0x1A is 0001 1010
–0xAF19 is 1010 1111 0001 1001
Little vs Big Endian storage
●Suppose processor has to store 32 bit number, that is
4 bytes
● Question : in which order to store the 4 bytes ?
–b3 b2 b1 b0
–b0 b1 b2 b3
●In the above , first is little endian (Intel processors),
second is big endian (e.g. PowerPC, SPARC, etc) . This