COMPUTER APPLICATIONS
University of Abuja
Faculty of Science
Department of Computer Science
CSC 300: Microcomputer Applications
Lecture 2: Review data representation, characters, words, strings
In the lecture, the following topics will be covered:
1. Types of data and data Representation
2. Text representation
3. Image representation
4. Sound Representation
5. Data Compression: images, sounds and videos
1. Types of data and data representation
In Lecture 1, it was explained that data is an unprocessed facts and figures and that data is
represented in the computer using 0s and 1s (equivalent to electrical pulses) called bits (or binary
digits). The smallest “unit” of data on a binary computer is a single bit. A group of 4 bits is
called nibble, a group of 8 bits is called byte and a group of 16 bits is called word. Bytes are
typically used as a unit of measure of data in the storage unit. A 1 megabyte storage unit can
store 1048576 bytes of data (which is equal to 1024 kilobytes).
2. Text representation
As described in Lecture 1, data is used to represent texts, pictures, videos and audios. Texts are
usually in characters which are converted to numbers (by means of a code) for representation in
the computer. Each character symbol is assigned a unique bit pattern. The text is then
represented as a long string of bits in which the successive patterns represent the successive
symbols in the original text.
In the mid-1900s, the American National Standards Institute (ANSI) adopted the American
Standard Code for Information Interchange (ASCII) that uses bit patterns of length seven to
represent upper- and lowercase letters of the English alphabet, punctuation symbols, digits 0
through 9, and certain control information such as line feeds, carriage returns, and tabs. The
symbols are represented in the table below with the column bits representing the first 3 bits and
the row bits representing the last 4 bits of the bit pattern.
1
COMPUTER APPLICATIONS
Column
0 1 2 3 4 5 6 7
000 001 010 011 100 101 110 111
Row
0(0000) NUL DLE SP 0 @ P ` p
1(0001) SOH DC1 ! 1 A Q a q
2(0010) STX DC2 " 2 B R b r
3(0011) ETX DC3 # 3 C S c s
4(0100) EOT DC4 $ 4 D T d t
5(0101) ENQ NAK % 5 E U e u
6(0110) ACK SYN & 6 F V f v
7(0111) BEL ETB ' 7 G W g w
8(1000) BS CAN ( 8 H X h x
9(1001) HT EM ) 9 I Y i y
10(1010) LF SUB * : J Z j z
11(1011) VT ESC + ; K [ k {
12(1100) FF FS , < L \ l |
13(1101) CR GS - = M ] m }
14(1110) SO RS . > N ^ n ~
15(1111) SI US / ? O _ o DEL
The binary code for any entry can be found by composing the bits of the column number with the
bits of the row number. For example, CR is in column 0, row 13, and thus has the binary code
0001101 = 13 decimal = 15 octal = 0D hexadecimal. The ASCII encoding scheme can only
accommodate 128 character symbols.
The International Organization for Standardization (ISO) developed a number of extensions of
ASCII, each of which was designed to accommodate a major language group.
In contrast to the 7-bit ASCII encoding scheme, IBM developed Extended Binary Coded
Decimal Interchange Code (EBCDIC) in 1963. This is an eight-bit character encoding used
mainly on IBM mainframe and IBM midrange computer operating systems. The EBCDIC can
accommodate character symbols.
2
COMPUTER APPLICATIONS
ASCII and its extended versions, were not sufficient to accommodate alphabets of Asian and
some Eastern European languages. For this and other reasons, the Unicode was developed
through the cooperation of several of the leading manufacturers of hardware and software. The
encoding scheme uses up to 21 bits to represent each symbol. The Unicode is usually combined
with Unicode Transformation Format 8-bit (UTF-8) encoding standard to accommodate ASCII
8-bit pattern (note that 8-bit ASCII simply attached 0 as the most significant bit), and languages
such as in Chinese, Japanese, and Hebrew. UTF-8 also uses 24-bit and 32-bit patterns to
represent more obscure Unicode symbols.
A file consisting of a long sequence of symbols encoded using ASCII or Unicode is often called
a text file. A text file contains only character-by-character encoding of the text. Note that text file
produced by a word processor contains numerous proprietary codes representing changes in
fonts, alignment information and other parameters.
3. Image Representation
Pictures and videos are represented in bits, octal or hexadecimal. Pixels (or picture elements) are
also used to represent pictures and video frames. A pixel is defined as the basic unit of
programmable color on a computer display or in a computer image. The physical size of a pixel
is based on the resolution for the display screen. The appearance of each pixel is then encoded
and the entire image is represented as a collection of these encoded pixels called bit map.
The method of encoding the pixels in a bit map varies among applications. In a simple black-
and-white image, each pixel is a single bit whose value depends on whether the pixel is black or
white. A more elaborate black-and-white photograph uses collection of bits (typically 8 bits) to
allow for a variety of shades of grayness to be represented. For color images, each pixel is
encoded using two approaches.
First approach: RGB encoding uses red, green and blue color components, corresponding to the
intensity of three primary colors of light to represent a pixel. Three bytes of storage are required
to represent a single pixel in the original image.
Second approach: uses brightness component and two color components. The brightness
component is called the pixel luminance and it is the sum of the red, green, and blue components.
The two color components are blue and red chrominance, obtained from the difference between
the pixel luminance and the amount of blue or red light in the pixel.
4. Sound representation
The most generic method of encoding audio information for computer storage and manipulation
is to sample the amplitude of the sound wave at regular intervals and record the series of values
3
COMPUTER APPLICATIONS
obtained. A sample rate of 8000 samples per second has been used for years in long-distance
voice telephone communication.
To obtain better quality sound reproduction by today’s musical CDs, a sample rate of 44,100
samples per second is used. The sample data is represented in 16 bits (or 32 bits for stereo
recording).
Another encoding scheme for audio known as Musical Instrument Digital Interface (MIDI) is
widely used in the music synthesizers found in electronic keyboards, video game sound, sound
effects accompanying websites. MIDI encodes what instrument is to play which note and for
what duration of time.
5. Data compression
Data compression is the technique of reducing the size of the data while retaining the underlying
information. Data compression scheme is either lossless or lossy. Lossless schemes do not lose
information in the compression. Lossy schemes may lead to the loss of information. Lossy
techniques often provide more compression than lossless schemes and are therefore popular in
settings in which minor errors can be tolerated.
The bit maps produced using the encoding schemes described in the preceding sections are often
large. Numerous compression schemes have been devised for image, audio and sound
compression. Examples of such compression schemes are Graphic Interchange Format (GIF),
Joint Photographic Experts Group (JPEG), Tagged Image File Format (TIFF), and Motion
Picture Experts Group (MPEG).
GIF was developed by CompuServe and it uses dictionary encoding system to reduce the number
of colors of a pixel to 256. The 256 encodings are stored in a table (a dictionary) called palette.
Each pixel in an image is represented by a single byte whose value indicates which of the 256
palette entries represents the pixel’s color.
JPEG is has proved to be an effective standard for compressing color photographs. The JPEG
standard encompasses several methods, each with its own goals. When precision is utmost,
JPEG’s lossless is used, this usually entails high level compression The JPEG’s lossy sequential
mode has become the standard of choice in many applications.
TIFF as a compression technique allows for the storing of photographs along with their related
information, such as date, time, and camera settings.
MPEG is a compression scheme for audios and videos and it encompasses a variety of standards.
4
COMPUTER APPLICATIONS
References
Brookshear and Brylow, 2015, Computer Science, An Overview, 12th edition, Pearson
Publishers.
http://www.bernstein-plus-sons.com/.dowling/Characters.html
https://en.wikipedia.org/wiki/EBCDIC