Character Encoding
1. What is Character Encoding?
Computers store everything as binary.
Text (letters, digits, punctuation, symbols) also needs to be stored in binary.
Character encoding is the system that maps each character to a unique binary code.
Example: The letter A → 01000001 in ASCII.
2. ASCII (American Standard Code for Information Interchange)
Early and most common encoding system.
Uses 7 bits to represent 128 characters.
o Characters include: uppercase letters (A–Z), lowercase letters (a–z), digits (0–9),
punctuation, control codes.
Extended ASCII uses 8 bits (256 characters).
o Useful for additional symbols (e.g., £, é, etc.).
Example:
A = 65 → 1000001 (7-bit)
a = 97 → 1100001
Limitations of ASCII
Can’t represent characters from all languages (e.g., Arabic, Chinese).
Mostly designed for English.
3. Unicode
Developed to overcome ASCII’s limitations.
Provides a universal character set covering most written languages, symbols, and
emojis.
Unicode is a standard; different implementations exist (UTF-8, UTF-16, UTF-32).
UTF-8 (most widely used):
𝒮𝒾𝓂𝓅𝓁𝒾𝒻𝓎𝒾𝓃𝑔 𝒞𝑜𝓂𝓅𝓊𝓉𝑒𝓇 𝒮𝒸𝒾𝑒𝓃𝒸𝑒 𝒻𝑜𝓇 𝒴𝑜𝓊 – SIR SHARJIL | WHATSAPP 0315-0511431 1
Variable length encoding: 8, 16, or 32 bits depending on character.
Compatible with ASCII (first 128 characters are the same).
Efficient for English text, flexible for global languages.
UTF-16 / UTF-32:
Use fixed lengths (16-bit or 32-bit) → suitable for systems that need direct indexing.
4. Importance of Character Encoding
Allows computers to exchange text reliably across platforms.
Standardization avoids misinterpretation (e.g., mojibake – garbled characters).
Essential for multilingual support, web pages, and software internationalization.
5. Examples in Practice
ASCII file → "HELLO" = 01001000 01000101 01001100 01001100 01001111
Unicode example:
o (أArabic letter Alef) in Unicode → U+0623
o Stored in UTF-8 as 11011000 10100011
6. Key Exam Pointers for AS Students
Know the difference between ASCII and Unicode.
Be able to explain why Unicode is needed in modern computing.
Recognize bit sizes (7-bit ASCII, extended 8-bit ASCII, UTF-8, UTF-16, UTF-32).
Explain advantages/disadvantages:
o ASCII → smaller, but limited.
o Unicode → larger, but global.
✅ Summary:
Character encoding is the bridge between human-readable text and binary. ASCII is small and
simple but limited, while Unicode is global and flexible, making it the modern standard.
𝒮𝒾𝓂𝓅𝓁𝒾𝒻𝓎𝒾𝓃𝑔 𝒞𝑜𝓂𝓅𝓊𝓉𝑒𝓇 𝒮𝒸𝒾𝑒𝓃𝒸𝑒 𝒻𝑜𝓇 𝒴𝑜𝓊 – SIR SHARJIL | WHATSAPP 0315-0511431 2