Uni Code Image

Uploaded by

senbeth11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views1 page

Uni Code Image

Uploaded by

senbeth11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

UTF-8 Encoding & Decoding — Zero-Knowledge Guide

What you store are bytes. UTF-8 tells you how to turn characters into bytes (encoding) and back
(decoding).

1) Bits, bytes, hex (what is C3?)

A byte is 8 bits. We write a byte as two hexadecimal (hex) digits. Each hex digit = 4 bits.
Example: hex C3 ⇒ C = 12 = 1100, 3 = 0011 ⇒ 1100 0011.
So the reason C3 “becomes” 1100 0011 is: it’s just hex → binary.

0=0000 1=0001 2=0010 3=0011 4=0100 5=0101 6=0110 7=0111

8=1000 9=1001 A=1010 B=1011 C=1100 D=1101 E=1110 F=1111

2) UTF-8 lead/continuation prefixes

Look at the first bits of the first byte:

• 0xxxxxxx → 1 byte total (ASCII range).

• 110xxxxx → 2 bytes total (next must start with 10).

• 1110xxxx → 3 bytes total (then two 10 bytes).

• 11110xxx → 4 bytes total (then three 10 bytes).

• Continuation bytes always start 10xxxxxx.

3) ENCODING by hand (char ⇒ bytes)

Example: Encode ‘£’ (U+00A3).
Step 1: U+00A3 = hex A3 = binary 1010 0011.
Step 2: Range is U+0080–07FF ⇒ use 2-byte template 110xxxxx 10xxxxxx.
Step 3: Fill x’s from right to left. Last 6 bits → 2nd byte: 10 100011 = 1010 0011 (A3).
Remaining bits (pad to 5) → 1st byte: 00010 ⇒ 110 00010 = 1100 0010 (C2).
Answer: C2 A3.
Another quick one: ‘é’ (U+00E9) ⇒ C3 A9.

4) DECODING by hand (bytes ⇒ char)

Example: Decode C3 A9.
Step 1: C3⇒1100 0011 (starts with 110 ⇒ 2-byte char). A9⇒1010 1001.
Step 2: Strip prefixes: from first drop 110 ⇒ 00011; from second drop 10 ⇒ 101001.
Step 3: Join bits: 00011 101001 = 1110 1001 = hex E9 = U+00E9 = ‘é’.

5) What to remember for exams

• ASCII stays 1 byte in UTF-8. Others use 2–4 bytes.

• Count leading 1s in the first byte to know how many bytes long the character is.

• Show working and units (bytes) when asked for file sizes.

Handout - Utf 8 Encoding Explained (Step by Step For U+1f60a)
No ratings yet
Handout - Utf 8 Encoding Explained (Step by Step For U+1f60a)
4 pages
Unicode Better Explained
No ratings yet
Unicode Better Explained
5 pages
Unicode
No ratings yet
Unicode
4 pages
Unicode in C and C
No ratings yet
Unicode in C and C
8 pages
Unicode CPP PDF
No ratings yet
Unicode CPP PDF
139 pages
Unicode HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Unicode HOWTO: Guido Van Rossum and The Python Development Team
12 pages
Unicode and Character Sets
No ratings yet
Unicode and Character Sets
2 pages
UNI Teaching
No ratings yet
UNI Teaching
20 pages
Lecture - ASCII and Unicode
No ratings yet
Lecture - ASCII and Unicode
38 pages
Understanding Unicode and Encodings
No ratings yet
Understanding Unicode and Encodings
4 pages
Unicode in C++ - McNellis - CppCon 2014
No ratings yet
Unicode in C++ - McNellis - CppCon 2014
125 pages
Howto Unicode
No ratings yet
Howto Unicode
12 pages
Notes 07 Compression PDF
No ratings yet
Notes 07 Compression PDF
193 pages
Unicode Basics for Tech Enthusiasts
No ratings yet
Unicode Basics for Tech Enthusiasts
51 pages
Coding Encoding
No ratings yet
Coding Encoding
14 pages
Ascii and Unicode
No ratings yet
Ascii and Unicode
6 pages
Module10 PDF
No ratings yet
Module10 PDF
108 pages
10200
No ratings yet
10200
38 pages
Programming With Uni Cod
No ratings yet
Programming With Uni Cod
63 pages
Unicode Encoding Explained
No ratings yet
Unicode Encoding Explained
7 pages
1.3 Data Storage - Part 1
No ratings yet
1.3 Data Storage - Part 1
15 pages
Lecture 1: Encoding Language: LING 1330/2330: Introduction To Computational Linguistics Na-Rae Han
No ratings yet
Lecture 1: Encoding Language: LING 1330/2330: Introduction To Computational Linguistics Na-Rae Han
18 pages
Utf-8 - Wikipedia, The Free Encyclopedia
No ratings yet
Utf-8 - Wikipedia, The Free Encyclopedia
10 pages
Unicode HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Unicode HOWTO: Guido Van Rossum and The Python Development Team
13 pages
Howto Unicode
No ratings yet
Howto Unicode
9 pages
Howto Unicode PDF
No ratings yet
Howto Unicode PDF
11 pages
Howto Unicode
No ratings yet
Howto Unicode
13 pages
p62 0x09 UTF8 Shellcode by Greuff
No ratings yet
p62 0x09 UTF8 Shellcode by Greuff
16 pages
Converting Strings To Bytes and Vice Versa
No ratings yet
Converting Strings To Bytes and Vice Versa
2 pages
6.0 Bit Operations
No ratings yet
6.0 Bit Operations
22 pages
Unicode Vs UTF-8
No ratings yet
Unicode Vs UTF-8
2 pages
Howto Unicode PDF
No ratings yet
Howto Unicode PDF
13 pages
Compress: Input
No ratings yet
Compress: Input
2 pages
CHARACTER ENCODING: How Do Computers Deal With Multiple Language?
No ratings yet
CHARACTER ENCODING: How Do Computers Deal With Multiple Language?
26 pages
Info
No ratings yet
Info
3 pages
Character Sets, Encodings, and Unicode
No ratings yet
Character Sets, Encodings, and Unicode
26 pages
Python Unicode Guide
No ratings yet
Python Unicode Guide
13 pages
210 Huffman Encoding
No ratings yet
210 Huffman Encoding
10 pages
Lecture Slides 01 015-Arrays
No ratings yet
Lecture Slides 01 015-Arrays
7 pages
Binary Data Handling in Ruby
No ratings yet
Binary Data Handling in Ruby
31 pages
Lab 02
No ratings yet
Lab 02
12 pages
Uni Code Basic
No ratings yet
Uni Code Basic
2 pages
Lecture 13#CSE1012-2
No ratings yet
Lecture 13#CSE1012-2
34 pages
Programming - Chapter - 2 - Introduction - To C.
No ratings yet
Programming - Chapter - 2 - Introduction - To C.
6 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
Ascii
No ratings yet
Ascii
16 pages
Programming-Arduino (1) - Pages-104
No ratings yet
Programming-Arduino (1) - Pages-104
1 page
Unicodebook PDF
No ratings yet
Unicodebook PDF
73 pages
Binary Code Writing Guide
No ratings yet
Binary Code Writing Guide
4 pages
Lecture 2&3
No ratings yet
Lecture 2&3
30 pages
Computer Codes
No ratings yet
Computer Codes
22 pages
Screenshot 2024-05-14 at 9.07.07 PM
No ratings yet
Screenshot 2024-05-14 at 9.07.07 PM
39 pages
Ruby Conf 2006: I18N, M17N, Unicode, and All That
No ratings yet
Ruby Conf 2006: I18N, M17N, Unicode, and All That
60 pages
20 Compression
No ratings yet
20 Compression
58 pages
L10 Huffman Encoding Greedy
No ratings yet
L10 Huffman Encoding Greedy
52 pages
Standard Form Homework
No ratings yet
Standard Form Homework
1 page
Lesson 2
No ratings yet
Lesson 2
9 pages
Pseudocode-Computer Science
No ratings yet
Pseudocode-Computer Science
3 pages
Mathbook 9
No ratings yet
Mathbook 9
1 page
Lesson 1 - Icebreaker Classes As
No ratings yet
Lesson 1 - Icebreaker Classes As
4 pages
Teacher Prob
No ratings yet
Teacher Prob
2 pages
Math
No ratings yet
Math
2 pages
Cum Mary Comp 78
No ratings yet
Cum Mary Comp 78
2 pages
Cambridge Lower Secondary Maths Year 7 - Chapter 1 - Numbers Teaching Plan
No ratings yet
Cambridge Lower Secondary Maths Year 7 - Chapter 1 - Numbers Teaching Plan
9 pages
Mathbook 9 Fian L
No ratings yet
Mathbook 9 Fian L
1 page
Ans Teach Year 9 Comp
No ratings yet
Ans Teach Year 9 Comp
3 pages
Ass Ement Teacher Cs As
No ratings yet
Ass Ement Teacher Cs As
4 pages
Cambridge Lower Secondary Mathematics Year 7 - Introduction Presentation & First Two Weeks Plan
No ratings yet
Cambridge Lower Secondary Mathematics Year 7 - Introduction Presentation & First Two Weeks Plan
10 pages
Asses Ment Sheet
No ratings yet
Asses Ment Sheet
4 pages
Compyear 7
No ratings yet
Compyear 7
3 pages

Uni Code Image

Uploaded by

Uni Code Image

Uploaded by

UTF-8 Encoding & Decoding — Zero-Knowledge Guide

1) Bits, bytes, hex (what is C3?)

0=0000 1=0001 2=0010 3=0011 4=0100 5=0101 6=0110 7=0111

2) UTF-8 lead/continuation prefixes

• 0xxxxxxx → 1 byte total (ASCII range).

• 110xxxxx → 2 bytes total (next must start with 10).

• 1110xxxx → 3 bytes total (then two 10 bytes).

• 11110xxx → 4 bytes total (then three 10 bytes).

• Continuation bytes always start 10xxxxxx.

3) ENCODING by hand (char ⇒ bytes)

4) DECODING by hand (bytes ⇒ char)

5) What to remember for exams

You might also like