[go: up one dir, main page]

0% found this document useful (0 votes)
74 views3 pages

2 Huff

The document provides a Python implementation of Huffman Encoding using a greedy strategy, detailing the construction of a Huffman Tree and the generation of binary codes for character encoding. It includes functions for encoding and decoding data, along with an analysis of time and space complexity. The document also explains the steps involved in Huffman encoding, emphasizing the greedy approach to achieve efficient data compression.

Uploaded by

shinchansbv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views3 pages

2 Huff

The document provides a Python implementation of Huffman Encoding using a greedy strategy, detailing the construction of a Huffman Tree and the generation of binary codes for character encoding. It includes functions for encoding and decoding data, along with an analysis of time and space complexity. The document also explains the steps involved in Huffman encoding, emphasizing the greedy approach to achieve efficient data compression.

Uploaded by

shinchansbv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

# : Write a program to implement Huffman Encoding using a greedy strategy.

import heapq
from collections import Counter, defaultdict

# Node class to represent each character and frequency


class Node:
def __init__(self, char, freq):
self.char = char # Character
self.freq = freq # Frequency of the character
self.left = None # Left child in the tree
self.right = None # Right child in the tree

# Define comparison for priority queue


def __lt__(self, other):
return self.freq < other.freq

# Function to build the Huffman Tree


def build_huffman_tree(char_freq):
# Step 1: Create a min-heap (priority queue) with all characters and their
frequencies
heap = [Node(char, freq) for char, freq in char_freq.items()]
heapq.heapify(heap)

# Step 2: Combine nodes until we have the final Huffman Tree


while len(heap) > 1:
left = heapq.heappop(heap) # Smallest frequency node
right = heapq.heappop(heap) # Second smallest frequency node

# Create a new node with combined frequency of the two nodes


merged = Node(None, left.freq + right.freq)
merged.left = left
merged.right = right

# Add this new node back to the heap


heapq.heappush(heap, merged)

# The last remaining node is the root of the Huffman Tree


return heap[0]

# Function to generate Huffman codes for each character


def build_huffman_codes(root):
huffman_codes = {}

# Helper function to generate codes by traversing the tree


def encode(node, code):
if node:
if node.char is not None: # Leaf node
huffman_codes[node.char] = code
encode(node.left, code + "0") # Traverse left
encode(node.right, code + "1") # Traverse right

encode(root, "")
return huffman_codes

# Function to encode the input data using the Huffman codes


def huffman_encoding(data):
# Calculate frequency of each character in the input string
char_freq = Counter(data)
# Build the Huffman Tree based on character frequencies
root = build_huffman_tree(char_freq)

# Generate the Huffman codes from the tree


huffman_codes = build_huffman_codes(root)

# Encode the data by replacing each character with its code


encoded_data = "".join(huffman_codes[char] for char in data)

return encoded_data, huffman_codes

# Function to decode the encoded string back to original text


def huffman_decoding(encoded_data, huffman_codes):
# Reverse the huffman_codes dictionary to get codes as keys
code_to_char = {code: char for char, code in huffman_codes.items()}

# Decode the encoded data


current_code = ""
decoded_data = []
for bit in encoded_data:
current_code += bit
if current_code in code_to_char:
decoded_data.append(code_to_char[current_code])
current_code = ""

return "".join(decoded_data)

# Main program
data = input("Enter a string to encode: ")
encoded_data, huffman_codes = huffman_encoding(data)
print("Encoded data:", encoded_data)
print("Huffman Codes:", huffman_codes)

# Decode the encoded data to verify correctness


decoded_data = huffman_decoding(encoded_data, huffman_codes)
print("Decoded data:", decoded_data)

# Time and Space Complexity Analysis

# Time Complexity:
# Building the Tree:
# O(nlogn)
# Inserting nodes into the heap and extracting two nodes take
# O(logn), and this happens n-1 times
# n−1 times.
# Generating Codes: O(n)
# A DFS traversal takes linear time.
# Overall Time Complexity: O(nlogn)

# Space Complexity:
# Heap Storage:
# O(n) for storing all nodes.
# Tree Storage:
# O(n) for the internal nodes and leaves.
# Huffman Codes:
# O(n) for storing the codes.
# Overall Space Complexity: O(n)
# Huffman encoding is a popular algorithm for data compression, especially for
text. It uses a **greedy strategy** to assign shorter codes to more frequent
characters, thereby reducing the overall storage size for the data. Here’s a step-
by-step explanation of how Huffman encoding works, guided by the greedy approach:

# ### 1. Analyze Character Frequency


# The first step in Huffman encoding is to determine the frequency of each
character in the text or dataset you want to compress. The frequency count helps
prioritize which characters should have shorter codes.

# ### 2. Build a Priority Queue (Min-Heap)


# Using a min-heap, create a priority queue where each node contains:
# - A character from the dataset (or, for internal nodes, no specific
character).
# - The frequency of the character.

# In a greedy approach, nodes with lower frequencies are prioritized, ensuring that
characters with higher frequencies end up closer to the root of the tree and get
shorter codes.

# ### 3. Construct the Huffman Tree


# - **While there are at least two nodes in the queue**:
# - Remove the two nodes with the lowest frequency from the priority queue.
# - Create a new internal node with a frequency equal to the sum of these two
nodes' frequencies.
# - Make the two nodes the left and right children of this new node.
# - Insert the new node back into the priority queue.

# This merging process follows the greedy principle of always combining the
least frequent nodes first, ensuring minimal "cost" for more frequent characters.

# ### 4. Assign Binary Codes


# Once the Huffman tree is complete, assign binary codes to each character:
# - Traverse the tree from the root to each leaf node (representing a
character).
# - Assign a `0` for each left edge and a `1` for each right edge.
# - The path taken to reach each character forms its Huffman code.

# Since more frequent characters are closer to the root, their binary codes are
shorter, achieving efficient compression.

# ### 5. Encode the Data


# Replace each character in the original data with its Huffman code to get the
compressed binary string.

# ### Example

# Consider the string "AAABBC":


# 1. Character frequencies: A = 3, B = 2, C = 1.
# 2. Build initial nodes and use a priority queue to construct the Huffman tree:
# - Merge B and C (smallest frequencies) into a node with frequency 3.
# - Merge the new node (frequency 3) with A (frequency 3).
# 3. Assign codes: A might get `0`, B `10`, and C `11`.

# Thus, Huffman encoding is efficient due to its greedy nature, always making local
choices (combining lowest frequency nodes) that lead to a globally optimized tree
for encoding.

You might also like