# : Write a program to implement Huffman Encoding using a greedy strategy.
import heapq
from collections import Counter, defaultdict
# Node class to represent each character and frequency
class Node:
    def __init__(self, char, freq):
        self.char = char # Character
        self.freq = freq # Frequency of the character
        self.left = None # Left child in the tree
        self.right = None # Right child in the tree
    # Define comparison for priority queue
    def __lt__(self, other):
        return self.freq < other.freq
# Function to build the Huffman Tree
def build_huffman_tree(char_freq):
    # Step 1: Create a min-heap (priority queue) with all characters and their
frequencies
    heap = [Node(char, freq) for char, freq in char_freq.items()]
    heapq.heapify(heap)
    # Step 2: Combine nodes until we have the final Huffman Tree
    while len(heap) > 1:
        left = heapq.heappop(heap) # Smallest frequency node
        right = heapq.heappop(heap) # Second smallest frequency node
        # Create a new node with combined frequency of the two nodes
        merged = Node(None, left.freq + right.freq)
        merged.left = left
        merged.right = right
        # Add this new node back to the heap
        heapq.heappush(heap, merged)
    # The last remaining node is the root of the Huffman Tree
    return heap[0]
# Function to generate Huffman codes for each character
def build_huffman_codes(root):
    huffman_codes = {}
    # Helper function to generate codes by traversing the tree
    def encode(node, code):
        if node:
            if node.char is not None: # Leaf node
                 huffman_codes[node.char] = code
            encode(node.left, code + "0") # Traverse left
            encode(node.right, code + "1") # Traverse right
    encode(root, "")
    return huffman_codes
# Function to encode the input data using the Huffman codes
def huffman_encoding(data):
    # Calculate frequency of each character in the input string
    char_freq = Counter(data)
      # Build the Huffman Tree based on character frequencies
      root = build_huffman_tree(char_freq)
      # Generate the Huffman codes from the tree
      huffman_codes = build_huffman_codes(root)
      # Encode the data by replacing each character with its code
      encoded_data = "".join(huffman_codes[char] for char in data)
      return encoded_data, huffman_codes
# Function to decode the encoded string back to original text
def huffman_decoding(encoded_data, huffman_codes):
    # Reverse the huffman_codes dictionary to get codes as keys
    code_to_char = {code: char for char, code in huffman_codes.items()}
      # Decode the encoded data
      current_code = ""
      decoded_data = []
      for bit in encoded_data:
          current_code += bit
          if current_code in code_to_char:
              decoded_data.append(code_to_char[current_code])
              current_code = ""
      return "".join(decoded_data)
# Main program
data = input("Enter a string to encode: ")
encoded_data, huffman_codes = huffman_encoding(data)
print("Encoded data:", encoded_data)
print("Huffman Codes:", huffman_codes)
# Decode the encoded data to verify correctness
decoded_data = huffman_decoding(encoded_data, huffman_codes)
print("Decoded data:", decoded_data)
# Time and Space Complexity Analysis
#   Time Complexity:
#   Building the Tree:
#   O(nlogn)
#   Inserting nodes into the heap and extracting two nodes take
#   O(logn), and this happens n-1 times
#   n−1 times.
#   Generating Codes: O(n)
#   A DFS traversal takes linear time.
#   Overall Time Complexity: O(nlogn)
#   Space Complexity:
#   Heap Storage:
#   O(n) for storing all nodes.
#   Tree Storage:
#   O(n) for the internal nodes and leaves.
#   Huffman Codes:
#   O(n) for storing the codes.
#   Overall Space Complexity: O(n)
# Huffman encoding is a popular algorithm for data compression, especially for
text. It uses a **greedy strategy** to assign shorter codes to more frequent
characters, thereby reducing the overall storage size for the data. Here’s a step-
by-step explanation of how Huffman encoding works, guided by the greedy approach:
# ### 1. Analyze   Character Frequency
# The first step   in Huffman encoding is to determine the frequency of each
character in the   text or dataset you want to compress. The frequency count helps
prioritize which   characters should have shorter codes.
# ### 2. Build a Priority Queue (Min-Heap)
# Using a min-heap, create a priority queue where each node contains:
#    - A character from the dataset (or, for internal nodes, no specific
character).
#    - The frequency of the character.
# In a greedy approach, nodes with lower frequencies are prioritized, ensuring that
characters with higher frequencies end up closer to the root of the tree and get
shorter codes.
# ### 3. Construct the Huffman Tree
#    - **While there are at least two nodes in the queue**:
#      - Remove the two nodes with the lowest frequency from the priority queue.
#      - Create a new internal node with a frequency equal to the sum of these two
nodes' frequencies.
#      - Make the two nodes the left and right children of this new node.
#      - Insert the new node back into the priority queue.
#    This merging process follows the greedy principle of always combining the
least frequent nodes first, ensuring minimal "cost" for more frequent characters.
# ### 4. Assign Binary Codes
# Once the Huffman tree is complete, assign binary codes to each character:
#    - Traverse the tree from the root to each leaf node (representing a
character).
#    - Assign a `0` for each left edge and a `1` for each right edge.
#    - The path taken to reach each character forms its Huffman code.
# Since more frequent characters are closer to the root, their binary codes are
shorter, achieving efficient compression.
# ### 5. Encode the Data
# Replace each character in the original data with its Huffman code to get the
compressed binary string.
# ### Example
#   Consider the string "AAABBC":
#   1. Character frequencies: A = 3, B = 2, C = 1.
#   2. Build initial nodes and use a priority queue to construct the Huffman tree:
#      - Merge B and C (smallest frequencies) into a node with frequency 3.
#      - Merge the new node (frequency 3) with A (frequency 3).
#   3. Assign codes: A might get `0`, B `10`, and C `11`.
# Thus, Huffman encoding is efficient due to its greedy nature, always making local
choices (combining lowest frequency nodes) that lead to a globally optimized tree
for encoding.