áááºáááºážáá
áááºáá®
Huffman coding ááẠááá¯ááºáá»á¯á¶á·ááŒááºážá á¡ááŒá±áá¶á¡áá°á¡áááᯠáá¯á¶áá±á¬áºáá±ážááá·áº áá±áá¬áá»á¯á¶á·á áá áºáá áºáá¯ááŒá áºáááºá á€áá±á¬ááºážáá«ážááœááºá áá¯á¶áá±ááŸáá·áº ááŒá±á¬ááºážáá²ááá¯ááºáá±á¬ á¡áá»á¬ážááá¯áẠáá¯ááºááŒá±á¬ááºážááŒááºážá áá°ážáá°ážááŒá¬ážááŒá¬áž á¡áá¯á¶ážááŒá¯ááá¯ááºáá±á¬ áá¯ááºáá»á¬ážá ááŸá±á·áááºá ááºážáá»ááºážáá»á¬ážááŸáá·áº Huffman áá áºááẠáááºáá±á¬ááºááŒááºážá¡ááŒá±á¬ááºáž ááœá±ážááœá±ážáá«áááºá
á¡áá¹ááá¬áá
áºáá¯á
á®ááᯠ0's ááŸáá·áº 1's á sequence á¡ááŒá
Ạááááºážáááºážáá¬ážááŒá®áž 8 bits ááŒá¬ááŒáá·áºááŒá±á¬ááºáž áá»áœááºá¯ááºááá¯á·áááá«áááºá á
á¬áá¯á¶ážáá
áºáá¯á¶ážá
á®ááẠáá°áá®áá±á¬ áá¯á¶áá±áá
áºáá¶áá«ááºáá»á¬ážááᯠááááºážáááºážáá¬ážáá±á¬ááŒá±á¬áá·áº áááºážááᯠáá¯á¶áá±á¡áá»á¬ážááá¯ááºáá¯ááºáá¯ááºááŒááºážáá¯áá±á«áºáááºá
áá±ážáá¬ážáá±á¬ á á¬áá¬ážááᯠááá¯ááŒáá«á áá¯á·á á á¬áá¯á¶ážáá áºáá¯á¶ážáááºážááᯠááááºážáááºážááẠááá¯á¡ááºááá·áº áá±áá¬ááá¬áááᯠáá»áœááºá¯ááºááá¯á· áááºááá¯á·áá»áŸá±á¬á·áá»ááá¯ááºáááºáááºážá
á¡ááá á¡áá°á¡áááŸá¬ ááŒá±á¬ááºážáá²ááá¯ááºáá±á¬ á¡ááŸáẠáá¯ááºáá¶áá«áẠááŒá
áºáááºá á
á¬áá¬ážáá« á
á¬áá¯á¶ážá¡áá»áá¯á·ááẠá¡ááŒá¬ážáá°áá»á¬ážááẠááá¯á áááŒá¬áá ááŒá
áºáá±á«áºááŒááºážáá°áá±á¬ á¡áá»ááºááᯠáá»áœááºá¯ááºááá¯á· á¡áá¯á¶ážááŒá¯ááá¯ááºáááºá
bits á sequence ááá¯ááá áááºážááᯠááŸááºážááŸááºážáááºážáááºáž áááááŸááá² áá¯ááºáá¯ááºáááºážá
á á¬ááŒá±á¬ááºážááᯠáááºááŒááºáá«á "á¡áá¹ááááº". áááºážááœáẠá á¬áá¯á¶áž 8 áá¯á¶ážáá«ááŸáááŒá®áž áá¯á¶áá±á¡ááŸááºááᯠáá¯ááºááœááºážááá·áºá¡áá« áááºážááᯠááááºážáááºážááẠ64 bits ááá¯á¡ááºáááºááŒá áºáááºá áááºá¹áá±á á¡ááŒáááºáá±ááᯠáááááŒá¯áá«á "á", "á", "á" О "D" 4á 2á 1á 1 á¡áá®ážáá®áž áá®áá»áŸáááºá á áááºáá°ážááŒáá·áºáá¡á±á¬áẠ"á¡áá¹ááááº" á¡áááºážáááºáá±á¬ bits áá°áá±á¬á¡áá»ááºááᯠá¡áá¯á¶ážááŒá¯ "áááº" ááẠááá¯áá»á¬ážáááºá "á"ááŸáá·áº "á" ááẠááá¯áá»á¬ážáááºá "á" О "D". áá¯ááºááŒáá·áºá ááá¯ááºáá¡á±á¬áẠ"áááº" áá áºáá ẠááŸáá·áº 0 á "á" áá»áœááºá¯ááºááá¯á·ááẠááŸá áºáá áºáá¯áẠ11 ááᯠáááºááŸááºáá±ážáááºááŒá áºááŒá®áž áá Ạ100 ááŸáá·áº 011 áá¯á¶ážáá»áá¯ážááᯠá¡áá¯á¶ážááŒá¯á áá»áœááºá¯ááºááá¯á·áá¯ááºáá¯ááºáá«áááºá "á" О "D".
ááááºá¡áá±ááŒáá·áº áá»áœááºá¯ááºááá¯á· áááŸááá«áááº-
a
0
b
11
c
100
d
011
áá®áá±á¬á· ááá¯ááºážááááá°ážá "á¡áá¹ááááº" á¡ááŒá áºáá¯ááºáá¯ááºáá«áááºá 00110100011011 (0|0|11|0|100|011|0|11)á¡áá±á«áºá codes ááœá±ááᯠá¡áá¯á¶ážááŒá¯. ááá¯á·áá±á¬áºá á¡áááááŒá¿áá¬ááŸá¬ decoding ááœááºááŒá áºáááá·áºáááºá string ááᯠdecode áá¯ááºááá¯á·ááŒáá¯ážá á¬ážáá²á·á¡áá« 00110100011011áááºážááᯠááá¯ááºá á¬ážááŒá¯ááá¯ááºáá±á¬ááŒá±á¬áá·áº áááŸááºážáááºážáá±á¬ááááºááᯠáá»áœááºá¯ááºááá¯á·áááŸááá«áááºá
0|011|0|100|011|0|11 adacdab
0|0|11|0|100|0|11|011 aabacabd
0|011|0|100|0|11|0|11 adacabab
...
á
áááºááá¯á·ááá¯
á€ááá±áá»á¬ááá±áá¬ááŸá¯áá»á¬ážááᯠááŸá±á¬ááºááŸá¬ážáááºá áá»áœááºá¯ááºááá¯á·á áá¯ááºáá¯ááºááẠááá¯áá²á·ááá¯á·áá±á¬ ááá±á¬ááá¬ážááᯠáá»á±áááºá á±ááŒá±á¬ááºáž áá±áá»á¬á á±ááá«áááºá á¡ááŒáá¯á ááºážáááºážáá¯ááºáá»á¬ážááᯠáá áºáá°áá°ážááŒá¬ážáá±á¬áááºážááŒáá·áºáᬠáá¯ááºáá»ááºááá¯ááºáááºáᯠááá¯ááá¯áááºá ááŸá±á·áááºá ááºážáá»ááºážááẠá¡ááŒá¬ážáá¯ááºáá áºáá¯áááŸá±á·áááºááá¯ááºááŒá±á¬ááºáž áá±áá»á¬á á±áááºá áá¯ááºá¡á¬ážááŒáá·áºá áá»áœááºá¯ááºááá¯á·ááẠáá®ážááŒá¬ážáá¬ááºáá±á¬ááºáá áºáá¯ááᯠááá¯ááºá á¬ážááŒá¯áááºá¡ááœáẠá¡áá¯á¶ážááŒá¯ááá·áº bit áá»á¬ážááᯠááá¯ááá¯áááºá á¡áá±á«áºá á¥ááá¬ááŸá¬ 0 ááŸá±á·áááºáá áºáá¯ááŒá áºáááºá 011ááŸá±á·áááºá ááºážáá»ááºážááᯠáá»áá¯ážáá±á¬ááºáá±á¬á ááá¯á·ááŒá±á¬áá·áºá áá»áœááºá¯ááºááá¯á·ááá¯ááºáá»á¬ážááẠááŸá±á·áááºá ááºážáá»ááºážááᯠáá»á±áááºáá«áá áá»áœááºá¯ááºááá¯á·ááẠáá®ážááá·áºáá¯ááºáá¯ááºááŒááºáž (ááŸáá·áº á¡ááŒááºá¡ááŸááºá¡á¬ážááŒáá·áº)á
á¡áá±á«áºá á¥ááá¬ááᯠááŒááºááŒáá·áºáá¡á±á¬ááºá áá®áá áºáá«áá±á¬á· áááºá¹áá±áááœá±á¡ááœáẠáá±ážáá«áááºá "á", "á", "á" О "D" ááŸá±á·áááºá ááºážáá»ááºážááᯠáá»á±áááºá á±áá±á¬ áá¯ááºáá»á¬ážá
a
0
b
10
c
110
d
111
á€áá¯ááºááŒáá·áº áá¯ááºáá¶áá«ááºá "á¡áá¹ááááº" á¡ááŒá Ạencode áá¯ááºáá«áááºá 00100100011010 (0|0|10|0|100|011|0|10)á áá®ááŸá¬ 00100100011010 áá»áœááºá¯ááºááá¯á·ááẠáááááŒáẠáá¯ááºááŸáá·áº áá°áááºážá á¬ááŒá±á¬ááºážááá¯á· ááŒááºááœá¬ážááá¯ááºáá±ááŒá®ááŒá áºáááºá "á¡áá¹ááááº".
Huffman coding
ááᯠáá»áœááºá¯ááºááá¯á·ááẠááŒá±á¬ááºážáá²ááá¯ááºáá±á¬ á¡áá»á¬ážááá¯áẠáá¯ááºááŒá±á¬ááºážááŒááºážááŸáá·áº ááŸá±á·áááºá ááºážáá»ááºážááᯠááá¯ááºááœááºááŒá±ááŸááºážááŒá®ážáá±á¬á¡áá«á Huffman áá¯ááºááŒá±á¬ááºážááŒááºážá¡ááŒá±á¬ááºáž ááœá±ážááœá±ážááŒáá·áºáá¡á±á¬ááºá
áááºážáááºážááẠbinary áá áºáááºáá»á¬ážáááºáá®ážááŸá¯á¡áá±á«áºá¡ááŒá±áá¶áááºá áááºážááœááºá node ááẠáá±á¬ááºáá¯á¶áž ááá¯á·ááá¯áẠá¡ááœááºážááá¯ááºáž ááŒá áºááá¯ááºáááºá á¡á ááá¯ááºážááœááºá node á¡á¬ážáá¯á¶ážááᯠá¡ááœááºáá»á¬áž (terminals) áᯠáááºááŸááºááŒááŒá®áž áááºá¹áá±áááá¯ááºááá¯ááºááŸáá·áº áááºážáá¡áá±ážáá»ááẠ(ááŒá áºáá±á«áºááŸá¯á¡ááŒáááºáá±) ááᯠááá¯ááºá á¬ážááŒá¯áááºá á¡ááœááºážááá¯ááºáž node áá»á¬ážááœáẠáá¬ááºáá±á¬ááºá á¡áá±ážáá»áááºáá«áááºááŒá®áž áááºážáááºáá¬áá±á¬ node ááŸá áºáá¯ááᯠáááºááœáŸááºážáááºá áá±áá°áá»á¡á¬ážááŒáá·áº ááá±á¬áá°áá®áá»ááºá áááºážáááºáž « 0 » áááºáááºá¡ááá¯ááºážá¡áááºááᯠááá¯ááºá á¬ážááŒá¯ááááºážáá±á¬ááºážá « 1 » - áá¬áááºááœááºá áá áºáááºá¡ááŒáá·áº N á¡ááœááºááŸáá·áº N-1 á¡ááœááºážááá¯ááºážáá¯á¶ááŸááºáá»á¬ážá Huffman áá áºáááºááá¯áááºáá±á¬ááºááá·áºá¡áá« á¡áá±á¬ááºážáá¯á¶ážá¡ááŸááºáá¯ááºáá»á¬ážáááŸáááẠá¡áá¯á¶ážáááŒá¯áá±á¬áááºá¹áá±ááá»á¬ážááᯠá áœáá·áºáá áºááẠá¡ááŒá¶ááŒá¯áá¬ážáááºá
ááŒáááºááŸá¯ááºážá¡áááºážáá¯á¶ážááŸááá±á¬ node ááẠá¡ááŒáá·áºáá¯á¶ážáŠážá á¬ážáá±ážáááá·áº Huffman áá áºáááºáá áºáááºááá¯áááºáá±á¬ááºááẠáŠážá á¬ážáá±ážáááºážá á®áá áºáá¯ááᯠá¡áá¯á¶ážááŒá¯áá«áááºá áááºáá±á¬ááºááŸá¯ á¡ááá·áºáá»á¬ážááᯠá¡á±á¬ááºááœáẠáá±á¬áºááŒáá¬ážáá«áááºá
- áá¬ááºáá±á¬ááºáá áºáá¯á á®á¡ááœáẠleaf node áá áºáá¯ááá¯áááºáá®ážááŒá®áž áŠážá á¬ážáá±ážáááºážá á®ááá¯á· áá±á«ááºážááá·áºáá«á
- áááºážá
á®ááá¬ážááœáẠá
á¬ááœááºáá
áºáá¯áááºááá¯á ááŸááá±áá±á¬áºáááºážá á¡á±á¬ááºáá«ááá¯á·ááᯠáá¯ááºáá±á¬ááºáá«á
- á¡ááŒáá·áºáá¯á¶ážáŠážá á¬ážáá±áž (á¡áááá·áºáá¯á¶ážá¡ááŒáááºáá±) ááᯠáááºážá á®á០áááºááŸá¬ážáá«á
- ဠnode ááŸá áºáá¯ááẠááá±ážáá»á¬ážááŒá áºááá·áº internal node á¡áá áºáá áºáá¯ááᯠáááºáá®ážááŒá®áž ááŒá áºáá»ááºááŸá¯á¡ááŒáááºááŸá¯ááºážááẠဠnode ááŸá áºáá¯á ááŒáááºááŸá¯ááºážáá±á«ááºážáááºááŸáá·áº áá®áá»áŸáááºááŒá áºáááºá
- áŠážá á¬ážáá±ážáááºážá á®ááá¯á· node á¡áá áºáá áºáá¯ááá·áºáá«á
- áá»ááºááŸááá±á¬ áá áºáá¯áááºážáá±á¬ node ááẠroot ááŒá áºááŒá®ážá áááºážááẠáá áºáááºáááºáá±á¬ááºááŸá¯ááᯠá¡ááŒá®ážáááºáááºááŒá áºáááºá
áá»áœááºá¯ááºááá¯á·ááœáẠá¡áá¹ááá¬áá»á¬ážáá¬áá«áááºáá±á¬ á á¬áá¬ážá¡áá»áá¯á·ááŸááááºááᯠááŒááºáá±á¬ááºááŒáá·áºáá«á "á¡áá±á®á á®áá®" О "ááŸáá·áº"ááŸáá·áº áááºážááá¯á·á ááŒá áºáá±á«áºááŸá¯ ááŒáááºááŸá¯ááºážáá»á¬ážááŸá¬ 15á 7á 6á 6á ááŸáá·áº 5 á¡áá®ážáá®ážááŒá áºáááºá á¡á±á¬ááºááœáẠalgorithm á á¡ááá·áºáá»á¬ážááᯠáááºáááºá á±áá±á¬ ááá¯ááºáá±á¬áºáá¯á¶áá»á¬ážááŒá áºáááºá
root á០áááºááá·áº end node ááá¯á·áááᯠáááºážááŒá±á¬ááºážáá
áºáá¯ááẠááᯠend node ááŸáá·áº áááºá
ááºáá±áá±á¬ áá¬ááºáá±á¬ááºááŸáá·áº áááºááá¯ááºááá·áº á¡áá±á¬ááºážáá¯á¶áž prefix code (Huffman code áá¯áááºáž áá±á«áºáááº) ááᯠááááºážáááºážáá¬ážáááºááŒá
áºáááºá
Huffman áá
áºáááº
á¡á±á¬ááºááœáẠC++ ááŸáá·áº Java ááŸá Huffman compression algorithm áá¡áá±á¬ááºá¡áááºáá±á¬áºááŸá¯ááᯠáááºááœá±á·áááá·áºáááºá
#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;
// A Tree node
struct Node
{
char ch;
int freq;
Node *left, *right;
};
// Function to allocate a new tree node
Node* getNode(char ch, int freq, Node* left, Node* right)
{
Node* node = new Node();
node->ch = ch;
node->freq = freq;
node->left = left;
node->right = right;
return node;
}
// Comparison object to be used to order the heap
struct comp
{
bool operator()(Node* l, Node* r)
{
// highest priority item has lowest frequency
return l->freq > r->freq;
}
};
// traverse the Huffman Tree and store Huffman Codes
// in a map.
void encode(Node* root, string str,
unordered_map<char, string> &huffmanCode)
{
if (root == nullptr)
return;
// found a leaf node
if (!root->left && !root->right) {
huffmanCode[root->ch] = str;
}
encode(root->left, str + "0", huffmanCode);
encode(root->right, str + "1", huffmanCode);
}
// traverse the Huffman Tree and decode the encoded string
void decode(Node* root, int &index, string str)
{
if (root == nullptr) {
return;
}
// found a leaf node
if (!root->left && !root->right)
{
cout << root->ch;
return;
}
index++;
if (str[index] =='0')
decode(root->left, index, str);
else
decode(root->right, index, str);
}
// Builds Huffman Tree and decode given input text
void buildHuffmanTree(string text)
{
// count frequency of appearance of each character
// and store it in a map
unordered_map<char, int> freq;
for (char ch: text) {
freq[ch]++;
}
// Create a priority queue to store live nodes of
// Huffman tree;
priority_queue<Node*, vector<Node*>, comp> pq;
// Create a leaf node for each character and add it
// to the priority queue.
for (auto pair: freq) {
pq.push(getNode(pair.first, pair.second, nullptr, nullptr));
}
// do till there is more than one node in the queue
while (pq.size() != 1)
{
// Remove the two nodes of highest priority
// (lowest frequency) from the queue
Node *left = pq.top(); pq.pop();
Node *right = pq.top(); pq.pop();
// Create a new internal node with these two nodes
// as children and with frequency equal to the sum
// of the two nodes' frequencies. Add the new node
// to the priority queue.
int sum = left->freq + right->freq;
pq.push(getNode('', sum, left, right));
}
// root stores pointer to root of Huffman Tree
Node* root = pq.top();
// traverse the Huffman Tree and store Huffman Codes
// in a map. Also prints them
unordered_map<char, string> huffmanCode;
encode(root, "", huffmanCode);
cout << "Huffman Codes are :n" << 'n';
for (auto pair: huffmanCode) {
cout << pair.first << " " << pair.second << 'n';
}
cout << "nOriginal string was :n" << text << 'n';
// print encoded string
string str = "";
for (char ch: text) {
str += huffmanCode[ch];
}
cout << "nEncoded string is :n" << str << 'n';
// traverse the Huffman Tree again and this time
// decode the encoded string
int index = -1;
cout << "nDecoded string is: n";
while (index < (int)str.size() - 2) {
decode(root, index, str);
}
}
// Huffman coding algorithm
int main()
{
string text = "Huffman coding is a data compression algorithm.";
buildHuffmanTree(text);
return 0;
}
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
// A Tree node
class Node
{
char ch;
int freq;
Node left = null, right = null;
Node(char ch, int freq)
{
this.ch = ch;
this.freq = freq;
}
public Node(char ch, int freq, Node left, Node right) {
this.ch = ch;
this.freq = freq;
this.left = left;
this.right = right;
}
};
class Huffman
{
// traverse the Huffman Tree and store Huffman Codes
// in a map.
public static void encode(Node root, String str,
Map<Character, String> huffmanCode)
{
if (root == null)
return;
// found a leaf node
if (root.left == null && root.right == null) {
huffmanCode.put(root.ch, str);
}
encode(root.left, str + "0", huffmanCode);
encode(root.right, str + "1", huffmanCode);
}
// traverse the Huffman Tree and decode the encoded string
public static int decode(Node root, int index, StringBuilder sb)
{
if (root == null)
return index;
// found a leaf node
if (root.left == null && root.right == null)
{
System.out.print(root.ch);
return index;
}
index++;
if (sb.charAt(index) == '0')
index = decode(root.left, index, sb);
else
index = decode(root.right, index, sb);
return index;
}
// Builds Huffman Tree and huffmanCode and decode given input text
public static void buildHuffmanTree(String text)
{
// count frequency of appearance of each character
// and store it in a map
Map<Character, Integer> freq = new HashMap<>();
for (int i = 0 ; i < text.length(); i++) {
if (!freq.containsKey(text.charAt(i))) {
freq.put(text.charAt(i), 0);
}
freq.put(text.charAt(i), freq.get(text.charAt(i)) + 1);
}
// Create a priority queue to store live nodes of Huffman tree
// Notice that highest priority item has lowest frequency
PriorityQueue<Node> pq = new PriorityQueue<>(
(l, r) -> l.freq - r.freq);
// Create a leaf node for each character and add it
// to the priority queue.
for (Map.Entry<Character, Integer> entry : freq.entrySet()) {
pq.add(new Node(entry.getKey(), entry.getValue()));
}
// do till there is more than one node in the queue
while (pq.size() != 1)
{
// Remove the two nodes of highest priority
// (lowest frequency) from the queue
Node left = pq.poll();
Node right = pq.poll();
// Create a new internal node with these two nodes as children
// and with frequency equal to the sum of the two nodes
// frequencies. Add the new node to the priority queue.
int sum = left.freq + right.freq;
pq.add(new Node('', sum, left, right));
}
// root stores pointer to root of Huffman Tree
Node root = pq.peek();
// traverse the Huffman tree and store the Huffman codes in a map
Map<Character, String> huffmanCode = new HashMap<>();
encode(root, "", huffmanCode);
// print the Huffman codes
System.out.println("Huffman Codes are :n");
for (Map.Entry<Character, String> entry : huffmanCode.entrySet()) {
System.out.println(entry.getKey() + " " + entry.getValue());
}
System.out.println("nOriginal string was :n" + text);
// print encoded string
StringBuilder sb = new StringBuilder();
for (int i = 0 ; i < text.length(); i++) {
sb.append(huffmanCode.get(text.charAt(i)));
}
System.out.println("nEncoded string is :n" + sb);
// traverse the Huffman Tree again and this time
// decode the encoded string
int index = -1;
System.out.println("nDecoded string is: n");
while (index < sb.length() - 2) {
index = decode(root, index, sb);
}
}
public static void main(String[] args)
{
String text = "Huffman coding is a data compression algorithm.";
buildHuffmanTree(text);
}
}
ááŸááºáá»ááº: input string ááŸá¡áá¯á¶ážááŒá¯áá±á¬ memory ááẠ47 * 8 = 376 bits ááŒá áºááŒá®áž encoded string ááẠ194 bits áá¬ááŒá áºáááºá data ááœá±ááᯠ48% áá±á¬áẠááááááºáá¬ážáá«áááºá á¡áááºáá±á¬áºááŒáá« C++ áááá¯ááááºááœááºá áááá¯ááááºááᯠáááºááá¯ááºá á±áááºá¡ááœáẠáá¯ááºáá¶áá«ááºááᯠááááºážáááºážááẠstring class ááᯠá¡áá¯á¶ážááŒá¯áááºá
á¡áááºááŒá±á¬áá·áºááá¯áá±á¬áº áááá±á¬ááºáá±á¬ áŠážá á¬ážáá±áž áááºážá á®ááŒááºáž áá±áá¬ááœá²á·á ááºážáá¯á¶áá»á¬ážááẠááá·áºááœááºážááŸá¯áá áºáá¯á¡ááœáẠááá¯á¡ááºáá±á¬ááŒá±á¬áá·áº ááŒá áºáááºá O(ááŸááºáááºáž(N)) á¡áá»áááºá áá«áá±ááá·áº ááŒá®ážááŒáá·áºá á¯á¶áá²á· ááœáá á¯á¶áá áºáááºááœá±áá²á· N áá á¹á á¯áá¹áááºá¡ááœáẠ2N-1 nodes á Huffman tree ááẠááŒá®ážááŒáá·áºá á¯á¶áá±á¬ binary tree ááŒá áºááŒá®ážá ááá¯á·áá±á¬áẠalgorithm ááẠá¡áá¯ááºáá¯ááºáá«áááºá O(Nlog(N)) á¡áá»áááºá áááºááŸá¬áá²á N - áá¬ááºáá±á¬ááºáá»á¬ážá
ááááºážáááºážááŒá áº:
source: www.habr.com