αα»ααααα
αΆααααααΎααααααα·ααααΆ
ααΆααααααααΌα Huffman ααΊααΆαααα½ααααααααΆαααΆααααα αΆαααα·ααααααααααααααΎααααα·αααΆααΌαααααΆαααααΆααααα αΆααα―αααΆαα αα αααα»αα’αααααααα ααΎαααΉααα·ααΆαα’αααΈααΆαα’αα·αααΌαααααααααα αα·αα’ααα αααααΌααααα’αΆα αααααΌαααΆααααα½αααα α αααΆαααα»ααααα αα·αααΆααααΆααααααΆα Huffman α
ααΎαααΉαααΆαα½α’ααααααΈαα½ααααααΌαααΆααααααΆαα»αααΆααααΆαααα 0 αα·α 1 α αΎααα 8 αααΈαα ααΆααααΌαααΆαααα α
ααΆααΆαα’αα·αααΌαααααααααα ααΈααααααα½α’ααααααΈαα½ααααααΎα
ααα½ααααΈααααααΌα
ααααΆααΎααααΈαααααΆαα»αα
α§αααΆααΆααΎαααΆαα’αααααα ααΎβααΎαβα’αΆα βααΆααβαααααβααα αβαααβααααΌαβααΆαβααΎααααΈβαα»αβαα½α’ααααβαα½αβαααβααααβααΆ?
αααα·αα
ααααααΊααΆαα’αα·αααΌαααααααα’αααα ααΎαα’αΆα
ααααΎααΆααα·ααααααΆαα½α’αααααα½αα
ααα½ααα
αααα»αα’αααααααΎαα‘αΎαααΉαααΆααααΆαα’αααααα (
αααβααΉαβααααΆααβααβαααΈα αα·ααΌαβαααβαα·αβα αααΆααβααΆααβαααβααααβααΆ?
αα·α αΆαααΆαααααΆαα "α’αΆααΆααΆα". ααΆααΆα 8 αα½α’αααα α αΎααα αααα’αα·αααΌαααααααααα ααΆααΉαααααΌαααΆα 64 αααΈαααΎααααΈαααααΆαα»αααΆα α αααΆαααΆααααααααα·αα·ααααααααΆ "a", "b", "c" ΠΈ "α" ααααΎααΉα 4, 2, 1, 1 αααααααα½αα αααααΆααααααααΎα "α’αΆααΆααΆα" αααΈααα·α ααΆα αααααααΎααΆααα·α "αα " ααΎαα‘αΎαααΉαααΆααααΆα "α"αα·α "α" ααΎαα‘αΎαααΉαααΆααααΆα "α" ΠΈ "α". α αΌαα αΆααααααΎααααααΆααααααααΌα "αα " ααΆαα½αααΉααααΈααα½αααααΎααΉα 0, "α" ααΎαααΉαααααααααααΌαααΈααααΈα 11 α αΎααααααααΎααΈαααΈα 100 αα·α 011 ααΎαααΉαα’αα·αααΌα "α" ΠΈ "α".
ααΆααααααααΎαααΉαααα½αααΆαα
a
0
b
11
c
100
d
011
ααΌα αααααααααΆαα "α’αΆααΆααΆα" ααΎαααΉαα’αα·αααΌαααΆ 00110100011011 (0|0|11|0|100|011|0|11)αααααααΎαααααΌαααΆαααΎα αααααΆαααΆαααΆααααααααα αΆα ααααααΉααααα·ααα αααα»αααΆααα·ααΌαα αα αααααΎαααααΆααΆααα·ααΌαααααα’αααα 00110100011011ααΎαααα½αααΆααααααααα·αα αααΆααααΆαα αααααααΆα’αΆα ααααΌαααΆαααααΆαααΆα
0|011|0|100|011|0|11 adacdab
0|0|11|0|100|0|11|011 aabacabd
0|011|0|100|0|11|0|11 adacabab
...
α αΎαααΌα
αααααα
ααΎα
ααΎααααΈαααααΆαααΆααα·αα αααΆααααΆααααα ααΎαααααΌαααΆααΆααΆααΆαα’αα·αααΌαααααααΎααααααααΌαααααααα·ααααααα α αααΆαααα»αααααααααα αααα»ααααααΆααααααΆαααααΌαα’αΆα ααααΌαααΆααα·ααΌααααα»ααα·ααΈαααα½ααααα α αααΆααβαα»αααααβααΆααΆβααΆβααααΆαβααΌαβααΆβαα»αααααβαααααβαααβααα ααΆαβααΌα ααΎαβααΆαβαααβααΆβαααΈαβαααβααααΎβααΎααααΈβααααΆαβαα½α’ααααβααΆααααΆααβαα½αα αα αααα»αα§ααΆα αααααΆαααΎ 0 ααΊααΆαα»ααααα 011αααααααΆαα αααΆαααα»αααααα ααΌα αααα ααααα·αααΎβααΌαβααααβααΎαβαααααβααΆαβαααα½αβαα»ααααα αααβααΎαβα’αΆα βαα·ααΌαβαααβα‘ααβααΈααααΆ (αα·αβα αααΆααααα·α)α
α αΌαααΎααα·αα·αααααΎαα§ααΆα αααααΆαααΎα ααΎααααααΎαααΉαααααααααααΆαααα·αα·ααααααααΆ "a", "b", "c" ΠΈ "α" ααΌαβαααβααααΌαβααΉαβα αααΆααβαα»αααααα
a
0
b
10
c
110
d
111
ααΆαα½αααΉαααΆαα’αα·αααΌαααα ααααα’αααα "α’αΆααΆααΆα" ααΉαααααΌαααΆαα’αα·αααΌαααΆ 00100100011010 (0|0|10|0|100|011|0|10)α α αΎααα ααΈααα 00100100011010 ααΎαβααΉαβα’αΆα βαα·ααΌαβαααβαα·αβα αααΆααβααΆααβαα½α βα αΎαβαααα‘ααβαα βααααβα’ααααβααΎαβααααβααΎα "α’αΆααΆααΆα".
ααΆααααααααΌα Haffman
α₯α‘αΌαβαααβααΎαβααΆαβαααααααΆαβααΆαα½αβααΉαβααΆαβαααααααβααααααβα’ααα αα·αβαααα½αβαα»ααααα ααΌαβαα·ααΆαβα’αααΈβααΆαβα’αα·αααΌα Huffman α
αα·ααΈααΆαααααααΊααα’ααααΎααΆααααααΎαααΎαααΎαααααΈαα αα αααα»αααΆααααΆααα’αΆα ααΆα α»ααααααα¬ααΆααααα»αα ααααΌαααααΆααααΆααα’ααααααΌαααΆαα αΆαααα»αααΆααΆααααΉα (ααααΆααΈα) αααααααΆαα±αααα·αα·ααααααααΆαααα½αααΆαα·αααααααααααααΆ (αααααΊααΆαααΉαααΆααααααΆαααΎαα‘αΎα) α ααααΆααααΆααααα»αααΆααααααααααα½α’αααα α αΎαααααα ααααΆααααααααΌαααΈαα ααααα·α αα αααααααααααΌαα , αααΈα "0" ααααΆαβα±ααβααΆαβαααααβααΆααΆβααΆαβααααα αα·α "1" - αα ααΆαααααΆαα αα αααα»αααΎαααΎααα N ααααΉααα·α N-1 ααααΆααααΆααααα»αα ααΆααααΌαααΆαααααΆαααΆαα αααααΆαααααααααΆα Huffman αα·αα·ααααααααΆααααα·αααααΎααααΌααααα ααααΎααααΈααα½αααΆααααααΌαααααααααααα’αααααΎαα
ααΎαααΉαααααΎαα½αα’αΆαα·ααΆαααΎααααΈααΆαααααααααΆα Huffman αααααααΆαααααααΆααααααααααΆααααα»αααΉαααααΌαααΆααααααα’αΆαα·ααΆαααααααααα»αα ααα αΆαααΆααααααααΌαααΆααα·αααααΆααΌα ααΆαααααα:
- αααααΎαααααΆααααααΉααααααΆαααα½α’ααααααΈαα½αα α αΎααααααααα½αααΆαα αα½αα’αΆαα·ααΆαα
- αααααααααααΆααααααΉαα
αααΎαααΆααα½ααα
αααα»ααα½α ααΌαααααΎααΌα
ααΆααααααα
- ααααααΆααααΆααααΈααααααΆαα’αΆαα·ααΆαααααααααα»α (αααααααααΆααααα»α) α ααααΈαα½αα
- αααααΎαααααΆααααΆααααα»αααααΈαα½α αααααααΆααααΆααααΈααααααΉαααααΆαααΆααΌα α αΎααααααααααααΆαααΎαα‘αΎαααΉαααααΎααΉαααααΌααααααααααααααααΆααααΆααααΈααααα
- ααααααααααΆααααααΈαα αα½αα’αΆαα·ααΆαα
- ααααΆααααααα αααααααααα½ααααααΉαααΆα«α α αΎααααααΉααααα ααααΆαααΆαααααααααΆαα
αααααααΆααΎαααΆαα’ααααααααααααααΆααααα½α’ααααααα»αααααα "a", "b", "c", "d" ΠΈ "αα·α"α αΎααααααααααααΆαααΎαα‘αΎααααααα½αααααΊ 15, 7, 6, 6 αα·α 5 αααααααΆα ααΆαααααααααααΆααΌαααΆαααααααα»ααααα αΆααααΈααα αΆααααααα½ααααααααΆαα
ααααΌαααΈα«ααα
ααααΆααα
α»αααΆαα½αααΉααααααΆαα»αααΌααα»αααααααα’αααα»α (ααααΌαααΆαααααααΆααααΆααΆααΌα Huffman) αααααααΌαααααΆαα
ααΉααα½α’αααααααααααΆααααΆαα½αααααΆαααααα
αααααα
ααΎαααΎ Huffman
ααΆαααααααααα’αααααΉαααΎαααΆαα’αα»αααααααα½ααααααααΆαααΆααααα αΆαα Huffman αα αααα»α C ++ αα·α Javaα
#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;
// A Tree node
struct Node
{
char ch;
int freq;
Node *left, *right;
};
// Function to allocate a new tree node
Node* getNode(char ch, int freq, Node* left, Node* right)
{
Node* node = new Node();
node->ch = ch;
node->freq = freq;
node->left = left;
node->right = right;
return node;
}
// Comparison object to be used to order the heap
struct comp
{
bool operator()(Node* l, Node* r)
{
// highest priority item has lowest frequency
return l->freq > r->freq;
}
};
// traverse the Huffman Tree and store Huffman Codes
// in a map.
void encode(Node* root, string str,
unordered_map<char, string> &huffmanCode)
{
if (root == nullptr)
return;
// found a leaf node
if (!root->left && !root->right) {
huffmanCode[root->ch] = str;
}
encode(root->left, str + "0", huffmanCode);
encode(root->right, str + "1", huffmanCode);
}
// traverse the Huffman Tree and decode the encoded string
void decode(Node* root, int &index, string str)
{
if (root == nullptr) {
return;
}
// found a leaf node
if (!root->left && !root->right)
{
cout << root->ch;
return;
}
index++;
if (str[index] =='0')
decode(root->left, index, str);
else
decode(root->right, index, str);
}
// Builds Huffman Tree and decode given input text
void buildHuffmanTree(string text)
{
// count frequency of appearance of each character
// and store it in a map
unordered_map<char, int> freq;
for (char ch: text) {
freq[ch]++;
}
// Create a priority queue to store live nodes of
// Huffman tree;
priority_queue<Node*, vector<Node*>, comp> pq;
// Create a leaf node for each character and add it
// to the priority queue.
for (auto pair: freq) {
pq.push(getNode(pair.first, pair.second, nullptr, nullptr));
}
// do till there is more than one node in the queue
while (pq.size() != 1)
{
// Remove the two nodes of highest priority
// (lowest frequency) from the queue
Node *left = pq.top(); pq.pop();
Node *right = pq.top(); pq.pop();
// Create a new internal node with these two nodes
// as children and with frequency equal to the sum
// of the two nodes' frequencies. Add the new node
// to the priority queue.
int sum = left->freq + right->freq;
pq.push(getNode('', sum, left, right));
}
// root stores pointer to root of Huffman Tree
Node* root = pq.top();
// traverse the Huffman Tree and store Huffman Codes
// in a map. Also prints them
unordered_map<char, string> huffmanCode;
encode(root, "", huffmanCode);
cout << "Huffman Codes are :n" << 'n';
for (auto pair: huffmanCode) {
cout << pair.first << " " << pair.second << 'n';
}
cout << "nOriginal string was :n" << text << 'n';
// print encoded string
string str = "";
for (char ch: text) {
str += huffmanCode[ch];
}
cout << "nEncoded string is :n" << str << 'n';
// traverse the Huffman Tree again and this time
// decode the encoded string
int index = -1;
cout << "nDecoded string is: n";
while (index < (int)str.size() - 2) {
decode(root, index, str);
}
}
// Huffman coding algorithm
int main()
{
string text = "Huffman coding is a data compression algorithm.";
buildHuffmanTree(text);
return 0;
}
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
// A Tree node
class Node
{
char ch;
int freq;
Node left = null, right = null;
Node(char ch, int freq)
{
this.ch = ch;
this.freq = freq;
}
public Node(char ch, int freq, Node left, Node right) {
this.ch = ch;
this.freq = freq;
this.left = left;
this.right = right;
}
};
class Huffman
{
// traverse the Huffman Tree and store Huffman Codes
// in a map.
public static void encode(Node root, String str,
Map<Character, String> huffmanCode)
{
if (root == null)
return;
// found a leaf node
if (root.left == null && root.right == null) {
huffmanCode.put(root.ch, str);
}
encode(root.left, str + "0", huffmanCode);
encode(root.right, str + "1", huffmanCode);
}
// traverse the Huffman Tree and decode the encoded string
public static int decode(Node root, int index, StringBuilder sb)
{
if (root == null)
return index;
// found a leaf node
if (root.left == null && root.right == null)
{
System.out.print(root.ch);
return index;
}
index++;
if (sb.charAt(index) == '0')
index = decode(root.left, index, sb);
else
index = decode(root.right, index, sb);
return index;
}
// Builds Huffman Tree and huffmanCode and decode given input text
public static void buildHuffmanTree(String text)
{
// count frequency of appearance of each character
// and store it in a map
Map<Character, Integer> freq = new HashMap<>();
for (int i = 0 ; i < text.length(); i++) {
if (!freq.containsKey(text.charAt(i))) {
freq.put(text.charAt(i), 0);
}
freq.put(text.charAt(i), freq.get(text.charAt(i)) + 1);
}
// Create a priority queue to store live nodes of Huffman tree
// Notice that highest priority item has lowest frequency
PriorityQueue<Node> pq = new PriorityQueue<>(
(l, r) -> l.freq - r.freq);
// Create a leaf node for each character and add it
// to the priority queue.
for (Map.Entry<Character, Integer> entry : freq.entrySet()) {
pq.add(new Node(entry.getKey(), entry.getValue()));
}
// do till there is more than one node in the queue
while (pq.size() != 1)
{
// Remove the two nodes of highest priority
// (lowest frequency) from the queue
Node left = pq.poll();
Node right = pq.poll();
// Create a new internal node with these two nodes as children
// and with frequency equal to the sum of the two nodes
// frequencies. Add the new node to the priority queue.
int sum = left.freq + right.freq;
pq.add(new Node('', sum, left, right));
}
// root stores pointer to root of Huffman Tree
Node root = pq.peek();
// traverse the Huffman tree and store the Huffman codes in a map
Map<Character, String> huffmanCode = new HashMap<>();
encode(root, "", huffmanCode);
// print the Huffman codes
System.out.println("Huffman Codes are :n");
for (Map.Entry<Character, String> entry : huffmanCode.entrySet()) {
System.out.println(entry.getKey() + " " + entry.getValue());
}
System.out.println("nOriginal string was :n" + text);
// print encoded string
StringBuilder sb = new StringBuilder();
for (int i = 0 ; i < text.length(); i++) {
sb.append(huffmanCode.get(text.charAt(i)));
}
System.out.println("nEncoded string is :n" + sb);
// traverse the Huffman Tree again and this time
// decode the encoded string
int index = -1;
System.out.println("nDecoded string is: n");
while (index < sb.length() - 2) {
index = decode(root, index, sb);
}
}
public static void main(String[] args)
{
String text = "Huffman coding is a data compression algorithm.";
buildHuffmanTree(text);
}
}
α αααΆα: α’αααα αα αΆααααααααΎαααααααα’αααααααα αΌαααΊ 47 * 8 = 376 αααΈα α αΎαααααα’αααααααααΆαα’αα·αααΌαααΊααΆααα 194 αααΈαααα»ααααα i.e. αα·ααααααααααΌαααΆααααα αΆαααααα αα 48% α αα αααα»ααααααα·ααΈ C++ ααΆαααΎ ααΎαααααΎ string class ααΎααααΈαααααΆαα»α string αααααΆαα’αα·αααΌα ααΎααααΈααααΎα±αααααααα·ααΈα’αΆα α’αΆαααΆαα
αααααΆααααα ααΆαααααααααα·αααααααα½αα’αΆαα·ααΆαααααΆαααααα·αααααΆαααΆαααΆααααα»ααα½αααΆααααα αΌα O(log(N)) αααααααΆ ααα»αααααα αααα»ααααααΆααααααΈαααααααααΆαα½α N αααααααα»α 2N-1 nodes α αΎαααΎαααΎ Huffman ααΊααΆαααααΆααααααΈααααααα αααααΆαααα algorithm ααααΎαααΆα O(Nlog(N)) αααααααΆ, ααααααααΆ N - αα½α’ααααα
ααααα:
ααααα: www.habr.com