Pele thupelo e qala
Huffman coding ke data compression algorithm e etsang mohopolo oa mantlha oa compression ea faele. Sehloohong sena, re tla bua ka khouto e tsitsitseng le e fapaneng ea bolelele ba likhoutu, likhoutu tse ikhethileng ka mokhoa o ikhethileng, melao ea sehlomathiso, le ho aha sefate sa Huffman.
Rea tseba hore tlhaku e 'ngoe le e 'ngoe e bolokiloe e le tatellano ea 0's le 1' me e nka li-bits tse 8. Sena se bitsoa "fixed length encoding" hobane sebapali se seng le se seng se sebelisa palo e tsitsitseng ea li-bits ho boloka.
Ha re re re fuoa mongolo. Re ka fokotsa sebaka se hlokahalang joang ho boloka sebapali se le seng?
Taba ea mantlha ke khouto ea bolelele bo fapaneng. Re ka sebelisa taba ea hore litlhaku tse ling li hlaha hangata ho feta tse ling (
Joang, ka ho tseba tatellano ea likotoana, u e tseba joang ka mokhoa o sa tsitsang?
Nahana ka moeli "batlabe". E na le litlhaku tse 8, 'me ha e khouta bolelele bo tsitsitseng, e tla hloka li-bits tse 64 ho e boloka. Hlokomela hore maqhubu a letšoao "a", "b", "c" и "D" e lekana le 4, 2, 1, 1 ka ho latellana. Ha re leke ho nahana "batlabe" likotoana tse fokolang, ho sebelisa taba ea hore "ho" e etsahala hangata hofeta "B"le "B" e etsahala hangata hofeta "c" и "D". Ha re qale ka ho khouta "ho" ka karoloana e le 'ngoe e lekanang le 0, "B" re tla abela khoutu ea 11-bit, 'me re sebelisa likotoana tse tharo 100 le 011 re tla kenyelletsa "c" и "D".
Ka lebaka leo, re tla fumana:
a
0
b
11
c
100
d
011
Kahoo mola "batlabe" re tla encode joalo ka 00110100011011 (0|0|11|0|100|011|0|11)ka ho sebedisa dikhoutu tse ka hodimo. Leha ho le joalo, bothata bo ka sehloohong e tla ba ho decoding. Ha re leka ho hlakisa khoele 00110100011011, re fumana sephetho se sa utloahaleng, kaha se ka emeloa joalo ka:
0|011|0|100|011|0|11 adacdab
0|0|11|0|100|0|11|011 aabacabd
0|011|0|100|0|11|0|11 adacabab
...
joalo-joalo.
Ho qoba ho se hlaka hona, re tlameha ho etsa bonnete ba hore khouto ea rona e khotsofatsa mohopolo o kang molao oa sehlongoapele, e leng se bolelang hore likhoutu li ka hlalosoa feela ka tsela e le 'ngoe e ikhethang. Molao wa sehlongwapele o netefatsa hore ha ho khoutu eo e leng sehlongwapele sa e nngwe. Ka khoutu, re bolela likotoana tse sebelisoang ho emela sebopeho se itseng. Mohlala o ka holimo 0 ke sehlongoapele 011, e tlolang molao wa sehlongwapele. Kahoo, haeba likhoutu tsa rona li khotsofatsa molao oa sehlongoapele, re ka khetha ka mokhoa o ikhethileng (le ka tsela e fapaneng).
Ha re hlahlobeng mohlala o ka holimo hape. Lekhetlong lena re tla abela matšoao "a", "b", "c" и "D" dikhoutu tse kgotsofatsang molao wa sehlongwapele.
a
0
b
10
c
110
d
111
Ka encoding ena, khoele "batlabe" e tla encoded joalo ka 00100100011010 (0|0|10|0|100|011|0|10). 'Me mona 00100100011010 re tla be re se re khona ho khetholla ka mokhoa o hlakileng le ho khutlela khoeleng ea rona ea pele "batlabe".
Huffman coding
Kaha joale re se re sebetsane le khouto ea bolelele bo fapaneng le molao oa sehlomapele, ha re bueng ka khouto ea Huffman.
Mokhoa o thehiloe ho thehoeng ha lifate tsa binary. Ho eona, node e ka ba ea ho qetela kapa ea ka hare. Qalong, li-node tsohle li nkoa e le makhasi (li-terminals), tse emelang letšoao ka boeona le boima ba lona (ke hore, khafetsa ea ketsahalo). Li-node tse ka hare li na le boima ba sebopeho 'me li bua ka li-node tse peli tsa litloholo. Ka tumellano e akaretsang, bit "0" e emela ho latela lekala le letšehali, le "1" - ka ho le letona. sefateng se tletseng N mahlaku le N-1 nodes ka hare. Ho khothaletsoa hore ha ho etsoa sefate sa Huffman, matšoao a sa sebelisoeng a lahloe ho fumana likhoutu tsa bolelele bo nepahetseng.
Re tla sebelisa mokoloko oa pele ho aha sefate sa Huffman, moo node e nang le maqhubu a tlaase ka ho fetisisa e tla fuoa pele. Mehato ea kaho e hlalositsoe ka tlase:
- Theha node ea lekhasi bakeng sa sebapali se seng le se seng 'me u se kenye moleng oa bohlokoa.
- Le hoja ho na le maqephe a fetang bonngoe moleng, etsa se latelang:
- Tlosa li-node tse peli ka pele-pele (maqhubu a tlaase) ho tloha moleng;
- Theha node e ncha ea ka hare, moo li-node tsena tse peli e tla ba bana, 'me khafetsa ea ketsahalo e tla lekana le kakaretso ea maqhubu a li-node tsena tse peli.
- Kenya node e ncha moleng oa bohlokoa.
- Node e setseng e tla ba motso, 'me sena se tla phetha kaho ea sefate.
Nahana hore re na le mongolo o nang le litlhaku feela "a", "b", "c", "d" и "le", mme maqhubu a tsona a hlahang ke 15, 7, 6, 6, le 5, ka ho latellana. Ka tlase ke litšoantšo tse bontšang mehato ea algorithm.
Tsela e tsoang motsong ho ea sebakeng leha e le sefe sa ho qetela e tla boloka khoutu e nepahetseng ea pele (eo hape e tsejoang e le khoutu ea Huffman) e tsamaellanang le sebopeho se amanang le node eo ea ho qetela.
Sefate sa Huffman
Ka tlase u tla fumana ts'ebetsong ea Huffman compression algorithm ho C++ le Java:
#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;
// A Tree node
struct Node
{
char ch;
int freq;
Node *left, *right;
};
// Function to allocate a new tree node
Node* getNode(char ch, int freq, Node* left, Node* right)
{
Node* node = new Node();
node->ch = ch;
node->freq = freq;
node->left = left;
node->right = right;
return node;
}
// Comparison object to be used to order the heap
struct comp
{
bool operator()(Node* l, Node* r)
{
// highest priority item has lowest frequency
return l->freq > r->freq;
}
};
// traverse the Huffman Tree and store Huffman Codes
// in a map.
void encode(Node* root, string str,
unordered_map<char, string> &huffmanCode)
{
if (root == nullptr)
return;
// found a leaf node
if (!root->left && !root->right) {
huffmanCode[root->ch] = str;
}
encode(root->left, str + "0", huffmanCode);
encode(root->right, str + "1", huffmanCode);
}
// traverse the Huffman Tree and decode the encoded string
void decode(Node* root, int &index, string str)
{
if (root == nullptr) {
return;
}
// found a leaf node
if (!root->left && !root->right)
{
cout << root->ch;
return;
}
index++;
if (str[index] =='0')
decode(root->left, index, str);
else
decode(root->right, index, str);
}
// Builds Huffman Tree and decode given input text
void buildHuffmanTree(string text)
{
// count frequency of appearance of each character
// and store it in a map
unordered_map<char, int> freq;
for (char ch: text) {
freq[ch]++;
}
// Create a priority queue to store live nodes of
// Huffman tree;
priority_queue<Node*, vector<Node*>, comp> pq;
// Create a leaf node for each character and add it
// to the priority queue.
for (auto pair: freq) {
pq.push(getNode(pair.first, pair.second, nullptr, nullptr));
}
// do till there is more than one node in the queue
while (pq.size() != 1)
{
// Remove the two nodes of highest priority
// (lowest frequency) from the queue
Node *left = pq.top(); pq.pop();
Node *right = pq.top(); pq.pop();
// Create a new internal node with these two nodes
// as children and with frequency equal to the sum
// of the two nodes' frequencies. Add the new node
// to the priority queue.
int sum = left->freq + right->freq;
pq.push(getNode('', sum, left, right));
}
// root stores pointer to root of Huffman Tree
Node* root = pq.top();
// traverse the Huffman Tree and store Huffman Codes
// in a map. Also prints them
unordered_map<char, string> huffmanCode;
encode(root, "", huffmanCode);
cout << "Huffman Codes are :n" << 'n';
for (auto pair: huffmanCode) {
cout << pair.first << " " << pair.second << 'n';
}
cout << "nOriginal string was :n" << text << 'n';
// print encoded string
string str = "";
for (char ch: text) {
str += huffmanCode[ch];
}
cout << "nEncoded string is :n" << str << 'n';
// traverse the Huffman Tree again and this time
// decode the encoded string
int index = -1;
cout << "nDecoded string is: n";
while (index < (int)str.size() - 2) {
decode(root, index, str);
}
}
// Huffman coding algorithm
int main()
{
string text = "Huffman coding is a data compression algorithm.";
buildHuffmanTree(text);
return 0;
}
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
// A Tree node
class Node
{
char ch;
int freq;
Node left = null, right = null;
Node(char ch, int freq)
{
this.ch = ch;
this.freq = freq;
}
public Node(char ch, int freq, Node left, Node right) {
this.ch = ch;
this.freq = freq;
this.left = left;
this.right = right;
}
};
class Huffman
{
// traverse the Huffman Tree and store Huffman Codes
// in a map.
public static void encode(Node root, String str,
Map<Character, String> huffmanCode)
{
if (root == null)
return;
// found a leaf node
if (root.left == null && root.right == null) {
huffmanCode.put(root.ch, str);
}
encode(root.left, str + "0", huffmanCode);
encode(root.right, str + "1", huffmanCode);
}
// traverse the Huffman Tree and decode the encoded string
public static int decode(Node root, int index, StringBuilder sb)
{
if (root == null)
return index;
// found a leaf node
if (root.left == null && root.right == null)
{
System.out.print(root.ch);
return index;
}
index++;
if (sb.charAt(index) == '0')
index = decode(root.left, index, sb);
else
index = decode(root.right, index, sb);
return index;
}
// Builds Huffman Tree and huffmanCode and decode given input text
public static void buildHuffmanTree(String text)
{
// count frequency of appearance of each character
// and store it in a map
Map<Character, Integer> freq = new HashMap<>();
for (int i = 0 ; i < text.length(); i++) {
if (!freq.containsKey(text.charAt(i))) {
freq.put(text.charAt(i), 0);
}
freq.put(text.charAt(i), freq.get(text.charAt(i)) + 1);
}
// Create a priority queue to store live nodes of Huffman tree
// Notice that highest priority item has lowest frequency
PriorityQueue<Node> pq = new PriorityQueue<>(
(l, r) -> l.freq - r.freq);
// Create a leaf node for each character and add it
// to the priority queue.
for (Map.Entry<Character, Integer> entry : freq.entrySet()) {
pq.add(new Node(entry.getKey(), entry.getValue()));
}
// do till there is more than one node in the queue
while (pq.size() != 1)
{
// Remove the two nodes of highest priority
// (lowest frequency) from the queue
Node left = pq.poll();
Node right = pq.poll();
// Create a new internal node with these two nodes as children
// and with frequency equal to the sum of the two nodes
// frequencies. Add the new node to the priority queue.
int sum = left.freq + right.freq;
pq.add(new Node('', sum, left, right));
}
// root stores pointer to root of Huffman Tree
Node root = pq.peek();
// traverse the Huffman tree and store the Huffman codes in a map
Map<Character, String> huffmanCode = new HashMap<>();
encode(root, "", huffmanCode);
// print the Huffman codes
System.out.println("Huffman Codes are :n");
for (Map.Entry<Character, String> entry : huffmanCode.entrySet()) {
System.out.println(entry.getKey() + " " + entry.getValue());
}
System.out.println("nOriginal string was :n" + text);
// print encoded string
StringBuilder sb = new StringBuilder();
for (int i = 0 ; i < text.length(); i++) {
sb.append(huffmanCode.get(text.charAt(i)));
}
System.out.println("nEncoded string is :n" + sb);
// traverse the Huffman Tree again and this time
// decode the encoded string
int index = -1;
System.out.println("nDecoded string is: n");
while (index < sb.length() - 2) {
index = decode(root, index, sb);
}
}
public static void main(String[] args)
{
String text = "Huffman coding is a data compression algorithm.";
buildHuffmanTree(text);
}
}
Ela hloko: mohopolo o sebelisoang ke khoele ea ho kenya ke 47 * 8 = 376 bits le khoele e kentsoeng ke likotoana tsa 194 feela ke hore. data e petelitsoe ke hoo e ka bang 48%. Lenaneong la C ++ le ka holimo, re sebelisa sehlopha sa likhoele ho boloka khoele e kentsoeng ho etsa hore lenaneo le balehe.
Hobane libopeho tse sebetsang hantle tsa lethathamo la data li hloka ho kengoa ka 'ngoe O(log(N)) nako, empa ka ho feletseng binary sefate le N makhasi a teng 2N-1 nodes, 'me sefate sa Huffman ke sefate sa binary se feletseng, joale algorithm e kena O(Nlog(N)) nako, kae N - Litlhaku.
Lisebelisoa:
Source: www.habr.com