Huffman compression algorithm

I mua i te tiimata o te akoranga "Taurangi mo nga Kaihanga" kua whakaritea mo koe he whakamaoritanga o tetahi atu rauemi whai hua.

Ko te Huffman coding he algorithm taapiri raraunga e whakatakoto ana i te whakaaro taketake mo te whakakoi i nga konae. I roto i tenei tuhinga, ka korero tatou mo te whakawaehere roa me te rereke rereke, nga waehere ka taea te wetewete, nga ture prefix, me te hanga rakau Huffman.

E mohio ana tatou kei te rongoa ia ahuatanga hei raupapa o te 0 me te 1 ka tangohia e 8 nga moka. Ka kiia tenei ko te whakawaehere roa pumau na te mea he rite tonu te maha o nga moka ka whakamahia e ia ahuatanga hei penapena.

Me kii ka tukuna he tuhinga. Me pehea e taea ai e tatou te whakaiti i te nui o te waahi e hiahiatia ana hei penapena i te ahua kotahi?

Ko te whakaaro matua ko te whakawaehere roa rereke. Ka taea e tatou te whakamahi i te meka ko etahi o nga tohu i roto i te tuhinga ka nui ake i era atu (kite konei) ki te whakawhanake i te hātepee hei tohu i te raupapa o nga tohu i roto i nga moka iti. I roto i te whakawaehere roa rereke, ka tautapahia e matou nga tohu he tau rerekee o nga moka, i runga i te maha o nga wa ka puta ki tetahi tuhinga. I te mutunga, ka 1 moka te tango a etahi, ko etahi ka 2 moka, 3 neke atu ranei. Ko te raruraru ki te whakawaehere roa taurangi ko te wetewaehere o muri o te raupapa.

Me pehea, ma te mohio ki te raupapa o nga paraka, te wetewete i te kore korero?

Whakaarohia te raina "abacdab". E 8 nga tohu, a, ka whakawaeheretia te roa, me 64 nga moka hei penapena. Kia mahara ko te auau tohu "a", "b", "c" и "D" he rite ki te 4, 2, 1, 1. Kia tamata tatou ki te whakaaro "abacdab" iti moka, te whakamahi i te meka e "ki" he maha ake nga wa "B"a "B" he maha ake nga wa "c" и "D". Me timata ma te whakawaehere "ki" kotahi moka rite ki te 0, "B" ka tautapa tatou i te waehere moka-rua 11, ka whakamahi i nga moka e toru 100 me te 011 ka whakawaeheretia "c" и "D".

Ko te mutunga, ka whiwhi tatou:

a
0

b
11

c
100

d
011

Na te raina "abacdab" ka whakawaeheretia tatou hei 00110100011011 (0|0|11|0|100|011|0|11)te whakamahi i nga waehere o runga. Heoi, ko te raru nui ko te wetewete. Ina ngana ana tatou ki te wetewete i te aho 00110100011011, ka whiwhi tatou i te hua rangirua, na te mea ka taea te tohu hei:

0|011|0|100|011|0|11    adacdab
0|0|11|0|100|0|11|011   aabacabd
0|011|0|100|0|11|0|11   adacabab 

...
me te pera.

Hei karo i tenei rangirua, me whakarite kia tutuki to taatau whakawaehere ki te kaupapa penei ture prefix, e tohu ana ka taea anake te wetewete i nga waehere ki tetahi huarahi ahurei. Ko te ture prefix ka whakarite kia kore he waehere he prefix o tetahi atu. Ma te waehere, ko te tikanga ko nga paraka i whakamahia hei tohu i tetahi ahuatanga. I te tauira i runga ake nei 0 he prefix 011, e takahi ana i te ture prefix. Na, ki te pai o tatou waehere ki te ture prefix, katahi ka taea e tatou te wetewete motuhake (me te rereke).

Kia titiro ano tatou ki te tauira i runga ake nei. I tenei wa ka tohua e matou mo nga tohu "a", "b", "c" и "D" waehere e tutuki ana i te ture prefix.

a
0

b
10

c
110

d
111

Ki tenei whakawaehere, te aho "abacdab" ka whakawaeheretia hei 00100100011010 (0|0|10|0|100|011|0|10). Engari 00100100011010 kua taea e tatou te wetewete me te hoki ki to tatou aho taketake "abacdab".

Huffman waehere

Inaianei kua korero tatou mo te whakawaehere roa rereke me te ture prefix, me korero tatou mo te whakawaehere Huffman.

Ko te tikanga ko te hanga i nga rakau rua. I roto, ka taea e te node te whakamutunga, o roto ranei. I te timatanga, ka kiia nga kohanga katoa he rau (nga pito), e tohu ana i te tohu ake me tona taumaha (ara, te auau o te puta). Kei roto i nga pona o roto te taumaha o te ahua me te tohu ki nga pona uri e rua. Na te whakaaetanga whanui, bit "0" e tohu ana i te whai i te peka maui, a "1" - kei te taha matau. i te rakau katoa N rau me N-1 pona o roto. E taunaki ana, i te wa e hanga ana i te rakau Huffman, ka makahia nga tohu kaore i whakamahia kia whiwhi waehere roa tino pai.

Ka whakamahia e matou he rarangi matua ki te hanga i tetahi rakau Huffman, ko te pona he iti te auau ka whakawhiwhia ki te kaupapa matua. Ko nga waahanga hanga e whakaahuahia ana i raro nei:

  1. Waihangatia he node rau mo ia ahua ka taapiri atu ki te rarangi matua.
  2. Ahakoa he nui ake i te kotahi rau kei roto i te rarangi, mahia enei e whai ake nei:
    • Tango i nga pona e rua me te kaupapa matua teitei (te auau iti) mai i te rarangi;
    • Waihangatia he node a-roto hou, hei tamariki enei pona e rua, a ka rite te auau o te puta ki te tapeke o nga iarere o enei pona e rua.
    • Tāpirihia he kōpuku hou ki te tūtira matua.
  3. Ko te kohanga e toe ana ko te putake, ka oti te hanga o te rakau.

Whakaarohia kei a maatau etahi tuhinga kei roto anake nga tohu "a", "b", "c", "d" и "me", a ko te 15, te 7, te 6, te 6, me te 5 o ratou auau puta. Kei raro nei nga whakaahua e whakaatu ana i nga hikoinga o te algorithm.

Huffman compression algorithm

Huffman compression algorithm

Huffman compression algorithm

Huffman compression algorithm

Huffman compression algorithm

Ko te ara mai i te putake ki tetahi pona mutunga ka penapena i te waehere prefix tino pai (e mohiotia ana ko te Waehere Huffman) e rite ana ki te ahua e hono ana ki taua pona mutunga.

Huffman compression algorithm
rakau Huffman

Kei raro ka kitea e koe te whakatinanatanga o te Huffman compression algorithm i C++ me Java:

#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;

// A Tree node
struct Node
{
	char ch;
	int freq;
	Node *left, *right;
};

// Function to allocate a new tree node
Node* getNode(char ch, int freq, Node* left, Node* right)
{
	Node* node = new Node();

	node->ch = ch;
	node->freq = freq;
	node->left = left;
	node->right = right;

	return node;
}

// Comparison object to be used to order the heap
struct comp
{
	bool operator()(Node* l, Node* r)
	{
		// highest priority item has lowest frequency
		return l->freq > r->freq;
	}
};

// traverse the Huffman Tree and store Huffman Codes
// in a map.
void encode(Node* root, string str,
			unordered_map<char, string> &huffmanCode)
{
	if (root == nullptr)
		return;

	// found a leaf node
	if (!root->left && !root->right) {
		huffmanCode[root->ch] = str;
	}

	encode(root->left, str + "0", huffmanCode);
	encode(root->right, str + "1", huffmanCode);
}

// traverse the Huffman Tree and decode the encoded string
void decode(Node* root, int &index, string str)
{
	if (root == nullptr) {
		return;
	}

	// found a leaf node
	if (!root->left && !root->right)
	{
		cout << root->ch;
		return;
	}

	index++;

	if (str[index] =='0')
		decode(root->left, index, str);
	else
		decode(root->right, index, str);
}

// Builds Huffman Tree and decode given input text
void buildHuffmanTree(string text)
{
	// count frequency of appearance of each character
	// and store it in a map
	unordered_map<char, int> freq;
	for (char ch: text) {
		freq[ch]++;
	}

	// Create a priority queue to store live nodes of
	// Huffman tree;
	priority_queue<Node*, vector<Node*>, comp> pq;

	// Create a leaf node for each character and add it
	// to the priority queue.
	for (auto pair: freq) {
		pq.push(getNode(pair.first, pair.second, nullptr, nullptr));
	}

	// do till there is more than one node in the queue
	while (pq.size() != 1)
	{
		// Remove the two nodes of highest priority
		// (lowest frequency) from the queue
		Node *left = pq.top(); pq.pop();
		Node *right = pq.top();	pq.pop();

		// Create a new internal node with these two nodes
		// as children and with frequency equal to the sum
		// of the two nodes' frequencies. Add the new node
		// to the priority queue.
		int sum = left->freq + right->freq;
		pq.push(getNode('', sum, left, right));
	}

	// root stores pointer to root of Huffman Tree
	Node* root = pq.top();

	// traverse the Huffman Tree and store Huffman Codes
	// in a map. Also prints them
	unordered_map<char, string> huffmanCode;
	encode(root, "", huffmanCode);

	cout << "Huffman Codes are :n" << 'n';
	for (auto pair: huffmanCode) {
		cout << pair.first << " " << pair.second << 'n';
	}

	cout << "nOriginal string was :n" << text << 'n';

	// print encoded string
	string str = "";
	for (char ch: text) {
		str += huffmanCode[ch];
	}

	cout << "nEncoded string is :n" << str << 'n';

	// traverse the Huffman Tree again and this time
	// decode the encoded string
	int index = -1;
	cout << "nDecoded string is: n";
	while (index < (int)str.size() - 2) {
		decode(root, index, str);
	}
}

// Huffman coding algorithm
int main()
{
	string text = "Huffman coding is a data compression algorithm.";

	buildHuffmanTree(text);

	return 0;
}

import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

// A Tree node
class Node
{
	char ch;
	int freq;
	Node left = null, right = null;

	Node(char ch, int freq)
	{
		this.ch = ch;
		this.freq = freq;
	}

	public Node(char ch, int freq, Node left, Node right) {
		this.ch = ch;
		this.freq = freq;
		this.left = left;
		this.right = right;
	}
};

class Huffman
{
	// traverse the Huffman Tree and store Huffman Codes
	// in a map.
	public static void encode(Node root, String str,
							  Map<Character, String> huffmanCode)
	{
		if (root == null)
			return;

		// found a leaf node
		if (root.left == null && root.right == null) {
			huffmanCode.put(root.ch, str);
		}


		encode(root.left, str + "0", huffmanCode);
		encode(root.right, str + "1", huffmanCode);
	}

	// traverse the Huffman Tree and decode the encoded string
	public static int decode(Node root, int index, StringBuilder sb)
	{
		if (root == null)
			return index;

		// found a leaf node
		if (root.left == null && root.right == null)
		{
			System.out.print(root.ch);
			return index;
		}

		index++;

		if (sb.charAt(index) == '0')
			index = decode(root.left, index, sb);
		else
			index = decode(root.right, index, sb);

		return index;
	}

	// Builds Huffman Tree and huffmanCode and decode given input text
	public static void buildHuffmanTree(String text)
	{
		// count frequency of appearance of each character
		// and store it in a map
		Map<Character, Integer> freq = new HashMap<>();
		for (int i = 0 ; i < text.length(); i++) {
			if (!freq.containsKey(text.charAt(i))) {
				freq.put(text.charAt(i), 0);
			}
			freq.put(text.charAt(i), freq.get(text.charAt(i)) + 1);
		}

		// Create a priority queue to store live nodes of Huffman tree
		// Notice that highest priority item has lowest frequency
		PriorityQueue<Node> pq = new PriorityQueue<>(
										(l, r) -> l.freq - r.freq);

		// Create a leaf node for each character and add it
		// to the priority queue.
		for (Map.Entry<Character, Integer> entry : freq.entrySet()) {
			pq.add(new Node(entry.getKey(), entry.getValue()));
		}

		// do till there is more than one node in the queue
		while (pq.size() != 1)
		{
			// Remove the two nodes of highest priority
			// (lowest frequency) from the queue
			Node left = pq.poll();
			Node right = pq.poll();

			// Create a new internal node with these two nodes as children 
			// and with frequency equal to the sum of the two nodes
			// frequencies. Add the new node to the priority queue.
			int sum = left.freq + right.freq;
			pq.add(new Node('', sum, left, right));
		}

		// root stores pointer to root of Huffman Tree
		Node root = pq.peek();

		// traverse the Huffman tree and store the Huffman codes in a map
		Map<Character, String> huffmanCode = new HashMap<>();
		encode(root, "", huffmanCode);

		// print the Huffman codes
		System.out.println("Huffman Codes are :n");
		for (Map.Entry<Character, String> entry : huffmanCode.entrySet()) {
			System.out.println(entry.getKey() + " " + entry.getValue());
		}

		System.out.println("nOriginal string was :n" + text);

		// print encoded string
		StringBuilder sb = new StringBuilder();
		for (int i = 0 ; i < text.length(); i++) {
			sb.append(huffmanCode.get(text.charAt(i)));
		}

		System.out.println("nEncoded string is :n" + sb);

		// traverse the Huffman Tree again and this time
		// decode the encoded string
		int index = -1;
		System.out.println("nDecoded string is: n");
		while (index < sb.length() - 2) {
			index = decode(root, index, sb);
		}
	}

	public static void main(String[] args)
	{
		String text = "Huffman coding is a data compression algorithm.";

		buildHuffmanTree(text);
	}
}

Tuhipoka: ko te mahara e whakamahia ana e te aho whakauru he 47 * 8 = 376 moka me te aho whakawaehere he 194 moka noa i.e. ka kōpekehia te raraunga e pā ana ki te 48%. I roto i te kaupapa C++ i runga ake nei, ka whakamahia e matou te karaehe aho ki te penapena i te aho kua whakawaeheretia kia pai ai te panui o te papatono.

Na te mea e tika ana nga hanganga raraunga rarangi matua mo ia whakaurunga O(takiuru(N)) wa, engari i roto i te rakau rua oti ki N rau kei reira 2N-1 nodes, a ko te rakau Huffman he rakau rua katoa, katahi ka uru te algorithm O(Nlog(N)) te wa, kei hea N - Pūāhua.

Kaupapa:

en.wikipedia.org/wiki/Huffman_coding
en.wikipedia.org/wiki/Variable-length_code
www.youtube.com/watch?v=5wRPin4oxCo

Ako atu mo te akoranga.

Source: will.com

Tāpiri i te kōrero