Huffman kaomi algorithm

Ma mua o ka hoʻomaka ʻana o ka papa "Algorithms no nā mea hoʻomohala" Ua hoʻomākaukau ʻia no ʻoe i ka unuhi ʻana o kekahi mea pono ʻē aʻe.

ʻO Huffman coding kahi algorithm compression data e hoʻokumu i ka manaʻo kumu o ka hoʻopili faila. Ma kēia ʻatikala, e kamaʻilio mākou e pili ana i ka hoʻopili ʻana i ka lōʻihi a me ka loli, nā code decodable kū hoʻokahi, nā lula prefix, a me ke kūkulu ʻana i kahi lāʻau Huffman.

Ua ʻike mākou ua mālama ʻia kēlā me kēia ʻano ma ke ʻano he kaʻina o 0 a me 1 a lawe i 8 mau bits. Kapa ʻia kēia i ka hoʻopili ʻana i ka lōʻihi paʻa no ka mea hoʻohana kēlā me kēia ʻano i ka helu paʻa o nā bits e mālama ai.

E ʻōlelo kākou ua hāʻawi ʻia i ka kikokikona. Pehea e hiki ai iā mākou ke hoʻemi i ka nui o ka lumi e pono ai e mālama i kahi ʻano hoʻokahi?

ʻO ka manaʻo nui ʻo ka hoʻopili ʻana i ka lōʻihi. Hiki iā mākou ke hoʻohana i ka ʻoiaʻiʻo o kekahi mau huaʻōlelo i loko o ka kikokikona i ʻoi aku ka pinepine ma mua o nā mea ʻē aʻe (e nana maanei) e hoʻomohala i kahi algorithm e hōʻike i ke kaʻina like o nā huaʻōlelo i nā ʻāpana liʻiliʻi. Ma ka hoʻopili ʻana i ka lōʻihi o ka lōʻihi, hāʻawi mākou i nā huaʻōlelo i kahi helu ʻokoʻa o nā bits, ma muli o ka pinepine o ka ʻike ʻia ʻana i loko o kahi kikokikona. ʻO ka hopena, hiki i kekahi mau huaʻōlelo ke lawe i ka liʻiliʻi e like me ka 1 bit, a ʻo nā mea ʻē aʻe e lawe i 2 bits, 3 a i ʻole. ʻO ka pilikia me ka hoʻololi ʻana i ka lōʻihi o ka hoʻololi ʻana ʻo ia wale nō ka decoding hope o ke kaʻina.

Pehea, me ka ʻike ʻana i ke kaʻina o nā bits, decode me ka maopopo ʻole?

E noʻonoʻo i ka laina "abacdab". Loaʻa iā ia he 8 mau huaʻōlelo, a i ka wā e hoʻopili ai i kahi lōʻihi paʻa, pono ia i 64 mau bits e mālama ai. E hoʻomaopopo i ka pinepine hōʻailona "a", "b", "c" и "D" ua like ia me 4, 2, 1, 1. E hoao kakou e noonoo "abacdab" liʻiliʻi liʻiliʻi, hoʻohana i ka ʻoiaʻiʻo "iā" puka pinepine ma mua o "B"a me ka "B" puka pinepine ma mua o "c" и "D". E hoʻomaka kākou ma ka coding "iā" me hoʻokahi bit like me 0, "B" e hāʻawi mākou i kahi code ʻelua-bit 11, a me ka hoʻohana ʻana i ʻekolu mau bits 100 a me 011 e hoʻopili mākou. "c" и "D".

ʻO ka hopena, e loaʻa iā mākou:

a
0

b
11

c
100

d
011

No laila ka laina "abacdab" e hoʻopaʻa inoa mākou e like me 00110100011011 (0|0|11|0|100|011|0|11)me ka hoʻohana ʻana i nā code ma luna. Eia naʻe, ʻo ka pilikia nui ma ka decoding. Ke ho'āʻo mākou e hoʻokaʻawale i ke kaula 00110100011011, loaʻa iā mākou kahi hopena ambiguous, no ka mea hiki ke hōʻike ʻia e like me:

0|011|0|100|011|0|11    adacdab
0|0|11|0|100|0|11|011   aabacabd
0|011|0|100|0|11|0|11   adacabab 

...
a pēlā aku nō.

I mea e pale aku ai i kēia ambiguity, pono mākou e hōʻoia i kā mākou hoʻopili ʻana i ka manaʻo like lula prefix, ʻo ia hoʻi ka manaʻo e hiki ke hoʻokaʻawale ʻia nā code i hoʻokahi ala kūʻokoʻa. Hoʻomaopopo ka lula prefix ʻaʻohe code he prefix o kekahi. Ma ke code, ke ʻōlelo nei mākou i nā ʻāpana i hoʻohana ʻia e hōʻike i kahi ʻano kikoʻī. Ma ka laana maluna 0 he prefix 011, e kue ana i ka rula prefix. No laila, inā ʻoluʻolu kā mākou mau code i ka lula prefix, a laila hiki iā mākou ke hoʻokaʻawale kūʻokoʻa (a ʻo ia hoʻi).

E nānā hou kākou i ka laʻana ma luna. I kēia manawa e hoʻonohonoho mākou no nā hōʻailona "a", "b", "c" и "D" nā code i hoʻokō i ka lula prefix.

a
0

b
10

c
110

d
111

Me kēia hoʻopili, ke kaula "abacdab" e hoʻopaʻa ʻia e like me 00100100011010 (0|0|10|0|100|011|0|10). ^ E Ha yM. A eia nō 00100100011010 hiki iā mākou ke hoʻololi a hoʻihoʻi i kā mākou kaula kumu "abacdab".

Huffman coding

I kēia manawa ua pili mākou i ka hoʻopili ʻana i ka lōʻihi o ka lōʻihi a me ke kānāwai prefix, e kamaʻilio e pili ana i ka hoʻopili ʻana iā Huffman.

Hoʻokumu ʻia ke ʻano ma ka hana ʻana i nā lāʻau binary. I loko o ia mea, hiki i ka node ke hope a i loko paha. I ka hoʻomaka ʻana, ua manaʻo ʻia nā nodes a pau i nā lau (terminals), e hōʻike ana i ka hōʻailona ponoʻī a me kona kaumaha (ʻo ia hoʻi, ke alapine o ka hanana). Loaʻa i nā node i loko ke kaumaha o ke ʻano a kuhikuhi i nā node mamo ʻelua. Ma ka ʻaelike nui, bit "0" hōʻike ʻia ma hope o ka lālā hema, a "1" - ma ka ʻākau. i ka laau piha N lau a me N-1 nā ʻōpū o loko. Manaʻo ʻia i ke kūkulu ʻana i kahi lāʻau Huffman, e hoʻolei ʻia nā hōʻailona i hoʻohana ʻole ʻia no ka loaʻa ʻana o nā code lōʻihi maikaʻi loa.

E hoʻohana mākou i kahi laina koʻikoʻi no ke kūkulu ʻana i kahi lāʻau Huffman, kahi e hāʻawi ʻia ai ka node me ka alapine haʻahaʻa i ka mea kiʻekiʻe loa. Ua wehewehe ʻia nā ʻanuʻu hana ma lalo nei:

  1. E hana i kahi node lau no kēlā me kēia ʻano a hoʻohui iā lākou i ka pila koʻikoʻi.
  2. ʻOiai ʻoi aku ma mua o hoʻokahi pepa i ka pila, e hana i kēia:
    • Wehe i nā node ʻelua me ka mea nui loa (frequency haʻahaʻa) mai ka pila;
    • E hana i kahi node kūloko hou, kahi e lilo ai kēia mau node ʻelua i mau keiki, a ua like ka pinepine o ka hanana ʻana me ka huina o nā alapine o kēia mau node ʻelua.
    • E hoʻohui i kahi node hou i ka pila koʻikoʻi.
  3. ʻO ke kumu wale nō ke koena o ke kumu, a ʻo kēia ka mea e hoʻopau ai i ke kūkulu ʻana i ka lāʻau.

E noʻonoʻo e loaʻa iā mākou kekahi mau kikokikona me nā huaʻōlelo wale nō "a", "b", "c", "d" и "a me", a he 15, 7, 6, 6, a me 5 ko lakou mau alapine. Aia ma lalo nā kiʻi e hōʻike ana i nā ʻanuʻu o ka algorithm.

Huffman kaomi algorithm

Huffman kaomi algorithm

Huffman kaomi algorithm

Huffman kaomi algorithm

Huffman kaomi algorithm

ʻO ke ala mai ke kumu a hiki i kekahi node hope e mālama i ke code prefix maikaʻi loa (ʻike pū ʻia ʻo Huffman code) e pili ana i ke ʻano pili me kēlā node hope.

Huffman kaomi algorithm
lāʻau Huffman

Ma lalo ʻoe e ʻike ai i ka hoʻokō ʻana o ka Huffman compression algorithm ma C ++ a me Java:

#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;

// A Tree node
struct Node
{
	char ch;
	int freq;
	Node *left, *right;
};

// Function to allocate a new tree node
Node* getNode(char ch, int freq, Node* left, Node* right)
{
	Node* node = new Node();

	node->ch = ch;
	node->freq = freq;
	node->left = left;
	node->right = right;

	return node;
}

// Comparison object to be used to order the heap
struct comp
{
	bool operator()(Node* l, Node* r)
	{
		// highest priority item has lowest frequency
		return l->freq > r->freq;
	}
};

// traverse the Huffman Tree and store Huffman Codes
// in a map.
void encode(Node* root, string str,
			unordered_map<char, string> &huffmanCode)
{
	if (root == nullptr)
		return;

	// found a leaf node
	if (!root->left && !root->right) {
		huffmanCode[root->ch] = str;
	}

	encode(root->left, str + "0", huffmanCode);
	encode(root->right, str + "1", huffmanCode);
}

// traverse the Huffman Tree and decode the encoded string
void decode(Node* root, int &index, string str)
{
	if (root == nullptr) {
		return;
	}

	// found a leaf node
	if (!root->left && !root->right)
	{
		cout << root->ch;
		return;
	}

	index++;

	if (str[index] =='0')
		decode(root->left, index, str);
	else
		decode(root->right, index, str);
}

// Builds Huffman Tree and decode given input text
void buildHuffmanTree(string text)
{
	// count frequency of appearance of each character
	// and store it in a map
	unordered_map<char, int> freq;
	for (char ch: text) {
		freq[ch]++;
	}

	// Create a priority queue to store live nodes of
	// Huffman tree;
	priority_queue<Node*, vector<Node*>, comp> pq;

	// Create a leaf node for each character and add it
	// to the priority queue.
	for (auto pair: freq) {
		pq.push(getNode(pair.first, pair.second, nullptr, nullptr));
	}

	// do till there is more than one node in the queue
	while (pq.size() != 1)
	{
		// Remove the two nodes of highest priority
		// (lowest frequency) from the queue
		Node *left = pq.top(); pq.pop();
		Node *right = pq.top();	pq.pop();

		// Create a new internal node with these two nodes
		// as children and with frequency equal to the sum
		// of the two nodes' frequencies. Add the new node
		// to the priority queue.
		int sum = left->freq + right->freq;
		pq.push(getNode('', sum, left, right));
	}

	// root stores pointer to root of Huffman Tree
	Node* root = pq.top();

	// traverse the Huffman Tree and store Huffman Codes
	// in a map. Also prints them
	unordered_map<char, string> huffmanCode;
	encode(root, "", huffmanCode);

	cout << "Huffman Codes are :n" << 'n';
	for (auto pair: huffmanCode) {
		cout << pair.first << " " << pair.second << 'n';
	}

	cout << "nOriginal string was :n" << text << 'n';

	// print encoded string
	string str = "";
	for (char ch: text) {
		str += huffmanCode[ch];
	}

	cout << "nEncoded string is :n" << str << 'n';

	// traverse the Huffman Tree again and this time
	// decode the encoded string
	int index = -1;
	cout << "nDecoded string is: n";
	while (index < (int)str.size() - 2) {
		decode(root, index, str);
	}
}

// Huffman coding algorithm
int main()
{
	string text = "Huffman coding is a data compression algorithm.";

	buildHuffmanTree(text);

	return 0;
}

import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

// A Tree node
class Node
{
	char ch;
	int freq;
	Node left = null, right = null;

	Node(char ch, int freq)
	{
		this.ch = ch;
		this.freq = freq;
	}

	public Node(char ch, int freq, Node left, Node right) {
		this.ch = ch;
		this.freq = freq;
		this.left = left;
		this.right = right;
	}
};

class Huffman
{
	// traverse the Huffman Tree and store Huffman Codes
	// in a map.
	public static void encode(Node root, String str,
							  Map<Character, String> huffmanCode)
	{
		if (root == null)
			return;

		// found a leaf node
		if (root.left == null && root.right == null) {
			huffmanCode.put(root.ch, str);
		}


		encode(root.left, str + "0", huffmanCode);
		encode(root.right, str + "1", huffmanCode);
	}

	// traverse the Huffman Tree and decode the encoded string
	public static int decode(Node root, int index, StringBuilder sb)
	{
		if (root == null)
			return index;

		// found a leaf node
		if (root.left == null && root.right == null)
		{
			System.out.print(root.ch);
			return index;
		}

		index++;

		if (sb.charAt(index) == '0')
			index = decode(root.left, index, sb);
		else
			index = decode(root.right, index, sb);

		return index;
	}

	// Builds Huffman Tree and huffmanCode and decode given input text
	public static void buildHuffmanTree(String text)
	{
		// count frequency of appearance of each character
		// and store it in a map
		Map<Character, Integer> freq = new HashMap<>();
		for (int i = 0 ; i < text.length(); i++) {
			if (!freq.containsKey(text.charAt(i))) {
				freq.put(text.charAt(i), 0);
			}
			freq.put(text.charAt(i), freq.get(text.charAt(i)) + 1);
		}

		// Create a priority queue to store live nodes of Huffman tree
		// Notice that highest priority item has lowest frequency
		PriorityQueue<Node> pq = new PriorityQueue<>(
										(l, r) -> l.freq - r.freq);

		// Create a leaf node for each character and add it
		// to the priority queue.
		for (Map.Entry<Character, Integer> entry : freq.entrySet()) {
			pq.add(new Node(entry.getKey(), entry.getValue()));
		}

		// do till there is more than one node in the queue
		while (pq.size() != 1)
		{
			// Remove the two nodes of highest priority
			// (lowest frequency) from the queue
			Node left = pq.poll();
			Node right = pq.poll();

			// Create a new internal node with these two nodes as children 
			// and with frequency equal to the sum of the two nodes
			// frequencies. Add the new node to the priority queue.
			int sum = left.freq + right.freq;
			pq.add(new Node('', sum, left, right));
		}

		// root stores pointer to root of Huffman Tree
		Node root = pq.peek();

		// traverse the Huffman tree and store the Huffman codes in a map
		Map<Character, String> huffmanCode = new HashMap<>();
		encode(root, "", huffmanCode);

		// print the Huffman codes
		System.out.println("Huffman Codes are :n");
		for (Map.Entry<Character, String> entry : huffmanCode.entrySet()) {
			System.out.println(entry.getKey() + " " + entry.getValue());
		}

		System.out.println("nOriginal string was :n" + text);

		// print encoded string
		StringBuilder sb = new StringBuilder();
		for (int i = 0 ; i < text.length(); i++) {
			sb.append(huffmanCode.get(text.charAt(i)));
		}

		System.out.println("nEncoded string is :n" + sb);

		// traverse the Huffman Tree again and this time
		// decode the encoded string
		int index = -1;
		System.out.println("nDecoded string is: n");
		while (index < sb.length() - 2) {
			index = decode(root, index, sb);
		}
	}

	public static void main(String[] args)
	{
		String text = "Huffman coding is a data compression algorithm.";

		buildHuffmanTree(text);
	}
}

'Ōlelo Aʻo: ʻo ka hoʻomanaʻo i hoʻohana ʻia e ke kaula hoʻokomo he 47 * 8 = 376 mau bits a ʻo ke kaula i hoʻopaʻa ʻia he 194 bits wale nō ʻo ia. Ua hoʻopili ʻia ka ʻikepili ma kahi o 48%. Ma ka papahana C++ ma luna, hoʻohana mākou i ka papa string e mālama i ke kaula i hoʻopaʻa ʻia e hiki ke heluhelu ʻia ka papahana.

No ka mea e pono ana i ka hoʻokomo ʻana i nā hoʻolālā ʻikepili pila koʻikoʻi O(log(N)) manawa, akā ma kahi lāʻau binary piha me N lau i keia manawa 2N-1 nodes, a ʻo ka lāʻau Huffman he lāʻau binary piha, a laila holo ka algorithm O(Nlog(N)) manawa, kahi N - Nā huapalapala.

Nā kumuhana:

en.wikipedia.org/wiki/Huffman_coding
en.wikipedia.org/wiki/Variable-length_code
www.youtube.com/watch?v=5wRPin4oxCo

E aʻo hou e pili ana i ka papa.

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka