Huffman mkpakọ algọridim

Tupu mmalite nke usoro "Algorithms maka ndị mmepe" kwadebere gị ntụgharị asụsụ nke ihe ọzọ bara uru.

Huffman coding bụ data mkpakọ algọridim nke na-emepụta echiche bụ isi nke mkpakọ faịlụ. N'edemede a, anyị ga-ekwu maka itinye koodu ogologo na agbanwe agbanwe, koodu ndị nwere ike ime nke ọma, iwu prefix, na iwulite osisi Huffman.

Anyị maara na a na-echekwa agwa ọ bụla dịka usoro nke 0 na 1 wee were 8 bits. A na-akpọ nke a itinye ngbanwe ogologo ogologo n'ihi na agwa ọ bụla na-eji otu ọnụọgụ ibe n'ibe echekwabara.

Ka anyị kwuo na e nyere anyị ederede. Kedu ka anyị ga-esi belata ohere a chọrọ iji chekwaa otu agwa?

Isi echiche bụ mgbanwe ogologo ngbanwe. Anyị nwere ike iji eziokwu ahụ bụ na ụfọdụ mkpụrụedemede na ederede na-emekarị karịa ndị ọzọ (lee ebe a) iji mepụta algọridim nke ga-anọchi anya otu usoro mkpụrụedemede na ntakịrị ntakịrị. Na ngbanwe ogologo mgbanwe, anyị na-ekenye mkpụrụedemede ọnụọgụ ibe n'ụdị agbanwe agbanwe, dabere na ugboro ole ha na-apụta na ederede enyere. N'ikpeazụ, ụfọdụ mkpụrụedemede nwere ike were ihe dịka 1 bit, ebe ndị ọzọ nwere ike were 2 bit, 3 ma ọ bụ karịa. Nsogbu dị na ngbanwe ogologo ngbanwe bụ naanị ngbanwe nke usoro a na-esote.

Kedu otu, n'ịmara usoro nke bits, decode ya n'enweghị mgbagha?

Tụlee ahịrị ahụ "Akụkụ". Ọ nwere mkpụrụedemede 8, ma mgbe ị na-etinye koodu ogologo, ọ ga-achọ 64 bit iji chekwaa ya. Rịba ama na ugboro akara "a", "b", "c" и "D" nhata 4, 2, 1, 1 n'otu n'otu. Ka anyị gbalịa iche n'echiche "Akụkụ" obere ibe n'ibe, na-eji eziokwu ahụ "ka" na-eme ugboro ugboro karịa "B"na "B" na-eme ugboro ugboro karịa "c" и "D". Ka anyị malite site na itinye koodu "ka" ya na otu bit hà nhata 0, "B" anyị ga-ekenye koodu 11-bit abụọ, na iji ibe 100 na 011 atọ, anyị ga-etinye koodu. "c" и "D".

N'ihi ya, anyị ga-enweta:

a
0

b
11

c
100

d
011

Ya mere ahịrị "Akụkụ" anyị ga-encode dị ka 00110100011011 (0|0|11|0|100|011|0|11)iji koodu ndị dị n'elu. Otú ọ dị, isi nsogbu ga-abụ na ntọpụta. Mgbe anyị na-agbalị decode nke eriri 00110100011011, anyị na-enweta nsonaazụ na-enweghị atụ, ebe ọ bụ na enwere ike ịnọchite anya ya dị ka:

0|011|0|100|011|0|11    adacdab
0|0|11|0|100|0|11|011   aabacabd
0|011|0|100|0|11|0|11   adacabab 

...
na ihe ndị ọzọ.

Iji zere mgbagha a, anyị ga-ahụrịrị na ntinye koodu anyị na-eju echiche dị ka iwu prefix, nke n'aka nke ya na-egosi na koodu ndị ahụ nwere ike ịmegharị naanị n'otu ụzọ pụrụ iche. Iwu prefix na-eme ka o doo anya na ọ nweghị koodu bụ nganiihu nke ọzọ. Site na koodu, anyị na-ekwu na ibe n'ibe eji anọchi anya otu agwa. N'ihe atụ dị n'elu 0 bụ prefix 011, nke na-emebi iwu prefix. Yabụ, ọ bụrụ na koodu anyị mejuo iwu prefix, mgbe ahụ anyị nwere ike depụta n'ụzọ pụrụ iche (na ọzọ).

Ka anyị legharịa ihe atụ n'elu. Oge a anyị ga-ekenye maka akara "a", "b", "c" и "D" Koodu na-emeju iwu prefix.

a
0

b
10

c
110

d
111

Site na ntinye koodu a, eriri "Akụkụ" ga-encoded ka 00100100011010 (0|0|10|0|100|011|0|10). Na ebe a 00100100011010 anyị ga-enwe ike imezi koodu n'enweghị mgbagha wee laghachi na eriri mbụ anyị "Akụkụ".

Huffman koodu

Ugbu a anyị elelela mgbanwe ngbanwe ogologo ogologo yana iwu prefix, ka anyị kwuo maka itinye koodu Huffman.

Usoro a dabeere na ịmepụta osisi ọnụọgụ abụọ. N'ime ya, ọnụ nwere ike ịbụ nke ikpeazụ ma ọ bụ n'ime. Na mbido, a na-ewere oghere niile dị ka akwụkwọ (ọnụ), nke na-anọchite anya akara n'onwe ya na ịdị arọ ya (ya bụ, ugboro ugboro). Ọnụ ọnụ ime ahụ nwere ịdị arọ nke agwa ma na-ezo aka na ọnụ ụzọ abụọ sitere na mkpụrụ. Site na nkwekọrịta izugbe, bit "0" na-anọchi anya iso ngalaba aka ekpe, na "1" - n'aka nri. na osisi zuru oke N epupụta na N-1 esịtidem ọnụ. A na-atụ aro na mgbe ị na-arụ osisi Huffman, a ga-atụfu akara ndị na-ejighị ya iji nweta koodu ogologo kacha mma.

Anyị ga-eji kwụ n'ahịrị dị mkpa iji wuo osisi Huffman, ebe a ga-enye ọnụ na ugboro kacha nta kacha mkpa. A kọwara usoro ihe owuwu ihe n'okpuru:

  1. Mepụta ọnụ akwụkwọ maka agwa ọ bụla ma tinye ha na kwụ n'ahịrị mbụ.
  2. Mgbe enwere ihe karịrị otu mpempe akwụkwọ na kwụ n'ahịrị, mee ihe ndị a:
    • Wepu ọnụ ụzọ abụọ ahụ na mkpa kachasị elu (ugboro kachasị ala) site na kwụ n'ahịrị;
    • Mepụta oghere dị n'ime ọhụrụ, ebe ọnụ abụọ a ga-abụ ụmụaka, na ugboro ole ihe omume ga-adaba na nchikota nke ugboro abụọ a.
    • Tinye ọnụ ọhụrụ na kwụ n'ahịrị mkpa.
  3. Naanị oghere fọdụrụ ga-abụ mgbọrọgwụ, nke a ga-emecha wuchaa osisi ahụ.

Were ya na anyị nwere ederede nke nwere naanị mkpụrụedemede "a", "b", "c", "d" и "na", na ihe na-eme ugboro ugboro bụ 15, 7, 6, 6, na 5, n'otu n'otu. N'okpuru bụ ihe atụ na-egosipụta usoro nke algọridim.

Huffman mkpakọ algọridim

Huffman mkpakọ algọridim

Huffman mkpakọ algọridim

Huffman mkpakọ algọridim

Huffman mkpakọ algọridim

Ụzọ sitere na mgbọrọgwụ gaa na ọnụ njedebe ọ bụla ga-echekwa koodu prefix kacha mma (nke a makwaara dị ka koodu Huffman) kwekọrọ na agwa jikọtara ya na ọnụ njedebe ahụ.

Huffman mkpakọ algọridim
Osisi Huffman

N'okpuru, ị ga-ahụ mmejuputa nke Huffman compression algorithm na C++ na Java:

#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;

// A Tree node
struct Node
{
	char ch;
	int freq;
	Node *left, *right;
};

// Function to allocate a new tree node
Node* getNode(char ch, int freq, Node* left, Node* right)
{
	Node* node = new Node();

	node->ch = ch;
	node->freq = freq;
	node->left = left;
	node->right = right;

	return node;
}

// Comparison object to be used to order the heap
struct comp
{
	bool operator()(Node* l, Node* r)
	{
		// highest priority item has lowest frequency
		return l->freq > r->freq;
	}
};

// traverse the Huffman Tree and store Huffman Codes
// in a map.
void encode(Node* root, string str,
			unordered_map<char, string> &huffmanCode)
{
	if (root == nullptr)
		return;

	// found a leaf node
	if (!root->left && !root->right) {
		huffmanCode[root->ch] = str;
	}

	encode(root->left, str + "0", huffmanCode);
	encode(root->right, str + "1", huffmanCode);
}

// traverse the Huffman Tree and decode the encoded string
void decode(Node* root, int &index, string str)
{
	if (root == nullptr) {
		return;
	}

	// found a leaf node
	if (!root->left && !root->right)
	{
		cout << root->ch;
		return;
	}

	index++;

	if (str[index] =='0')
		decode(root->left, index, str);
	else
		decode(root->right, index, str);
}

// Builds Huffman Tree and decode given input text
void buildHuffmanTree(string text)
{
	// count frequency of appearance of each character
	// and store it in a map
	unordered_map<char, int> freq;
	for (char ch: text) {
		freq[ch]++;
	}

	// Create a priority queue to store live nodes of
	// Huffman tree;
	priority_queue<Node*, vector<Node*>, comp> pq;

	// Create a leaf node for each character and add it
	// to the priority queue.
	for (auto pair: freq) {
		pq.push(getNode(pair.first, pair.second, nullptr, nullptr));
	}

	// do till there is more than one node in the queue
	while (pq.size() != 1)
	{
		// Remove the two nodes of highest priority
		// (lowest frequency) from the queue
		Node *left = pq.top(); pq.pop();
		Node *right = pq.top();	pq.pop();

		// Create a new internal node with these two nodes
		// as children and with frequency equal to the sum
		// of the two nodes' frequencies. Add the new node
		// to the priority queue.
		int sum = left->freq + right->freq;
		pq.push(getNode('', sum, left, right));
	}

	// root stores pointer to root of Huffman Tree
	Node* root = pq.top();

	// traverse the Huffman Tree and store Huffman Codes
	// in a map. Also prints them
	unordered_map<char, string> huffmanCode;
	encode(root, "", huffmanCode);

	cout << "Huffman Codes are :n" << 'n';
	for (auto pair: huffmanCode) {
		cout << pair.first << " " << pair.second << 'n';
	}

	cout << "nOriginal string was :n" << text << 'n';

	// print encoded string
	string str = "";
	for (char ch: text) {
		str += huffmanCode[ch];
	}

	cout << "nEncoded string is :n" << str << 'n';

	// traverse the Huffman Tree again and this time
	// decode the encoded string
	int index = -1;
	cout << "nDecoded string is: n";
	while (index < (int)str.size() - 2) {
		decode(root, index, str);
	}
}

// Huffman coding algorithm
int main()
{
	string text = "Huffman coding is a data compression algorithm.";

	buildHuffmanTree(text);

	return 0;
}

import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

// A Tree node
class Node
{
	char ch;
	int freq;
	Node left = null, right = null;

	Node(char ch, int freq)
	{
		this.ch = ch;
		this.freq = freq;
	}

	public Node(char ch, int freq, Node left, Node right) {
		this.ch = ch;
		this.freq = freq;
		this.left = left;
		this.right = right;
	}
};

class Huffman
{
	// traverse the Huffman Tree and store Huffman Codes
	// in a map.
	public static void encode(Node root, String str,
							  Map<Character, String> huffmanCode)
	{
		if (root == null)
			return;

		// found a leaf node
		if (root.left == null && root.right == null) {
			huffmanCode.put(root.ch, str);
		}


		encode(root.left, str + "0", huffmanCode);
		encode(root.right, str + "1", huffmanCode);
	}

	// traverse the Huffman Tree and decode the encoded string
	public static int decode(Node root, int index, StringBuilder sb)
	{
		if (root == null)
			return index;

		// found a leaf node
		if (root.left == null && root.right == null)
		{
			System.out.print(root.ch);
			return index;
		}

		index++;

		if (sb.charAt(index) == '0')
			index = decode(root.left, index, sb);
		else
			index = decode(root.right, index, sb);

		return index;
	}

	// Builds Huffman Tree and huffmanCode and decode given input text
	public static void buildHuffmanTree(String text)
	{
		// count frequency of appearance of each character
		// and store it in a map
		Map<Character, Integer> freq = new HashMap<>();
		for (int i = 0 ; i < text.length(); i++) {
			if (!freq.containsKey(text.charAt(i))) {
				freq.put(text.charAt(i), 0);
			}
			freq.put(text.charAt(i), freq.get(text.charAt(i)) + 1);
		}

		// Create a priority queue to store live nodes of Huffman tree
		// Notice that highest priority item has lowest frequency
		PriorityQueue<Node> pq = new PriorityQueue<>(
										(l, r) -> l.freq - r.freq);

		// Create a leaf node for each character and add it
		// to the priority queue.
		for (Map.Entry<Character, Integer> entry : freq.entrySet()) {
			pq.add(new Node(entry.getKey(), entry.getValue()));
		}

		// do till there is more than one node in the queue
		while (pq.size() != 1)
		{
			// Remove the two nodes of highest priority
			// (lowest frequency) from the queue
			Node left = pq.poll();
			Node right = pq.poll();

			// Create a new internal node with these two nodes as children 
			// and with frequency equal to the sum of the two nodes
			// frequencies. Add the new node to the priority queue.
			int sum = left.freq + right.freq;
			pq.add(new Node('', sum, left, right));
		}

		// root stores pointer to root of Huffman Tree
		Node root = pq.peek();

		// traverse the Huffman tree and store the Huffman codes in a map
		Map<Character, String> huffmanCode = new HashMap<>();
		encode(root, "", huffmanCode);

		// print the Huffman codes
		System.out.println("Huffman Codes are :n");
		for (Map.Entry<Character, String> entry : huffmanCode.entrySet()) {
			System.out.println(entry.getKey() + " " + entry.getValue());
		}

		System.out.println("nOriginal string was :n" + text);

		// print encoded string
		StringBuilder sb = new StringBuilder();
		for (int i = 0 ; i < text.length(); i++) {
			sb.append(huffmanCode.get(text.charAt(i)));
		}

		System.out.println("nEncoded string is :n" + sb);

		// traverse the Huffman Tree again and this time
		// decode the encoded string
		int index = -1;
		System.out.println("nDecoded string is: n");
		while (index < sb.length() - 2) {
			index = decode(root, index, sb);
		}
	}

	public static void main(String[] args)
	{
		String text = "Huffman coding is a data compression algorithm.";

		buildHuffmanTree(text);
	}
}

Cheta na: ebe nchekwa nke eriri ntinye na-eji bụ 47 * 8 = 376 bits na eriri etinyere bụ naanị 194 bits i.e. A na-ejikọta data site na ihe dịka 48%. N'ime mmemme C++ dị n'elu, anyị na-eji klas eriri iji chekwaa eriri agbakwunyere ka enwere ike ịgụ ihe mmemme.

N'ihi na arụrụ ọrụ data kwụ n'ahịrị dị mkpa chọrọ ntinye ọ bụla O(log(N)) oge, ma na a zuru ezu osisi ọnụọgụ abụọ na N akwụkwọ dị ugbu a 2N-1 ọnụ, na osisi Huffman bụ osisi ọnụọgụ abụọ zuru oke, mgbe ahụ algọridim na-abanye O(Nlog(N)) oge, ebe N - agwa.

Isi mmalite:

en.wikipedia.org/wiki/Huffman_coding
en.wikipedia.org/wiki/Variable-length_code
www.youtube.com/watch?v=5wRPin4oxCo

Mụtakwuo maka nkuzi ahụ.

isi: www.habr.com

Tinye a comment