Huffman cadaadis algorithm

Kahor bilowga koorsada "Algorithms for Developers" waxaa laguu diyaariyay tarjumaad walxo kale oo faa'iido leh.

Codaynta Huffman waa algorithm isku-darka xogta kaas oo qaabeeya fikradda aasaasiga ah ee isku-buufinta faylka. Maqaalkan, waxaanu kaga hadli doonaa codaynta dhererka go'an iyo doorsooma, koodh si gaar ah loo kala furfuri karo, xeerarka horgalayaasha, iyo dhisida geedka Huffman.

Waxaan ognahay in xaraf kasta uu u kaydsan yahay sida isku xigxiga ee 0's iyo 1's oo uu qaato 8 bits. Tan waxa loo yaqaan codaynta dhererka go'an sababtoo ah xaraf kastaa wuxuu isticmaalaa tiro go'an oo isku mid ah si uu u kaydiyo.

Aynu nidhaahno qoraal baa nala siiyay. Sideen u yarayn karnaa inta boos ee loo baahan yahay si loo kaydiyo hal xaraf?

Fikradda ugu muhiimsan waa codaynta dhererka doorsooma. Waxaan u adeegsan karnaa xaqiiqda ah in jilayaasha qoraalka qaarkood ay ka dhacaan marar badan kuwa kale (halkan ka daawo) in la sameeyo algorithm kaaso matali doona isku xigxiga jilayaasha ee xoogaa yar. Codaynta dhererka doorsooma, waxaanu ku meelaynnaa jilayaasha tiro doorsooma oo jajab ah, iyadoo ku xidhan inta jeer ee ay ka soo baxaan qoraalka la bixiyay. Ugu dambeyntii, jilayaasha qaar ayaa laga yaabaa inay qaataan 1 bit, halka kuwa kalena ay qaadan karaan 2 bits, 3 ama ka badan. Dhibaatada ku jirta codaynta dhererka doorsooma waa kaliya dejinta koodka xiga ee taxanaha.

Sidee, iyadoo la og yahay isku xigxiga ee bits, u go'aamin si aan madmadow lahayn?

Tixgeli khadka "abacdab". Waxay leedahay 8 xaraf, iyo marka codaynta dherer go'an, waxay u baahan doontaa 64 bits si ay u kaydiso. Ogsoonow inta jeer ee calaamaduhu "a", "b", "c" ΠΈ "D" waxay la mid tahay 4, 2, 1, 1 siday u kala horreeyaan. Aan isku dayno inaan qiyaasno "abacdab" xoogaa yar, adoo isticmaalaya xaqiiqda "ilaa" dhacdaa in ka badan si ka badan "B"iyo "B" dhacdaa in ka badan si ka badan "c" ΠΈ "D". Aan ku bilowno codaynta "ilaa" oo leh hal xoogaa la mid ah 0, "B" waxaanu ku meelayn doonaa kood laba-bit ah 11, anagoo adeegsanayna saddex xabbo oo 100 iyo 011 ah ayaanu codayn doonaa "c" ΠΈ "D".

Natiijo ahaan, waxaan heli doonaa:

a
0

b
11

c
100

d
011

Markaa khadka "abacdab" waxaan u codeeyn doonaa sida 00110100011011 (0|0|11|0|100|011|0|11)adigoo isticmaalaya koodhka kore. Si kastaba ha ahaatee, dhibka ugu weyni wuxuu ahaan doonaa dejinta code. Marka aan isku dayno inaan furno xargaha 00110100011011, waxaan helnaa natiijo madmadow, mar haddii loo matali karo sida:

0|011|0|100|011|0|11    adacdab
0|0|11|0|100|0|11|011   aabacabd
0|011|0|100|0|11|0|11   adacabab 

...
iyo wixii la mid ah.

Si aan uga fogaano madmadowgan, waa in aan hubinno in cod-bixintayadu ay qanciso fikradda sida xeerka horgalayaasha, taas oo iyana muujinaysa in koodka lagu kala saari karo hal hab oo keliya. Xeerka horgalayaasha ayaa hubinaya in kood aanu horgale u ahayn mid kale. Xeer ahaan, waxaan ula jeednaa qaniinyada loo isticmaalo in lagu matalo dabeecad gaar ah. Tusaalaha kore 0 waa horgale 011, kaas oo jabinaya xeerka horgalayaasha. Markaa, haddii koodkayaga qanciyo qaanuunka horgalayaasha, markaa si gaar ah ayaan u dejin karnaa (iyo lid ku ah).

Aan dib u eegno tusaalaha sare. Markan waxaan ku meelayn doonaa calaamadaha "a", "b", "c" ΠΈ "D" codes qanciya xeerka horgalayaasha.

a
0

b
10

c
110

d
111

Iyada oo tan codaynta, xadhigga "abacdab" waxaa loo codayn doonaa sida 00100100011010 (0|0|10|0|100|011|0|10). Iyo halkan 00100100011010 Waxaan mar hore awood u yeelan doonnaa inaan si aan mugdi ku jirin u furno oo ku soo celinno xardhiggayagii asalka ahaa "abacdab".

Huffman codaynta

Hadda oo aanu ka hadalnay codaynta dhererka doorsooma iyo xeerka horgalayaasha, aynu ka hadalno codaynta Huffman.

Habka wuxuu ku salaysan yahay abuurista geedaha binary. Dhexdeeda, noodhku wuxuu noqon karaa mid kama dambays ah ama gudaha ah. Markii hore, dhammaan qanjidhada waxaa loo tixgeliyaa caleemaha (terminals), kuwaas oo matalaya calaamadda lafteeda iyo miisaankeeda (taas oo ah, inta jeer ee dhacdada). Dhuumaha gudaha waxa ay ka kooban yihiin culayska jilaha oo tixraac laba nood oo farcan ah. Heshiis guud, xoogaa "0" waxay ka dhigan tahay raacaya laanta bidix, iyo "1" - midigta. geed buuxa N caleemaha iyo N-1 qanjidhada gudaha. Waxaa lagu talinayaa in marka la dhisayo geed Huffman, calaamadaha aan la isticmaalin la tuuro si loo helo kood dhererka ugu fiican.

Waxaan isticmaali doonaa safka mudnaanta leh si aan u dhisno geed Huffman, halkaas oo noodhka leh inta jeer ee ugu hooseeya la siin doono mudnaanta ugu sareysa. Tillaabooyinka dhismaha ayaa lagu sharaxay hoos:

  1. U samee noodhka caleen ee dabeecad kasta oo ku dar safka mudnaanta leh.
  2. Marka ay jiraan wax ka badan hal xaashi safka ku jira, samee waxyaabaha soo socda:
    • Ka saar labada nood ee mudnaanta ugu sarreeya (inta jeer ee ugu hooseeya) ka soo baxa safka;
    • Samee qanjidh cusub oo gudaha ah, halkaas oo labadan nood ay noqon doonaan carruur, inta jeer ee dhacdadu waxay la mid tahay wadarta soo noqnoqoshada labadan nood.
    • Ku dar nood cusub safka mudnaanta
  3. Meesha kaliya ee soo hadhay waxay noqon doontaa xididka, tani waxay dhamaystiri doontaa dhismaha geedka.

Bal qiyaas in aan haysano qoraal ka kooban oo kaliya jilayaal "a", "b", "c", "d" ΠΈ "iyo", iyo inta jeer ee dhacdadoodu waa 15, 7, 6, 6, iyo 5, siday u kala horreeyaan. Hoos waxaa ku yaal sawirro ka tarjumaya tallaabooyinka algorithm.

Huffman cadaadis algorithm

Huffman cadaadis algorithm

Huffman cadaadis algorithm

Huffman cadaadis algorithm

Huffman cadaadis algorithm

Jidka xididka ilaa qandhicir kasta ayaa kaydin doona koodka horgalaha ugu fiican (sidoo kale loo yaqaan koodka Huffman) ee u dhigma jilaha la xidhiidha noodhka dhamaadka.

Huffman cadaadis algorithm
Geed Huffman

Hoos waxaad ka heli doontaa hirgelinta Huffman cufnaanta algorithm ee C++ iyo Java:

#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;

// A Tree node
struct Node
{
	char ch;
	int freq;
	Node *left, *right;
};

// Function to allocate a new tree node
Node* getNode(char ch, int freq, Node* left, Node* right)
{
	Node* node = new Node();

	node->ch = ch;
	node->freq = freq;
	node->left = left;
	node->right = right;

	return node;
}

// Comparison object to be used to order the heap
struct comp
{
	bool operator()(Node* l, Node* r)
	{
		// highest priority item has lowest frequency
		return l->freq > r->freq;
	}
};

// traverse the Huffman Tree and store Huffman Codes
// in a map.
void encode(Node* root, string str,
			unordered_map<char, string> &huffmanCode)
{
	if (root == nullptr)
		return;

	// found a leaf node
	if (!root->left && !root->right) {
		huffmanCode[root->ch] = str;
	}

	encode(root->left, str + "0", huffmanCode);
	encode(root->right, str + "1", huffmanCode);
}

// traverse the Huffman Tree and decode the encoded string
void decode(Node* root, int &index, string str)
{
	if (root == nullptr) {
		return;
	}

	// found a leaf node
	if (!root->left && !root->right)
	{
		cout << root->ch;
		return;
	}

	index++;

	if (str[index] =='0')
		decode(root->left, index, str);
	else
		decode(root->right, index, str);
}

// Builds Huffman Tree and decode given input text
void buildHuffmanTree(string text)
{
	// count frequency of appearance of each character
	// and store it in a map
	unordered_map<char, int> freq;
	for (char ch: text) {
		freq[ch]++;
	}

	// Create a priority queue to store live nodes of
	// Huffman tree;
	priority_queue<Node*, vector<Node*>, comp> pq;

	// Create a leaf node for each character and add it
	// to the priority queue.
	for (auto pair: freq) {
		pq.push(getNode(pair.first, pair.second, nullptr, nullptr));
	}

	// do till there is more than one node in the queue
	while (pq.size() != 1)
	{
		// Remove the two nodes of highest priority
		// (lowest frequency) from the queue
		Node *left = pq.top(); pq.pop();
		Node *right = pq.top();	pq.pop();

		// Create a new internal node with these two nodes
		// as children and with frequency equal to the sum
		// of the two nodes' frequencies. Add the new node
		// to the priority queue.
		int sum = left->freq + right->freq;
		pq.push(getNode('', sum, left, right));
	}

	// root stores pointer to root of Huffman Tree
	Node* root = pq.top();

	// traverse the Huffman Tree and store Huffman Codes
	// in a map. Also prints them
	unordered_map<char, string> huffmanCode;
	encode(root, "", huffmanCode);

	cout << "Huffman Codes are :n" << 'n';
	for (auto pair: huffmanCode) {
		cout << pair.first << " " << pair.second << 'n';
	}

	cout << "nOriginal string was :n" << text << 'n';

	// print encoded string
	string str = "";
	for (char ch: text) {
		str += huffmanCode[ch];
	}

	cout << "nEncoded string is :n" << str << 'n';

	// traverse the Huffman Tree again and this time
	// decode the encoded string
	int index = -1;
	cout << "nDecoded string is: n";
	while (index < (int)str.size() - 2) {
		decode(root, index, str);
	}
}

// Huffman coding algorithm
int main()
{
	string text = "Huffman coding is a data compression algorithm.";

	buildHuffmanTree(text);

	return 0;
}

import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

// A Tree node
class Node
{
	char ch;
	int freq;
	Node left = null, right = null;

	Node(char ch, int freq)
	{
		this.ch = ch;
		this.freq = freq;
	}

	public Node(char ch, int freq, Node left, Node right) {
		this.ch = ch;
		this.freq = freq;
		this.left = left;
		this.right = right;
	}
};

class Huffman
{
	// traverse the Huffman Tree and store Huffman Codes
	// in a map.
	public static void encode(Node root, String str,
							  Map<Character, String> huffmanCode)
	{
		if (root == null)
			return;

		// found a leaf node
		if (root.left == null && root.right == null) {
			huffmanCode.put(root.ch, str);
		}


		encode(root.left, str + "0", huffmanCode);
		encode(root.right, str + "1", huffmanCode);
	}

	// traverse the Huffman Tree and decode the encoded string
	public static int decode(Node root, int index, StringBuilder sb)
	{
		if (root == null)
			return index;

		// found a leaf node
		if (root.left == null && root.right == null)
		{
			System.out.print(root.ch);
			return index;
		}

		index++;

		if (sb.charAt(index) == '0')
			index = decode(root.left, index, sb);
		else
			index = decode(root.right, index, sb);

		return index;
	}

	// Builds Huffman Tree and huffmanCode and decode given input text
	public static void buildHuffmanTree(String text)
	{
		// count frequency of appearance of each character
		// and store it in a map
		Map<Character, Integer> freq = new HashMap<>();
		for (int i = 0 ; i < text.length(); i++) {
			if (!freq.containsKey(text.charAt(i))) {
				freq.put(text.charAt(i), 0);
			}
			freq.put(text.charAt(i), freq.get(text.charAt(i)) + 1);
		}

		// Create a priority queue to store live nodes of Huffman tree
		// Notice that highest priority item has lowest frequency
		PriorityQueue<Node> pq = new PriorityQueue<>(
										(l, r) -> l.freq - r.freq);

		// Create a leaf node for each character and add it
		// to the priority queue.
		for (Map.Entry<Character, Integer> entry : freq.entrySet()) {
			pq.add(new Node(entry.getKey(), entry.getValue()));
		}

		// do till there is more than one node in the queue
		while (pq.size() != 1)
		{
			// Remove the two nodes of highest priority
			// (lowest frequency) from the queue
			Node left = pq.poll();
			Node right = pq.poll();

			// Create a new internal node with these two nodes as children 
			// and with frequency equal to the sum of the two nodes
			// frequencies. Add the new node to the priority queue.
			int sum = left.freq + right.freq;
			pq.add(new Node('', sum, left, right));
		}

		// root stores pointer to root of Huffman Tree
		Node root = pq.peek();

		// traverse the Huffman tree and store the Huffman codes in a map
		Map<Character, String> huffmanCode = new HashMap<>();
		encode(root, "", huffmanCode);

		// print the Huffman codes
		System.out.println("Huffman Codes are :n");
		for (Map.Entry<Character, String> entry : huffmanCode.entrySet()) {
			System.out.println(entry.getKey() + " " + entry.getValue());
		}

		System.out.println("nOriginal string was :n" + text);

		// print encoded string
		StringBuilder sb = new StringBuilder();
		for (int i = 0 ; i < text.length(); i++) {
			sb.append(huffmanCode.get(text.charAt(i)));
		}

		System.out.println("nEncoded string is :n" + sb);

		// traverse the Huffman Tree again and this time
		// decode the encoded string
		int index = -1;
		System.out.println("nDecoded string is: n");
		while (index < sb.length() - 2) {
			index = decode(root, index, sb);
		}
	}

	public static void main(String[] args)
	{
		String text = "Huffman coding is a data compression algorithm.";

		buildHuffmanTree(text);
	}
}

Fiiro gaar ah: Xusuusta uu isticmaalo xargaha wax gelinta waa 47 * 8 = 376 bits halka xarkaha xardhankuna uu yahay 194 bits oo keliya. Xogta waxaa lagu soo koobay ilaa 48%. Barnaamijka C++ ee kore, waxaan u isticmaalnaa fasalka xargaha si aan u keydino xargaha xardhan si aan barnaamijka uga dhigno mid la akhriyi karo.

Sababtoo ah qaabdhismeedka xogta safka hufan ee mudnaanta leh ayaa u baahan gelinta kasta O(log(N)) waqti, laakiin in geed binary dhamaystiran leh N caleemaha joogo 2N-1 nodes, iyo geedka Huffman waa geed binary dhamaystiran, ka dibna algorithm socda gudaha O(Nlog(N)) waqti, halkee N - Jilayaasha.

Ilaha:

en.wikipedia.org/wiki/Huffman_coding
en.wikipedia.org/wiki/Variable-length_code
www.youtube.com/watch?v=5wRPin4oxCo

Wax badan ka baro koorsada.

Source: www.habr.com

Add a comment