Algorithm teannachadh Huffman

Mus tòisich an cùrsa "Algorithms airson luchd-leasachaidh" ullachadh dhut eadar-theangachadh de stuth feumail eile.

Is e algorithm teannachaidh dàta a th’ ann an còdadh Huffman a chruthaicheas am beachd bunaiteach air teannachadh faidhle. San artaigil seo, bruidhnidh sinn mu chòdachadh faid stèidhichte agus caochlaideach, còdan gun samhail a ghabhas còdachadh, riaghailtean ro-leasachan, agus togail craobh Huffman.

Tha fios againn gu bheil gach caractar air a stòradh ann an sreath de 0's agus 1's agus a' toirt suas 8 pìosan. Canar còdachadh faid stèidhichte ris an seo oir tha gach caractar a’ cleachdadh an aon àireamh shuidhichte de phìosan airson a stòradh.

Canaidh sinn gun deach teacsa a thoirt dhuinn. Ciamar a lùghdaicheas sinn na tha de rùm a dhìth airson aon charactar a stòradh?

Is e am prìomh bheachd còdachadh fad caochlaideach. Faodaidh sinn a chleachdadh gu bheil cuid de charactaran san teacsa a’ nochdadh nas trice na cuid eile (faic an seo) algairim a leasachadh a riochdaicheas an aon sreath de charactaran ann an nas lugha de phìosan. Ann an còdachadh faid caochlaideach, bidh sinn a’ sònrachadh àireamh caochlaideach de phìosan do charactaran, a rèir dè cho tric sa nochdas iad ann an teacsa sònraichte. Mu dheireadh, faodaidh cuid de charactaran cho beag ri 1 pìos a ghabhail, agus faodaidh cuid eile 2 phìos, 3 no barrachd a ghabhail. Is e an duilgheadas le còdachadh faid caochlaideach a-mhàin an dì-chòdachadh den t-sreath às deidh sin.

Ciamar, le eòlas air an t-sreath de phìosan, a dhì-chòdachadh gu soilleir?

Beachdaich air an loidhne "abadab". Tha 8 caractaran ann, agus nuair a thathar a’ còdachadh fad stèidhichte, bidh feum air 64 pìosan airson a stòradh. Thoir an aire gu bheil an tricead samhla "a", "b", "c" и "D" co-ionann ri 4, 2, 1, 1 fa leth. Feuchaidh sinn ri smaoineachadh "abadab" nas lugha de phìosan, a’ cleachdadh an fhìrinn sin "gu" a’ tachairt nas trice na "B"agus "B" a’ tachairt nas trice na "c" и "D". Feuch an tòisich sinn le bhith a 'còdadh "gu" le aon phìos co-ionann ri 0, "B" sònraichidh sinn còd dà-phìos 11, agus a’ cleachdadh trì pìosan 100 agus 011 còdaichidh sinn "c" и "D".

Mar thoradh air an sin, gheibh sinn:

a
0

b
11

c
100

d
011

Mar sin an loidhne "abadab" còdaichidh sinn mar 00110100011011 (0|0|11|0|100|011|0|11)a’ cleachdadh nan còdan gu h-àrd. Ach, bidh am prìomh dhuilgheadas ann an còdachadh. Nuair a dh'fheuchas sinn ris an t-sreath a dhì-chòdachadh 00110100011011, gheibh sinn toradh teagmhach, oir faodar a riochdachadh mar:

0|011|0|100|011|0|11    adacdab
0|0|11|0|100|0|11|011   aabacabd
0|011|0|100|0|11|0|11   adacabab 

...
agus mar sin air adhart.

Gus an mì-chinnt seo a sheachnadh, feumaidh sinn dèanamh cinnteach gu bheil ar còdachadh a’ sàsachadh a leithid de bhun-bheachd mar riaghailt ro-leasachan, a tha an uair sin a’ ciallachadh nach urrainnear na còdan a chòdachadh ach ann an aon dòigh air leth. Tha an riaghailt ro-leasachan a’ dèanamh cinnteach nach eil còd sam bith na ro-leasachan air fear eile. Le còd, tha sinn a 'ciallachadh na pìosan a thathar a' cleachdadh gus caractar sònraichte a riochdachadh. Anns an eisimpleir gu h-àrd 0 's e ro-leasachan 011, a tha a’ briseadh riaghailt an ro-leasachan. Mar sin, ma choileanas na còdan againn an riaghailt ro-leasachan, is urrainn dhuinn a dhì-chòdachadh gu h-annasach (agus a chaochladh).

Bheir sinn sùil a-rithist air an eisimpleir gu h-àrd. An turas seo sònraichidh sinn airson samhlaidhean "a", "b", "c" и "D" còdan a choinnicheas ri riaghailt an ro-leasachan.

a
0

b
10

c
110

d
111

Leis a 'chòdachadh seo, an sreang "abadab" thèid a chòdachadh mar 00100100011010 (0|0|10|0|100|011|0|10)S an Iar- Agus an seo 00100100011010 bidh sinn mar-thà comasach air dì-chòdachadh agus tilleadh chun t-sreang thùsail againn "abadab".

Còdachadh Huffman

A-nis gu bheil sinn air dèiligeadh ri còdachadh faid caochlaideach agus riaghailt an ro-leasachan, bruidhnidh sinn mu chòdachadh Huffman.

Tha an dòigh stèidhichte air cruthachadh chraobhan binary. Ann, faodaidh an nód a bhith deireannach no a-staigh. An toiseach, thathas a 'beachdachadh air a h-uile nodan mar dhuilleagan (crìochnachaidhean), a tha a' riochdachadh an samhla fhèin agus a chuideam (is e sin, cho tric 'sa tha e). Tha cuideam a’ charactar anns na nodan a-staigh agus a’ toirt iomradh air dà nod sliochd. Le aonta coitcheann, bit «0» riochdachadh a 'leantainn a' mheur chlì, agus «1» - air an làimh dheis. ann an làn chraoibh N duilleagan agus N-1 nodan a-staigh. Thathas a 'moladh nuair a thathar a' togail craobh Huffman, gun tèid samhlaidhean nach deach a chleachdadh a thilgeil air falbh gus na còdan fad as fheàrr fhaighinn.

Cleachdaidh sinn ciudha prìomhachais gus craobh Huffman a thogail, far am faighear a’ phrìomhachas as àirde don nód leis an tricead as ìsle. Tha na ceumannan togail air am mìneachadh gu h-ìosal:

  1. Cruthaich nód duille airson gach caractar agus cuir iad ris a’ chiudha prìomhachais.
  2. Ged a tha barrachd air aon duilleag anns a’ chiudha, dèan na leanas:
    • Thoir air falbh an dà nod leis a’ phrìomhachas as àirde (tricead as ìsle) bhon ciudha;
    • Cruthaich nód ùr a-staigh, far am bi an dà nodan sin nan clann, agus bidh tricead an tachartais co-ionann ri suim tricead an dà nodan sin.
    • Cuir nód ùr ris a’ chiudha prìomhachais.
  3. Is e an aon nód a tha air fhàgail am freumh, agus cuiridh seo crìoch air togail na craoibhe.

Smaoinich gu bheil teacsa againn anns nach eil ach caractaran "a", "b", "c", "d" и "agus", agus tha na triceadan tachartais aca 15, 7, 6, 6, agus 5, fa leth. Gu h-ìosal tha dealbhan a tha a’ nochdadh ceumannan an algairim.

Algorithm teannachadh Huffman

Algorithm teannachadh Huffman

Algorithm teannachadh Huffman

Algorithm teannachadh Huffman

Algorithm teannachadh Huffman

Bidh slighe bhon fhreumh gu nód crìochnachaidh sam bith a 'stòradh a' chòd ro-leasachan as fheàrr (ris an canar cuideachd còd Huffman) a fhreagras ris a 'charactar co-cheangailte ris an nód deiridh sin.

Algorithm teannachadh Huffman
craobh Huffman

Gu h-ìosal lorgaidh tu buileachadh an algairim teannachaidh Huffman ann an C ++ agus Java:

#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;

// A Tree node
struct Node
{
	char ch;
	int freq;
	Node *left, *right;
};

// Function to allocate a new tree node
Node* getNode(char ch, int freq, Node* left, Node* right)
{
	Node* node = new Node();

	node->ch = ch;
	node->freq = freq;
	node->left = left;
	node->right = right;

	return node;
}

// Comparison object to be used to order the heap
struct comp
{
	bool operator()(Node* l, Node* r)
	{
		// highest priority item has lowest frequency
		return l->freq > r->freq;
	}
};

// traverse the Huffman Tree and store Huffman Codes
// in a map.
void encode(Node* root, string str,
			unordered_map<char, string> &huffmanCode)
{
	if (root == nullptr)
		return;

	// found a leaf node
	if (!root->left && !root->right) {
		huffmanCode[root->ch] = str;
	}

	encode(root->left, str + "0", huffmanCode);
	encode(root->right, str + "1", huffmanCode);
}

// traverse the Huffman Tree and decode the encoded string
void decode(Node* root, int &index, string str)
{
	if (root == nullptr) {
		return;
	}

	// found a leaf node
	if (!root->left && !root->right)
	{
		cout << root->ch;
		return;
	}

	index++;

	if (str[index] =='0')
		decode(root->left, index, str);
	else
		decode(root->right, index, str);
}

// Builds Huffman Tree and decode given input text
void buildHuffmanTree(string text)
{
	// count frequency of appearance of each character
	// and store it in a map
	unordered_map<char, int> freq;
	for (char ch: text) {
		freq[ch]++;
	}

	// Create a priority queue to store live nodes of
	// Huffman tree;
	priority_queue<Node*, vector<Node*>, comp> pq;

	// Create a leaf node for each character and add it
	// to the priority queue.
	for (auto pair: freq) {
		pq.push(getNode(pair.first, pair.second, nullptr, nullptr));
	}

	// do till there is more than one node in the queue
	while (pq.size() != 1)
	{
		// Remove the two nodes of highest priority
		// (lowest frequency) from the queue
		Node *left = pq.top(); pq.pop();
		Node *right = pq.top();	pq.pop();

		// Create a new internal node with these two nodes
		// as children and with frequency equal to the sum
		// of the two nodes' frequencies. Add the new node
		// to the priority queue.
		int sum = left->freq + right->freq;
		pq.push(getNode('', sum, left, right));
	}

	// root stores pointer to root of Huffman Tree
	Node* root = pq.top();

	// traverse the Huffman Tree and store Huffman Codes
	// in a map. Also prints them
	unordered_map<char, string> huffmanCode;
	encode(root, "", huffmanCode);

	cout << "Huffman Codes are :n" << 'n';
	for (auto pair: huffmanCode) {
		cout << pair.first << " " << pair.second << 'n';
	}

	cout << "nOriginal string was :n" << text << 'n';

	// print encoded string
	string str = "";
	for (char ch: text) {
		str += huffmanCode[ch];
	}

	cout << "nEncoded string is :n" << str << 'n';

	// traverse the Huffman Tree again and this time
	// decode the encoded string
	int index = -1;
	cout << "nDecoded string is: n";
	while (index < (int)str.size() - 2) {
		decode(root, index, str);
	}
}

// Huffman coding algorithm
int main()
{
	string text = "Huffman coding is a data compression algorithm.";

	buildHuffmanTree(text);

	return 0;
}

import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

// A Tree node
class Node
{
	char ch;
	int freq;
	Node left = null, right = null;

	Node(char ch, int freq)
	{
		this.ch = ch;
		this.freq = freq;
	}

	public Node(char ch, int freq, Node left, Node right) {
		this.ch = ch;
		this.freq = freq;
		this.left = left;
		this.right = right;
	}
};

class Huffman
{
	// traverse the Huffman Tree and store Huffman Codes
	// in a map.
	public static void encode(Node root, String str,
							  Map<Character, String> huffmanCode)
	{
		if (root == null)
			return;

		// found a leaf node
		if (root.left == null && root.right == null) {
			huffmanCode.put(root.ch, str);
		}


		encode(root.left, str + "0", huffmanCode);
		encode(root.right, str + "1", huffmanCode);
	}

	// traverse the Huffman Tree and decode the encoded string
	public static int decode(Node root, int index, StringBuilder sb)
	{
		if (root == null)
			return index;

		// found a leaf node
		if (root.left == null && root.right == null)
		{
			System.out.print(root.ch);
			return index;
		}

		index++;

		if (sb.charAt(index) == '0')
			index = decode(root.left, index, sb);
		else
			index = decode(root.right, index, sb);

		return index;
	}

	// Builds Huffman Tree and huffmanCode and decode given input text
	public static void buildHuffmanTree(String text)
	{
		// count frequency of appearance of each character
		// and store it in a map
		Map<Character, Integer> freq = new HashMap<>();
		for (int i = 0 ; i < text.length(); i++) {
			if (!freq.containsKey(text.charAt(i))) {
				freq.put(text.charAt(i), 0);
			}
			freq.put(text.charAt(i), freq.get(text.charAt(i)) + 1);
		}

		// Create a priority queue to store live nodes of Huffman tree
		// Notice that highest priority item has lowest frequency
		PriorityQueue<Node> pq = new PriorityQueue<>(
										(l, r) -> l.freq - r.freq);

		// Create a leaf node for each character and add it
		// to the priority queue.
		for (Map.Entry<Character, Integer> entry : freq.entrySet()) {
			pq.add(new Node(entry.getKey(), entry.getValue()));
		}

		// do till there is more than one node in the queue
		while (pq.size() != 1)
		{
			// Remove the two nodes of highest priority
			// (lowest frequency) from the queue
			Node left = pq.poll();
			Node right = pq.poll();

			// Create a new internal node with these two nodes as children 
			// and with frequency equal to the sum of the two nodes
			// frequencies. Add the new node to the priority queue.
			int sum = left.freq + right.freq;
			pq.add(new Node('', sum, left, right));
		}

		// root stores pointer to root of Huffman Tree
		Node root = pq.peek();

		// traverse the Huffman tree and store the Huffman codes in a map
		Map<Character, String> huffmanCode = new HashMap<>();
		encode(root, "", huffmanCode);

		// print the Huffman codes
		System.out.println("Huffman Codes are :n");
		for (Map.Entry<Character, String> entry : huffmanCode.entrySet()) {
			System.out.println(entry.getKey() + " " + entry.getValue());
		}

		System.out.println("nOriginal string was :n" + text);

		// print encoded string
		StringBuilder sb = new StringBuilder();
		for (int i = 0 ; i < text.length(); i++) {
			sb.append(huffmanCode.get(text.charAt(i)));
		}

		System.out.println("nEncoded string is :n" + sb);

		// traverse the Huffman Tree again and this time
		// decode the encoded string
		int index = -1;
		System.out.println("nDecoded string is: n");
		while (index < sb.length() - 2) {
			index = decode(root, index, sb);
		}
	}

	public static void main(String[] args)
	{
		String text = "Huffman coding is a data compression algorithm.";

		buildHuffmanTree(text);
	}
}

Note: is e 47 * 8 = 376 buillean a’ chuimhne a chleachdas an t-sreang cuir a-steach agus chan eil anns an t-sreang còdaichte ach 194 buillean i.e. tha dàta air a dhlùthadh le timcheall air 48%. Anns a 'phrògram C ++ gu h-àrd, bidh sinn a' cleachdadh a 'chlas sreang gus an sreang còdaichte a stòradh gus am bi am prògram furasta a leughadh.

Leis gu bheil feum air structaran dàta ciudha prìomhachais èifeachdach airson gach cuir a-steach O(log(n)) ùine, ach ann an craobh dhùcha iomlan le N duilleagan an làthair 2N-1 nodan, agus tha craobh Huffman na chraobh binary iomlan, agus an uairsin bidh an algairim a’ ruith a-steach O(Nlog(N)) uair, càite N - Caractaran.

Stòran:

en.wikipedia.org/wiki/Huffman_coding
en.wikipedia.org/wiki/Variable-length_code
www.youtube.com/watch?v=5wRPin4oxCo

Ionnsaich tuilleadh mun chùrsa.

Source: www.habr.com

Cuir beachd ann