Huffman funmorawon alugoridimu

Ṣaaju ibẹrẹ ti ẹkọ naa "Alugoridimu fun Awọn Difelopa" A ti pese itumọ kan ti ohun elo miiran ti o wulo fun ọ.

Ifaminsi Huffman jẹ algorithm funmorawon data ti o ṣe agbekalẹ imọran ipilẹ ti funmorawon faili. Ninu àpilẹkọ yii a yoo sọrọ nipa fifi koodu gigun ti o wa titi ati oniyipada, awọn koodu iyasọtọ ti o yatọ, awọn ofin iṣaaju, ati ikole igi Huffman.

A mọ pe kọọkan ohun kikọ ti wa ni ipamọ bi a ọkọọkan ti 0s ati 1s ati ki o wa lagbedemeji 8 die-die. Eyi ni a npe ni fifi koodu-ipari to wa titi nitori pe ohun kikọ kọọkan nlo nọmba ti o wa titi kanna ti awọn die-die ipamọ.

Jẹ ká sọ awọn ọrọ ti wa ni fun. Bawo ni a ṣe le dinku iye aaye ti o nilo lati tọju ohun kikọ kan?

Ero ipilẹ jẹ iyipada gigun iyipada. A le lo otitọ pe diẹ ninu awọn ohun kikọ han ni igbagbogbo ni ọrọ ju awọn miiran lọ (wo nibi) lati ṣe agbekalẹ algoridimu kan ti yoo ṣe aṣoju ọna-kọọkan ti awọn ohun kikọ ni awọn iwọn diẹ. Ni fifi koodu gigun oniyipada, a fi nọmba oniyipada ti awọn die-die si awọn ohun kikọ ti o da lori igbohunsafẹfẹ ti iṣẹlẹ wọn ninu ọrọ ti a fifun. Ni ipari, diẹ ninu awọn ohun kikọ le gba to 1 bit nikan, nigba ti awọn miiran le gba awọn die-die 2, 3 tabi diẹ sii. Iṣoro pẹlu fifi koodu gigun oniyipada jẹ iyipada ti o tẹle ti ọkọọkan.

Bawo, ni mimọ ọna ti awọn die-die, ṣe o le ṣe iyipada rẹ lainidi bi?

Gbé ìlà yẹ̀ wò "Abakab". O ni awọn ohun kikọ 8, ati pẹlu fifi koodu gigun ti o wa titi yoo nilo awọn bit 64 lati tọju rẹ. Akiyesi pe igbohunsafẹfẹ aami "a", "b", "c" и "D" dọgba 4, 2, 1, 1 lẹsẹsẹ. Jẹ ká gbiyanju lati fojuinu "Abakab" ni diẹ die-die, mu anfani ti o daju wipe "si" waye siwaju sii ju igba "B", ati "B" waye siwaju sii ju igba "c" и "D". A yoo bẹrẹ nipasẹ ifaminsi "si" lilo ọkan bit dogba si 0, "B" a yoo fi koodu-bit meji si 11, ati lilo awọn die-die mẹta 100 ati 011 a yoo fi koodu pamọ. "c" и "D".

Bi abajade, a yoo gba:

a
0

b
11

c
100

d
011

Bayi ni ila "Abakab" a yoo koodu bi 00110100011011 (0|0|11|0|100|011|0|11)lilo awọn koodu ti pese loke. Sibẹsibẹ, iṣoro akọkọ yoo wa ni iyipada. Nigba ti a ba gbiyanju lati pinnu okun 00110100011011, a yoo gba abajade ti ko ni idaniloju, niwon o le jẹ aṣoju bi:

0|011|0|100|011|0|11    adacdab
0|0|11|0|100|0|11|011   aabacabd
0|011|0|100|0|11|0|11   adacabab 

...
ati bẹbẹ lọ.

Lati yago fun aibikita yii, a gbọdọ rii daju pe fifi koodu wa ni itẹlọrun imọran gẹgẹbi ofin ìpele, eyiti o tumọ si pe awọn koodu le ṣe iyipada ni ọna alailẹgbẹ kan. Ofin ìpele ṣe idaniloju pe ko si koodu jẹ ami-ipele ti omiiran. Nipa koodu a tumọ si awọn ege ti a lo lati ṣe aṣoju ohun kikọ kan pato. Ni awọn loke apẹẹrẹ 0 jẹ ìpele 011, eyi ti o lodi si ofin iṣaaju. Nitorinaa, ti awọn koodu wa ba ni itẹlọrun ofin iṣaaju, lẹhinna a le ṣe iyipada laiseaniani (ati ni idakeji).

Jẹ ki a tun wo apẹẹrẹ loke. Ni akoko yii a yoo yan fun awọn aami "a", "b", "c" и "D" awọn koodu ti o ni itẹlọrun ofin ìpele.

a
0

b
10

c
110

d
111

Lilo fifi koodu yi, okun "Abakab" yoo wa ni kooduopo bi 00100100011010 (0|0|10|0|100|011|0|10)... Ati nibi 00100100011010 a le ni bayi laiseaniani yiyipada koodu ati pada si okun atilẹba wa "Abakab".

Ifaminsi Huffman

Ni bayi ti a loye iyipada gigun iyipada ati ofin iṣaaju, jẹ ki a sọrọ nipa fifi koodu Huffman.

Ọna naa da lori ṣiṣẹda awọn igi alakomeji. Ninu rẹ, ipade le jẹ boya ipari tabi ti inu. Ni ibẹrẹ, gbogbo awọn apa ni a gba awọn ewe (awọn opin), eyiti o jẹ aṣoju aami funrararẹ ati iwuwo rẹ (iyẹn ni, igbohunsafẹfẹ ti iṣẹlẹ). Awọn apa inu ni iwuwo ti ohun kikọ silẹ ati tọka si awọn apa arọpo meji. Nipa adehun gbogbogbo, bit «0» duro awọn wọnyi osi ti eka, ati «1» - lori ọtun. Ni kikun igi N ewe ati N-1 ti abẹnu apa. A ṣe iṣeduro pe nigba ti o ba n ṣe igi Huffman, awọn aami ti a ko lo jẹ asonu lati gba awọn koodu ti ipari to dara julọ.

A yoo lo isinyi pataki lati kọ igi Huffman kan, nibiti ipade pẹlu igbohunsafẹfẹ ti o kere julọ yoo fun ni pataki julọ. Awọn igbesẹ ikole ti wa ni apejuwe ni isalẹ:

  1. Ṣẹda oju oju ewe fun aami kọọkan ki o ṣafikun wọn si isinyi ayo.
  2. Lakoko ti o wa ju ọkan lọ ni isinyi, ṣe atẹle naa:
    • Yọ awọn ipele meji ti o ga julọ (igbohunsafẹfẹ ti o kere julọ) kuro lati isinyi;
    • Ṣẹda ipade inu inu tuntun, nibiti awọn apa meji wọnyi yoo jẹ awọn arọpo, ati igbohunsafẹfẹ ti iṣẹlẹ yoo dogba si apao awọn igbohunsafẹfẹ ti awọn apa meji wọnyi.
    • Ṣafikun ipade tuntun si isinyi ayo.
  3. Ipin ti o ku nikan ni yoo jẹ aaye gbongbo, ati pe eyi yoo pari kikọ igi naa.

Jẹ ki a fojuinu pe a ni diẹ ninu awọn ọrọ ti o ni awọn ohun kikọ nikan "a", "b", "c", "d" и "ati", ati awọn igbohunsafẹfẹ iṣẹlẹ wọn jẹ 15, 7, 6, 6 ati 5, lẹsẹsẹ. Ni isalẹ wa awọn apejuwe ti o ṣe afihan awọn igbesẹ ti algorithm.

Huffman funmorawon alugoridimu

Huffman funmorawon alugoridimu

Huffman funmorawon alugoridimu

Huffman funmorawon alugoridimu

Huffman funmorawon alugoridimu

Ọ̀nà láti gbòǹgbò sí ojú ewé èyíkéyìí yóò tọ́jú koodu ìpele tí ó dára jùlọ (tí a tún mọ̀ sí koodu Huffman) tí ó bá ìwà tí ó ní í ṣe pẹ̀lú ojú ewé yẹn.

Huffman funmorawon alugoridimu
Huffman igi

Ni isalẹ iwọ yoo rii imuse ti Huffman funmorawon algorithm ni C ++ ati Java:

#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;

// A Tree node
struct Node
{
	char ch;
	int freq;
	Node *left, *right;
};

// Function to allocate a new tree node
Node* getNode(char ch, int freq, Node* left, Node* right)
{
	Node* node = new Node();

	node->ch = ch;
	node->freq = freq;
	node->left = left;
	node->right = right;

	return node;
}

// Comparison object to be used to order the heap
struct comp
{
	bool operator()(Node* l, Node* r)
	{
		// highest priority item has lowest frequency
		return l->freq > r->freq;
	}
};

// traverse the Huffman Tree and store Huffman Codes
// in a map.
void encode(Node* root, string str,
			unordered_map<char, string> &huffmanCode)
{
	if (root == nullptr)
		return;

	// found a leaf node
	if (!root->left && !root->right) {
		huffmanCode[root->ch] = str;
	}

	encode(root->left, str + "0", huffmanCode);
	encode(root->right, str + "1", huffmanCode);
}

// traverse the Huffman Tree and decode the encoded string
void decode(Node* root, int &index, string str)
{
	if (root == nullptr) {
		return;
	}

	// found a leaf node
	if (!root->left && !root->right)
	{
		cout << root->ch;
		return;
	}

	index++;

	if (str[index] =='0')
		decode(root->left, index, str);
	else
		decode(root->right, index, str);
}

// Builds Huffman Tree and decode given input text
void buildHuffmanTree(string text)
{
	// count frequency of appearance of each character
	// and store it in a map
	unordered_map<char, int> freq;
	for (char ch: text) {
		freq[ch]++;
	}

	// Create a priority queue to store live nodes of
	// Huffman tree;
	priority_queue<Node*, vector<Node*>, comp> pq;

	// Create a leaf node for each character and add it
	// to the priority queue.
	for (auto pair: freq) {
		pq.push(getNode(pair.first, pair.second, nullptr, nullptr));
	}

	// do till there is more than one node in the queue
	while (pq.size() != 1)
	{
		// Remove the two nodes of highest priority
		// (lowest frequency) from the queue
		Node *left = pq.top(); pq.pop();
		Node *right = pq.top();	pq.pop();

		// Create a new internal node with these two nodes
		// as children and with frequency equal to the sum
		// of the two nodes' frequencies. Add the new node
		// to the priority queue.
		int sum = left->freq + right->freq;
		pq.push(getNode('', sum, left, right));
	}

	// root stores pointer to root of Huffman Tree
	Node* root = pq.top();

	// traverse the Huffman Tree and store Huffman Codes
	// in a map. Also prints them
	unordered_map<char, string> huffmanCode;
	encode(root, "", huffmanCode);

	cout << "Huffman Codes are :n" << 'n';
	for (auto pair: huffmanCode) {
		cout << pair.first << " " << pair.second << 'n';
	}

	cout << "nOriginal string was :n" << text << 'n';

	// print encoded string
	string str = "";
	for (char ch: text) {
		str += huffmanCode[ch];
	}

	cout << "nEncoded string is :n" << str << 'n';

	// traverse the Huffman Tree again and this time
	// decode the encoded string
	int index = -1;
	cout << "nDecoded string is: n";
	while (index < (int)str.size() - 2) {
		decode(root, index, str);
	}
}

// Huffman coding algorithm
int main()
{
	string text = "Huffman coding is a data compression algorithm.";

	buildHuffmanTree(text);

	return 0;
}

import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

// A Tree node
class Node
{
	char ch;
	int freq;
	Node left = null, right = null;

	Node(char ch, int freq)
	{
		this.ch = ch;
		this.freq = freq;
	}

	public Node(char ch, int freq, Node left, Node right) {
		this.ch = ch;
		this.freq = freq;
		this.left = left;
		this.right = right;
	}
};

class Huffman
{
	// traverse the Huffman Tree and store Huffman Codes
	// in a map.
	public static void encode(Node root, String str,
							  Map<Character, String> huffmanCode)
	{
		if (root == null)
			return;

		// found a leaf node
		if (root.left == null && root.right == null) {
			huffmanCode.put(root.ch, str);
		}


		encode(root.left, str + "0", huffmanCode);
		encode(root.right, str + "1", huffmanCode);
	}

	// traverse the Huffman Tree and decode the encoded string
	public static int decode(Node root, int index, StringBuilder sb)
	{
		if (root == null)
			return index;

		// found a leaf node
		if (root.left == null && root.right == null)
		{
			System.out.print(root.ch);
			return index;
		}

		index++;

		if (sb.charAt(index) == '0')
			index = decode(root.left, index, sb);
		else
			index = decode(root.right, index, sb);

		return index;
	}

	// Builds Huffman Tree and huffmanCode and decode given input text
	public static void buildHuffmanTree(String text)
	{
		// count frequency of appearance of each character
		// and store it in a map
		Map<Character, Integer> freq = new HashMap<>();
		for (int i = 0 ; i < text.length(); i++) {
			if (!freq.containsKey(text.charAt(i))) {
				freq.put(text.charAt(i), 0);
			}
			freq.put(text.charAt(i), freq.get(text.charAt(i)) + 1);
		}

		// Create a priority queue to store live nodes of Huffman tree
		// Notice that highest priority item has lowest frequency
		PriorityQueue<Node> pq = new PriorityQueue<>(
										(l, r) -> l.freq - r.freq);

		// Create a leaf node for each character and add it
		// to the priority queue.
		for (Map.Entry<Character, Integer> entry : freq.entrySet()) {
			pq.add(new Node(entry.getKey(), entry.getValue()));
		}

		// do till there is more than one node in the queue
		while (pq.size() != 1)
		{
			// Remove the two nodes of highest priority
			// (lowest frequency) from the queue
			Node left = pq.poll();
			Node right = pq.poll();

			// Create a new internal node with these two nodes as children 
			// and with frequency equal to the sum of the two nodes
			// frequencies. Add the new node to the priority queue.
			int sum = left.freq + right.freq;
			pq.add(new Node('', sum, left, right));
		}

		// root stores pointer to root of Huffman Tree
		Node root = pq.peek();

		// traverse the Huffman tree and store the Huffman codes in a map
		Map<Character, String> huffmanCode = new HashMap<>();
		encode(root, "", huffmanCode);

		// print the Huffman codes
		System.out.println("Huffman Codes are :n");
		for (Map.Entry<Character, String> entry : huffmanCode.entrySet()) {
			System.out.println(entry.getKey() + " " + entry.getValue());
		}

		System.out.println("nOriginal string was :n" + text);

		// print encoded string
		StringBuilder sb = new StringBuilder();
		for (int i = 0 ; i < text.length(); i++) {
			sb.append(huffmanCode.get(text.charAt(i)));
		}

		System.out.println("nEncoded string is :n" + sb);

		// traverse the Huffman Tree again and this time
		// decode the encoded string
		int index = -1;
		System.out.println("nDecoded string is: n");
		while (index < sb.length() - 2) {
			index = decode(root, index, sb);
		}
	}

	public static void main(String[] args)
	{
		String text = "Huffman coding is a data compression algorithm.";

		buildHuffmanTree(text);
	}
}

akiyesi: iranti ti a lo nipasẹ okun igbewọle jẹ 47 * 8 = 376 die-die ati okun ti a fi koodu gba soke nikan 194 die-die i.e. data ti wa ni fisinuirindigbindigbin nipa isunmọ 48%. Ninu eto C ++ ti o wa loke, a nlo kilasi okun lati tọju okun ti a fi koodu pamọ lati jẹ ki eto naa le ṣee ka.

Nitori awọn ẹya data isinyi ayo daradara nilo fun fifi sii O(log(N)) akoko, ati ni kan pipe alakomeji igi pẹlu N leaves bayi 2N-1 awọn apa, ati igi Huffman jẹ igi alakomeji pipe, lẹhinna algorithm ṣiṣẹ ni O(Nlog(N)) akoko, nibo N - Awọn kikọ.

Awọn orisun:

en.wikipedia.org/wiki/Huffman_coding
en.wikipedia.org/wiki/Variable-length_code
www.youtube.com/watch?v=5wRPin4oxCo

Kọ ẹkọ diẹ sii nipa ẹkọ naa.

orisun: www.habr.com

Fi ọrọìwòye kun