Huffman compression algorithm

A'o le'i amataina le vasega "Algorithms mo Atina'e" saunia mo oe se faaliliuga o se isi mea aoga.

Huffman coding o se faʻamaumauga faʻapipiʻi algorithm e faʻavaeina ai le manatu autu o le faʻapipiʻiina o faila. I totonu o lenei tusiga, o le a tatou talanoa e uiga i le faʻaogaina o le umi o le umi, faʻailoga tulaga ese, tulafono faʻapipiʻi, ma le fausiaina o se laau Huffman.

Matou te iloa o tagata taʻitasi o loʻo teuina o se faasologa o 0 ma 1's ma ave i luga 8 bits. E ta'ua lea o le umi fa'amaufa'ailogaina ona e fa'aogaina e tagata ta'itasi le numera fa'amautu e tasi e teu ai.

Sei faapea ua tuuina mai ia i tatou ni tusitusiga. E fa'afefea ona tatou fa'aitiitia le aofa'i o avanoa e mana'omia e teu ai se tagata e tasi?

O le manatu autu o le fesuiaiga o le umi o le encoding. E mafai ona tatou faʻaogaina le mea moni o nisi mataitusi i totonu o tusitusiga e tupu soo nai lo isi (vaai iinei) e atiaʻe se algorithm e faʻatusalia le faasologa tutusa o mataʻitusi i nai vaega laiti. I le fa'aliliuina o le umi e fesuisuia'i, matou te tu'uina atu i mata'itusi se numera fesuisuia'i o pusi, e fa'atatau i le fa'afia ona aliali mai i se tusitusiga. Mulimuli ane, o nisi mataitusi atonu e itiiti ifo i le 1 bit, ae o isi e mafai ona ave 2 bits, 3 pe sili atu. O le fa'afitauli i le fesuiaiga o le umi o le fa'ailoga ua na'o le fa'asologa mulimuli ane o le fa'asologa.

E fa'afefea, i le iloaina o le fa'asologa o vaega, fa'avasegaina ma le manino?

Mafaufau i le laina "abacdab". E 8 mataitusi, ma pe a faʻapipiʻi se umi tumau, e manaʻomia 64 bits e teu ai. Manatua o le fa'ailoga fa'atele "a", "b", "c" и "D" e tutusa ma le 4, 2, 1, 1. Sei o tatou taumafai e vaai faalemafaufau "abacdab" nai vaega itiiti, faʻaaoga le mea moni e faapea "ia" tupu soo nai lo "B"ma "B" tupu soo nai lo "c" и "D". Tatou amata ile coding "ia" fa'atasi ai ma le siisii ​​tutusa ma le 0, "B" matou te tuʻuina atu se lua-bit code 11, ma faʻaaoga tolu bits 100 ma 011 o le a matou faʻaogaina "c" и "D".

O se taunuuga, o le a tatou maua:

a
0

b
11

c
100

d
011

O le laina la "abacdab" o le a tatou encode pei 00110100011011 (0|0|11|0|100|011|0|11)fa'aaoga tulafono o lo'o i luga. Ae ui i lea, o le faafitauli autu o le a i le decoding. Pe a tatou taumafai e fa'avasega le manoa 00110100011011, o le a tatou maua se taunuuga le mautonu, talu ai e mafai ona faʻatusalia e pei o:

0|011|0|100|011|0|11    adacdab
0|0|11|0|100|0|11|011   aabacabd
0|011|0|100|0|11|0|11   adacabab 

...
ma isi.

Ina ia aloese mai lenei le mautonu, e tatau ona tatou mautinoa o loʻo faʻamalieina e la tatou faʻailoga se manatu e pei o tulafono prefix, lea e fa'aalia ai e na'o le tasi le auala e mafai ona fa'avasega ai tulafono laiti. O le tulafono muamua e faʻamautinoa e leai se faʻailoga o se prefix o se isi. I le code, o lona uiga o fasi mea na faʻaaogaina e fai ma sui o se uiga patino. I le faʻataʻitaʻiga i luga 0 o se prefix 011, lea e soli ai le tulafono prefix. O lea la, afai e fa'amalieina e a tatou tulafono le tulafono muamua, ona mafai lea ona tatou fa'avasega tulaga ese (ma le isi itu).

Sei o tatou toe asia le faataitaiga o loo i luga. O le taimi lenei o le a tatou tofia mo faailoga "a", "b", "c" и "D" code e fa'amalieina le tulafono fa'amuamua.

a
0

b
10

c
110

d
111

Faatasi ai ma lenei encoding, o le manoa "abacdab" o le a fa'ailogaina e pei o 00100100011010 (0|0|10|0|100|011|0|10). Ma iinei 00100100011010 o le a mafai ona tatou fa'avasegaina ma toe fo'i i la tatou manoa muamua "abacdab".

Huffman coding

O lea la ua uma ona tatou tagofia le fesuiaiga o le umi ma le tulafono muamua, sei o tatou talanoa e uiga i Huffman encoding.

O le metotia e faʻavae i luga o le fausiaina o laʻau binary. I totonu, o le node e mafai ona mulimuli pe totonu. I le taimi muamua, o nodes uma e taʻua o lau (terminal), o loʻo faʻatusalia ai le faʻailoga lava ia ma lona mamafa (o lona uiga, o le tele o taimi e tupu ai). O pona i totonu o lo'o i ai le mamafa o le amio ma fa'asino i pona tupuaga e lua. I se maliega lautele, bit "0" o lo'o fa'atusalia le mulimuli i le lala agavale, ma "1" - i le itu taumatau. i le laau atoa N lau ma N-1 nodes totonu. E fautuaina pe a fauina se laau Huffman, ia lafoaʻi faʻailoga e leʻi faʻaaogaina ina ia maua ai tulafono laiti umi.

O le a matou fa'aogaina se laina fa'amuamua e fau ai se la'au Huffman, lea o le node e maualalo le taimi o le a tu'uina atu i ai le fa'amuamua maualuga. O laasaga o le fausiaina o loʻo faʻamatalaina i lalo:

  1. Fausia se node laulaau mo tagata ta'itasi ma fa'aopoopo i le laina fa'amuamua.
  2. A'o sili atu ma le tasi le laupepa i le laina, fai mea nei:
    • Aveese pona e lua o lo'o iai le fa'amuamua maualuga (maualalo taimi) mai le laina;
    • Fausia se node fou i totonu, lea o nei pona e lua o le a avea ma tamaiti, ma o le taimi e tupu ai o le a tutusa ma le aofaʻi o laina o nei pona e lua.
    • Fa'aopoopo se node fou ile laina fa'amuamua.
  3. Na o le pau lava le node o le a avea ma aʻa, ma o le a maeʻa ai le fausiaina o le laau.

Vaai faalemafaufau e iai ni a tatou tusitusiga e aofia ai na o mataitusi "a", "b", "c", "d" и "ma", ma o latou fa'alavelave fa'afuase'i e 15, 7, 6, 6, ma le 5. Lalo o faʻataʻitaʻiga e atagia ai laasaga o le algorithm.

Huffman compression algorithm

Huffman compression algorithm

Huffman compression algorithm

Huffman compression algorithm

Huffman compression algorithm

O se ala mai le a'a i so'o se pito pito e teu ai le numera pito sili ona lelei (fa'aigoaina o le Huffman code) e fetaui ma le tagata e feso'ota'i ma lena pito pito.

Huffman compression algorithm
Laau Huffman

I lalo o le ae maua ai le faʻatinoga o le Huffman compression algorithm i C ++ ma Java:

#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;

// A Tree node
struct Node
{
	char ch;
	int freq;
	Node *left, *right;
};

// Function to allocate a new tree node
Node* getNode(char ch, int freq, Node* left, Node* right)
{
	Node* node = new Node();

	node->ch = ch;
	node->freq = freq;
	node->left = left;
	node->right = right;

	return node;
}

// Comparison object to be used to order the heap
struct comp
{
	bool operator()(Node* l, Node* r)
	{
		// highest priority item has lowest frequency
		return l->freq > r->freq;
	}
};

// traverse the Huffman Tree and store Huffman Codes
// in a map.
void encode(Node* root, string str,
			unordered_map<char, string> &huffmanCode)
{
	if (root == nullptr)
		return;

	// found a leaf node
	if (!root->left && !root->right) {
		huffmanCode[root->ch] = str;
	}

	encode(root->left, str + "0", huffmanCode);
	encode(root->right, str + "1", huffmanCode);
}

// traverse the Huffman Tree and decode the encoded string
void decode(Node* root, int &index, string str)
{
	if (root == nullptr) {
		return;
	}

	// found a leaf node
	if (!root->left && !root->right)
	{
		cout << root->ch;
		return;
	}

	index++;

	if (str[index] =='0')
		decode(root->left, index, str);
	else
		decode(root->right, index, str);
}

// Builds Huffman Tree and decode given input text
void buildHuffmanTree(string text)
{
	// count frequency of appearance of each character
	// and store it in a map
	unordered_map<char, int> freq;
	for (char ch: text) {
		freq[ch]++;
	}

	// Create a priority queue to store live nodes of
	// Huffman tree;
	priority_queue<Node*, vector<Node*>, comp> pq;

	// Create a leaf node for each character and add it
	// to the priority queue.
	for (auto pair: freq) {
		pq.push(getNode(pair.first, pair.second, nullptr, nullptr));
	}

	// do till there is more than one node in the queue
	while (pq.size() != 1)
	{
		// Remove the two nodes of highest priority
		// (lowest frequency) from the queue
		Node *left = pq.top(); pq.pop();
		Node *right = pq.top();	pq.pop();

		// Create a new internal node with these two nodes
		// as children and with frequency equal to the sum
		// of the two nodes' frequencies. Add the new node
		// to the priority queue.
		int sum = left->freq + right->freq;
		pq.push(getNode('', sum, left, right));
	}

	// root stores pointer to root of Huffman Tree
	Node* root = pq.top();

	// traverse the Huffman Tree and store Huffman Codes
	// in a map. Also prints them
	unordered_map<char, string> huffmanCode;
	encode(root, "", huffmanCode);

	cout << "Huffman Codes are :n" << 'n';
	for (auto pair: huffmanCode) {
		cout << pair.first << " " << pair.second << 'n';
	}

	cout << "nOriginal string was :n" << text << 'n';

	// print encoded string
	string str = "";
	for (char ch: text) {
		str += huffmanCode[ch];
	}

	cout << "nEncoded string is :n" << str << 'n';

	// traverse the Huffman Tree again and this time
	// decode the encoded string
	int index = -1;
	cout << "nDecoded string is: n";
	while (index < (int)str.size() - 2) {
		decode(root, index, str);
	}
}

// Huffman coding algorithm
int main()
{
	string text = "Huffman coding is a data compression algorithm.";

	buildHuffmanTree(text);

	return 0;
}

import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

// A Tree node
class Node
{
	char ch;
	int freq;
	Node left = null, right = null;

	Node(char ch, int freq)
	{
		this.ch = ch;
		this.freq = freq;
	}

	public Node(char ch, int freq, Node left, Node right) {
		this.ch = ch;
		this.freq = freq;
		this.left = left;
		this.right = right;
	}
};

class Huffman
{
	// traverse the Huffman Tree and store Huffman Codes
	// in a map.
	public static void encode(Node root, String str,
							  Map<Character, String> huffmanCode)
	{
		if (root == null)
			return;

		// found a leaf node
		if (root.left == null && root.right == null) {
			huffmanCode.put(root.ch, str);
		}


		encode(root.left, str + "0", huffmanCode);
		encode(root.right, str + "1", huffmanCode);
	}

	// traverse the Huffman Tree and decode the encoded string
	public static int decode(Node root, int index, StringBuilder sb)
	{
		if (root == null)
			return index;

		// found a leaf node
		if (root.left == null && root.right == null)
		{
			System.out.print(root.ch);
			return index;
		}

		index++;

		if (sb.charAt(index) == '0')
			index = decode(root.left, index, sb);
		else
			index = decode(root.right, index, sb);

		return index;
	}

	// Builds Huffman Tree and huffmanCode and decode given input text
	public static void buildHuffmanTree(String text)
	{
		// count frequency of appearance of each character
		// and store it in a map
		Map<Character, Integer> freq = new HashMap<>();
		for (int i = 0 ; i < text.length(); i++) {
			if (!freq.containsKey(text.charAt(i))) {
				freq.put(text.charAt(i), 0);
			}
			freq.put(text.charAt(i), freq.get(text.charAt(i)) + 1);
		}

		// Create a priority queue to store live nodes of Huffman tree
		// Notice that highest priority item has lowest frequency
		PriorityQueue<Node> pq = new PriorityQueue<>(
										(l, r) -> l.freq - r.freq);

		// Create a leaf node for each character and add it
		// to the priority queue.
		for (Map.Entry<Character, Integer> entry : freq.entrySet()) {
			pq.add(new Node(entry.getKey(), entry.getValue()));
		}

		// do till there is more than one node in the queue
		while (pq.size() != 1)
		{
			// Remove the two nodes of highest priority
			// (lowest frequency) from the queue
			Node left = pq.poll();
			Node right = pq.poll();

			// Create a new internal node with these two nodes as children 
			// and with frequency equal to the sum of the two nodes
			// frequencies. Add the new node to the priority queue.
			int sum = left.freq + right.freq;
			pq.add(new Node('', sum, left, right));
		}

		// root stores pointer to root of Huffman Tree
		Node root = pq.peek();

		// traverse the Huffman tree and store the Huffman codes in a map
		Map<Character, String> huffmanCode = new HashMap<>();
		encode(root, "", huffmanCode);

		// print the Huffman codes
		System.out.println("Huffman Codes are :n");
		for (Map.Entry<Character, String> entry : huffmanCode.entrySet()) {
			System.out.println(entry.getKey() + " " + entry.getValue());
		}

		System.out.println("nOriginal string was :n" + text);

		// print encoded string
		StringBuilder sb = new StringBuilder();
		for (int i = 0 ; i < text.length(); i++) {
			sb.append(huffmanCode.get(text.charAt(i)));
		}

		System.out.println("nEncoded string is :n" + sb);

		// traverse the Huffman Tree again and this time
		// decode the encoded string
		int index = -1;
		System.out.println("nDecoded string is: n");
		while (index < sb.length() - 2) {
			index = decode(root, index, sb);
		}
	}

	public static void main(String[] args)
	{
		String text = "Huffman coding is a data compression algorithm.";

		buildHuffmanTree(text);
	}
}

Manatua: o le manatua o loʻo faʻaaogaina e le manoa faʻapipiʻi o le 47 * 8 = 376 bits ma le manoa faʻailoga e naʻo le 194 bits i.e. faʻamaumauga o loʻo faʻapipiʻiina e tusa ma le 48%. I le polokalame C ++ o loʻo i luga, matou te faʻaogaina le vasega manoa e teu ai le manoa faʻapipiʻi ina ia mafai ai ona faitau le polokalame.

Aua e mana'omia le fa'aofiina o fa'amaumauga o fa'amaumauga fa'amuamua fa'amuamua O(log(N)) taimi, ae i totonu o se laau binary atoatoa ma N lau o iai 2N-1 nodes, ma o le laau Huffman o se laau binary atoatoa, ona alu lea o le algorithm i totonu O(Nlog(N)) taimi, o fea N - Tagata.

Punaoa:

en.wikipedia.org/wiki/Huffman_coding
en.wikipedia.org/wiki/Variable-length_code
www.youtube.com/watch?v=5wRPin4oxCo

Aoao atili e uiga i le kosi.

puna: www.habr.com

Faaopoopo i ai se faamatalaga