Cov ntaub ntawv compression siv Huffman algorithm

nkag

Hauv tsab xov xwm no kuv yuav tham txog lub npe nrov Huffman algorithm, nrog rau nws daim ntawv thov hauv cov ntaub ntawv compression.

Yog li ntawd, peb yuav sau ib qho yooj yim archiver. Qhov no twb tau tham lawm tsab xov xwm ntawm Habre, tab sis tsis muaj kev coj ua. Cov ntaub ntawv theoretical ntawm cov ntawv tshaj tawm tam sim no yog muab los ntawm tsev kawm ntawv cov kev kawm computer science thiab Robert Laforet phau ntawv "Data Structures and Algorithms in Java". Yog li, txhua yam raug txiav!

Ob peb xav

Nyob rau hauv cov ntawv nyeem ib txwm muaj, ib tus cim yog encoded nrog 8 khoom (ASCII encoding) lossis 16 (Unicode encoding). Tom ntej no peb yuav xav txog ASCII encoding. Piv txwv li, coj kab s1 = "SUSIE SAYS IS EASYn". Muaj tag nrho ntawm 22 tus cim hauv kab, ib txwm muaj, suav nrog cov chaw thiab cov kab tshiab - 'n'. Cov ntaub ntawv uas muaj kab no yuav hnyav 22 * ​​8 = 176 khoom. Cov lus nug tam sim ntawd tshwm sim: nws puas yog qhov tsim nyog los siv tag nrho 8 cov khoom los encode 1 tus cim? Peb tsis siv tag nrho cov cim ASCII. Txawm hais tias lawv tau ua, nws yuav muaj txiaj ntsig ntau dua rau tsab ntawv tshaj plaws - S - kom tau txais cov lej luv tshaj plaws, thiab rau tsab ntawv tsis tshua muaj - T (lossis U, lossis 'n') - yuav tsum tau muab cov lej ntev dua. Qhov no yog qhov Huffman algorithm muaj xws li: nws yog ib qho tsim nyog los nrhiav qhov kev xaiv zoo tshaj plaws encoding uas cov ntaub ntawv yuav muaj qhov hnyav tsawg kawg nkaus. Nws yog qhov zoo ib yam uas cov lej ntev yuav txawv rau cov cim sib txawv - qhov no yog qhov algorithm raws li.

Coding

Yog vim li cas ho tsis muab tus cim 'S' ib tug code, piv txwv li, 1 ntsis ntev: 0 los yog 1. Cia nws yog 1. Ces tus thib ob feem ntau cim - ' ' (chaw) - muab 0. Xav txog tej yam koj pib decoding koj cov lus - tus encoded hlua s1 - thiab koj pom tias tus lej pib nrog 1. Yog li, koj ua li cas: qhov no yog tus cim S, lossis nws puas yog lwm tus cim, piv txwv li A? Yog li ntawd, ib txoj cai tseem ceeb tshwm sim:

Tsis yog code yuav tsum yog prefix ntawm lwm tus

Txoj cai no yog qhov tseem ceeb hauv algorithm. Yog li ntawd, kev tsim cov cai pib nrog lub rooj zaus, uas qhia qhov zaus (tus naj npawb ntawm qhov tshwm sim) ntawm txhua lub cim:

Cov ntaub ntawv compression siv Huffman algorithm Cov cim uas feem ntau tshwm sim yuav tsum tau encoded tsawg kawg nkaus ua tau pes tsawg. Kuv yuav muab piv txwv ntawm ib qho ntawm cov kab lis kev cai:

Cov ntaub ntawv compression siv Huffman algorithm Yog li cov lus encoded yuav zoo li no:

10 01111 10 110 1111 00 10 010 1110 10 00 110 0110 00 110 10 00 1111 010 10 1110 01110

Kuv cais tus lej ntawm txhua tus cim nrog qhov chaw. Qhov no yuav tsis tshwm sim hauv cov ntaub ntawv compressed tiag tiag!
Cov lus nug tshwm sim: cas tus tub hluas no tuaj yeem tsim ib lub rooj lis dej num? Qhov no yuav tau tham hauv qab no.

Tsim ib tsob ntoo Huffman

Qhov no yog qhov twg binary nrhiav ntoo tuaj cawm. Tsis txhob txhawj, koj yuav tsis xav tau kev tshawb nrhiav, ntxig, lossis tshem tawm txoj hauv kev ntawm no. Nov yog cov qauv ntoo hauv java:

public class Node {
    private int frequence;
    private char letter;
    private Node leftChild;
    private Node rightChild;
    ...
}

class BinaryTree {
    private Node root;

    public BinaryTree() {
        root = new Node();
    }
    public BinaryTree(Node root) {
        this.root = root;
    }
    ...
}

Qhov no tsis yog tus lej tiav, tag nrho cov lej yuav nyob hauv qab no.

Ntawm no yog algorithm rau kev tsim tsob ntoo:

  1. Tsim ib qho khoom Node rau txhua tus cim los ntawm cov lus (kab s1). Nyob rau hauv peb rooj plaub yuav muaj 9 nodes (Node khoom). Txhua lub node muaj ob cov ntaub ntawv teb: cim thiab zaus
  2. Tsim ib tsob ntoo khoom (BinaryTree) rau txhua Node. Cov node ua cov cag ntoo.
  3. Ntxig cov ntoo no rau hauv qhov tseem ceeb ntawm kab. Qhov tsawg zaus, qhov tseem ceeb dua. Yog li, thaum rho tawm, dervo nrog qhov qis tshaj plaws yog ib txwm xaiv.

Ua ntej koj yuav tsum ua cov hauv qab no cyclically:

  1. Tshem ob tsob ntoo los ntawm qhov tseem ceeb queue thiab ua rau lawv cov me nyuam ntawm cov node tshiab (cov node tsim tshiab tsis muaj tsab ntawv). Qhov zaus ntawm cov node tshiab yog sib npaug rau qhov sib npaug ntawm cov zaus ntawm ob tsob ntoo xeeb leej xeeb ntxwv.
  2. Rau qhov node, tsim ib tsob ntoo nrog lub hauv paus ntawm node. Ntxig tsob ntoo no rov qab rau hauv qhov tseem ceeb queue. (Txij li tsob ntoo muaj qhov zaus tshiab, nws yuav tshwm sim hauv qhov chaw tshiab hauv kab)
  3. Txuas ntxiv cov kauj ruam 1 thiab 2 kom txog thaum tsuas muaj ib tsob ntoo sab laug hauv kab - tsob ntoo Huffman

Xav txog qhov no algorithm ntawm kab s1:

Cov ntaub ntawv compression siv Huffman algorithm

Ntawm no lub cim "lf" (linefeed) txhais tau tias ib kab tshiab, "sp" (chaw) yog qhov chaw.

Thiab yog dab tsi ntxiv?

Peb tau txais tsob ntoo Huffman. OK. Thiab yuav ua li cas nrog nws? Lawv yuav tsis txawm coj nws dawb.Thiab tom qab ntawd, koj yuav tsum taug qab txhua txoj hauv kev los ntawm cov hauv paus mus rau nplooj ntoo. Cia peb pom zoo kom pom ib ntug 0 yog tias nws coj mus rau sab laug tus me nyuam thiab 1 yog nws coj mus rau sab xis. Kev hais lus nruj me ntsis, hauv qhov kev sau no, cov cai ntawm lub cim yog txoj hauv kev los ntawm lub hauv paus ntawm tsob ntoo mus rau nplooj uas muaj cov cim no heev.

Cov ntaub ntawv compression siv Huffman algorithm

Qhov no yog li cas lub rooj ntawm cov lis dej num muab tawm. Nco ntsoov tias yog tias peb xav txog cov lus no, peb tuaj yeem xaus txog "qhov hnyav" ntawm txhua lub cim - qhov no yog qhov ntev ntawm nws cov cai. Tom qab ntawd, hauv daim ntawv compressed, thawj cov ntaub ntawv yuav hnyav: 2 * 3 + 2 * 4 + 3 * 3 + 6 * 2 + 1 * 4 + 1 * 5 + 2 * 4 + 4 * 2 + 1 * 5 = 65 khoom . Thaum xub thawj nws hnyav 176 ntsis. Yog li ntawd, peb txo nws los ntawm ntau npaum li 176/65 = 2.7 npaug! Tab sis qhov no yog utopia. Xws li tus coefficient tsis zoo li yuav tau txais. Vim li cas? Qhov no yuav tau tham me ntsis tom qab.

Kev txiav txim siab

Zoo, tej zaum qhov yooj yim tshaj plaws sab laug yog decoding. Kuv xav tias ntau tus ntawm koj tau twv tias peb tsis tuaj yeem tsim cov ntaub ntawv compressed yam tsis muaj lus pom zoo li cas nws tau encoded - peb yuav tsis tuaj yeem txiav txim siab nws! Yog, yog, nws nyuaj rau kuv paub qhov no, tab sis kuv yuav tau tsim cov ntawv nyeem ntawv table.txt nrog lub rooj compression:

01110
 00
A010
E1111
I110
S10
T0110
U01111
Y1110

Cov lus nkag hauv daim ntawv 'symbol' 'character code'. Vim li cas 01110 tsis muaj lub cim? Qhov tseeb, nws yog nrog lub cim, nws tsuas yog tias cov cuab yeej java kuv siv thaum tso tawm rau cov ntaub ntawv, tus cim tshiab - 'n' - tau hloov mus rau hauv kab tshiab (txawm tias ruam npaum li cas nws yuav suab). Yog li ntawd, cov kab khoob nyob rau sab saum toj yog tus cim rau code 01110. Rau code 00, tus cwj pwm yog qhov chaw nyob ntawm qhov pib ntawm kab. Kuv mam li hais tam sim ntawd rau peb Khan coefficient, qhov no txoj kev khaws cia ib lub rooj tuaj yeem lees tias yog qhov tsis sib haum xeeb tshaj plaws. Tab sis nws yog ib qho yooj yim to taub thiab siv. Kuv yuav zoo siab tau hnov ​​koj cov lus pom zoo hauv cov lus hais txog kev ua kom zoo.

Muaj cov lus no ua rau nws yooj yim heev rau kev txiav txim siab. Cia peb nco ntsoov tias peb ua raws li txoj cai thaum tsim cov encoding:

Tsis yog code yuav tsum yog prefix ntawm lwm tus

Qhov no yog qhov uas nws ua lub luag haujlwm yooj yim. Peb nyeem ib ntus los ntawm me ntsis thiab, sai li sai tau raws li cov txiaj ntsig d, suav nrog cov ntawv nyeem, phim cov encoding sib xws rau tus cwj pwm tus cwj pwm, peb tam sim ntawd paub tias tus cwj pwm tus cwj pwm (thiab tsuas yog nws!) tau encoded. Tom ntej no, peb sau cov cim rau hauv kab txiav txim siab (txoj kab uas muaj cov lus txiav txim), rov pib kab d, thiab tom qab ntawd nyeem cov ntaub ntawv encoded.

Kev siv

Nws yog lub sij hawm los txov kuv cov cai thiab sau ib qho archiver. Cia peb hu nws Compressor.

Pib dua. Ua ntej tshaj plaws, peb sau Node chav kawm:

public class Node {
    private int frequence;//частота
    private char letter;//буква
    private Node leftChild;//левый потомок
    private Node rightChild;//правый потомок

   

    public Node(char letter, int frequence) { //собственно, конструктор
        this.letter = letter;
        this.frequence = frequence;
    }

    public Node() {}//перегрузка конструтора для безымянных узлов(см. выше в разделе о построении дерева Хаффмана)
    public void addChild(Node newNode) {//добавить потомка
        if (leftChild == null)//если левый пустой=> правый тоже=> добавляем в левый
            leftChild = newNode;
        else {
            if (leftChild.getFrequence() <= newNode.getFrequence()) //в общем, левым потомком
                rightChild = newNode;//станет тот, у кого меньше частота
            else {
                rightChild = leftChild;
                leftChild = newNode;
            }
        }

        frequence += newNode.getFrequence();//итоговая частота
    }

    public Node getLeftChild() {
        return leftChild;
    }

    public Node getRightChild() {
        return rightChild;
    }

    public int getFrequence() {
        return frequence;
    }

    public char getLetter() {
        return letter;
    }

    public boolean isLeaf() {//проверка на лист
        return leftChild == null && rightChild == null;
    }
}

Tam sim no tsob ntoo:

class BinaryTree {
    private Node root;

    public BinaryTree() {
        root = new Node();
    }

    public BinaryTree(Node root) {
        this.root = root;
    }

    public int getFrequence() {
        return root.getFrequence();
    }

    public Node getRoot() {
        return root;
    }
}

Lub luag haujlwm tseem ceeb:

import java.util.ArrayList;//да-да, очередь будет на базе списка

class PriorityQueue {
    private ArrayList<BinaryTree> data;//список очереди
    private int nElems;//кол-во элементов в очереди

    public PriorityQueue() {
        data = new ArrayList<BinaryTree>();
        nElems = 0;
    }

    public void insert(BinaryTree newTree) {//вставка
        if (nElems == 0)
            data.add(newTree);
        else {
            for (int i = 0; i < nElems; i++) {
                if (data.get(i).getFrequence() > newTree.getFrequence()) {//если частота вставляемого дерева меньше 
                    data.add(i, newTree);//чем част. текущего, то cдвигаем все деревья на позициях справа на 1 ячейку                   
                    break;//затем ставим новое дерево на позицию текущего
                }
                if (i == nElems - 1) 
                    data.add(newTree);
            }
        }
        nElems++;//увеличиваем кол-во элементов на 1
    }

    public BinaryTree remove() {//удаление из очереди
        BinaryTree tmp = data.get(0);//копируем удаляемый элемент
        data.remove(0);//собственно, удаляем
        nElems--;//уменьшаем кол-во элементов на 1
        return tmp;//возвращаем удаленный элемент(элемент с наименьшей частотой)
    }
}

Chav kawm uas tsim cov ntoo Huffman:

public class HuffmanTree {
    private final byte ENCODING_TABLE_SIZE = 127;//длина кодировочной таблицы
    private String myString;//сообщение
    private BinaryTree huffmanTree;//дерево Хаффмана
    private int[] freqArray;//частотная таблица
    private String[] encodingArray;//кодировочная таблица


    //----------------constructor----------------------
    public HuffmanTree(String newString) {
        myString = newString;

        freqArray = new int[ENCODING_TABLE_SIZE];
        fillFrequenceArray();

        huffmanTree = getHuffmanTree();

        encodingArray = new String[ENCODING_TABLE_SIZE];
        fillEncodingArray(huffmanTree.getRoot(), "", "");
    }

    //--------------------frequence array------------------------
    private void fillFrequenceArray() {
        for (int i = 0; i < myString.length(); i++) {
            freqArray[(int)myString.charAt(i)]++;
        }
    }

    public int[] getFrequenceArray() {
        return freqArray;
    }

    //------------------------huffman tree creation------------------
    private BinaryTree getHuffmanTree() {
        PriorityQueue pq = new PriorityQueue();
        //алгоритм описан выше
        for (int i = 0; i < ENCODING_TABLE_SIZE; i++) {
            if (freqArray[i] != 0) {//если символ существует в строке
                Node newNode = new Node((char) i, freqArray[i]);//то создать для него Node
                BinaryTree newTree = new BinaryTree(newNode);//а для Node создать BinaryTree
                pq.insert(newTree);//вставить в очередь
            }
        }

        while (true) {
            BinaryTree tree1 = pq.remove();//извлечь из очереди первое дерево.

            try {
                BinaryTree tree2 = pq.remove();//извлечь из очереди второе дерево

                Node newNode = new Node();//создать новый Node
                newNode.addChild(tree1.getRoot());//сделать его потомками два извлеченных дерева
                newNode.addChild(tree2.getRoot());

                pq.insert(new BinaryTree(newNode);
            } catch (IndexOutOfBoundsException e) {//осталось одно дерево в очереди
                return tree1;
            }
        }
    }

    public BinaryTree getTree() {
        return huffmanTree;
    }

    //-------------------encoding array------------------
    void fillEncodingArray(Node node, String codeBefore, String direction) {//заполнить кодировочную таблицу
        if (node.isLeaf()) {
            encodingArray[(int)node.getLetter()] = codeBefore + direction;
        } else {
            fillEncodingArray(node.getLeftChild(), codeBefore + direction, "0");
            fillEncodingArray(node.getRightChild(), codeBefore + direction, "1");
        }
    }

    String[] getEncodingArray() {
        return encodingArray;
    }

    public void displayEncodingArray() {//для отладки
        fillEncodingArray(huffmanTree.getRoot(), "", "");

        System.out.println("======================Encoding table====================");
        for (int i = 0; i < ENCODING_TABLE_SIZE; i++) {
            if (freqArray[i] != 0) {
                System.out.print((char)i + " ");
                System.out.println(encodingArray[i]);
            }
        }
        System.out.println("========================================================");
    }
    //-----------------------------------------------------
    String getOriginalString() {
        return myString;
    }
}

Chav kawm uas muaj encodes/decodes:

public class HuffmanOperator {
    private final byte ENCODING_TABLE_SIZE = 127;//длина таблицы
    private HuffmanTree mainHuffmanTree;//дерево Хаффмана (используется только для сжатия)
    private String myString;//исходное сообщение
    private int[] freqArray;//частотаная таблица
    private String[] encodingArray;//кодировочная таблица
    private double ratio;//коэффициент сжатия 


    public HuffmanOperator(HuffmanTree MainHuffmanTree) {//for compress
        this.mainHuffmanTree = MainHuffmanTree;

        myString = mainHuffmanTree.getOriginalString();

        encodingArray = mainHuffmanTree.getEncodingArray();

        freqArray = mainHuffmanTree.getFrequenceArray();
    }

    public HuffmanOperator() {}//for extract;

    //---------------------------------------compression-----------------------------------------------------------
    private String getCompressedString() {
        String compressed = "";
        String intermidiate = "";//промежуточная строка(без добавочных нулей)
        //System.out.println("=============================Compression=======================");
        //displayEncodingArray();
        for (int i = 0; i < myString.length(); i++) {
            intermidiate += encodingArray[myString.charAt(i)];
        }
        //Мы не можем писать бит в файл. Поэтому нужно сделать длину сообщения кратной 8=>
        //нужно добавить нули в конец(можно 1, нет разницы)
        byte counter = 0;//количество добавленных в конец нулей (байта в полне хватит: 0<=counter<8<127)
        for (int length = intermidiate.length(), delta = 8 - length % 8; 
        		counter < delta ; counter++) {//delta - количество добавленных нулей
            intermidiate += "0";
        }
        
        //склеить кол-во добавочных нулей в бинарном предаствлении и промежуточную строку 
        compressed = String.format("%8s", Integer.toBinaryString(counter & 0xff)).replace(" ", "0") + intermidiate;
        		
        //идеализированный коэффициент
        setCompressionRatio();
        //System.out.println("===============================================================");
        return compressed;
    }
    
    private void setCompressionRatio() {//посчитать идеализированный коэффициент 
        double sumA = 0, sumB = 0;//A-the original sum
        for (int i = 0; i < ENCODING_TABLE_SIZE; i++) {
            if (freqArray[i] != 0) {
                sumA += 8 * freqArray[i];
                sumB += encodingArray[i].length() * freqArray[i];
            }
        }
        ratio = sumA / sumB;
    }

    public byte[] getBytedMsg() {//final compression
        StringBuilder compressedString = new StringBuilder(getCompressedString());
        byte[] compressedBytes = new byte[compressedString.length() / 8];
        for (int i = 0; i < compressedBytes.length; i++) {
                compressedBytes[i] = (byte) Integer.parseInt(compressedString.substring(i * 8, (i + 1) * 8), 2);
        }
        return compressedBytes;
    }
    //---------------------------------------end of compression----------------------------------------------------------------
    //------------------------------------------------------------extract-----------------------------------------------------
    public String extract(String compressed, String[] newEncodingArray) {
        String decompressed = "";
        String current = "";
        String delta = "";
        encodingArray = newEncodingArray;
        
        //displayEncodingArray();
        //получить кол-во вставленных нулей
        for (int i = 0; i < 8; i++) 
        	delta += compressed.charAt(i);
        int ADDED_ZEROES = Integer.parseInt(delta, 2);
       
        for (int i = 8, l = compressed.length() - ADDED_ZEROES; i < l; i++) {
            //i = 8, т.к. первым байтом у нас идет кол-во вставленных нулей
            current += compressed.charAt(i);
            for (int j = 0; j < ENCODING_TABLE_SIZE; j++) {
                if (current.equals(encodingArray[j])) {//если совпало
                    decompressed += (char)j;//то добавляем элемент
                    current = "";//и обнуляем текущую строку
                }
            }
        }

        return decompressed;
    }

    public String getEncodingTable() {
        String enc = "";
    	for (int i = 0; i < encodingArray.length; i++) {
        	if (freqArray[i] != 0) 
        		enc += (char)i + encodingArray[i] + 'n';
        }
    	return enc;
    }

    public double getCompressionRatio() {
        return ratio;
    }


    public void displayEncodingArray() {//для отладки
        System.out.println("======================Encoding table====================");
        for (int i = 0; i < ENCODING_TABLE_SIZE; i++) {
            //if (freqArray[i] != 0) {
                System.out.print((char)i + " ");
                System.out.println(encodingArray[i]);
            //}
        }
        System.out.println("========================================================");
    }
    }

Ib chav kawm uas ua rau nws yooj yim sau rau hauv cov ntaub ntawv:

import java.io.File;
import java.io.PrintWriter;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.Closeable;

public class FileOutputHelper implements Closeable {
    private File outputFile;
    private FileOutputStream fileOutputStream;

    public FileOutputHelper(File file) throws FileNotFoundException {
        outputFile = file;
        fileOutputStream = new FileOutputStream(outputFile);
    }

    public void writeByte(byte msg) throws IOException {
        fileOutputStream.write(msg);
    }

    public void writeBytes(byte[] msg) throws IOException {
        fileOutputStream.write(msg);
    }

    public void writeString(String msg) {
    	try (PrintWriter pw = new PrintWriter(outputFile)) {
    		pw.write(msg);
    	} catch (FileNotFoundException e) {
    		System.out.println("Неверный путь, или такого файла не существует!");
    	}
    }

    @Override
    public void close() throws IOException {
        fileOutputStream.close();
    }

    public void finalize() throws IOException {
        close();
    }
}

Ib chav kawm uas ua kom yooj yim nyeem los ntawm cov ntaub ntawv:

import java.io.FileInputStream;
import java.io.EOFException;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.Closeable;
import java.io.File;
import java.io.IOException;

public class FileInputHelper implements Closeable {
	private FileInputStream fileInputStream;
	private BufferedReader fileBufferedReader;
	
	public FileInputHelper(File file) throws IOException {
		fileInputStream = new FileInputStream(file);
		fileBufferedReader = new BufferedReader(new InputStreamReader(fileInputStream));
	}
	
	
    public byte readByte() throws IOException {
    	int cur = fileInputStream.read();
    	if (cur == -1)//если закончился файл
    		throw new EOFException();
    	return (byte)cur;
    }
    
    public String readLine() throws IOException {
    	return fileBufferedReader.readLine();
    }
    
    @Override
    public void close() throws IOException{
    	fileInputStream.close();
    }
}

Zoo, thiab lub ntsiab chav kawm:

import java.io.File;
import java.nio.charset.MalformedInputException;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.NoSuchFileException;
import java.nio.file.Paths;
import java.util.List;
import java.io.EOFException;
public class Main {
	private static final byte ENCODING_TABLE_SIZE = 127;
	
    public static void main(String[] args) throws IOException {
        try {//указываем инструкцию с помощью аргументов командной строки
            if (args[0].equals("--compress") || args[0].equals("-c"))
                compress(args[1]);
            else if ((args[0].equals("--extract") || args[0].equals("-x"))
            		&& (args[2].equals("--table") || args[2].equals("-t"))) {
            	extract(args[1], args[3]);
            }
            else
                throw new IllegalArgumentException();
        } catch (ArrayIndexOutOfBoundsException | IllegalArgumentException e) {
            System.out.println("Неверный формат ввода аргументов ");
            System.out.println("Читайте Readme.txt");
            e.printStackTrace();
        }
    }

	public static void compress(String stringPath) throws IOException {
        List<String> stringList;
        File inputFile = new File(stringPath);
        String s = "";
        File compressedFile, table;
        
        try {
            stringList = Files.readAllLines(Paths.get(inputFile.getAbsolutePath()));
        } catch (NoSuchFileException e) {
            System.out.println("Неверный путь, или такого файла не существует!");
            return;
        } catch (MalformedInputException e) {
        	System.out.println("Текущая кодировка файла не поддерживается");
        	return;
        }

        for (String item : stringList) {
            s += item;
            s += 'n';
        }

        HuffmanOperator operator = new HuffmanOperator(new HuffmanTree(s));

        compressedFile = new File(inputFile.getAbsolutePath() + ".cpr");
        compressedFile.createNewFile();
        try (FileOutputHelper fo = new FileOutputHelper(compressedFile)) {
        	fo.writeBytes(operator.getBytedMsg());
        }
        //create file with encoding table:
        
        table = new File(inputFile.getAbsolutePath() + ".table.txt");
        table.createNewFile();
        try (FileOutputHelper fo = new FileOutputHelper(table)) {
        	fo.writeString(operator.getEncodingTable());
        }
        
        System.out.println("Путь к сжатому файлу: " + compressedFile.getAbsolutePath());
        System.out.println("Путь к кодировочной таблице " + table.getAbsolutePath());
        System.out.println("Без таблицы файл будет невозможно извлечь!");
        
        double idealRatio = Math.round(operator.getCompressionRatio() * 100) / (double) 100;//идеализированный коэффициент
        double realRatio = Math.round((double) inputFile.length() 
        		/ ((double) compressedFile.length() + (double) table.length()) * 100) / (double)100;//настоящий коэффициент
        
        System.out.println("Идеализированный коэффициент сжатия равен " + idealRatio);
        System.out.println("Коэффициент сжатия с учетом кодировочной таблицы " + realRatio);
    }

    public static void extract(String filePath, String tablePath) throws FileNotFoundException, IOException {
        HuffmanOperator operator = new HuffmanOperator();
        File compressedFile = new File(filePath),
        	 tableFile = new File(tablePath),
        	 extractedFile = new File(filePath + ".xtr");
        String compressed = "";
        String[] encodingArray = new String[ENCODING_TABLE_SIZE];
        //read compressed file
        //!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!check here:
        try (FileInputHelper fi = new FileInputHelper(compressedFile)) {
        	byte b;
        	while (true) {
        		b = fi.readByte();//method returns EOFException
        		compressed += String.format("%8s", Integer.toBinaryString(b & 0xff)).replace(" ", "0");
        	}
        } catch (EOFException e) {
        	
        }
        
        //--------------------
        
        //read encoding table:
        try (FileInputHelper fi = new FileInputHelper(tableFile)) {
        	fi.readLine();//skip first empty string
        	encodingArray[(byte)'n'] = fi.readLine();//read code for 'n'
        	while (true) {
        		String s = fi.readLine();
        		if (s == null)
        			throw new EOFException();
        		encodingArray[(byte)s.charAt(0)] = s.substring(1, s.length());        		
        	}
        } catch (EOFException ignore) {}
        
        extractedFile.createNewFile();
        //extract:
		try (FileOutputHelper fo = new FileOutputHelper(extractedFile)) {
			fo.writeString(operator.extract(compressed, encodingArray));
		}
		
		System.out.println("Путь к распакованному файлу " + extractedFile.getAbsolutePath());
    }
}

Koj yuav tau sau cov ntaub ntawv readme.txt koj tus kheej :)

xaus

Kuv xav tias yog txhua yam kuv xav hais. Yog tias koj muaj ib yam dab tsi hais txog kuv qhov tsis muaj peev xwm hauv kev txhim kho cov cai, algorithm, lossis kev ua kom zoo tshaj plaws, ces xav tias dawb sau. Yog kuv tsis tau piav dab tsi, thov sau ntawv thiab. Kuv xav hnov ​​los ntawm koj hauv cov lus!

PS

Yog, yog, kuv tseem nyob ntawm no, vim kuv tsis nco qab txog cov coefficient. Rau txoj hlua s1, lub rooj encoding hnyav 48 bytes - ntau dua li cov ntaub ntawv hauv qhov chaw, thiab peb tsis hnov ​​​​qab txog qhov xoom ntxiv (tus naj npawb ntawm xoom ntxiv yog 7) => qhov sib piv compression yuav tsawg dua ib qho: 176 / Luas = 65 + 48 * 8 + 7 = 0.38. Yog tias koj tseem pom qhov no, ces nws tsis yog koj lub ntsej muag xwb. Yog lawm, qhov kev siv no yuav tsis muaj txiaj ntsig rau cov ntaub ntawv me me. Tab sis ua li cas rau cov ntaub ntawv loj? Cov ntaub ntawv loj npaum li cas loj dua qhov loj ntawm lub rooj encoding. Qhov no yog qhov chaw algorithm ua haujlwm raws li nws yuav tsum tau ua! Piv txwv li, rau Faust lub monologue Lub archiver tsim ib tug tiag tiag (tsis zoo tagnrho) coefficient ntawm 1.46 - yuav luag ib thiab ib nrab zaug! Thiab yog, cov ntaub ntawv yuav tsum yog lus Askiv.

Tau qhov twg los: www.hab.com

Ntxiv ib saib