Kutulutsidwa kwa dongosolo lozindikiritsa zolemba Tesseract 4.1

Zokonzekera kutulutsidwa kwa optical text recognition system Tesseract 4.1, kuthandizira kuzindikira zilembo za UTF-8 ndi malemba m'zinenero zoposa 100, kuphatikizapo Chirasha, Chikazakh, Chibelarusi ndi Chiyukireniya. Zotsatira zitha kusungidwa m'mawu osavuta kapena HTML (hOCR), ALTO (XML), PDF ndi TSV. Dongosololi lidapangidwa koyambirira mu 1985-1995 mu labotale ya Hewlett Packard; mu 2005, code idatsegulidwa pansi pa layisensi ya Apache ndipo idapangidwanso mothandizidwa ndi ogwira ntchito ku Google. Magwero a polojekiti kufalitsa zololedwa pansi pa Apache 2.0.

Tesseract imaphatikizapo chida chothandizira komanso laibulale ya libtesseract yophatikizira magwiridwe antchito a OCR muzinthu zina. Kuchokera kumagulu ena omwe amathandizira Tesseract GUI zolumikizira mukhoza kuzindikira gImageReader, VietnamOCR ΠΈ YAGF. Injini ziwiri zozindikiritsa zimaperekedwa: yachikale yomwe imazindikira zolemba pamlingo wa mawonekedwe amunthu aliyense, ndi yatsopano kutengera kugwiritsa ntchito makina ophunzirira makina ozikidwa pa LSTM recurrent neural network, yokonzedwa kuti izindikire zingwe zonse ndikulola kuwonjezeka kwakukulu kwa kulondola. Zitsanzo zokonzedwa kale zimasindikizidwa 123 zilankhulo. Kuti muwongolere magwiridwe antchito, ma modules pogwiritsa ntchito OpenMP ndi AVX2, AVX kapena SSE4.1 SIMD malangizo amaperekedwa.

waukulu kuwongolera mu Tesseract 4.1:

  • Adawonjezera kuthekera kotulutsa mumtundu wa XML ALTO (Mapangidwe Osanthulidwa ndi Chinthu Cholemba). Kuti mugwiritse ntchito mawonekedwewa, muyenera kuyendetsa pulogalamuyi ngati "tessaract image_name alto output_dir";
  • Anawonjezera ma modules atsopano LSMBox ndi WordStrBox, kuchepetsa maphunziro a injini;
  • Thandizo lowonjezera la pseudographics muzotulutsa za hOCR (HTML);
  • Anawonjezera zolemba zina zolembedwa mu Python pophunzitsa injini potengera kuphunzira pamakina;
  • Kukhathamiritsa kowonjezera pogwiritsa ntchito malangizo a AVX, AVX2 ndi SSE;
  • Thandizo la OpenMP limayimitsidwa mwachisawawa chifukwa cha mavuto ndi zokolola;
  • Thandizo lowonjezera la mindandanda yoyera ndi yakuda mu injini ya LSTM;
  • Zolemba zokonzedwa bwino zochokera ku Cmake.

Source: opennet.ru

Kuwonjezera ndemanga