Ho lokolloa ha mokhoa oa temoho ea mongolo Tesseract 5.1

Ho lokolloa ha Tesseract 5.1 optical text recognition system e hatisitsoe, e tšehetsang tlhokomelo ea litlhaku tsa UTF-8 le litemana ka lipuo tse fetang 100, ho akarelletsa Serussia, Kazakh, Belarusian le Seukraine. Sephetho se ka bolokoa ka mongolo o hlakileng kapa ka liforomo tsa HTML (hOCR), ALTO (XML), PDF le TSV. Sistimi e qalile ka 1985-1995 ka laboratoring ea Hewlett Packard; ka 2005, khoutu e ile ea buloa tlasa laesense ea Apache mme ea ntlafatsoa hape ka ho nka karolo ha basebetsi ba Google. Khoutu ea mohloli oa projeke e ajoa tlasa laesense ea Apache 2.0.

Tesseract e kenyelletsa sesebelisoa sa console le laeborari ea libtesseract bakeng sa ho kenya ts'ebetso ea OCR lits'ebetsong tse ling. Likhokahano tsa mokha oa boraro tsa GUI tse tšehetsang Tesseract li kenyelletsa gImageReader, VietOCR le YAGF. Ho fanoa ka lienjineri tse peli tsa temohisiso: ea khale e amohelang mongolo boemong ba lipaterone tsa motho ka mong, le e ncha e thehiloeng ho ts'ebeliso ea sistimi ea ho ithuta ka mochini e thehiloeng ho netweke ea LSTM e pheta-phetoang ea neural, e ntlafalitsoeng bakeng sa ho lemoha likhoele tse felletseng le ho lumella ho keketseho e kholo ea ho nepahala. Mefuta e seng e lokisitsoe e seng e hatisitsoe ka lipuo tse 123. Ho ntlafatsa ts'ebetso, ho fanoa ka li-module tse sebelisang OpenMP le SIMD AVX2, AVX, NEON kapa SSE4.1.

Lintlafatso tse kholo ho Tesseract 5.1:

  • Bokhoni ba ho sebetsana le libaka ka litšoantšo le mela ha ho hlahisoa ka ALTO, hOCR le liforomo tsa mongolo bo kentsoe tšebetsong.
  • E kentse paramethara e ncha curl_timeout lkz curl_easy_setop.
  • Sistimi ea kaho e ntlafalitsoeng.
  • Mosebetsi o se o entsoe ho tlosa khoutu e sa sebelisoeng
  • Likotsi tse sa fetoheng tse bakiloeng ke ho ts'oaroa ka mokhoa o fosahetseng ha lisupa tse se nang thuso ho PageIterator::Sehlopha sa boithuto.

Source: opennet.ru

Eketsa ka tlhaloso