Ho lokolloa ha mokhoa oa temoho ea mongolo Tesseract 4.1

Lokisitsoe ho lokolloa ha optical text recognition system Tesseract 4.1, e tšehetsang ho amoheloa ha litlhaku tsa UTF-8 le litemana ka lipuo tse fetang 100, ho akarelletsa le Serussia, Kazakh, Belarusian le Seukraine. Sephetho se ka bolokoa ka mongolo o hlakileng kapa ka liforomo tsa HTML (hOCR), ALTO (XML), PDF le TSV. Sistimi e qalile ka 1985-1995 ka laboratoring ea Hewlett Packard; ka 2005, khoutu e ile ea buloa tlasa laesense ea Apache mme ea ntlafatsoa hape ka ho nka karolo ha basebetsi ba Google. Mehloli ea morero ho jaleha e nang le tumello tlas'a Apache 2.0.

Tesseract e kenyelletsa sesebelisoa sa console le laeborari ea libtesseract bakeng sa ho kenya ts'ebetso ea OCR lits'ebetsong tse ling. Ho tsoa ho batho ba boraro ba tšehetsang Tesseract Li-interface tsa GUI o ka hlokomela GImageReader, VietnamOCR и YAGF. Ho fanoa ka lienjineri tse peli tsa temohisiso: ea khale e amohelang mongolo boemong ba lipaterone tsa motho ka mong, le e ncha e thehiloeng ho ts'ebeliso ea sistimi ea ho ithuta ka mochini e thehiloeng ho netweke ea LSTM e pheta-phetoang ea neural, e ntlafalitsoeng bakeng sa ho lemoha likhoele tse felletseng le ho lumella ho keketseho e kholo ea ho nepahala. Mehlala e seng e entsoe e koetlisitsoeng e hatisoa bakeng sa Lipuo tse 123. Ho ntlafatsa ts'ebetso, ho fanoa ka litaelo tse sebelisang OpenMP le AVX2, AVX kapa SSE4.1 SIMD.

ka sehloohong ntlafatso ho Tesseract 4.1:

  • E kentse bokhoni ba ho hlahisa ka sebopeho sa XML ALTO (Sebopeho se Hlahlobisitsoeng le Sehlahisoa sa Mongolo). Ho sebelisa sebopeho sena, o lokela ho tsamaisa kopo joalo ka "tessaract image_name alto output_dir";
  • E kentse li-module tse ncha tsa ho fana ka LTMBox le WordStrBox, ho nolofatsa koetliso ea enjene;
  • Tšehetso e ekelitsoeng bakeng sa pseudographics ho tlhahiso ea hOCR (HTML);
  • E kentse mangolo a mang a ngotsoeng ka Python bakeng sa ho koetlisa enjene ho latela ho ithuta ka mochini;
  • Lintlafatso tse atolositsoeng tse sebelisang litaelo tsa AVX, AVX2 le SSE;
  • Ts'ehetso ea OpenMP e holofalitsoe ke kamehla ka lebaka la mathata ka tlhahiso;
  • Tšehetso e ekelitsoeng bakeng sa manane a masoeu le a matšo ka har'a enjene ea LSTM;
  • Lingoliloeng tse ntlafalitsoeng tse thehiloeng ho Cmake.

Source: opennet.ru

Eketsa ka tlhaloso