Ukukhishwa kwesistimu yokuqaphela umbhalo i-Tesseract 4.1

Ilungiselelwe ukukhululwa kwesistimu yokuqaphela umbhalo obonakalayo I-Tesseract 4.1, esekela ukuqashelwa kwezinhlamvu ze-UTF-8 nemibhalo ngezilimi ezingaphezu kuka-100, okuhlanganisa isiRashiya, isiKazakh, isiBelarusian nesi-Ukrainian. Umphumela ungagcinwa ngombhalo ongenalutho noma ngefomethi ye-HTML (hOCR), ALTO (XML), PDF kanye ne-TSV. Uhlelo lwaqalwa ngo-1985-1995 elabhorethri ye-Hewlett Packard; ngo-2005, ikhodi yavulwa ngaphansi kwelayisensi ye-Apache futhi yathuthukiswa futhi ngokubamba iqhaza kwabasebenzi bakwa-Google. Imithombo yephrojekthi ukubhebhetheka ilayisensi ngaphansi kwe-Apache 2.0.

I-Tesseract ihlanganisa insiza yekhonsoli kanye nelabhulali ye-libtesseract yokushumeka ukusebenza kwe-OCR kwezinye izinhlelo zokusebenza. Kusukela ezinkampanini zangaphandle ezisekela i-Tesseract Izixhumanisi ze-GUI ungaqaphela gImageReader, I-VietnamOCR ΠΈ YAGF. Kuhlinzekwa izinjini ezimbili zokuqaphela: eyakudala ebona umbhalo ezingeni lamaphethini omlingiswa ngamunye, kanye nentsha esekelwe ekusetshenzisweni kwesistimu yokufunda yomshini esekelwe kunethiwekhi ye-neural eqhubekayo ye-LSTM, elungiselelwe ukubona amayunithi ezinhlamvu wonke kanye nokuvumela ukwanda okuphawulekayo kokunemba. Amamodeli aqeqeshiwe esenziwe ngomumo ashicilelwa Izilimi eziyi-123. Ukuthuthukisa ukusebenza, amamojula asebenzisa i-OpenMP ne-AVX2, i-AVX noma imiyalelo ye-SSE4.1 SIMD iyanikezwa.

main ukuthuthukiswa ku-Tesseract 4.1:

  • Kwengezwe amandla okukhiphayo ngefomethi ye-XML I-ALTO (Isakhiwo Esihlaziyiwe kanye Nento Yombhalo). Ukuze usebenzise le fomethi, kufanele usebenzise uhlelo lokusebenza njenge-β€œtessaract image_name alto output_dir”;
  • Kwengezwe amamojula amasha okunikeza i-LSTMBox ne-WordStrBox, ukuqeqeshwa kwenjini kube lula;
  • Ukwesekwa okwengeziwe kwama-pseudographics ekuphumeni kwe-hOCR (HTML);
  • Kwengezwe ezinye izikripthi ezibhalwe nge-Python zokuqeqesha injini ngokusekelwe ekufundeni komshini;
  • Ukulungiselelwa okunwetshiwe kusetshenziswa imiyalelo ye-AVX, AVX2 kanye ne-SSE;
  • Usekelo lwe-OpenMP lukhutshaziwe ngokuzenzakalelayo ngenxa izinkinga ngokukhiqiza;
  • Ukwesekwa okwengeziwe kohlu olumhlophe nolumnyama enjinini ye-LSTM;
  • Izikripthi zokwakha ezithuthukisiwe ezisekelwe ku-Cmake.

Source: opennet.ru

Engeza amazwana