Ukukhishwa kwesistimu yokuqaphela umbhalo i-Tesseract 5.3.4

Ukukhishwa kohlelo lokuqaphela umbhalo we-Tesseract 5.3.4 kushicilelwe, okusekela ukuqashelwa kwezinhlamvu ze-UTF-8 nemibhalo ngezilimi ezingaphezu kuka-100, okuhlanganisa isiRashiya, isiKazakh, isiBelarusian nesi-Ukrainian. Umphumela ungagcinwa ngombhalo ongenalutho noma ngefomethi ye-HTML (hOCR), ALTO (XML), PDF kanye ne-TSV. Uhlelo lwaqalwa ngo-1985-1995 elabhorethri ye-Hewlett Packard; ngo-2005, ikhodi yavulwa ngaphansi kwelayisensi ye-Apache futhi yathuthukiswa futhi ngokubamba iqhaza kwabasebenzi bakwa-Google. Ikhodi yomthombo yephrojekthi isatshalaliswa ngaphansi kwelayisensi ye-Apache 2.0.

I-Tesseract ihlanganisa insiza yekhonsoli kanye nelabhulali ye-libtesseract yokushumeka ukusebenza kwe-OCR kwezinye izinhlelo zokusebenza. Izixhumanisi ze-GUI zenkampani yangaphandle ezisekela i-Tesseract zifaka i-gImageReader, i-VietOCR ne-YAGF. Kuhlinzekwa izinjini ezimbili zokuqaphela: eyakudala ebona umbhalo ezingeni lamaphethini omlingiswa ngamunye, kanye nentsha esekelwe ekusetshenzisweni kwesistimu yokufunda yomshini esekelwe kunethiwekhi ye-neural eqhubekayo ye-LSTM, elungiselelwe ukubona amayunithi ezinhlamvu wonke kanye nokuvumela ukwanda okuphawulekayo kokunemba. Amamodeli asevele enziwe aqeqeshiwe ashicilelwe ngezilimi eziyi-123. Ukuze uthuthukise ukusebenza, amamojula asebenzisa imiyalelo ye-OpenMP ne-SIMD AVX2, AVX, AVX512F, NEON noma SSE4.1.

Ukuthuthukiswa okuyinhloko:

  • Ukubonwa kwesithombe okuthuthukisiwe nge-URL ngokulanda ifayela kusetshenziswa ilabhulali ye-libcurl. Lapho ulayisha, unhlokweni Womenzeli Womsebenzisi uyasethwa. Kwengezwe ipharamitha entsha curl_cookiefile yokusebenzisa ifayela lekhukhi.
  • Iseva ye-ScrollView isebenzisa i-TCP njengephrothokholi yayo ethandwayo.
  • Uma usebenzisa umyalo othi "combine_tessdata -d", okukhiphayo kunikezwa ku-stdout esikhundleni se-stderr.
  • Kulungiswe izinkinga zokwakha uma usebenzisa i-autoconf ne-clang.

Source: opennet.ru

Engeza amazwana