Ukukhishwa kwesistimu yokuqaphela umbhalo i-Tesseract 5.1

Ukukhishwa kohlelo lokuqaphela umbhalo we-Tesseract 5.1 kushicilelwe, okusekela ukuqashelwa kwezinhlamvu ze-UTF-8 nemibhalo ngezilimi ezingaphezu kuka-100, okuhlanganisa isiRashiya, isiKazakh, isiBelarusian nesi-Ukrainian. Umphumela ungagcinwa ngombhalo ongenalutho noma ngefomethi ye-HTML (hOCR), ALTO (XML), PDF kanye ne-TSV. Uhlelo lwaqalwa ngo-1985-1995 elabhorethri ye-Hewlett Packard; ngo-2005, ikhodi yavulwa ngaphansi kwelayisensi ye-Apache futhi yathuthukiswa futhi ngokubamba iqhaza kwabasebenzi bakwa-Google. Ikhodi yomthombo yephrojekthi isatshalaliswa ngaphansi kwelayisensi ye-Apache 2.0.

I-Tesseract ihlanganisa insiza yekhonsoli kanye nelabhulali ye-libtesseract yokushumeka ukusebenza kwe-OCR kwezinye izinhlelo zokusebenza. Izixhumanisi ze-GUI zenkampani yangaphandle ezisekela i-Tesseract zifaka i-gImageReader, i-VietOCR ne-YAGF. Kuhlinzekwa izinjini ezimbili zokuqaphela: eyakudala ebona umbhalo ezingeni lamaphethini omlingiswa ngamunye, nentsha esekelwe ekusetshenzisweni kwesistimu yokufunda yomshini esekelwe kunethiwekhi ye-neural eqhubekayo ye-LSTM, elungiselelwe ukubona wonke amayunithi ezinhlamvu kanye nokuvumela ukwanda okuphawulekayo kokunemba. Amamodeli asevele enziwe aqeqeshiwe ashicilelwe ngezilimi eziyi-123. Ukuze kuthuthukiswe ukusebenza, amamojula asebenzisa imiyalelo ye-OpenMP ne-SIMD ethi AVX2, AVX, NEON noma SSE4.1.

Ukuthuthukiswa okukhulu ku-Tesseract 5.1:

  • Π Π΅Π°Π»ΠΈΠ·ΠΎΠ²Π°Π½Π° Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎΡΡ‚ΡŒ ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ областСй с изобраТСниями ΠΈ линиями ΠΏΡ€ΠΈ Π²Ρ‹Π²ΠΎΠ΄Π΅ Π² Ρ„ΠΎΡ€ΠΌΠ°Ρ‚Π°Ρ… ALTO, hOCR ΠΈ text.
  • Π”ΠΎΠ±Π°Π²Π»Π΅Π½ Π½ΠΎΠ²Ρ‹ΠΉ ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ curl_timeout lkz curl_easy_setop.
  • Isistimu yokwakha ethuthukisiwe.
  • ΠŸΡ€ΠΎΠ²Π΅Π΄Π΅Π½Π° Ρ€Π°Π±ΠΎΡ‚Π° ΠΏΠΎ ΡƒΠ΄Π°Π»Π΅Π½ΠΈΡŽ Π½Π΅ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌΠΎΠ³ΠΎ ΠΊΠΎΠ΄Π°
  • УстранСны сбои, Π²Ρ‹Π·Π²Π°Π½Π½Ρ‹Π΅ Π½Π΅ΠΊΠΎΡ€Ρ€Π΅ΠΊΡ‚Π½ΠΎΠΉ ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΎΠΉ Π½ΡƒΠ»Π΅Π²Ρ‹Ρ… ΡƒΠΊΠ°Π·Π°Ρ‚Π΅Π»Π΅ΠΉ Π² классС PageIterator::Orientation.

Source: opennet.ru

Engeza amazwana