Ukukhutshwa kwenkqubo yokuqaphela umbhalo iTesseract 5.2

Ukukhutshwa kwenkqubo ye-Tesseract 5.2 ye-optical text recognition system iye yapapashwa, ixhasa ukuqatshelwa kweempawu ze-UTF-8 kunye nemibhalo kwiilwimi ezingaphezu kwe-100, kuquka isiRashiya, isiKazakh, isiBelarusian kunye ne-Ukraine. Isiphumo sinokugcinwa kwisicatshulwa esicacileyo okanye kwi-HTML (hOCR), ALTO (XML), PDF kunye neefomathi ze-TSV. Inkqubo yaqala ngo-1985-1995 kwibhubhoratri ye-Hewlett Packard; kwi-2005, ikhowudi yavulwa phantsi kwelayisensi ye-Apache kwaye yaphuhliswa ngakumbi ngokuthatha inxaxheba kwabasebenzi bakaGoogle. Ikhowudi yomthombo weprojekthi ihanjiswa phantsi kwelayisensi ye-Apache 2.0.

I-Tesseract ibandakanya into eluncedo ye-console kunye nethala leencwadi le-libtesseract lokuzinzisa ukusebenza kwe-OCR kwezinye izicelo. Ujongano lweqela lesithathu lwe-GUI oluxhasa iTesseract lubandakanya i-gImageReader, VietOCR kunye neYAGF. Iinjini ezimbini zokuqaphela zinikezelwa: enye yeklasikhi eqaphela isicatshulwa kwinqanaba leepateni zomlinganiswa ngamnye, kunye nentsha esekelwe kusetyenziso lwenkqubo yokufunda ngomatshini esekelwe kwinethiwekhi ye-neural ye-LSTM ephindaphindiweyo, elungiselelwe ukuqaphela iintambo ezipheleleyo kunye nokuvumela ukuba ukwanda okubalulekileyo kokuchaneka. Imifuziselo esele yenziwe sele ipapashiwe ngeelwimi ezili-123. Ukwandisa ukusebenza, iimodyuli ezisebenzisa i-OpenMP kunye ne-SIMD imiyalelo AVX2, AVX, AVX512F, NEON okanye SSE4.1 zinikezelwa.

Uphuculo olukhulu kwiTesseract 5.2:

  • Π”ΠΎΠ±Π°Π²Π»Π΅Π½Ρ‹ ΠΎΠΏΡ‚ΠΈΠΌΠΈΠ·Π°Ρ†ΠΈΠΈ, Ρ€Π΅Π°Π»ΠΈΠ·ΠΎΠ²Π°Π½Π½Ρ‹Π΅ с использованиСм инструкций Intel AVX512F.
  • Π’ C API Ρ€Π΅Π°Π»ΠΈΠ·ΠΎΠ²Π°Π½Π° функция для ΠΈΠ½ΠΈΡ†ΠΈΠ°Π»ΠΈΠ·Π°Ρ†ΠΈΠΈ tesseract с Π·Π°Π³Ρ€ΡƒΠ·ΠΊΠΎΠΉ ΠΈΠ· памяти ΠΌΠΎΠ΄Π΅Π»ΠΈ машинного обучСния.
  • Π”ΠΎΠ±Π°Π²Π»Π΅Π½ ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ invert_threshold, ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΡΡŽΡ‰ΠΈΠΉ ΡƒΡ€ΠΎΠ²Π΅Π½ΡŒ инвСртирования тСкстовых строк. По ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ выставлСно Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ 0.7. Для ΠΎΡ‚ΠΊΠ»ΡŽΡ‡Π΅Π½ΠΈΡ инвСртирования слСдуСт Π²Ρ‹ΡΡ‚Π°Π²ΠΈΡ‚ΡŒ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ 0.
  • НалаТСна ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠ° ΠΎΡ‡Π΅Π½ΡŒ Π±ΠΎΠ»ΡŒΡˆΠΈΡ… Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΎΠ² Π½Π° 32-разрядных хостах.
  • ΠžΡΡƒΡ‰Π΅ΡΡ‚Π²Π»Ρ‘Π½ ΠΏΠ΅Ρ€Π΅Ρ…ΠΎΠ΄ с использования Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΉ std::regex Π½Π° std::string.
  • Π£Π»ΡƒΡ‡ΡˆΠ΅Π½Ρ‹ сборочныС сцСнарии для Autotools, CMake ΠΈ систСм Π½Π΅ΠΏΡ€Π΅Ρ€Ρ‹Π²Π½ΠΎΠΉ ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Ρ†ΠΈΠΈ.

    umthombo: opennet.ru

Yongeza izimvo