Ukukhutshwa kwenkqubo ye-Tesseract 5.2 ye-optical text recognition system iye yapapashwa, ixhasa ukuqatshelwa kweempawu ze-UTF-8 kunye nemibhalo kwiilwimi ezingaphezu kwe-100, kuquka isiRashiya, isiKazakh, isiBelarusian kunye ne-Ukraine. Isiphumo sinokugcinwa kwisicatshulwa esicacileyo okanye kwi-HTML (hOCR), ALTO (XML), PDF kunye neefomathi ze-TSV. Inkqubo yaqala ngo-1985-1995 kwibhubhoratri ye-Hewlett Packard; kwi-2005, ikhowudi yavulwa phantsi kwelayisensi ye-Apache kwaye yaphuhliswa ngakumbi ngokuthatha inxaxheba kwabasebenzi bakaGoogle. Ikhowudi yomthombo weprojekthi ihanjiswa phantsi kwelayisensi ye-Apache 2.0.
I-Tesseract ibandakanya into eluncedo ye-console kunye nethala leencwadi le-libtesseract lokuzinzisa ukusebenza kwe-OCR kwezinye izicelo. Ujongano lweqela lesithathu lwe-GUI oluxhasa iTesseract lubandakanya i-gImageReader, VietOCR kunye neYAGF. Iinjini ezimbini zokuqaphela zinikezelwa: enye yeklasikhi eqaphela isicatshulwa kwinqanaba leepateni zomlinganiswa ngamnye, kunye nentsha esekelwe kusetyenziso lwenkqubo yokufunda ngomatshini esekelwe kwinethiwekhi ye-neural ye-LSTM ephindaphindiweyo, elungiselelwe ukuqaphela iintambo ezipheleleyo kunye nokuvumela ukuba ukwanda okubalulekileyo kokuchaneka. Imifuziselo esele yenziwe sele ipapashiwe ngeelwimi ezili-123. Ukwandisa ukusebenza, iimodyuli ezisebenzisa i-OpenMP kunye ne-SIMD imiyalelo AVX2, AVX, AVX512F, NEON okanye SSE4.1 zinikezelwa.
Uphuculo olukhulu kwiTesseract 5.2:
- ΠΠΎΠ±Π°Π²Π»Π΅Π½Ρ ΠΎΠΏΡΠΈΠΌΠΈΠ·Π°ΡΠΈΠΈ, ΡΠ΅Π°Π»ΠΈΠ·ΠΎΠ²Π°Π½Π½ΡΠ΅ Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ ΠΈΠ½ΡΡΡΡΠΊΡΠΈΠΉ Intel AVX512F.
- Π C API ΡΠ΅Π°Π»ΠΈΠ·ΠΎΠ²Π°Π½Π° ΡΡΠ½ΠΊΡΠΈΡ Π΄Π»Ρ ΠΈΠ½ΠΈΡΠΈΠ°Π»ΠΈΠ·Π°ΡΠΈΠΈ tesseract Ρ Π·Π°Π³ΡΡΠ·ΠΊΠΎΠΉ ΠΈΠ· ΠΏΠ°ΠΌΡΡΠΈ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΌΠ°ΡΠΈΠ½Π½ΠΎΠ³ΠΎ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ.
- ΠΠΎΠ±Π°Π²Π»Π΅Π½ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡ invert_threshold, ΠΎΠΏΡΠ΅Π΄Π΅Π»ΡΡΡΠΈΠΉ ΡΡΠΎΠ²Π΅Π½Ρ ΠΈΠ½Π²Π΅ΡΡΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΡΠ΅ΠΊΡΡΠΎΠ²ΡΡ ΡΡΡΠΎΠΊ. ΠΠΎ ΡΠΌΠΎΠ»ΡΠ°Π½ΠΈΡ Π²ΡΡΡΠ°Π²Π»Π΅Π½ΠΎ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ 0.7. ΠΠ»Ρ ΠΎΡΠΊΠ»ΡΡΠ΅Π½ΠΈΡ ΠΈΠ½Π²Π΅ΡΡΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΡΠ»Π΅Π΄ΡΠ΅Ρ Π²ΡΡΡΠ°Π²ΠΈΡΡ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ 0.
- ΠΠ°Π»Π°ΠΆΠ΅Π½Π° ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠ° ΠΎΡΠ΅Π½Ρ Π±ΠΎΠ»ΡΡΠΈΡ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΎΠ² Π½Π° 32-ΡΠ°Π·ΡΡΠ΄Π½ΡΡ Ρ ΠΎΡΡΠ°Ρ .
- ΠΡΡΡΠ΅ΡΡΠ²Π»ΡΠ½ ΠΏΠ΅ΡΠ΅Ρ ΠΎΠ΄ Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ ΡΡΠ½ΠΊΡΠΈΠΉ std::regex Π½Π° std::string.
- Π£Π»ΡΡΡΠ΅Π½Ρ ΡΠ±ΠΎΡΠΎΡΠ½ΡΠ΅ ΡΡΠ΅Π½Π°ΡΠΈΠΈ Π΄Π»Ρ Autotools, CMake ΠΈ ΡΠΈΡΡΠ΅ΠΌ Π½Π΅ΠΏΡΠ΅ΡΡΠ²Π½ΠΎΠΉ ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΠΈ.
umthombo: opennet.ru