Ukukhutshwa kwenkqubo yokuqaphela umbhalo iTesseract 4.1

Ilungisiwe ukukhululwa kwenkqubo yokuqaphela umbhalo obonakalayo ITesseract 4.1, ukuxhasa ukuqatshelwa kweempawu ze-UTF-8 kunye nemibhalo kwiilwimi ezingaphezu kwe-100, kuquka isiRashiya, isiKazakh, isiBelarusian kunye nesiUkraine. Isiphumo sinokugcinwa kwisicatshulwa esicacileyo okanye kwi-HTML (hOCR), ALTO (XML), PDF kunye neefomathi ze-TSV. Inkqubo yaqala ngo-1985-1995 kwibhubhoratri ye-Hewlett Packard; kwi-2005, ikhowudi yavulwa phantsi kwelayisensi ye-Apache kwaye yaphuhliswa ngakumbi ngokuthatha inxaxheba kwabasebenzi bakaGoogle. Imithombo yeprojekthi usasazeko ilayisenisi phantsi kweApache 2.0.

I-Tesseract ibandakanya into eluncedo ye-console kunye nethala leencwadi le-libtesseract lokuzinzisa ukusebenza kwe-OCR kwezinye izicelo. Kumaqela esithathu axhasa iTesseract Ujongano lwe-GUI unokuqaphela GImageReader, VietnamOCR ΠΈ YAGF. Iinjini ezimbini zokuqaphela zinikezelwa: enye yeklasikhi eqaphela isicatshulwa kwinqanaba leepateni zomlinganiswa ngamnye, kunye nentsha esekelwe kusetyenziso lwenkqubo yokufunda ngomatshini esekelwe kwinethiwekhi ye-neural ye-LSTM ephindaphindiweyo, elungiselelwe ukuqaphela iintambo ezipheleleyo kunye nokuvumela ukuba ukwanda okubalulekileyo kokuchaneka. Imifuziselo esele yenziwe sele ipapashiwe Iilwimi ezingama-123. Ukwandisa ukusebenza, iimodyuli ezisebenzisa i-OpenMP kunye ne-AVX2, i-AVX okanye i-SSE4.1 imiyalelo ye-SIMD iyanikezelwa.

Siseko ukuphucula kwiTesseract 4.1:

  • Kongezwe ukukwazi ukuvelisa kwifomathi ye-XML ALTO (ULwakhiwo oluHlalutyiweyo kunye nenjongo yokubhaliweyo). Ukusebenzisa le fomati, kufuneka usebenzise isicelo njenge "tessaract image_name alto output_dir";
  • Ukongeza iimodyuli ezintsha zokunikezela i-LSTMBox kunye ne-WordStrBox, ukwenza lula uqeqesho lwe-injini;
  • Inkxaso eyongeziweyo ye-pseudographics kwimveliso ye-hOCR (HTML);
  • Ukongezwa ezinye izikripthi ezibhalwe kwiPython zokuqeqesha injini ngokusekelwe ekufundeni koomatshini;
  • Ukwandiswa kwe-optimizations usebenzisa i-AVX, i-AVX2 kunye nemiyalelo ye-SSE;
  • Inkxaso ye-OpenMP ivaliwe ngokungagqibekanga ngenxa ye iingxaki ngemveliso;
  • Inkxaso eyongeziweyo yoluhlu olumhlophe nolumnyama kwi-injini ye-LSTM;
  • Izikripthi zokwakha eziphuculweyo ezisekwe kwiCmake.

umthombo: opennet.ru

Yongeza izimvo