Kutulutsidwa kwa dongosolo lozindikiritsa zolemba Tesseract 5.3.4

Kutulutsidwa kwa kachitidwe ka Tesseract 5.3.4 optical text recognition system kwasindikizidwa, kuthandizira kuzindikira zilembo za UTF-8 ndi zolemba m'zilankhulo zopitilira 100, kuphatikiza Chirasha, Chikazakh, Chibelarusi ndi Chiyukireniya. Zotsatira zitha kusungidwa m'mawu osavuta kapena HTML (hOCR), ALTO (XML), PDF ndi TSV. Dongosololi lidapangidwa koyambirira mu 1985-1995 mu labotale ya Hewlett Packard; mu 2005, code idatsegulidwa pansi pa layisensi ya Apache ndipo idapangidwanso mothandizidwa ndi ogwira ntchito ku Google. Khodi yoyambira polojekitiyi imagawidwa pansi pa layisensi ya Apache 2.0.

Tesseract imaphatikizapo chida chothandizira komanso laibulale ya libtesseract yophatikizira magwiridwe antchito a OCR muzinthu zina. Ma GUI a chipani chachitatu omwe amathandizira Tesseract akuphatikiza gImageReader, VietOCR ndi YAGF. Injini ziwiri zozindikiritsa zimaperekedwa: yachikale yomwe imazindikira zolemba pamlingo wa mawonekedwe amunthu, ndi yatsopano kutengera kugwiritsa ntchito makina ophunzirira makina otengera LSTM recurrent neural network, yokometsedwa kuzindikira zingwe zonse ndikuloleza kuwonjezeka kwakukulu kwa kulondola. Zitsanzo zokonzedwa kale zasindikizidwa m'zinenero 123. Kuti muwongolere magwiridwe antchito, ma modules pogwiritsa ntchito malangizo a OpenMP ndi SIMD AVX2, AVX, AVX512F, NEON kapena SSE4.1 amaperekedwa.

Kusintha kwakukulu:

  • Kuzindikiridwa bwino kwa zithunzi ndi URL ndikutsitsa mafayilo pogwiritsa ntchito laibulale ya libcurl. Mukatsitsa, mutu wa User-Agent umayikidwa. Onjezani parameter yatsopano curl_cookiefile yogwiritsira ntchito fayilo ya cookie.
  • Seva ya ScrollView imagwiritsa ntchito TCP ngati protocol yomwe imakonda.
  • Mukamagwiritsa ntchito lamulo la "combine_tessdata -d", zotuluka zimaperekedwa ku stdout m'malo mwa stderr.
  • Konzani zovuta zomanga mukamagwiritsa ntchito autoconf ndi clang.

Source: opennet.ru

Kuwonjezera ndemanga