Kutulutsidwa kwa dongosolo lozindikiritsa zolemba Tesseract 4.1
Zokonzekera kutulutsidwa kwa optical text recognition system Tesseract 4.1, kuthandizira kuzindikira zilembo za UTF-8 ndi malemba m'zinenero zoposa 100, kuphatikizapo Chirasha, Chikazakh, Chibelarusi ndi Chiyukireniya. Zotsatira zitha kusungidwa m'mawu osavuta kapena HTML (hOCR), ALTO (XML), PDF ndi TSV. Dongosololi lidapangidwa koyambirira mu 1985-1995 mu labotale ya Hewlett Packard; mu 2005, code idatsegulidwa pansi pa layisensi ya Apache ndipo idapangidwanso mothandizidwa ndi ogwira ntchito ku Google. Magwero a polojekiti kufalitsa zololedwa pansi pa Apache 2.0.
Tesseract imaphatikizapo chida chothandizira komanso laibulale ya libtesseract yophatikizira magwiridwe antchito a OCR muzinthu zina. Kuchokera kumagulu ena omwe amathandizira Tesseract GUI zolumikizira mukhoza kuzindikira gImageReader, VietnamOCR ΠΈ YAGF. Injini ziwiri zozindikiritsa zimaperekedwa: yachikale yomwe imazindikira zolemba pamlingo wa mawonekedwe amunthu aliyense, ndi yatsopano kutengera kugwiritsa ntchito makina ophunzirira makina ozikidwa pa LSTM recurrent neural network, yokonzedwa kuti izindikire zingwe zonse ndikulola kuwonjezeka kwakukulu kwa kulondola. Zitsanzo zokonzedwa kale zimasindikizidwa 123 zilankhulo. Kuti muwongolere magwiridwe antchito, ma modules pogwiritsa ntchito OpenMP ndi AVX2, AVX kapena SSE4.1 SIMD malangizo amaperekedwa.