Tso tawm cov ntawv nyeem qhov system Tesseract 5.1

Kev tso tawm ntawm Tesseract 5.1 optical text recognition system tau luam tawm, txhawb kev lees paub ntawm UTF-8 cov cim thiab cov ntawv hauv ntau dua 100 hom lus, suav nrog Lavxias, Kazakh, Belarusian thiab Ukrainian. Cov txiaj ntsig tuaj yeem khaws cia hauv cov ntawv nyeem dawb lossis hauv HTML (hOCR), ALTO (XML), PDF thiab TSV hom. Lub kaw lus tau tsim nyob rau hauv 1985-1995 hauv Hewlett Packard laboratory; hauv 2005, cov cai tau qhib raws li daim ntawv tso cai Apache thiab tau tsim ntxiv nrog kev koom tes ntawm Google cov neeg ua haujlwm. Qhov chaws ntawm qhov project yog muab faib raws li Apache 2.0 daim ntawv tso cai.

Tesseract suav nrog kev siv hluav taws xob console thiab lub tsev qiv ntawv libtesseract rau embedding OCR functionality rau lwm yam kev siv. Thib peb GUI interfaces uas txhawb Tesseract suav nrog gImageReader, VietOCR thiab YAGF. Muaj ob lub cav paub txog: ib qho classic uas lees paub cov ntawv nyob rau theem ntawm tus kheej tus cwj pwm qauv, thiab ib qho tshiab raws li kev siv tshuab kev kawm raws li LSTM recurrent neural network, optimized rau paub tag nrho cov hlua thiab tso cai rau ib tug nce qhov tseeb ntawm qhov tseeb. Cov qauv npaj ua tiav tau luam tawm rau 123 yam lus. Txhawm rau txhim kho kev ua tau zoo, cov qauv siv OpenMP thiab SIMD cov lus qhia AVX2, AVX, NEON lossis SSE4.1 muaj.

Kev txhim kho loj hauv Tesseract 5.1:

  • Lub peev xwm los ua cov cheeb tsam nrog cov duab thiab kab thaum tso tawm hauv ALTO, hOCR thiab cov ntawv nyeem tau raug siv.
  • Ntxiv tshiab parameter curl_timeout lkz curl_easy_setop.
  • Txhim kho qhov system.
  • Ua haujlwm tau ua kom tshem tawm cov lej tsis siv
  • Txhim kho kev sib tsoo tshwm sim los ntawm kev tuav tsis raug ntawm cov pointers tsis muaj nyob hauv PageIterator:: chav kawm qhia.

Tau qhov twg los: opennet.ru

Ntxiv ib saib