Kuburitswa kweiyo text recognition system Tesseract 5.2

Kuburitswa kweTesseract 5.2 optical text recognition system yakaburitswa, ichitsigira kucherechedzwa kwemavara eUTF-8 uye zvinyorwa mumitauro inodarika zana, kusanganisira chiRussian, Kazakh, Belarusian neUkraine. Mhedzisiro yacho inogona kuchengetwa mumavara akajeka kana muHTML (hOCR), ALTO (XML), PDF uye TSV mafomati. Iyo sisitimu yakatanga kugadzirwa muna 100-1985 murabhoritari yeHewlett Packard; muna 1995, iyo kodhi yakavhurwa pasi perezinesi reApache uye yakagadziridzwa zvakare nekutora chikamu kwevashandi veGoogle. Iyo kodhi kodhi yeprojekiti yakagoverwa pasi peiyo Apache 2005 rezinesi.

Tesseract inosanganisira koni yekushandisa uye libtesseract raibhurari yekumisikidza OCR mashandiro mune mamwe maapplication. Yechitatu-bato GUI inopindirana inotsigira Tesseract inosanganisira gImageReader, VietOCR uye YAGF. Injini mbiri dzekuzivikanwa dzinopihwa: yemhando yepamusoro inoziva zvinyorwa padanho rematanho emunhu ega, uye imwe nyowani yakavakirwa pakushandiswa kwemuchina wekudzidza system yakavakirwa pane LSTM inodzokororwa neural network, yakagadziridzwa yekuziva tambo dzese uye kubvumira kuwedzera kukuru kwechokwadi. Mamodheru akagadzirwa akadzidziswa akatsikiswa mumitauro 123. Kukwenenzvera kuita, mamodule anoshandisa OpenMP uye SIMD mirairo AVX2, AVX, AVX512F, NEON kana SSE4.1 inopihwa.

Kuvandudza kukuru muTesseract 5.2:

  • Yakawedzerwa optimizations inoshandiswa uchishandisa Intel AVX512F mirairo.
  • Iyo C API inoshandisa basa rekutanga tesseract nekurodha modhi yekudzidza yemuchina kubva mundangariro.
  • Yakawedzera invert_threshold parameter, iyo inotaridza mwero wekushandurwa kwemavara tambo. Iko kukosha kweiyo 0.7. Kudzima inversion, isa kukosha ku0.
  • Kuvandudzwa kwekugadzirisa kwemagwaro makuru kwazvo pa32-bit host.
  • Shanduko yaitwa kubva pakushandisa std::regex mabasa kuenda ku std::tambo.
  • Yakavandudzwa kuvaka zvinyorwa zve Autotools, CMake uye inoenderera mberi yekubatanidza masisitimu.

    Source: opennet.ru

Voeg