New optical text recognition system EasyOCR

project EasyOCR A new OCR system is being developed that supports more than 40 languages, including English, German, French, Japanese, Chinese, Korean, Uzbek, Azerbaijani and Lithuanian. Cyrillic-based languages ​​are not yet supported, but adding them to the list of plans. The code is written in Python using the framework PyTorch и spreads licensed under Apache 2.0. For loading provided ready-made models for languages ​​based on the Latin alphabet and hieroglyphs.

Machine learning methods are used to identify and recognize text in an image. A machine learning algorithm is used to determine the text CRAFT (Character-Region Awareness For Text) implementation for PyTorch, capable of highlighting text on arbitrary objects, including labels, information plates and road signs. A convolutional-recurrent neural network is used to recognize character sequences CRNN (Convolutional Recurrent Neural Network, combination of DCNN and RNN) and algorithm CTC BeamSearch CTC BeamSearch (Connectionist Temporal Classification) to decode neural network output into textual representation.

Source: opennet.ru

Add a comment