Do you need to quickly and automatically extract information from multiple documents? And are they, moreover, stored in the form of scans or photographs? You're in luck if you're an Amazon Web Services (AWS) customer. Amazon announced the opening of access to
According to Amazon, Textract is significantly more efficient than conventional optical character recognition systems. From files stored in an Amazon S3 bucket, it can extract the contents of fields and tables, taking into account the context in which this information is presented, for example, the system automatically highlights the names and social security numbers on tax forms or the totals of photographed receipts. As Amazon notes in
Textract saves results in JSON format annotated with page numbers, sections, form labels, and data types, and optionally integrates with database and analytics services such as Amazon Elasticsearch Service, Amazon DynamoDB, Amazon Athena, and machine learning products, such as Amazon Comprehend, Amazon Comprehend Medical, Amazon Translate, and Amazon SageMaker for post-processing. Alternatively, the extracted data can be transferred directly to third party cloud services for accounting and auditing compliance purposes or to support smart searches in document archives. According to Amazon, Textract can "accurately" process millions of pages of different documents in "just a few hours."
Many AWS customers already use Textract, including Globe and Mail, the UK National Weather Service, PricewaterhouseCoopers, Healthfirst, a non-profit managed care organization, and robotic process automation companies UiPath, Ripcord, and Blue Prism. Candor, a startup that aims to bring transparency to the mortgage industry, uses Textract to extract data from documents such as bank statements, pay stubs and various tax documents to speed up the loan approval process for its clients.
βThe power of Amazon Textract is that it accurately extracts textual and structured data from virtually any document without the need for prior machine learning,β said Swami Sivasubramanian, vice president of Amazon Machine Learning. "In addition to integrating with other AWS services, the large community growing around Amazon Textract enables our customers to get real value out of their file collections, work more efficiently, improve security compliance, automate data entry, and accelerate business decisions."
Below you can watch Textract's presentation at re:Invent 2018 in English.
Source: 3dnews.ru