Amazon launches cloud-based document recognition service

Do you need to quickly and automatically extract information from multiple documents? And are they, moreover, stored in the form of scans or photographs? You're in luck if you're an Amazon Web Services (AWS) customer. Amazon announced the opening of access to Textract, a cloud-based and fully managed service that uses machine learning to analyze tables, text forms, and entire pages of text in popular electronic formats. For now, it will only be available in select AWS regions, specifically the US East (Ohio and Northern Virginia), US West (Oregon), and EU (Ireland), with Textract going public next year.

Amazon launches cloud-based document recognition service

According to Amazon, Textract is significantly more efficient than conventional optical character recognition systems. From files stored in an Amazon S3 bucket, it can extract the contents of fields and tables, taking into account the context in which this information is presented, for example, the system automatically highlights the names and social security numbers on tax forms or the totals of photographed receipts. As Amazon notes in press release, Textract supports image formats such as scans, PDFs, and photos, and works efficiently with context in documents specific to financial services, insurance, and healthcare.

Textract saves results in JSON format annotated with page numbers, sections, form labels, and data types, and optionally integrates with database and analytics services such as Amazon Elasticsearch Service, Amazon DynamoDB, Amazon Athena, and machine learning products, such as Amazon Comprehend, Amazon Comprehend Medical, Amazon Translate, and Amazon SageMaker for post-processing. Alternatively, the extracted data can be transferred directly to third party cloud services for accounting and auditing compliance purposes or to support smart searches in document archives. According to Amazon, Textract can "accurately" process millions of pages of different documents in "just a few hours."

Many AWS customers already use Textract, including Globe and Mail, the UK National Weather Service, PricewaterhouseCoopers, Healthfirst, a non-profit managed care organization, and robotic process automation companies UiPath, Ripcord, and Blue Prism. Candor, a startup that aims to bring transparency to the mortgage industry, uses Textract to extract data from documents such as bank statements, pay stubs and various tax documents to speed up the loan approval process for its clients.

β€œThe power of Amazon Textract is that it accurately extracts textual and structured data from virtually any document without the need for prior machine learning,” said Swami Sivasubramanian, vice president of Amazon Machine Learning. "In addition to integrating with other AWS services, the large community growing around Amazon Textract enables our customers to get real value out of their file collections, work more efficiently, improve security compliance, automate data entry, and accelerate business decisions."

Below you can watch Textract's presentation at re:Invent 2018 in English.



Source: 3dnews.ru

Add a comment