Amazon Textract

UpFreeOpen Sourcecloud by Amazon Web Services★62· JavaScript· MIT

Amazon Textract is a document OCR and structured extraction service — recognizes forms, tables, and signatures far beyond traditional OCR.

Visit site ↗Source ↗Health checked 12h ago

Use it when

Tables preserved with row/column structure

Watch for

Handwriting accuracy lags printed text

First check

Call Textract.analyzeDocument (sync, small docs) or startDocumentAnalysis (async, large PDFs). Results return as a tree of blocks (page, line, word, key, value, cell, etc.).

Auth

api_key

CORS

HTTPS

Yes

Signup

Required

Latency

42 ms

Protocol

REST

Pricing

paid

Stars

Uptime · 30-day window

Probes: 30Uptime: 100%Avg latency: 48ms

GitHub activity

★ 62JavaScriptMIT17 open issuesLast commit 155d ago

About this API

Textract's key advantage over traditional OCR is structured output. Plain OCR turns images into plain text; Textract additionally tells you which row/column of which table the text sits in, which key maps to which value, and which lines form a paragraph. That makes invoice, form, and contract analysis tractable without elaborate post-processing.

It comes in several modes. DetectDocumentText is plain OCR. AnalyzeDocument adds FORMS (key/value), TABLES, and SIGNATURES. AnalyzeExpense handles invoices and receipts. AnalyzeID handles IDs, passports, and driver's licenses. Every mode returns a block tree — developers walk the tree to reassemble business structure (e.g. stitch table cells into a 2D array).

Asian-language support including Chinese is limited; Textract serves English-document workflows best. For high-volume Chinese document processing, evaluate Chinese-cloud OCR services in parallel.

What you can build

1Automate invoice and receipt entry
2Extract structured data from PDF tables
3Bulk-parse resumes
4Pull info from KYC documents

Strengths & limitations

Strengths

Tables preserved with row/column structure
Forms parsed as key/value pairs
Dedicated Invoices / Receipts analysis modes

Limitations

Handwriting accuracy lags printed text
Limited support for Asian languages including Chinese
Per-page pricing — high-volume document processing is expensive

Getting started

Call Textract.analyzeDocument (sync, small docs) or startDocumentAnalysis (async, large PDFs). Results return as a tree of blocks (page, line, word, key, value, cell, etc.).

FAQ

How is it priced?+

Per page — plain OCR ~$1.50/1000 pages; FORMS/TABLES mode ~$15–50/1000. Specialized modes (AnalyzeExpense) cost more.

Does it work for Chinese?+

Limited. Printed Chinese is sometimes recognized but accuracy drops for complex layouts and tables compared with English.

How do I handle large PDFs?+

Use startDocumentAnalysis as an async job with an SNS notification. Results paginate — iterate NextToken to fetch everything.

Technical details

CORS: NoHTTPS: YesSignup: YesOpen source: Yes

Auth type: api_key
Pricing: paid
Protocols: REST
SDKs: python, javascript, java, go, ruby, csharp
Response time: 42 ms
Last health check: 6/26/2026, 6:22:16 AM

Endpoints

Parsed from the OpenAPI spec. Showing 12 of 13 non-deprecated endpoints.

POST

/#X-Amz-Target=Textract.AnalyzeDocument

X-Amz-Target:header*

POST

/#X-Amz-Target=Textract.AnalyzeExpense

X-Amz-Target:header*

POST

/#X-Amz-Target=Textract.AnalyzeID

X-Amz-Target:header*

POST

/#X-Amz-Target=Textract.DetectDocumentText

X-Amz-Target:header*

POST

/#X-Amz-Target=Textract.GetDocumentAnalysis

X-Amz-Target:header*

POST

/#X-Amz-Target=Textract.GetDocumentTextDetection

X-Amz-Target:header*

POST

/#X-Amz-Target=Textract.GetExpenseAnalysis

X-Amz-Target:header*

POST

/#X-Amz-Target=Textract.GetLendingAnalysis

X-Amz-Target:header*

POST

/#X-Amz-Target=Textract.GetLendingAnalysisSummary

X-Amz-Target:header*

POST

/#X-Amz-Target=Textract.StartDocumentAnalysis

X-Amz-Target:header*

POST

/#X-Amz-Target=Textract.StartDocumentTextDetection

X-Amz-Target:header*

POST

/#X-Amz-Target=Textract.StartExpenseAnalysis

X-Amz-Target:header*

1 more endpoints not shown. See the OpenAPI spec for the full list.

More from Amazon Web Services

View all from Amazon Web Services →

Access Analyzer

★ 62

AWS IAM Access Analyzer API analyzes IAM resource policies for over-privileged access or external access — proactively surfaces security risks.

Alexa For Business

★ 62

Alexa for Business helps you use Alexa in your organization.

Amazon API Gateway

★ 62

Amazon API Gateway helps developers deliver robust, secure, and scalable mobile and web application back ends.

Amazon AppConfig

★ 62

Use AppConfig, a capability of Amazon Web Services Systems Manager, to create, manage, and quickly deploy application configurations.

Amazon Appflow

★ 62

Welcome to the Amazon AppFlow API reference.

Amazon AppIntegrations Service

★ 62

The Amazon AppIntegrations service enables you to configure and reuse connections to external applications.

Amazon AppStream

★ 62

Amazon AppStream 2.0 API Reference.

Amazon Athena

★ 62

Amazon Athena is an interactive query service that lets you use standard SQL to analyze data directly in Amazon S3.

Uptime · 30-day window

GitHub activity

About this API

What you can build

Strengths & limitations

Strengths

Limitations

Getting started

FAQ

Technical details

Endpoints

Tags

More from Amazon Web Services

Related APIs