Amazon Textract logo

Amazon Textract

Amazon Textract

UpFreeOpen Sourcecloudby Amazon Web Services62· JavaScript· MIT

Amazon Textract is a document OCR and structured extraction service — recognizes forms, tables, and signatures far beyond traditional OCR.

Visit site ↗Source ↗Health checked 9h ago
Use it when

Tables preserved with row/column structure

Watch for

Handwriting accuracy lags printed text

First check

Call Textract.analyzeDocument (sync, small docs) or startDocumentAnalysis (async, large PDFs). Results return as a tree of blocks (page, line, word, key, value, cell, etc.).

Auth
api_key
CORS
No
HTTPS
Yes
Signup
Required
Latency
10 ms
Protocol
REST
Pricing
paid
Stars
62

Uptime · 30-day window

Probes: 1Uptime: 100%Avg latency: 10ms

GitHub activity

62JavaScriptMIT17 open issuesLast commit 110d ago
01

About this API

Textract's key advantage over traditional OCR is structured output. Plain OCR turns images into plain text; Textract additionally tells you which row/column of which table the text sits in, which key maps to which value, and which lines form a paragraph. That makes invoice, form, and contract analysis tractable without elaborate post-processing.

It comes in several modes. DetectDocumentText is plain OCR. AnalyzeDocument adds FORMS (key/value), TABLES, and SIGNATURES. AnalyzeExpense handles invoices and receipts. AnalyzeID handles IDs, passports, and driver's licenses. Every mode returns a block tree — developers walk the tree to reassemble business structure (e.g. stitch table cells into a 2D array).

Asian-language support including Chinese is limited; Textract serves English-document workflows best. For high-volume Chinese document processing, evaluate Chinese-cloud OCR services in parallel.

02

What you can build

  • 1Automate invoice and receipt entry
  • 2Extract structured data from PDF tables
  • 3Bulk-parse resumes
  • 4Pull info from KYC documents
03

Strengths & limitations

Strengths

  • Tables preserved with row/column structure
  • Forms parsed as key/value pairs
  • Dedicated Invoices / Receipts analysis modes

Limitations

  • Handwriting accuracy lags printed text
  • Limited support for Asian languages including Chinese
  • Per-page pricing — high-volume document processing is expensive
04

Example request

Generic template — replace <endpoint> with the real path from the docs.
curl https://github.com/mermade/aws2openapi/<endpoint> \
  -H "Authorization: Bearer $API_KEY"
# Some providers use X-Api-Key instead — verify in the docs.
05

Getting started

Call Textract.analyzeDocument (sync, small docs) or startDocumentAnalysis (async, large PDFs). Results return as a tree of blocks (page, line, word, key, value, cell, etc.).

06

FAQ

How is it priced?+

Per page — plain OCR ~$1.50/1000 pages; FORMS/TABLES mode ~$15–50/1000. Specialized modes (AnalyzeExpense) cost more.

Does it work for Chinese?+

Limited. Printed Chinese is sometimes recognized but accuracy drops for complex layouts and tables compared with English.

How do I handle large PDFs?+

Use startDocumentAnalysis as an async job with an SNS notification. Results paginate — iterate NextToken to fetch everything.

07

Technical details

CORS: NoHTTPS: YesSignup: YesOpen source: Yes
Auth type
api_key
Pricing
paid
Protocols
REST
SDKs
python, javascript, java, go, ruby, csharp
Response time
10 ms
Last health check
5/12/2026, 7:36:34 AM
08

Endpoints

Parsed from the OpenAPI spec. Showing 12 of 13 non-deprecated endpoints.

POST
/#X-Amz-Target=Textract.AnalyzeDocument
X-Amz-Target:header*
POST
/#X-Amz-Target=Textract.AnalyzeExpense
X-Amz-Target:header*
POST
/#X-Amz-Target=Textract.AnalyzeID
X-Amz-Target:header*
POST
/#X-Amz-Target=Textract.DetectDocumentText
X-Amz-Target:header*
POST
/#X-Amz-Target=Textract.GetDocumentAnalysis
X-Amz-Target:header*
POST
/#X-Amz-Target=Textract.GetDocumentTextDetection
X-Amz-Target:header*
POST
/#X-Amz-Target=Textract.GetExpenseAnalysis
X-Amz-Target:header*
POST
/#X-Amz-Target=Textract.GetLendingAnalysis
X-Amz-Target:header*
POST
/#X-Amz-Target=Textract.GetLendingAnalysisSummary
X-Amz-Target:header*
POST
/#X-Amz-Target=Textract.StartDocumentAnalysis
X-Amz-Target:header*
POST
/#X-Amz-Target=Textract.StartDocumentTextDetection
X-Amz-Target:header*
POST
/#X-Amz-Target=Textract.StartExpenseAnalysis
X-Amz-Target:header*

1 more endpoints not shown. See the OpenAPI spec for the full list.

09

Tags

10

More from Amazon Web Services