Cloud Document AI API
Cloud Document AI API
Google Document AI API processes documents with ML — OCR, form parsing, contract data extraction, invoice extraction — pre-trained + custom models.
Pre-trained processors cover common types: invoice, ID, receipt
Per-page pricing — high-volume cost adds up
Create a Document AI processor in GCP Console (pre-trained or custom). POST /v1/projects/{project}/locations/{location}/processors/{processor}:process with the document.
Uptime · 30-day window
About this API
Document AI is GCP's Intelligent Document Processing (IDP) product, positioned as "smarter than generic OCR" — not just image-to-text but understanding document structure and semantics, extracting key-value pairs, tables, signature locations, field meanings. The product is structured as "processors" — each processor is an ML model for a specific document type: Invoice Parser, ID Document Parser, Form Parser, Contract Parser. Pre-trained processors work out of the box with high quality for common documents. For complex scenarios (industry-specific forms), train custom processors in Document AI Workbench — just label a few dozen samples in the UI. Compared to AWS Textract and Azure Form Recognizer, Document AI has industry-leading accuracy in some scenarios (e.g. invoice field extraction).
What you can build
- 1Auto-extract invoice amount/account/due date
- 2ID/passport OCR + field recognition
- 3Extract key contract terms (amount, effective date)
- 4Bank statement auto-ingestion
Strengths & limitations
Strengths
- Pre-trained processors cover common types: invoice, ID, receipt
- Workbench tool supports custom model training
- Form Parser handles general forms well
Limitations
- Per-page pricing — high-volume cost adds up
- Specialty forms (e.g. health-insurance forms) require custom training
Example request
curl https://google.com/<endpoint>Getting started
Create a Document AI processor in GCP Console (pre-trained or custom). POST /v1/projects/{project}/locations/{location}/processors/{processor}:process with the document.
FAQ
How is Chinese document support?+
Generic OCR and Form Parser work well for Chinese. Some pre-trained processors (like Invoice Parser) target English primarily — train custom processors for Chinese scenarios.
Document AI vs. Cloud Vision OCR?+
Vision OCR is for "just text". Document AI is for "understand structure and field meaning". Vision OCR is 5-10x cheaper.
Technical details
- Auth type
- unknown
- Pricing
- unknown
- Protocols
- REST
- SDKs
- python, javascript, go, java
- Response time
- 43 ms
- Last health check
- 5/12/2026, 7:37:31 AM
More from Google
Google Workspace Admin SDK API programmatically manages Workspace organizations — users, groups, devices, domains, audit logs, organizational units.
Retrieve AdMob accounts, apps, ad units, ad sources, and generate mediation or network reports.
Work with AdSense Host accounts, ad clients, ad units, reports, and ad code generation from one API surface.
Programmatically manage Apigee organizations, API proxy deployments, attributes, certificates, and hybrid operations.
Google BigQuery API is the REST interface to GCP's flagship data warehouse — execute SQL queries, manage datasets/tables, stream inserts, and use built-in ML.
Control Binary Authorization attestors and policy checks for container images deployed to GKE and Anthos.
Fetch Business Profile location metrics, daily time series, and monthly search keyword impressions.
Google Calendar API lets apps create, read, and update calendar events programmatically — the go-to integration for scheduling apps.