Google Cloud Vision API

Google Cloud's image-analysis API: 11 features (labels, OCR, document OCR, face attributes, landmarks, logos, object localization, safe search, crop hints, web detection) billed per feature per image

Visit site ↗Documentation ↗Health checked 12h ago

Use it when

Eleven standard features stack in a single request: LABEL_DETECTION, TEXT_DETECTION, DOCUMENT_TEXT_DETECTION, FACE_DETECTION, LANDMARK_DETECTION, LOGO_DETECTION, OBJECT_LOCALIZATION, IMAGE_PROPERTIES, SAFE_SEARCH_DETECTION, CROP_HINTS, WEB_DETECTION. Billing adds per feature, but only one network round-trip is consumed

Watch for

No face identification. The API returns face landmarks and emotion likelihoods (joy, sorrow, anger, surprise) but never an identity. For 1:1 or 1:N face matching you need AWS Rekognition or a self-hosted model such as ArcFace

First check

Open cloud.google.com/vision, enable the Cloud Vision API on your GCP project, and link a billing account. Authenticate locally with `gcloud auth application-default login` (ADC) or drop a service account JSON for CI/CD. Install @google-cloud/vision (Node.js) or google-cloud-vision (Python) and call LABEL_DETECTION first to confirm the per-feature-per-image billing model. For PDFs and scanned documents, go straight to DOCUMENT_TEXT_DETECTION via files:asyncBatchAnnotate — do not use TEXT_DETECTION for multi-page docs.

Auth

oauth

CORS

HTTPS

Yes

Signup

Required

Latency

166 ms

Protocol

REST, gRPC

Pricing

freemium

Uptime · 30-day window

Probes: 30Uptime: 93%Avg latency: 493ms

About this API

Cloud Vision API is Google Cloud's core image-understanding service, exposing eleven standard features. LABEL_DETECTION returns open-vocabulary object and scene tags. TEXT_DETECTION runs scene OCR for short strings in natural images. DOCUMENT_TEXT_DETECTION targets documents and handwriting, returning a hierarchical block / paragraph / word / symbol structure. FACE_DETECTION returns face landmarks and emotion likelihoods (joy, sorrow, anger, surprise) but never an identity. LANDMARK_DETECTION, LOGO_DETECTION, and OBJECT_LOCALIZATION cover landmarks, brand logos, and bounded objects. IMAGE_PROPERTIES returns dominant colors and crop hints. SAFE_SEARCH_DETECTION scores adult / spoof / medical / violence / racy on a five-step UNLIKELY-to-VERY_LIKELY scale. CROP_HINTS suggests smart crops. WEB_DETECTION runs a reverse-image lookup and returns a best-guess label. Billing is feature-times-image: one image with LABEL_DETECTION + TEXT_DETECTION bills 2 units. The free tier covers the first 1,000 units per month per feature. From 1,001 to 5,000,000 units per month, most features run $1.50 per 1,000 units; OBJECT_LOCALIZATION is $2.25, WEB_DETECTION is $3.50, and CROP_HINTS is $0.60. Above 5M units per month, TEXT_DETECTION, DOCUMENT_TEXT_DETECTION, FACE, LANDMARK, and LOGO drop to $0.60, and LABEL_DETECTION drops to $1.00. There are four endpoint shapes: v1/images:annotate (synchronous, up to 16 images per batch), v1/images:asyncBatchAnnotate (async with output to GCS), v1/files:annotate (synchronous PDF/TIFF, 5 pages max), and v1/files:asyncBatchAnnotate (async PDF/TIFF up to 2,000 pages per document). Image input takes three forms: inline base64 (HTTP payload capped at 10MB), GCS URI (single file capped at 20MB), or HTTP/HTTPS URL (demo-only). The default project quota is 1,800 requests per minute; higher concurrency needs a Cloud Quotas adjustment. Authentication follows the Google Cloud stack: ADC, service account JSON, OAuth bearer, or a limited API key. Seven official client libraries are maintained: C#, Go, Java, Node.js, PHP, Python, Ruby. Two gotchas dominate field experience. First, teams used to per-request pricing under-budget because Vision bills per feature per image — stacking three features triples the unit count. Decide which features you actually need before estimating cost. Second, multi-page PDFs should not be looped through synchronous TEXT_DETECTION one page at a time. Use DOCUMENT_TEXT_DETECTION via files:asyncBatchAnnotate and let Google process the whole document in one async job that writes JSON output to GCS. For 1:N face identification, Vision is the wrong tool — switch to AWS Rekognition or a self-hosted face model.

What you can build

1Document OCR pipelines: DOCUMENT_TEXT_DETECTION extracts structured text from scanned contracts, invoices, and handwritten notes. PDFs and TIFFs go through files:asyncBatchAnnotate with output written to GCS for downstream parsing
2UGC safety moderation: SAFE_SEARCH_DETECTION returns five-axis likelihood scores (adult, spoof, medical, violence, racy) on an UNLIKELY-to-VERY_LIKELY scale, ideal for fail-closed upload gates
3E-commerce product tagging: LABEL_DETECTION plus OBJECT_LOCALIZATION returns category labels and bounding boxes, with PRODUCT_SEARCH layered on top for reverse-image lookup against an indexed catalog

Strengths & limitations

Strengths

Eleven standard features stack in a single request: LABEL_DETECTION, TEXT_DETECTION, DOCUMENT_TEXT_DETECTION, FACE_DETECTION, LANDMARK_DETECTION, LOGO_DETECTION, OBJECT_LOCALIZATION, IMAGE_PROPERTIES, SAFE_SEARCH_DETECTION, CROP_HINTS, WEB_DETECTION. Billing adds per feature, but only one network round-trip is consumed
Volume pricing drops at the 5M-units/month tier: LABEL_DETECTION goes from $1.50 to $1.00 per 1,000 units, TEXT_DETECTION and DOCUMENT_TEXT_DETECTION go from $1.50 to $0.60
Image input has three shapes: inline base64 (payload capped at 10MB), GCS URI (gs://bucket/path, single file capped at 20MB), or an HTTP/HTTPS URL (demo-only — not for production). PDFs and TIFFs go through the files: endpoints with a 2,000-page ceiling per document

Limitations

No face identification. The API returns face landmarks and emotion likelihoods (joy, sorrow, anger, surprise) but never an identity. For 1:1 or 1:N face matching you need AWS Rekognition or a self-hosted model such as ArcFace
Synchronous annotate caps at 16 images per request; files:annotate (PDF/TIFF) caps at 5 pages per request. Anything larger has to switch to files:asyncBatchAnnotate with results written to GCS
WEB_DETECTION and OBJECT_LOCALIZATION are the most expensive features at the tier-1 ($3.50 and $2.25 per 1,000 units respectively). When stacking features, these two are the first to blow the budget — turn them off when not needed

Official quickstart

Read the official quickstart at cloud.google.com.

Getting started

FAQ

Does Google Cloud Vision API have a free tier?+

Yes. Each feature has its own monthly free quota of 1,000 units, billed independently across the 11 standard features. A unit equals one feature applied to one image, so an annotate request using LABEL_DETECTION + TEXT_DETECTION on one image draws from both free quotas. PRODUCT_SEARCH and CELEBRITY_RECOGNITION (allow-listed) follow separate pricing models.

Can Vision API do face recognition?+

No. FACE_DETECTION returns face landmarks (eye, nose, mouth coordinates) and emotion likelihoods (joy, sorrow, anger, surprise) but never an identity, and the API does not support 1:1 or 1:N face matching. Google explicitly excludes face identification from its public Cloud product. For identity matching, use AWS Rekognition (Face Compare, SearchFacesByImage) or a self-hosted model such as ArcFace or FaceNet.

Which endpoint should I use for multi-page PDFs?+

Use DOCUMENT_TEXT_DETECTION with v1/files:asyncBatchAnnotate. That endpoint accepts up to 2,000 pages per document, takes both input and output as GCS URIs, returns a long-running operation name, and writes results asynchronously to the GCS output prefix (one JSON file per batch of up to 100 pages). The synchronous files:annotate caps at five pages. TEXT_DETECTION is designed for scene text in natural images — it loses paragraph and line-break structure on document pages.

Does stacking features in one request reduce cost?+

Only the network round-trip is saved, not the bill. Requesting N features on one image bills N units, each at its own feature-specific rate. The "I am already calling it, might as well enable a few more features" instinct burns money — only enable the features your application actually consumes.

Technical details

CORS: NoHTTPS: YesSignup: YesOpen source: No

Auth type: oauth
Pricing: freemium
Rate limit: Default project quota: 1,800 requests/minute; synchronous annotate accepts up to 16 images per request, files:annotate (PDF/TIFF) up to 5 pages per request; async batch supports up to 2,000 pages per document with output written to GCS
Free tier quota: Per-feature monthly free units: first 1,000 units/month free for each feature. A unit = one feature applied to one image, so an annotate request with LABEL_DETECTION + TEXT_DETECTION on one image bills 2 units. PRODUCT_SEARCH and CELEBRITY_RECOGNITION (allow-listed) have separate pricing models.
Protocols: REST, gRPC
SDKs: C#, Go, Java, Node.js, PHP, Python, Ruby
Response time: 166 ms
Last health check: 6/26/2026, 6:23:30 AM

Endpoints

Parsed from the OpenAPI spec. Showing 8 of 8 non-deprecated endpoints.

POST

/v1p1beta1/{parent}/files:annotateprojects

parent:path*

POST

/v1p1beta1/{parent}/files:asyncBatchAnnotateprojects

parent:path*

POST

/v1p1beta1/{parent}/images:annotateprojects

parent:path*

POST

/v1p1beta1/{parent}/images:asyncBatchAnnotateprojects

parent:path*

POST

/v1p1beta1/files:annotatefiles

POST

/v1p1beta1/files:asyncBatchAnnotatefiles

POST

/v1p1beta1/images:annotateimages

POST

/v1p1beta1/images:asyncBatchAnnotateimages

More from Google

View all from Google →

Admin SDK API

Google Workspace Admin SDK API programmatically manages Workspace organizations — users, groups, devices, domains, audit logs, organizational units.

AdMob API

Retrieve AdMob accounts, apps, ad units, ad sources, and generate mediation or network reports.

AdSense Host API

Work with AdSense Host accounts, ad clients, ad units, reports, and ad code generation from one API surface.

Apigee API

Programmatically manage Apigee organizations, API proxy deployments, attributes, certificates, and hybrid operations.

BigQuery API

Google BigQuery API is the REST interface to GCP's flagship data warehouse — execute SQL queries, manage datasets/tables, stream inserts, and use built-in ML.

Binary Authorization API

Control Binary Authorization attestors and policy checks for container images deployed to GKE and Anthos.

Business Profile Performance API

Fetch Business Profile location metrics, daily time series, and monthly search keyword impressions.

Calendar API

Google Calendar API lets apps create, read, and update calendar events programmatically — the go-to integration for scheduling apps.

Uptime · 30-day window

About this API

What you can build

Strengths & limitations

Strengths

Limitations

Official quickstart

Getting started

FAQ

Technical details

Endpoints

Tags

More from Google

Related APIs