
Google Cloud Vision API
Google Cloud Vision API
Google Cloud's image-analysis API: 11 features (labels, OCR, document OCR, face attributes, landmarks, logos, object localization, safe search, crop hints, web detection) billed per feature per image
Eleven standard features stack in a single request: LABEL_DETECTION, TEXT_DETECTION, DOCUMENT_TEXT_DETECTION, FACE_DETECTION, LANDMARK_DETECTION, LOGO_DETECTION, OBJECT_LOCALIZATION, IMAGE_PROPERTIES, SAFE_SEARCH_DETECTION, CROP_HINTS, WEB_DETECTION. Billing adds per feature, but only one network round-trip is consumed
No face identification. The API returns face landmarks and emotion likelihoods (joy, sorrow, anger, surprise) but never an identity. For 1:1 or 1:N face matching you need AWS Rekognition or a self-hosted model such as ArcFace
Open cloud.google.com/vision, enable the Cloud Vision API on your GCP project, and link a billing account. Authenticate locally with `gcloud auth application-default login` (ADC) or drop a service account JSON for CI/CD. Install @google-cloud/vision (Node.js) or google-cloud-vision (Python) and call LABEL_DETECTION first to confirm the per-feature-per-image billing model. For PDFs and scanned documents, go straight to DOCUMENT_TEXT_DETECTION via files:asyncBatchAnnotate — do not use TEXT_DETECTION for multi-page docs.
Uptime · 30-day window
About this API
Cloud Vision API is Google Cloud's core image-understanding service, exposing eleven standard features. LABEL_DETECTION returns open-vocabulary object and scene tags. TEXT_DETECTION runs scene OCR for short strings in natural images. DOCUMENT_TEXT_DETECTION targets documents and handwriting, returning a hierarchical block / paragraph / word / symbol structure. FACE_DETECTION returns face landmarks and emotion likelihoods (joy, sorrow, anger, surprise) but never an identity. LANDMARK_DETECTION, LOGO_DETECTION, and OBJECT_LOCALIZATION cover landmarks, brand logos, and bounded objects. IMAGE_PROPERTIES returns dominant colors and crop hints. SAFE_SEARCH_DETECTION scores adult / spoof / medical / violence / racy on a five-step UNLIKELY-to-VERY_LIKELY scale. CROP_HINTS suggests smart crops. WEB_DETECTION runs a reverse-image lookup and returns a best-guess label. Billing is feature-times-image: one image with LABEL_DETECTION + TEXT_DETECTION bills 2 units. The free tier covers the first 1,000 units per month per feature. From 1,001 to 5,000,000 units per month, most features run $1.50 per 1,000 units; OBJECT_LOCALIZATION is $2.25, WEB_DETECTION is $3.50, and CROP_HINTS is $0.60. Above 5M units per month, TEXT_DETECTION, DOCUMENT_TEXT_DETECTION, FACE, LANDMARK, and LOGO drop to $0.60, and LABEL_DETECTION drops to $1.00. There are four endpoint shapes: v1/images:annotate (synchronous, up to 16 images per batch), v1/images:asyncBatchAnnotate (async with output to GCS), v1/files:annotate (synchronous PDF/TIFF, 5 pages max), and v1/files:asyncBatchAnnotate (async PDF/TIFF up to 2,000 pages per document). Image input takes three forms: inline base64 (HTTP payload capped at 10MB), GCS URI (single file capped at 20MB), or HTTP/HTTPS URL (demo-only). The default project quota is 1,800 requests per minute; higher concurrency needs a Cloud Quotas adjustment. Authentication follows the Google Cloud stack: ADC, service account JSON, OAuth bearer, or a limited API key. Seven official client libraries are maintained: C#, Go, Java, Node.js, PHP, Python, Ruby. Two gotchas dominate field experience. First, teams used to per-request pricing under-budget because Vision bills per feature per image — stacking three features triples the unit count. Decide which features you actually need before estimating cost. Second, multi-page PDFs should not be looped through synchronous TEXT_DETECTION one page at a time. Use DOCUMENT_TEXT_DETECTION via files:asyncBatchAnnotate and let Google process the whole document in one async job that writes JSON output to GCS. For 1:N face identification, Vision is the wrong tool — switch to AWS Rekognition or a self-hosted face model.
What you can build
- 1Document OCR pipelines: DOCUMENT_TEXT_DETECTION extracts structured text from scanned contracts, invoices, and handwritten notes. PDFs and TIFFs go through files:asyncBatchAnnotate with output written to GCS for downstream parsing
- 2UGC safety moderation: SAFE_SEARCH_DETECTION returns five-axis likelihood scores (adult, spoof, medical, violence, racy) on an UNLIKELY-to-VERY_LIKELY scale, ideal for fail-closed upload gates
- 3E-commerce product tagging: LABEL_DETECTION plus OBJECT_LOCALIZATION returns category labels and bounding boxes, with PRODUCT_SEARCH layered on top for reverse-image lookup against an indexed catalog
Strengths & limitations
Strengths
- Eleven standard features stack in a single request: LABEL_DETECTION, TEXT_DETECTION, DOCUMENT_TEXT_DETECTION, FACE_DETECTION, LANDMARK_DETECTION, LOGO_DETECTION, OBJECT_LOCALIZATION, IMAGE_PROPERTIES, SAFE_SEARCH_DETECTION, CROP_HINTS, WEB_DETECTION. Billing adds per feature, but only one network round-trip is consumed
- Volume pricing drops at the 5M-units/month tier: LABEL_DETECTION goes from $1.50 to $1.00 per 1,000 units, TEXT_DETECTION and DOCUMENT_TEXT_DETECTION go from $1.50 to $0.60
- Image input has three shapes: inline base64 (payload capped at 10MB), GCS URI (gs://bucket/path, single file capped at 20MB), or an HTTP/HTTPS URL (demo-only — not for production). PDFs and TIFFs go through the files: endpoints with a 2,000-page ceiling per document
Limitations
- No face identification. The API returns face landmarks and emotion likelihoods (joy, sorrow, anger, surprise) but never an identity. For 1:1 or 1:N face matching you need AWS Rekognition or a self-hosted model such as ArcFace
- Synchronous annotate caps at 16 images per request; files:annotate (PDF/TIFF) caps at 5 pages per request. Anything larger has to switch to files:asyncBatchAnnotate with results written to GCS
- WEB_DETECTION and OBJECT_LOCALIZATION are the most expensive features at the tier-1 ($3.50 and $2.25 per 1,000 units respectively). When stacking features, these two are the first to blow the budget — turn them off when not needed
Official quickstart
Read the official quickstart at cloud.google.com.
Getting started
Open cloud.google.com/vision, enable the Cloud Vision API on your GCP project, and link a billing account. Authenticate locally with `gcloud auth application-default login` (ADC) or drop a service account JSON for CI/CD. Install @google-cloud/vision (Node.js) or google-cloud-vision (Python) and call LABEL_DETECTION first to confirm the per-feature-per-image billing model. For PDFs and scanned documents, go straight to DOCUMENT_TEXT_DETECTION via files:asyncBatchAnnotate — do not use TEXT_DETECTION for multi-page docs.
FAQ
Does Google Cloud Vision API have a free tier?+
Yes. Each feature has its own monthly free quota of 1,000 units, billed independently across the 11 standard features. A unit equals one feature applied to one image, so an annotate request using LABEL_DETECTION + TEXT_DETECTION on one image draws from both free quotas. PRODUCT_SEARCH and CELEBRITY_RECOGNITION (allow-listed) follow separate pricing models.
Can Vision API do face recognition?+
No. FACE_DETECTION returns face landmarks (eye, nose, mouth coordinates) and emotion likelihoods (joy, sorrow, anger, surprise) but never an identity, and the API does not support 1:1 or 1:N face matching. Google explicitly excludes face identification from its public Cloud product. For identity matching, use AWS Rekognition (Face Compare, SearchFacesByImage) or a self-hosted model such as ArcFace or FaceNet.
Which endpoint should I use for multi-page PDFs?+
Use DOCUMENT_TEXT_DETECTION with v1/files:asyncBatchAnnotate. That endpoint accepts up to 2,000 pages per document, takes both input and output as GCS URIs, returns a long-running operation name, and writes results asynchronously to the GCS output prefix (one JSON file per batch of up to 100 pages). The synchronous files:annotate caps at five pages. TEXT_DETECTION is designed for scene text in natural images — it loses paragraph and line-break structure on document pages.
Does stacking features in one request reduce cost?+
Only the network round-trip is saved, not the bill. Requesting N features on one image bills N units, each at its own feature-specific rate. The "I am already calling it, might as well enable a few more features" instinct burns money — only enable the features your application actually consumes.
Technical details
- Auth type
- oauth
- Pricing
- freemium
- Rate limit
- Default project quota: 1,800 requests/minute; synchronous annotate accepts up to 16 images per request, files:annotate (PDF/TIFF) up to 5 pages per request; async batch supports up to 2,000 pages per document with output written to GCS
- Free tier quota
- Per-feature monthly free units: first 1,000 units/month free for each feature. A unit = one feature applied to one image, so an annotate request with LABEL_DETECTION + TEXT_DETECTION on one image bills 2 units. PRODUCT_SEARCH and CELEBRITY_RECOGNITION (allow-listed) have separate pricing models.
- Protocols
- REST, gRPC
- SDKs
- C#, Go, Java, Node.js, PHP, Python, Ruby
- Response time
- 166 ms
- Last health check
- 6/26/2026, 6:23:30 AM
Endpoints
Parsed from the OpenAPI spec. Showing 8 of 8 non-deprecated endpoints.
/v1p1beta1/{parent}/files:annotateprojects/v1p1beta1/{parent}/files:asyncBatchAnnotateprojects/v1p1beta1/{parent}/images:annotateprojects/v1p1beta1/{parent}/images:asyncBatchAnnotateprojects/v1p1beta1/files:annotatefiles/v1p1beta1/files:asyncBatchAnnotatefiles/v1p1beta1/images:annotateimages/v1p1beta1/images:asyncBatchAnnotateimagesMore from Google
Google Workspace Admin SDK API programmatically manages Workspace organizations — users, groups, devices, domains, audit logs, organizational units.
Retrieve AdMob accounts, apps, ad units, ad sources, and generate mediation or network reports.
Work with AdSense Host accounts, ad clients, ad units, reports, and ad code generation from one API surface.
Programmatically manage Apigee organizations, API proxy deployments, attributes, certificates, and hybrid operations.
Google BigQuery API is the REST interface to GCP's flagship data warehouse — execute SQL queries, manage datasets/tables, stream inserts, and use built-in ML.
Control Binary Authorization attestors and policy checks for container images deployed to GKE and Anthos.
Fetch Business Profile location metrics, daily time series, and monthly search keyword impressions.
Google Calendar API lets apps create, read, and update calendar events programmatically — the go-to integration for scheduling apps.