Replicate API logo

Replicate API

Replicate API

Replicate API is an open-source ML model hosting platform — call models like Stable Diffusion, Llama, FLUX via single command, pay-per-use.

Visit site ↗Documentation ↗Health checked 9h ago
Use it when

All open-source models available (HuggingFace + exclusives)

Watch for

Higher latency than OpenAI/Anthropic (GPU spin-up takes seconds)

First check

Sign up at replicate.com for API token. Python: import replicate; replicate.run("stability-ai/sdxl", input={"prompt": "..."})

Auth
api_key
CORS
?
HTTPS
Yes
Signup
?
Latency
533 ms
Protocol
REST
Pricing
paid

Uptime · 30-day window

Probes: 1Uptime: 100%Avg latency: 533ms
01

About this API

Replicate is an open-source ML model hosting platform founded 2019, positioned to "let developers use open-source AI models without running GPUs themselves". Background: HuggingFace has hundreds of thousands of open-source models, but using them requires renting GPUs + configuring inference servers — high barrier and costly. Replicate hosts models pre-configured; REST API calls trigger runs. Very broad coverage: Stable Diffusion (image), FLUX (latest image SOTA), Llama family (open LLMs), Whisper (speech), ControlNet, LoRA, video generation models, various niche models. Differentiator: import replicate in Python directly invokes — simpler than HuggingFace Inference Endpoint; pay-per-second instead of monthly subscription. Common users: AI startups in MVP phase, indie developers building AI art SaaS, content platforms integrating AI features.

02

What you can build

  • 1Build AI image apps with Stable Diffusion / FLUX
  • 2Call open Llama models to avoid OpenAI lock-in
  • 3Test new open-source models without GPU deployment
  • 4Fine-tune and host own models on Replicate
03

Strengths & limitations

Strengths

  • All open-source models available (HuggingFace + exclusives)
  • No cold-start hassles
  • Per-second billing, no idle charges
  • One-click fine-tune

Limitations

  • Higher latency than OpenAI/Anthropic (GPU spin-up takes seconds)
  • Not exactly cheap (GPU time isn't cheap)
  • Some large models (Llama 3.1 405B) very expensive
04

Example request

Generic template — replace <endpoint> with the real path from the docs.
curl https://replicate.com/<endpoint> \
  -H "Authorization: Bearer $API_KEY"
# Some providers use X-Api-Key instead — verify in the docs.
05

Getting started

Sign up at replicate.com for API token. Python: import replicate; replicate.run("stability-ai/sdxl", input={"prompt": "..."})

06

FAQ

Replicate vs. HuggingFace Inference?+

Replicate has more curated model catalog (many exclusive community-optimized variants), simpler invocation; HF has larger and open-source library.

Can I fine-tune my own models?+

Yes — Replicate provides LoRA fine-tune endpoints for SDXL, Llama, etc. Train your own version in minutes.

07

Technical details

CORS: ?HTTPS: YesSignup: ?Open source: No
Auth type
api_key
Pricing
paid
Rate limit
按 GPU 秒计费,无 RPM 限制
Protocols
REST
SDKs
python, javascript, typescript
Response time
533 ms
Last health check
5/12/2026, 7:38:12 AM
08

Tags