Replicate API
Replicate API
Replicate API is an open-source ML model hosting platform — call models like Stable Diffusion, Llama, FLUX via single command, pay-per-use.
All open-source models available (HuggingFace + exclusives)
Higher latency than OpenAI/Anthropic (GPU spin-up takes seconds)
Sign up at replicate.com for API token. Python: import replicate; replicate.run("stability-ai/sdxl", input={"prompt": "..."})
Uptime · 30-day window
About this API
Replicate is an open-source ML model hosting platform founded 2019, positioned to "let developers use open-source AI models without running GPUs themselves". Background: HuggingFace has hundreds of thousands of open-source models, but using them requires renting GPUs + configuring inference servers — high barrier and costly. Replicate hosts models pre-configured; REST API calls trigger runs. Very broad coverage: Stable Diffusion (image), FLUX (latest image SOTA), Llama family (open LLMs), Whisper (speech), ControlNet, LoRA, video generation models, various niche models. Differentiator: import replicate in Python directly invokes — simpler than HuggingFace Inference Endpoint; pay-per-second instead of monthly subscription. Common users: AI startups in MVP phase, indie developers building AI art SaaS, content platforms integrating AI features.
What you can build
- 1Build AI image apps with Stable Diffusion / FLUX
- 2Call open Llama models to avoid OpenAI lock-in
- 3Test new open-source models without GPU deployment
- 4Fine-tune and host own models on Replicate
Strengths & limitations
Strengths
- All open-source models available (HuggingFace + exclusives)
- No cold-start hassles
- Per-second billing, no idle charges
- One-click fine-tune
Limitations
- Higher latency than OpenAI/Anthropic (GPU spin-up takes seconds)
- Not exactly cheap (GPU time isn't cheap)
- Some large models (Llama 3.1 405B) very expensive
Example request
curl https://replicate.com/<endpoint> \
-H "Authorization: Bearer $API_KEY"
# Some providers use X-Api-Key instead — verify in the docs.Getting started
Sign up at replicate.com for API token. Python: import replicate; replicate.run("stability-ai/sdxl", input={"prompt": "..."})
FAQ
Replicate vs. HuggingFace Inference?+
Replicate has more curated model catalog (many exclusive community-optimized variants), simpler invocation; HF has larger and open-source library.
Can I fine-tune my own models?+
Yes — Replicate provides LoRA fine-tune endpoints for SDXL, Llama, etc. Train your own version in minutes.
Technical details
- Auth type
- api_key
- Pricing
- paid
- Rate limit
- 按 GPU 秒计费,无 RPM 限制
- Protocols
- REST
- SDKs
- python, javascript, typescript
- Response time
- 533 ms
- Last health check
- 5/12/2026, 7:38:12 AM