Together AI API
Together AI API
Together AI API is an open-source LLM inference cloud — run Llama, Mistral, Qwen and other open models at much lower prices than OpenAI, with OpenAI-compatible API.
5-10x cheaper than OpenAI (Llama 3.1 70B ~$0.9/M)
Quality ceiling depends on open models (Llama 3.1 405B near GPT-4 but slightly weaker)
Sign up at together.ai for API key. POST https://api.together.xyz/v1/chat/completions with model: "meta-llama/Llama-3.1-70B-Instruct-Turbo" + messages.
Uptime · 30-day window
About this API
Together AI is an open-source LLM inference service company founded 2022, differentiated as "OpenAI for open-source models" — want to use Llama 3.1 / Mistral / Qwen open large models but don't want to buy 8 H100s yourself, Together hosts them. Biggest selling point: 5-10x cheaper than closed-source LLMs (input tokens), with fine-tune support (OpenAI doesn't allow GPT-4 fine-tuning; Together lets you fine-tune Llama 3.1 70B). Tech stack: in-house inference optimization (Flash-Attention, various quantizations) + batch inference for high single-GPU utilization. API deliberately OpenAI-compatible (swap base_url + one line code switch), enabling zero-friction developer migration. Competitors: Groq (cheaper and faster but fewer models), Fireworks AI, DeepInfra, Lepton AI.
What you can build
- 1Want to use Llama 3.1 405B without running GPUs
- 2Price-sensitive scenarios (high-volume generation)
- 3Looking for cheaper OpenAI alternatives
- 4Enterprises needing to fine-tune open models
Strengths & limitations
Strengths
- 5-10x cheaper than OpenAI (Llama 3.1 70B ~$0.9/M)
- Supports fine-tuning (OpenAI doesn't support 70B+ fine-tune)
- OpenAI-compatible API (seamless switch)
Limitations
- Quality ceiling depends on open models (Llama 3.1 405B near GPT-4 but slightly weaker)
- Advanced features like function calling less mature than OpenAI
Example request
curl https://www.together.ai/<endpoint> \
-H "Authorization: Bearer $API_KEY"
# Some providers use X-Api-Key instead — verify in the docs.Getting started
Sign up at together.ai for API key. POST https://api.together.xyz/v1/chat/completions with model: "meta-llama/Llama-3.1-70B-Instruct-Turbo" + messages.
FAQ
Together vs. Groq?+
Groq uses in-house LPU chips with insane inference speed (Llama 70B 500+ tokens/sec); Together has wider model selection.
How do I fine-tune?+
POST /fine-tunes with training data (JSONL format); after training, get your own model ID and invoke.
Technical details
- Auth type
- api_key
- Pricing
- paid
- Rate limit
- 默认 600 RPM;可申请提高
- Protocols
- REST
- SDKs
- python, javascript, typescript
- Response time
- 475 ms
- Last health check
- 5/12/2026, 7:38:30 AM