Together AI API

Together AI API is an open-source LLM inference cloud — run Llama, Mistral, Qwen and other open models at much lower prices than OpenAI, with OpenAI-compatible API.

Visit site ↗Documentation ↗Health checked 9h ago

Use it when

5-10x cheaper than OpenAI (Llama 3.1 70B ~$0.9/M)

Watch for

Quality ceiling depends on open models (Llama 3.1 405B near GPT-4 but slightly weaker)

First check

Sign up at together.ai for API key. POST https://api.together.xyz/v1/chat/completions with model: "meta-llama/Llama-3.1-70B-Instruct-Turbo" + messages.

Auth

api_key

CORS

HTTPS

Yes

Signup

Latency

475 ms

Protocol

REST

Pricing

paid

Uptime · 30-day window

Probes: 1Uptime: 100%Avg latency: 475ms

About this API

Together AI is an open-source LLM inference service company founded 2022, differentiated as "OpenAI for open-source models" — want to use Llama 3.1 / Mistral / Qwen open large models but don't want to buy 8 H100s yourself, Together hosts them. Biggest selling point: 5-10x cheaper than closed-source LLMs (input tokens), with fine-tune support (OpenAI doesn't allow GPT-4 fine-tuning; Together lets you fine-tune Llama 3.1 70B). Tech stack: in-house inference optimization (Flash-Attention, various quantizations) + batch inference for high single-GPU utilization. API deliberately OpenAI-compatible (swap base_url + one line code switch), enabling zero-friction developer migration. Competitors: Groq (cheaper and faster but fewer models), Fireworks AI, DeepInfra, Lepton AI.

What you can build

1Want to use Llama 3.1 405B without running GPUs
2Price-sensitive scenarios (high-volume generation)
3Looking for cheaper OpenAI alternatives
4Enterprises needing to fine-tune open models

Strengths & limitations

Strengths

5-10x cheaper than OpenAI (Llama 3.1 70B ~$0.9/M)
Supports fine-tuning (OpenAI doesn't support 70B+ fine-tune)
OpenAI-compatible API (seamless switch)

Limitations

Quality ceiling depends on open models (Llama 3.1 405B near GPT-4 but slightly weaker)
Advanced features like function calling less mature than OpenAI

Example request

Generic template — replace <endpoint> with the real path from the docs.

curl https://www.together.ai/<endpoint> \
  -H "Authorization: Bearer $API_KEY"
# Some providers use X-Api-Key instead — verify in the docs.

Getting started

Sign up at together.ai for API key. POST https://api.together.xyz/v1/chat/completions with model: "meta-llama/Llama-3.1-70B-Instruct-Turbo" + messages.

FAQ

Together vs. Groq?+

Groq uses in-house LPU chips with insane inference speed (Llama 70B 500+ tokens/sec); Together has wider model selection.

How do I fine-tune?+

POST /fine-tunes with training data (JSONL format); after training, get your own model ID and invoke.

Technical details

CORS: ?HTTPS: YesSignup: ?Open source: No

Auth type: api_key
Pricing: paid
Rate limit: 默认 600 RPM；可申请提高
Protocols: REST
SDKs: python, javascript, typescript
Response time: 475 ms
Last health check: 5/12/2026, 7:38:30 AM