Together AI API logo

Together AI API

Together AI API

Together AI API is an open-source LLM inference cloud — run Llama, Mistral, Qwen and other open models at much lower prices than OpenAI, with OpenAI-compatible API.

Visit site ↗Documentation ↗Health checked 9h ago
Use it when

5-10x cheaper than OpenAI (Llama 3.1 70B ~$0.9/M)

Watch for

Quality ceiling depends on open models (Llama 3.1 405B near GPT-4 but slightly weaker)

First check

Sign up at together.ai for API key. POST https://api.together.xyz/v1/chat/completions with model: "meta-llama/Llama-3.1-70B-Instruct-Turbo" + messages.

Auth
api_key
CORS
?
HTTPS
Yes
Signup
?
Latency
475 ms
Protocol
REST
Pricing
paid

Uptime · 30-day window

Probes: 1Uptime: 100%Avg latency: 475ms
01

About this API

Together AI is an open-source LLM inference service company founded 2022, differentiated as "OpenAI for open-source models" — want to use Llama 3.1 / Mistral / Qwen open large models but don't want to buy 8 H100s yourself, Together hosts them. Biggest selling point: 5-10x cheaper than closed-source LLMs (input tokens), with fine-tune support (OpenAI doesn't allow GPT-4 fine-tuning; Together lets you fine-tune Llama 3.1 70B). Tech stack: in-house inference optimization (Flash-Attention, various quantizations) + batch inference for high single-GPU utilization. API deliberately OpenAI-compatible (swap base_url + one line code switch), enabling zero-friction developer migration. Competitors: Groq (cheaper and faster but fewer models), Fireworks AI, DeepInfra, Lepton AI.

02

What you can build

  • 1Want to use Llama 3.1 405B without running GPUs
  • 2Price-sensitive scenarios (high-volume generation)
  • 3Looking for cheaper OpenAI alternatives
  • 4Enterprises needing to fine-tune open models
03

Strengths & limitations

Strengths

  • 5-10x cheaper than OpenAI (Llama 3.1 70B ~$0.9/M)
  • Supports fine-tuning (OpenAI doesn't support 70B+ fine-tune)
  • OpenAI-compatible API (seamless switch)

Limitations

  • Quality ceiling depends on open models (Llama 3.1 405B near GPT-4 but slightly weaker)
  • Advanced features like function calling less mature than OpenAI
04

Example request

Generic template — replace <endpoint> with the real path from the docs.
curl https://www.together.ai/<endpoint> \
  -H "Authorization: Bearer $API_KEY"
# Some providers use X-Api-Key instead — verify in the docs.
05

Getting started

Sign up at together.ai for API key. POST https://api.together.xyz/v1/chat/completions with model: "meta-llama/Llama-3.1-70B-Instruct-Turbo" + messages.

06

FAQ

Together vs. Groq?+

Groq uses in-house LPU chips with insane inference speed (Llama 70B 500+ tokens/sec); Together has wider model selection.

How do I fine-tune?+

POST /fine-tunes with training data (JSONL format); after training, get your own model ID and invoke.

07

Technical details

CORS: ?HTTPS: YesSignup: ?Open source: No
Auth type
api_key
Pricing
paid
Rate limit
默认 600 RPM;可申请提高
Protocols
REST
SDKs
python, javascript, typescript
Response time
475 ms
Last health check
5/12/2026, 7:38:30 AM
08

Tags