Groq API
Groq API
Groq API uses in-house LPU (Language Processing Unit) chips for open-source LLM inference — 10x faster than GPUs (Llama 70B at 500+ tokens/sec).
Industry-fastest inference (Llama 70B 500+ tokens/sec vs GPU 30-100)
Limited model selection (mainly open-source: Llama, Mixtral)
Sign up at console.groq.com for free API key. POST https://api.groq.com/openai/v1/chat/completions with model: "llama-3.1-70b-versatile" + messages.
Uptime · 30-day window
About this API
Groq is an AI chip company founded 2016 (founding team from early Google TPU), with in-house LPU chips designed specifically for LLM inference. Unlike general-purpose GPUs (NVIDIA H100), LPU sacrifices training capability for inference speed — deterministic, low-latency, high throughput. Currently Llama 3.1 70B on Groq achieves 500+ tokens/sec (same model on H100 is 50-100 tokens/sec). This speed difference makes previously impossible LLM tasks viable (multi-step AI agent reasoning, real-time voice conversations). API is OpenAI-compatible (just swap base_url), generous free tier (30 RPM is enough for demos), paid by token. Downsides: small model catalog (only Llama, Mixtral, a few open-source models) and tighter rate limits than OpenAI (LPU capacity constrained).
What you can build
- 1Real-time chatbots (latency-sensitive scenarios)
- 2AI agent multi-step reasoning (each step fast means overall fast)
- 3Voice conversational AI (reply while talking)
- 4High-throughput content generation
Strengths & limitations
Strengths
- Industry-fastest inference (Llama 70B 500+ tokens/sec vs GPU 30-100)
- OpenAI-compatible API for zero-cost switch
- Cheaper than GPU-based providers
Limitations
- Limited model selection (mainly open-source: Llama, Mixtral)
- Tight rate limits (LPU resource constraints)
- No fine-tuning
Example request
curl https://groq.com/<endpoint> \
-H "Authorization: Bearer $API_KEY"
# Some providers use X-Api-Key instead — verify in the docs.Getting started
Sign up at console.groq.com for free API key. POST https://api.groq.com/openai/v1/chat/completions with model: "llama-3.1-70b-versatile" + messages.
FAQ
Groq vs. Together AI?+
Groq is insanely fast but limited models; Together has more models but slower. Latency-sensitive: Groq. Model selection matters: Together.
Note: Groq AI ≠ Grok (Musk's)+
Groq is an AI chip company (founded 2016). Grok is Elon Musk's xAI LLM. Similar names but completely unrelated.
Technical details
- Auth type
- api_key
- Pricing
- paid
- Rate limit
- free tier 30 RPM;付费 tier 提升
- Protocols
- REST
- SDKs
- python, typescript, javascript
- Response time
- 73 ms
- Last health check
- 5/12/2026, 7:37:38 AM