Groq API logo

Groq API

Groq API

Groq API uses in-house LPU (Language Processing Unit) chips for open-source LLM inference — 10x faster than GPUs (Llama 70B at 500+ tokens/sec).

Visit site ↗Documentation ↗Health checked 9h ago
Use it when

Industry-fastest inference (Llama 70B 500+ tokens/sec vs GPU 30-100)

Watch for

Limited model selection (mainly open-source: Llama, Mixtral)

First check

Sign up at console.groq.com for free API key. POST https://api.groq.com/openai/v1/chat/completions with model: "llama-3.1-70b-versatile" + messages.

Auth
api_key
CORS
?
HTTPS
Yes
Signup
?
Latency
73 ms
Protocol
REST
Pricing
paid

Uptime · 30-day window

Probes: 1Uptime: 100%Avg latency: 73ms
01

About this API

Groq is an AI chip company founded 2016 (founding team from early Google TPU), with in-house LPU chips designed specifically for LLM inference. Unlike general-purpose GPUs (NVIDIA H100), LPU sacrifices training capability for inference speed — deterministic, low-latency, high throughput. Currently Llama 3.1 70B on Groq achieves 500+ tokens/sec (same model on H100 is 50-100 tokens/sec). This speed difference makes previously impossible LLM tasks viable (multi-step AI agent reasoning, real-time voice conversations). API is OpenAI-compatible (just swap base_url), generous free tier (30 RPM is enough for demos), paid by token. Downsides: small model catalog (only Llama, Mixtral, a few open-source models) and tighter rate limits than OpenAI (LPU capacity constrained).

02

What you can build

  • 1Real-time chatbots (latency-sensitive scenarios)
  • 2AI agent multi-step reasoning (each step fast means overall fast)
  • 3Voice conversational AI (reply while talking)
  • 4High-throughput content generation
03

Strengths & limitations

Strengths

  • Industry-fastest inference (Llama 70B 500+ tokens/sec vs GPU 30-100)
  • OpenAI-compatible API for zero-cost switch
  • Cheaper than GPU-based providers

Limitations

  • Limited model selection (mainly open-source: Llama, Mixtral)
  • Tight rate limits (LPU resource constraints)
  • No fine-tuning
04

Example request

Generic template — replace <endpoint> with the real path from the docs.
curl https://groq.com/<endpoint> \
  -H "Authorization: Bearer $API_KEY"
# Some providers use X-Api-Key instead — verify in the docs.
05

Getting started

Sign up at console.groq.com for free API key. POST https://api.groq.com/openai/v1/chat/completions with model: "llama-3.1-70b-versatile" + messages.

06

FAQ

Groq vs. Together AI?+

Groq is insanely fast but limited models; Together has more models but slower. Latency-sensitive: Groq. Model selection matters: Together.

Note: Groq AI ≠ Grok (Musk's)+

Groq is an AI chip company (founded 2016). Grok is Elon Musk's xAI LLM. Similar names but completely unrelated.

07

Technical details

CORS: ?HTTPS: YesSignup: ?Open source: No
Auth type
api_key
Pricing
paid
Rate limit
free tier 30 RPM;付费 tier 提升
Protocols
REST
SDKs
python, typescript, javascript
Response time
73 ms
Last health check
5/12/2026, 7:37:38 AM
08

Tags