Groq API

01

About this API

Groq is an AI chip company founded 2016 (founding team from early Google TPU), with in-house LPU chips designed specifically for LLM inference. Unlike general-purpose GPUs (NVIDIA H100), LPU sacrifices training capability for inference speed — deterministic, low-latency, high throughput. Currently Llama 3.1 70B on Groq achieves 500+ tokens/sec (same model on H100 is 50-100 tokens/sec). This speed difference makes previously impossible LLM tasks viable (multi-step AI agent reasoning, real-time voice conversations). API is OpenAI-compatible (just swap base_url), generous free tier (30 RPM is enough for demos), paid by token. Downsides: small model catalog (only Llama, Mixtral, a few open-source models) and tighter rate limits than OpenAI (LPU capacity constrained).

02

What you can build

1Real-time chatbots (latency-sensitive scenarios)
2AI agent multi-step reasoning (each step fast means overall fast)
3Voice conversational AI (reply while talking)
4High-throughput content generation

03

Strengths & limitations

Strengths

Industry-fastest inference (Llama 70B 500+ tokens/sec vs GPU 30-100)
OpenAI-compatible API for zero-cost switch
Cheaper than GPU-based providers

Limitations

Limited model selection (mainly open-source: Llama, Mixtral)
Tight rate limits (LPU resource constraints)
No fine-tuning

04

Example request

Generic template — replace <endpoint> with the real path from the docs.

curl https://groq.com/<endpoint> \
  -H "Authorization: Bearer $API_KEY"
# Some providers use X-Api-Key instead — verify in the docs.

05

Getting started

Sign up at console.groq.com for free API key. POST https://api.groq.com/openai/v1/chat/completions with model: "llama-3.1-70b-versatile" + messages.

06

FAQ

Groq vs. Together AI?+

Groq is insanely fast but limited models; Together has more models but slower. Latency-sensitive: Groq. Model selection matters: Together.

Note: Groq AI ≠ Grok (Musk's)+

Groq is an AI chip company (founded 2016). Grok is Elon Musk's xAI LLM. Similar names but completely unrelated.

07

Technical details

CORS: ?HTTPS: YesSignup: ?Open source: No

Auth type: api_key
Pricing: paid
Rate limit: free tier 30 RPM；付费 tier 提升
Protocols: REST
SDKs: python, typescript, javascript
Response time: 73 ms
Last health check: 5/12/2026, 7:37:38 AM

08

Related APIs

Airtable API

Airtable API lets you use Airtable databases (spreadsheet-like collaborative DB) as backends — read/write records, fields, views; favorite of no-code teams.

Alchemy API

Alchemy is a Web3 dev platform — Ethereum/Polygon/Arbitrum/Optimism/Solana node access + enhanced APIs (NFT, Token, Transfer, etc.).

Anthropic Claude API

Anthropic Claude API provides Claude 3.5 Sonnet / Claude 3 Opus large language models — known for alignment safety and long context (200K tokens).

Appwrite

Appwrite Server API is open-source Firebase alternative — auth, database, storage, functions, messaging integrated backend-as-a-service.

Asana

Asana API lets apps read and write tasks, projects, goals, and portfolios — the primary integration entry point for team collaboration.

AssemblyAI API

AssemblyAI API provides enterprise-grade speech-to-text + LLM enhancements — real-time and batch transcription, speaker diarization, sentiment analysis, auto LLM summary.