OpenAI
GPT-4o
Flagship multimodal; strong tool use and latency profile.
OpenAI
GPT-4o mini
Cost-optimized workhorse for high-volume chat.
OpenAI
GPT-4.1
Coding-focused refresh; long-context variants where offered.
OpenAI
o3 / o4-mini
Reasoning-first models for math, code, and multi-step plans.
OpenAI
o1
Earlier reasoning line; still deployed in many stacks.
OpenAI
GPT-4 Turbo
Legacy 128k-class workhorse; being superseded by 4o family.
Anthropic
Claude Opus 4
Highest capability tier; long context and careful refusals.
Anthropic
Claude Sonnet 4
Balanced speed/quality for agents and coding copilots.
Anthropic
Claude 3.5 Haiku
Fast, inexpensive Claude for routing and summaries.
Google
Gemini 2.5 Pro
Top Gemini tier for complex reasoning and tools.
Google
Gemini 2.0 Flash
Low-latency multimodal; good default for product chat.
Google
Gemma 3
Open-ish weights family for fine-tuning and edge.
Meta
Llama 3.3 70B
Widely hosted open-weight baseline for RAG and agents.
Meta
Llama 4 Scout
Efficient Llama 4 variant for long docs and retrieval.
Meta
Llama 4 Maverick
Higher-throughput Llama 4 line for interactive apps.
Mistral
Mistral Large
Frontier-class with strong EU deployment story.
Mistral
Mistral Medium / Small
Tiered pricing ladder for classification and chat.
Mistral
Codestral
Code-completion and fill-in-the-middle specialist.
xAI
Grok-3
xAI flagship; check regional availability and policies.
xAI
Grok-2
Prior generation still common in third-party routers.
DeepSeek
DeepSeek-V3
High value general model; popular in routed endpoints.
DeepSeek
DeepSeek-R1
Reasoning-specialized; strong math/code benchmarks.
Cohere
Command R+
Enterprise RAG workflows with tool calling.
Cohere
Command R
Mid-tier workhorse for retrieval-heavy assistants.
AI21
Jamba 1.5
SSM-attention hybrid; very long effective context paths.
Amazon
Nova Pro
AWS-native frontier-class for Bedrock pipelines.
Amazon
Nova Lite / Micro
Cost-sensitive Bedrock defaults for classification.
Microsoft
Phi-4
Small LM with strong reasoning-per-dollar on CPU/GPU.
Microsoft
Phi-3 family
On-device and edge deployments; varied quantizations.
NVIDIA
Nemotron
Enterprise/agentic stacks on NVIDIA AI Foundations.
Alibaba
Qwen2.5 / Qwen3
Multilingual open weights; widely finetuned in APAC.
Baidu
ERNIE 4.x
China-market enterprise assistant and search integration.
Tencent
Hunyuan
Tencent cloud and super-app ecosystem integration.
01.AI
Yi-Large
Bilingual CN/EN frontier line with open variants.
Snowflake
Arctic
Enterprise data-cloud LLM positioning for SQL copilots.
Perplexity
Sonar (online)
Search-grounded answers via hosted Sonar endpoints.
IBM
Granite
watsonx enterprise models for regulated industries.
Databricks
DBRX / Mosaic line
Lakehouse-native assistants; names evolve with releases.
Together AI
Hosted Llama / Mistral / Qwen
Aggregator hosting many open weights behind one API.
Fireworks AI
Speed-optimized endpoints
Low-latency serving for open models in production.
Groq
LPU-hosted Llama / Mixtral
Extremely fast tokens/sec for latency-sensitive UX.
CrabAI
Multi-model routing
Unified API across many of the vendors listed here.