AI Agent

OpenAI

GPT-4o

Flagship multimodal; strong tool use and latency profile.

OpenAI

GPT-4o mini

Cost-optimized workhorse for high-volume chat.

OpenAI

GPT-4.1

Coding-focused refresh; long-context variants where offered.

OpenAI

o3 / o4-mini

Reasoning-first models for math, code, and multi-step plans.

OpenAI

o1

Earlier reasoning line; still deployed in many stacks.

OpenAI

GPT-4 Turbo

Legacy 128k-class workhorse; being superseded by 4o family.

Anthropic

Claude Opus 4

Highest capability tier; long context and careful refusals.

Anthropic

Claude Sonnet 4

Balanced speed/quality for agents and coding copilots.

Anthropic

Claude 3.5 Haiku

Fast, inexpensive Claude for routing and summaries.

Google

Gemini 2.5 Pro

Top Gemini tier for complex reasoning and tools.

Google

Gemini 2.0 Flash

Low-latency multimodal; good default for product chat.

Google

Gemma 3

Open-ish weights family for fine-tuning and edge.

Meta

Llama 4 Maverick

Higher-throughput Llama 4 line for interactive apps.

Mistral

Mistral Large

Frontier-class with strong EU deployment story.

Mistral

Mistral Medium / Small

Tiered pricing ladder for classification and chat.

Mistral

Codestral

Code-completion and fill-in-the-middle specialist.

xAI

Grok-3

xAI flagship; check regional availability and policies.

xAI

Grok-2

Prior generation still common in third-party routers.

DeepSeek

DeepSeek-V3

High value general model; popular in routed endpoints.

DeepSeek

DeepSeek-R1

Reasoning-specialized; strong math/code benchmarks.

Cohere

Command R+

Enterprise RAG workflows with tool calling.

Cohere

Command R

Mid-tier workhorse for retrieval-heavy assistants.

AI21

Jamba 1.5

SSM-attention hybrid; very long effective context paths.

Amazon

Nova Pro

AWS-native frontier-class for Bedrock pipelines.

Amazon

Nova Lite / Micro

Cost-sensitive Bedrock defaults for classification.

Microsoft

Phi-4

Small LM with strong reasoning-per-dollar on CPU/GPU.

Microsoft

Phi-3 family

On-device and edge deployments; varied quantizations.

NVIDIA

Nemotron

Enterprise/agentic stacks on NVIDIA AI Foundations.

Alibaba

Qwen2.5 / Qwen3

Multilingual open weights; widely finetuned in APAC.

Baidu

ERNIE 4.x

China-market enterprise assistant and search integration.

Tencent

Hunyuan

Tencent cloud and super-app ecosystem integration.

01.AI

Yi-Large

Bilingual CN/EN frontier line with open variants.

Snowflake

Arctic

Enterprise data-cloud LLM positioning for SQL copilots.

Perplexity

Sonar (online)

Search-grounded answers via hosted Sonar endpoints.

IBM

Granite

watsonx enterprise models for regulated industries.

Databricks

DBRX / Mosaic line

Lakehouse-native assistants; names evolve with releases.

Together AI

Hosted Llama / Mistral / Qwen

Aggregator hosting many open weights behind one API.

Fireworks AI

Speed-optimized endpoints

Low-latency serving for open models in production.

Groq

LPU-hosted Llama / Mixtral

Extremely fast tokens/sec for latency-sensitive UX.

CrabAI

Multi-model routing

Unified API across many of the vendors listed here.

Major LLMs on the market