withsoon

LLM Compare

Honest, task-based comparisons of the top language models. No marketing — just results.

Model Overview

ModelOrgContextStrengths
GPT-4oOpenAI128k
CodingReasoningVision
Claude 3.7 SonnetAnthropic200k
Long contextWritingInstruction following
Gemini 1.5 ProGoogle1M
MultimodalLong contextCode
Mistral LargeMistral AI32k
Open weightsFastEuropean data
Llama 3.1 405BMeta128k
Open sourceSelf-hostableReasoning

Coming soon

  • Prompt-by-prompt live comparison
  • Benchmark leaderboard with sources
  • Cost-per-task calculator
  • Latency benchmarks
  • RAG quality comparison