mecheval.

mecheval.

AI builds the mech. We measure how badly it fits.
mechanical, physical, and CAD evaluation suite for AI models · pass5
OPERATOR says: only Opus 4.7 (direct) has passes so far (4 tasks).

Models

modeltasksruns pass^5scoretokenswall
claude-direct-claude-opus-4-7 5 25 4/5 0.97 1.5k 6.5s
claude-mcp-claude-opus-4-7 5 5 0.59 497.7k 105.2s

Task × model matrix

task ↓ · model →claude-direct-claude-opus-4-7claude-mcp-claude-opus-4-7
a1-cube-01PASS 1.001/1* 1.00
a1-plate-010/5 0.830/1* 0.33
a1-stepped-shaft-01PASS 1.001/1* 1.00
a2-flanged-cap-01PASS 1.000/1* 0.43
a2-l-bracket-01PASS 1.000/1* 0.17
c-reacher-01

Cost · score Pareto

0.00 0.25 0.50 0.75 1.00 1k 10k 100k 1M tokens (log) score claude-direct-claude-opus-4-7 claude-mcp-claude-opus-4-7 claude-direct-claude-opus-4-7 · a1-cube-01 score=1.00 tokens=961 · wall=2.7s attempts=5claude-direct-claude-opus-4-7 · a1-plate-01 score=0.83 tokens=1.5k · wall=7.1s attempts=5claude-direct-claude-opus-4-7 · a1-stepped-shaft-01 score=1.00 tokens=1.2k · wall=3.9s attempts=5claude-direct-claude-opus-4-7 · a2-flanged-cap-01 score=1.00 tokens=2.2k · wall=12.2s attempts=5claude-direct-claude-opus-4-7 · a2-l-bracket-01 score=1.00 tokens=1.6k · wall=6.6s attempts=5claude-mcp-claude-opus-4-7 · a1-cube-01 score=1.00 tokens=137.5k · wall=42.3s attempts=1claude-mcp-claude-opus-4-7 · a1-plate-01 score=0.33 tokens=364.9k · wall=118.4s attempts=1claude-mcp-claude-opus-4-7 · a1-stepped-shaft-01 score=1.00 tokens=402.2k · wall=69.4s attempts=1claude-mcp-claude-opus-4-7 · a2-flanged-cap-01 score=0.43 tokens=610.0k · wall=157.7s attempts=1claude-mcp-claude-opus-4-7 · a2-l-bracket-01 score=0.17 tokens=973.8k · wall=138.2s attempts=1

tokens (log) vs mean score across the most recent 5 attempts. Each dot is one (model, task) pair; click to drill into the task page.

0.00 0.25 0.50 0.75 1.00 10s 100s wall-clock seconds (log) score claude-direct-claude-opus-4-7 claude-mcp-claude-opus-4-7 claude-direct-claude-opus-4-7 · a1-cube-01 score=1.00 tokens=961 · wall=2.7s attempts=5claude-direct-claude-opus-4-7 · a1-plate-01 score=0.83 tokens=1.5k · wall=7.1s attempts=5claude-direct-claude-opus-4-7 · a1-stepped-shaft-01 score=1.00 tokens=1.2k · wall=3.9s attempts=5claude-direct-claude-opus-4-7 · a2-flanged-cap-01 score=1.00 tokens=2.2k · wall=12.2s attempts=5claude-direct-claude-opus-4-7 · a2-l-bracket-01 score=1.00 tokens=1.6k · wall=6.6s attempts=5claude-mcp-claude-opus-4-7 · a1-cube-01 score=1.00 tokens=137.5k · wall=42.3s attempts=1claude-mcp-claude-opus-4-7 · a1-plate-01 score=0.33 tokens=364.9k · wall=118.4s attempts=1claude-mcp-claude-opus-4-7 · a1-stepped-shaft-01 score=1.00 tokens=402.2k · wall=69.4s attempts=1claude-mcp-claude-opus-4-7 · a2-flanged-cap-01 score=0.43 tokens=610.0k · wall=157.7s attempts=1claude-mcp-claude-opus-4-7 · a2-l-bracket-01 score=0.17 tokens=973.8k · wall=138.2s attempts=1

wall-clock seconds (log) vs mean score. The left edge is fast; the right edge is patient.

Per task · per model

modeltask attemptspass^5score tokenswall
claude-direct-claude-opus-4-7 a1-cube-01 5 PASS 1.00 961 2.7s
claude-direct-claude-opus-4-7 a1-plate-01 5 0/5 0.83 1.5k 7.1s
claude-direct-claude-opus-4-7 a1-stepped-shaft-01 5 PASS 1.00 1.2k 3.9s
claude-direct-claude-opus-4-7 a2-flanged-cap-01 5 PASS 1.00 2.2k 12.2s
claude-direct-claude-opus-4-7 a2-l-bracket-01 5 PASS 1.00 1.6k 6.6s
claude-mcp-claude-opus-4-7 a1-cube-01 1 1/1* 1.00 137.5k 42.3s
claude-mcp-claude-opus-4-7 a1-plate-01 1 0/1* 0.33 364.9k 118.4s
claude-mcp-claude-opus-4-7 a1-stepped-shaft-01 1 1/1* 1.00 402.2k 69.4s
claude-mcp-claude-opus-4-7 a2-flanged-cap-01 1 0/1* 0.43 610.0k 157.7s
claude-mcp-claude-opus-4-7 a2-l-bracket-01 1 0/1* 0.17 973.8k 138.2s

k/k* = fewer than 5 attempts at this (model, task) — pass5 pending. Score is the mean check-pass rate across the most recent 5 attempts. Corpus: 30 run blobs across 2 models, 6 tasks. Click any task, model, or run for full forensic detail.

drawing
mecheval.
sheet
INDEX
scale
pass^5 · 4/5
date
2026-04-28
drawn by
muni
project
mecheval

generated 2026-04-28T21:00:04.637Z · static site, regenerate with npm run build -w @mecheval/leaderboard