mecheval

The mechanical, physical, and CAD evaluation suite for AI models.

50%

industry pass⁵ · fraction of (model, task) pairs that have earned a clean pass⁵

2026-04-28 → 2026-06-12

113 / 228

pass⁵ · achieved / ready

0.98

top score · openai-direct-gpt-5

1233 runs

7 models · 64 tasks

Models

model	tasks	runs	pass^5	score	tokens	wall
openai-direct-gpt-5	25	125	20/23	0.98	2.5k	25.2s
claude-direct-claude-opus-4-7	37	216	32/37	0.97	1.7k	7.1s
claude-direct-claude-sonnet-4-6	34	193	29/34	0.95	1.4k	7.5s
openai-direct-gpt-5-mini	26	133	19/25	0.92	2.5k	26.3s
claude-mcp-claude-opus-4-7	52	267	13/52	0.80	452.9k	106.3s
openai-direct-gpt-4o-mini	26	126	0/23	0.13	1.2k	8.9s
claude-direct-claude-haiku-4-5-20251001	34	173	0/34	0.13	1.7k	4.4s

Task × model matrix

expected	task ↓ · model →	openai-direct-gpt-5	claude-direct-claude-opus-4-7	claude-direct-claude-sonnet-4-6	openai-direct-gpt-5-mini	claude-mcp-claude-opus-4-7	openai-direct-gpt-4o-mini	claude-direct-claude-haiku-4-5-20251001
	a1-block-01	4/4* 1.00	PASS 1.00	PASS 1.00	PASS 1.00	4/5 0.90	1/5 0.20	2/5 0.40
	a1-cone-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	2/5 0.40	0/5 0.00
	a1-cube-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	2/5 0.70	0/5 0.00	3/5 0.60
	a1-pipe-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	1/5 0.20	2/5 0.40
	a1-plate-01	PASS 1.00	PASS 1.00	PASS 1.00	4/5 0.80	3/5 0.70	0/4* 0.00	1/5 0.20
	a1-sphere-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	3/5 0.60	0/5 0.00
	a1-stepped-shaft-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	1/5 0.20	1/5 0.30
	a2-bolt-circle-block-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	0/5 0.00	1/5 0.20
	a2-channel-bracket-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	3/5 0.90	0/5 0.00	0/5 0.00
	a2-cube-with-pocket-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	0/5 0.40	0/5 0.00
	a2-cubemark-01	PASS 1.00	PASS 1.00	PASS 1.00	4/5 0.80	4/5 0.90	0/5 0.00	0/5 0.00
	a2-finned-block-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	4/5 0.90	0/5 0.00	0/5 0.00
	a2-flanged-cap-01	0/5 0.86	PASS 1.00	PASS 1.00	3/5 0.94	3/5 0.77	0/4* 0.00	0/5 0.00
	a2-l-bracket-01	4/4* 1.00	PASS 1.00	PASS 1.00	PASS 1.00	3/5 0.73	0/5 0.00	0/5 0.00
	a2-mounting-rail-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	2/5 0.70	0/5 0.00	1/5 0.20
	a2-square-flange-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	2/5 0.70	0/5 0.10	1/5 0.20
	a2-stepped-block-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	0/5 0.15	1/5 0.20
	a2-stepped-pyramid-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	0/5 0.15	2/5 0.40
	a2-tee-bracket-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	3/5 0.80	0/5 0.00	1/5 0.20
	a2-washer-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	4/5 0.97	3/5 0.60	2/5 0.40
	a3-cross-shaft-01	0/5 0.75	0/5 0.75	0/5 0.75	0/5 0.75	3/5 0.85	0/5 0.00	0/5 0.30
	a3-hex-bolt-pattern-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	0/5 0.37	0/5 0.07	0/5 0.00
	a3-hex-nut-01	—	1/5 0.67	—	—	2/5 0.83	—	—
	a3-octagonal-flange-01	0/5 0.83	0/5 0.70	0/5 0.80	0/5 0.43	0/5 0.63	0/2* 0.00	0/5 0.00
	a3-pentagonal-prism-01	—	PASS 1.00	—	—	PASS 1.00	—	—
	a3-rotated-block-01	—	PASS 1.00	—	—	2/5 0.75	—	—
	a3-spherical-dome-block-01	PASS 1.00	PASS 1.00	PASS 1.00	4/5 0.90	PASS 1.00	0/5 0.35	1/5 0.20
	a3-tangent-cylinders-01	—	PASS 1.00	—	—	4/5 0.95	—	—
	a3-three-tangent-cylinders-01	PASS 1.00	PASS 1.00	PASS 1.00	PASS 1.00	4/5 0.95	0/5 0.00	0/5 0.00
	a4-bolt-circle-flange-with-bore-01	—	PASS 1.00	PASS 1.00	—	4/5 0.80	—	0/5 0.00
	a4-counterbore-plate-01	—	PASS 1.00	PASS 1.00	—	0/5 0.40	—	0/5 0.00
	a4-flanged-shaft-01	—	PASS 1.00	PASS 1.00	—	1/5 0.72	—	0/5 0.00
	a4-rectangular-tube-01	—	PASS 1.00	PASS 1.00	—	4/5 0.90	—	0/5 0.00
	a4-rounded-bar-01	—	PASS 1.00	PASS 1.00	—	3/5 0.90	—	0/5 0.00
	a4-slotted-bracket-01	—	0/5 0.83	0/5 0.83	—	0/5 0.83	—	0/5 0.00
	a4-stepped-pyramid-with-holes-01	—	3/5 0.93	3/5 0.93	—	0/5 0.50	—	0/5 0.00
	a4-x-frame-01	—	PASS 1.00	PASS 1.00	—	PASS 1.00	—	1/5 0.20
	a5-disc-hub-01	—	—	—	—	3/5 0.85	—	—
	a5-double-d-shaft-01	—	—	—	—	4/5 0.90	—	—
	a5-hex-bolt-blank-01	—	—	—	—	4/5 0.95	—	—
	a5-hollow-cap-01	—	—	—	—	PASS 1.00	—	—
	a5-lightened-disc-01	—	—	—	—	2/5 0.82	—	—
	a5-ribbed-plate-01	—	—	—	—	0/5 0.63	—	—
	a5-stepped-boss-plate-01	—	—	—	—	1/5 0.45	—	—
	a5-u-bracket-01	—	—	—	—	3/5 0.77	—	—
	a6-compound-bore-ring-01	—	—	—	—	PASS 1.00	—	—
	a6-compound-boss-01	—	—	—	—	2/5 0.70	—	—
	a6-motor-flange-01	—	—	—	—	3/5 0.85	—	—
	a6-pulley-01	—	—	—	—	2/5 0.63	—	—
	a6-sprocket-blank-01	—	—	—	—	0/5 0.42	—	—
	a6-yoke-block-01	—	—	—	—	2/5 0.75	—	—
	c-reacher-01	—	—	0/5 0.08	0/2* 0.20	0/5 0.04	0/5 0.04	0/5 0.00
—	d1-sphere-01	—	—	—	—	—	—	—
—	f1-cap-tube-01	—	—	—	—	—	—	—
—	f1-plug-port-01	—	—	—	—	—	—	—
—	f1-shim-gap-01	—	—	—	—	—	—	—
—	f1-spacer-shaft-01	—	—	—	—	—	—	—
—	f2-collar-shaft-axial-01	—	—	—	—	—	—	—
—	f2-plug-port-tilted-01	—	—	—	—	—	—	—
—	f2-shim-gap-tilted-01	—	—	—	—	—	—	—
—	f2-spacer-shaft-sideways-01	—	—	—	—	—	—	—
—	f3-cap-snap-bottle-01	—	—	—	—	—	—	—
—	f3-clip-pipe-01	—	—	—	—	—	—	—
—	f4-collar-loaded-01	—	—	—	—	—	—	—

Each cell shows pass⁵ for the most recent 5 attempts at (model, task). The leftmost column is the latest passing reference render.

Cost · score Pareto

Tokens (log) vs mean score across the most recent 5 attempts. Each point is one (model, task) pair. The dashed line is the Pareto frontier — points on it are not dominated by any cheaper, better alternative.

Wall-clock seconds vs score

Wall-clock seconds (log) vs mean score. The left edge is fast; the right edge is patient.

k/k* denotes fewer than 5 attempts at this (model, task) — pass⁵ pending. Score is the mean check-pass rate across the most recent 5 attempts. Corpus: 1233 run blobs across 7 models and 64 tasks.

generated 2026-06-17T03:16:07.179Z · static site, regenerate with npm run build -w @mecheval/leaderboard