← mecheval / task / a1-cube-01

Centered cube A · A1 · a1-cube-01

primitives · sanity

Expected

Prompt

Make a 25mm cube centered on the origin.

Checks

0
valid_solid
{
  "type": "valid_solid"
}
1
bbox
{
  "type": "bbox",
  "min": [
    -12.5,
    -12.5,
    -12.5
  ],
  "max": [
    12.5,
    12.5,
    12.5
  ],
  "tolerance_mm": 0.05
}
2
mass_props
{
  "type": "mass_props",
  "volume_mm3": 15625,
  "center_of_mass": [
    0,
    0,
    0
  ],
  "tolerance_pct": 0.1
}
3
step_roundtrip
{
  "type": "step_roundtrip",
  "tolerance_pct": 0.1
}

Anti-cheese

{
  "max_solid_count": 1
}

Limits

{
  "max_tokens": 20000,
  "max_wallclock_sec": 120,
  "max_tool_calls": 20
}

Recent attempts

Runs (47)

modelrun statusscorefirst failtokenswall
claude-mcp-claude-opus-4-7 20260611T171713Z-ab0c PASS 1.00 159.2k 31.1s
claude-mcp-claude-opus-4-7 20260611T171226Z-7a96 fail 0.50 bbox · X off by +12.50mm 45.5k 11.0s
claude-mcp-claude-opus-4-7 20260611T171155Z-9ea3 PASS 1.00 193.6k 40.7s
claude-mcp-claude-opus-4-7 20260611T171146Z-842b fail 0.50 bbox · X off by +12.50mm 45.5k 8.7s
claude-mcp-claude-opus-4-7 20260611T171143Z-0bb8 fail 0.50 bbox · X off by +12.50mm 143.2k 59.3s
openai-direct-gpt-5-mini 20260428T212434Z-929d PASS 1.00 1.1k 9.5s
openai-direct-gpt-5-mini 20260428T212424Z-97ee PASS 1.00 1.1k 9.5s
openai-direct-gpt-4o-mini 20260428T212419Z-7957 fail 0.00 valid_solid · solid invalid 702 2.8s
openai-direct-gpt-4o-mini 20260428T212416Z-4c5e fail 0.00 valid_solid · solid invalid 703 3.2s
openai-direct-gpt-5-mini 20260428T212414Z-8b89 PASS 1.00 1.1k 10.3s
openai-direct-gpt-5 20260428T212413Z-54a2 PASS 1.00 1.2k 8.5s
openai-direct-gpt-4o-mini 20260428T212413Z-5107 fail 0.00 valid_solid · solid invalid 698 3.3s
openai-direct-gpt-4o-mini 20260428T212409Z-f0f5 fail 0.00 valid_solid · solid invalid 731 3.5s
openai-direct-gpt-4o-mini 20260428T212406Z-e29d fail 0.00 valid_solid · solid invalid 698 3.5s
openai-direct-gpt-5 20260428T212405Z-f6c0 PASS 1.00 908 7.4s
openai-direct-gpt-5-mini 20260428T212405Z-5e4d PASS 1.00 1.1k 8.7s
openai-direct-gpt-5 20260428T212355Z-2b83 PASS 1.00 1.1k 10.4s
openai-direct-gpt-5-mini 20260428T212350Z-c018 PASS 1.00 1.1k 14.9s
openai-direct-gpt-5 20260428T212348Z-2201 PASS 1.00 1.2k 6.9s
openai-direct-gpt-5 20260428T212331Z-5b81 PASS 1.00 1.4k 17.0s
openai-direct-gpt-4o-mini 20260428T212250Z-57e5 fail 0.00 valid_solid · solid invalid 731 5.6s
openai-direct-gpt-5-mini 20260428T212237Z-5584 PASS 1.00 1.1k 12.3s
openai-direct-gpt-5 20260428T212203Z-cd3a PASS 1.00 1.2k 10.0s
claude-direct-claude-haiku-4-5-20251001 20260428T211303Z-76e0 fail 0.00 valid_solid · solid invalid 809 1.9s
claude-direct-claude-haiku-4-5-20251001 20260428T211301Z-71e1 PASS 1.00 831 1.5s
claude-direct-claude-haiku-4-5-20251001 20260428T211259Z-5ec8 fail 0.00 valid_solid · solid invalid 813 2.1s
claude-direct-claude-sonnet-4-6 20260428T211257Z-2be0 PASS 1.00 770 2.5s
claude-direct-claude-haiku-4-5-20251001 20260428T211257Z-1d70 PASS 1.00 781 1.5s
claude-direct-claude-haiku-4-5-20251001 20260428T211256Z-f60d PASS 1.00 801 1.4s
claude-direct-claude-sonnet-4-6 20260428T211255Z-bc54 PASS 1.00 760 2.8s
claude-direct-claude-sonnet-4-6 20260428T211251Z-7829 PASS 1.00 784 3.2s
claude-direct-claude-sonnet-4-6 20260428T211248Z-fe4e PASS 1.00 752 3.2s
claude-direct-claude-sonnet-4-6 20260428T211245Z-3630 PASS 1.00 784 3.1s
claude-direct-claude-haiku-4-5-20251001 20260428T211113Z-5054 fail 0.00 valid_solid · solid invalid 809 2.0s
claude-direct-claude-sonnet-4-6 20260428T211109Z-cb72 PASS 1.00 755 2.8s
claude-direct-claude-sonnet-4-6 20260428T211106Z-6ace PASS 1.00 784 3.0s
claude-direct-claude-sonnet-4-6 20260428T211102Z-ff92 PASS 1.00 784 3.1s
claude-direct-claude-sonnet-4-6 20260428T211059Z-80e8 PASS 1.00 770 2.8s
claude-direct-claude-sonnet-4-6 20260428T211056Z-8f7f PASS 1.00 770 3.0s
claude-direct-claude-haiku-4-5-20251001 20260428T211020Z-0ebf PASS 1.00 829 1.6s
claude-direct-claude-sonnet-4-6 20260428T211009Z-297e PASS 1.00 770 2.6s
claude-direct-claude-opus-4-7 20260428T153537Z-bb2f PASS 1.00 965 2.6s
claude-direct-claude-opus-4-7 20260428T153525Z-1416 PASS 1.00 959 3.1s
claude-direct-claude-opus-4-7 20260428T153513Z-a078 PASS 1.00 961 2.7s
claude-mcp-claude-opus-4-7 20260428T145411Z-f872 PASS 1.00 137.5k 42.3s
claude-direct-claude-opus-4-7 20260428T142449Z-9da6 PASS 1.00 959 2.4s
claude-direct-claude-opus-4-7 20260428T141611Z-6a7b PASS 1.00 962 2.9s

generated 2026-06-17T03:16:07.193Z · static site, regenerate with npm run build -w @mecheval/leaderboard