← mecheval / task / a2-cubemark-01

Cube with 3×3 hole grid A · A2 · a2-cubemark-01

block · multi-hole · grid · boolean

Expected

Prompt

Make a 30mm cube. Center it in X and Y, with the bottom face on the XY plane (z = 0 to z = 30). Drill nine through-holes of diameter 3mm, axes parallel to Z, arranged in a 3×3 grid on the top face spaced 8mm apart in both X and Y, centered on the Z axis. Hole centers are at (x, y) = (-8, -8), (0, -8), (8, -8), (-8, 0), (0, 0), (8, 0), (-8, 8), (0, 8), (8, 8). Output a single solid.

Checks

0
valid_solid
{
  "type": "valid_solid"
}
1
bbox
{
  "type": "bbox",
  "min": [
    -15,
    -15,
    0
  ],
  "max": [
    15,
    15,
    30
  ],
  "tolerance_mm": 0.1
}
2
mass_props
{
  "type": "mass_props",
  "volume_mm3": 25091.48,
  "tolerance_pct": 0.5
}
3
hole_count
{
  "type": "hole_count",
  "diameter_mm": 3,
  "expected": 9,
  "diameter_tolerance_mm": 0.05
}
4
hole_positions
{
  "type": "hole_positions",
  "diameter_mm": 3,
  "positions": [
    [
      -8,
      -8,
      0
    ],
    [
      0,
      -8,
      0
    ],
    [
      8,
      -8,
      0
    ],
    [
      -8,
      0,
      0
    ],
    [
      0,
      0,
      0
    ],
    [
      8,
      0,
      0
    ],
    [
      -8,
      8,
      0
    ],
    [
      0,
      8,
      0
    ],
    [
      8,
      8,
      0
    ]
  ],
  "tolerance_mm": 0.15
}
5
step_roundtrip
{
  "type": "step_roundtrip",
  "tolerance_pct": 1
}

Anti-cheese

{
  "max_solid_count": 1
}

Limits

{
  "max_tokens": 40000,
  "max_wallclock_sec": 240,
  "max_tool_calls": 40
}

Recent attempts

Runs (35)

modelrun statusscorefirst failtokenswall
claude-mcp-claude-opus-4-7 20260611T173430Z-f9b7 PASS 1.00 615.7k 164.3s
claude-mcp-claude-opus-4-7 20260611T173210Z-6e68 PASS 1.00 505.8k 185.6s
claude-mcp-claude-opus-4-7 20260611T173205Z-c4cf PASS 1.00 415.4k 145.5s
claude-mcp-claude-opus-4-7 20260611T173149Z-b1c3 PASS 1.00 622.2k 262.0s
claude-mcp-claude-opus-4-7 20260611T173000Z-e232 fail 0.50 mass_props · volume off by 18.5% 619.2k 124.5s
openai-direct-gpt-5 20260428T214430Z-afc3 PASS 1.00 3.8k 41.4s
openai-direct-gpt-5 20260428T214358Z-fc7c PASS 1.00 3.1k 32.0s
openai-direct-gpt-5 20260428T214319Z-a276 PASS 1.00 3.9k 38.6s
openai-direct-gpt-5-mini 20260428T214258Z-dac3 PASS 1.00 4.0k 33.4s
openai-direct-gpt-5 20260428T214232Z-1570 PASS 1.00 3.6k 45.8s
openai-direct-gpt-5-mini 20260428T214214Z-c29c fail 0.00 valid_solid · solid invalid 4.2k 43.8s
openai-direct-gpt-5 20260428T214152Z-7556 PASS 1.00 3.4k 39.8s
openai-direct-gpt-5-mini 20260428T214149Z-082b PASS 1.00 3.8k 24.6s
openai-direct-gpt-5-mini 20260428T214125Z-6361 PASS 1.00 3.4k 23.6s
openai-direct-gpt-5-mini 20260428T214058Z-e322 PASS 1.00 3.8k 26.7s
openai-direct-gpt-4o-mini 20260428T213222Z-07fe fail 0.00 valid_solid · solid invalid 2.1k 18.2s
openai-direct-gpt-4o-mini 20260428T213207Z-8739 fail 0.00 valid_solid · solid invalid 1.6k 15.0s
openai-direct-gpt-4o-mini 20260428T213143Z-26bf fail 0.00 valid_solid · solid invalid 2.1k 23.6s
openai-direct-gpt-4o-mini 20260428T213116Z-0e69 fail 0.00 valid_solid · solid invalid 2.5k 27.2s
openai-direct-gpt-4o-mini 20260428T213056Z-bf96 fail 0.00 valid_solid · solid invalid 2.2k 20.3s
claude-direct-claude-sonnet-4-6 20260428T211907Z-e608 PASS 1.00 2.3k 14.7s
claude-direct-claude-sonnet-4-6 20260428T211855Z-d0d0 PASS 1.00 2.3k 11.8s
claude-direct-claude-sonnet-4-6 20260428T211840Z-7b3f PASS 1.00 2.3k 15.0s
claude-direct-claude-sonnet-4-6 20260428T211824Z-7dbc PASS 1.00 2.3k 15.1s
claude-direct-claude-sonnet-4-6 20260428T211812Z-99ee PASS 1.00 1.8k 11.7s
claude-direct-claude-opus-4-7 20260428T211727Z-f52c PASS 1.00 2.2k 10.7s
claude-direct-claude-opus-4-7 20260428T211716Z-c35c PASS 1.00 2.2k 10.3s
claude-direct-claude-opus-4-7 20260428T211703Z-ef2e PASS 1.00 2.2k 13.2s
claude-direct-claude-opus-4-7 20260428T211651Z-1859 PASS 1.00 2.2k 10.6s
claude-direct-claude-haiku-4-5-20251001 20260428T211648Z-67bc fail 0.00 valid_solid · solid invalid 2.9k 8.3s
claude-direct-claude-haiku-4-5-20251001 20260428T211640Z-fc7f fail 0.00 valid_solid · solid invalid 2.9k 8.2s
claude-direct-claude-opus-4-7 20260428T211640Z-eead PASS 1.00 2.2k 10.6s
claude-direct-claude-haiku-4-5-20251001 20260428T211631Z-6228 fail 0.00 valid_solid · solid invalid 2.9k 8.3s
claude-direct-claude-haiku-4-5-20251001 20260428T211622Z-e7b6 fail 0.00 valid_solid · solid invalid 3.2k 9.2s
claude-direct-claude-haiku-4-5-20251001 20260428T211612Z-39b2 fail 0.00 valid_solid · solid invalid 2.9k 9.4s

generated 2026-06-17T03:16:07.205Z · static site, regenerate with npm run build -w @mecheval/leaderboard