Some experts have questioned AIME’s validity as an AI benchmark. Nevertheless, AIME 2025 and older versions of the test are commonly used to probe a model’s math ability. xAI’s graph showed ...