Some experts have questioned AIME’s validity as an AI benchmark. Nevertheless, AIME 2025 and older versions of the test are commonly used to probe a model’s math ability. xAI’s graph showed ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results