Measurement of Scale in Statistics Model Exams

CAIS and Scale AI Unveil Results of "Humanity's Last Exam," a Groundbreaking New Benchmark

The new benchmark, called "Humanity's Last Exam," evaluated whether AI systems have achieved world-class expert-level reasoning and knowledge capabilities across a wide range of fields, including math ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

CAIS and Scale AI Unveil Results of "Humanity's Last Exam," a Groundbreaking New Benchmark

Trending now