The new benchmark, called "Humanity's Last Exam," evaluated whether AI systems have achieved world-class expert-level reasoning and knowledge capabilities across a wide range of fields, including math ...