AI Models Collapse in Face of Complex Problems
June 09, 2025
|Hindustan Times Gurugram
Just days ahead of the much-anticipated Worldwide Developer Conference (WWDC), Apple has released a study titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity", which saw researchers testing 'reasoning' AI models such as Anthropic's Claude, OpenAI's models, DeepSeek RL, and Google's Thinking models to see how far they can scale to replicate human reasoning.
NEW DELHI: Just days ahead of the much-anticipated Worldwide Developer Conference (WWDC), Apple has released a study titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity," which saw researchers testing 'reasoning' AI models such as Anthropic's Claude, OpenAI's models, DeepSeek RL, and Google's Thinking models to see how far they can scale to replicate human reasoning. Spoiler alert—not as much, as the entire AI marketing pitch would have you believe. Could this signal what may be in store for Apple's AI conversation ahead of the keynote?
The study questions the current standard evaluation of Large Reasoning Models (LRMs) using established mathematical and coding benchmarks, arguing they suffer from data contamination and don't reveal insights into reasoning trace structure and quality. Instead, it proposes a controlled experimental test-bed using algorithmic puzzle environments. The limitations of AI benchmarking, and need to evolve, is something we had written about earlier.
هذه القصة من طبعة June 09, 2025 من Hindustan Times Gurugram.
اشترك في Magzter GOLD للوصول إلى آلاف القصص المتميزة المنسقة، وأكثر من 9000 مجلة وصحيفة.
هل أنت مشترك بالفعل؟ تسجيل الدخول
المزيد من القصص من Hindustan Times Gurugram
Hindustan Times Gurugram
AI HARDWARE IN NEW AVATARS
We've seen “AI pins” more or less fail.
1 min
January 01, 2026
Hindustan Times Gurugram
Can't afford to go on strike on NYE: NCR gig workers
A nationwide strike called by unions representing gig workers associated with major e-commerce, food delivery, and cab platforms failed to significantly disrupt services in Delhi-NCR on New Year's Eve
2 mins
January 01, 2026
Hindustan Times Gurugram
Xi lauds Brahmaputra dam, says Taiwan reunification unstoppable
Chinese President Xi Jinping in his New Year address on Wednesday said Taiwan's reunification with China is “unstoppable” while projecting the country’s growing defence advances, and highlighted the construction of the world’s largest dam over Brahmaputra river.
1 mins
January 01, 2026
Hindustan Times Gurugram
Putin orders Ukraine buffer zone expansion
Russia's top general said its forces were pressing forward in northeastern Ukraine and President Vladimir Putin had ordered expansion of territory Moscow calls a buffer zone there in 2026, Russian news agencies said on Wednesday, Reuters reported.
2 mins
January 01, 2026
Hindustan Times Gurugram
Old challenges, new resolutions
Managing air pollution to negotiating a world in churn, the government has its task cut out in 2026
2 mins
January 01, 2026
Hindustan Times Gurugram
SOFTWARE-DEFINED AUTO COCKPITS
The next big thing in smart automotive mechanics looks set to be a digital cockpit revolution.
1 min
January 01, 2026
Hindustan Times Gurugram
JAPAN CEOS HALT ANNUAL CHINA TRIP FOR FIRST TIME IN 13 YEARS
TOKYO: A prominent group of Japanese executives has put its planned visit to Beijing on hold, a sign that a diplomatic feud is chilling commercial ties between the two economies.
1 min
January 01, 2026
Hindustan Times Gurugram
Old-school tactics, new-age instinct in Carlsen’s endgame
The Soviets are history.
3 mins
January 01, 2026
Hindustan Times Gurugram
Landfills to be flattened in '26: CM
IN LINE WITH 'VISION 2047'
1 mins
January 01, 2026
Hindustan Times Gurugram
Indian jails: Prisoners of the caste system
In December 2020, as the world grappled with unequal access to Covid-19 vaccines, another form of inequality was exposed inside India’s prisons.
3 mins
January 01, 2026
Listen
Translate
Change font size

