Intentar ORO - Gratis
AI Models Collapse in Face of Complex Problems
Hindustan Times Gurugram
|June 09, 2025
Just days ahead of the much-anticipated Worldwide Developer Conference (WWDC), Apple has released a study titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity", which saw researchers testing 'reasoning' AI models such as Anthropic's Claude, OpenAI's models, DeepSeek RL, and Google's Thinking models to see how far they can scale to replicate human reasoning.
NEW DELHI: Just days ahead of the much-anticipated Worldwide Developer Conference (WWDC), Apple has released a study titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity," which saw researchers testing 'reasoning' AI models such as Anthropic's Claude, OpenAI's models, DeepSeek RL, and Google's Thinking models to see how far they can scale to replicate human reasoning. Spoiler alert—not as much, as the entire AI marketing pitch would have you believe. Could this signal what may be in store for Apple's AI conversation ahead of the keynote?
The study questions the current standard evaluation of Large Reasoning Models (LRMs) using established mathematical and coding benchmarks, arguing they suffer from data contamination and don't reveal insights into reasoning trace structure and quality. Instead, it proposes a controlled experimental test-bed using algorithmic puzzle environments. The limitations of AI benchmarking, and need to evolve, is something we had written about earlier.
Esta historia es de la edición June 09, 2025 de Hindustan Times Gurugram.
Suscríbete a Magzter GOLD para acceder a miles de historias premium seleccionadas y a más de 9000 revistas y periódicos.
¿Ya eres suscriptor? Iniciar sesión
MÁS HISTORIAS DE Hindustan Times Gurugram
Hindustan Times Gurugram
{ IRCTC CASE } HC seeks CBI reply over Tejashwi plea on charges
The Delhi high court on Tuesday sought response of CBI on a plea by RJD leader Tejashwi Prasad Yadav challenging an order framing charges against him in the alleged IRCTC case.
1 min
January 07, 2026
Hindustan Times Gurugram
Goyal heads to Brussels for India-EU FTA talks
Union commerce and industry minister Piyush Goyal will visit Brussels this week to provide “strategic guidance” to negotiators finalising the contours of a mutually beneficial India-European Union free trade deal.
1 mins
January 07, 2026
Hindustan Times Gurugram
Head, Smith centuries put Australia in charge on Day 3
Run-machine Travis Head hit a swashbuckling 163 and Steve Smith a composed 129* as Australia built a 134-run lead over England to seize control on Day 3 of the fifth and final Ashes Test on Tuesday.
1 min
January 07, 2026
Hindustan Times Gurugram
SC grants bail to former Amtek Group promoter in PMLA case
THE COURT HELD THAT PROLONGED INCARCERATION WITHOUT COMMENCEMENT OF TRIAL VIOLATES THE FUNDAMENTAL RIGHT TO SPEEDY TRIAL UNDER ARTICLE 21
2 mins
January 07, 2026
Hindustan Times Gurugram
Riteish Deshmukh responds to BJP leader's comments on late father
Actor Riteish Deshmukh on Tuesday responded strongly to remarks made by Maharashtra BJP chief Ravindra Chavan about his late father, former chief minister Vilasrao Deshmukh, saying that no one can erase a legacy built on public service.
1 min
January 07, 2026
Hindustan Times Gurugram
Switzerland bar fire: Inspections last done in 2019
Swiss authorities admitted on Tuesday that fire safety inspections had not been carried out for the past five years at a bar where 40 people died and dozens were injured in a New Year blaze.
1 min
January 07, 2026
Hindustan Times Gurugram
2026, WHY SO DRY ON THE BIG SCREEN?
While big stars dominate releases till April, the months after remain sparse. From cautious small-budget filmmakers to the lingering 'Dhurandhar effect,' we unravel the reasons behind the lull
1 mins
January 07, 2026
Hindustan Times Gurugram
UP SIR: 28.9mn names deleted from draft roll
The name of nearly every fifth voter in Uttar Pradesh might be removed after the special intensive revision (SIR), as the Election Commission of India (ECI) on Tuesday published the draft roll that dropped 28.9 million people, marking the highest percentage of deletions among major states where the controversial exercise has been conducted.
1 mins
January 07, 2026
Hindustan Times Gurugram
Flu, hepatitis A & B: US cuts number of shots needed for kids amid outcry
The United States ended on Monday its longstanding guidance that all children receive vaccines against flu and three other diseases, a sweeping change that advances one of Health Secretary Robert F. Kennedy Jr.’s long-term goals.
1 min
January 07, 2026
Hindustan Times Gurugram
Objectionable content: X gets 48-hr extension for action plan on GrokAl
The ministry of electronics and information technology (Meity) has extended X's deadline by 48 hours after the platform cited employee holidays, a senior official told HT on Tuesday.
1 mins
January 07, 2026
Listen
Translate
Change font size
