AI Models Collapse in Face of Complex Problems
Hindustan Times Bengaluru
|June 09, 2025
Just days ahead of the much-anticipated Worldwide Developer Conference (WWDC), Apple has released a study titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity", which saw researchers testing 'reasoning' AI models such as Anthropic's Claude, OpenAI's models, DeepSeek RL, and Google's Thinking models to see how far they can scale to replicate human reasoning.
NEW DELHI: Just days ahead of the much-anticipated Worldwide Developer Conference (WWDC), Apple has released a study titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity," which saw researchers testing 'reasoning' AI models such as Anthropic's Claude, OpenAI's models, DeepSeek RL, and Google's Thinking models to see how far they can scale to replicate human reasoning. Spoiler alert—not as much, as the entire AI marketing pitch would have you believe. Could this signal what may be in store for Apple's AI conversation ahead of the keynote?
The study questions the current standard evaluation of Large Reasoning Models (LRMs) using established mathematical and coding benchmarks, arguing they suffer from data contamination and don't reveal insights into reasoning trace structure and quality. Instead, it proposes a controlled experimental test-bed using algorithmic puzzle environments. The limitations of AI benchmarking, and need to evolve, is something we had written about earlier.
Diese Geschichte stammt aus der June 09, 2025-Ausgabe von Hindustan Times Bengaluru.
Abonnieren Sie Magzter GOLD, um auf Tausende kuratierter Premium-Geschichten und über 9.000 Zeitschriften und Zeitungen zuzugreifen.
Sie sind bereits Abonnent? Anmelden
WEITERE GESCHICHTEN VON Hindustan Times Bengaluru
Hindustan Times Bengaluru
Indian jails: Prisoners of the caste system
In December 2020, as the world grappled with unequal access to Covid-19 vaccines, another form of inequality was exposed inside India’s prisons.
3 mins
January 01, 2026
Hindustan Times Bengaluru
Denmark’s last letter ends 400-yr postal tradition in first for world
Denmark's state-run postal service, PostNord, delivered its final letter on Tuesday (local time), bringing an end to more than 400 years of traditional mail delivery as the country fully embraces digital communication, CNN reported.
1 mins
January 01, 2026
Hindustan Times Bengaluru
Old challenges, new resolutions
Managing air pollution to negotiating a world in churn, the government has its task cut out in 2026
2 mins
January 01, 2026
Hindustan Times Bengaluru
Markets surge nearly 1% on last trading day of 2025
MUMBAI: Equity benchmark indices Sensex and Nifty jumped nearly 1% on Wednesday, the final trading session of 2025, after days of range-bound trading amid sustained buying by domestic institutional investors.
1 min
January 01, 2026
Hindustan Times Bengaluru
Suvidha providers to help resolve PF-related issues, says Mandaviya
The Employees' Provident Fund Organisation (EPFO) will soon appoint \"Suvidha providers\" to act as a guide to subscribers and help them access benefits, such as cash withdrawals, and resolve issues, said Union labour minister Mansukh Mandaviya.
1 min
January 01, 2026
Hindustan Times Bengaluru
TRUMP TO MAKE A VISIT TO CHINA
025 ended with the US-China relationship finally on somewhat firmer ground.
1 min
January 01, 2026
Hindustan Times Bengaluru
Old-school tactics, new-age instinct in Carlsen’s endgame
The Soviets are history.
3 mins
January 01, 2026
Hindustan Times Bengaluru
U.S. STATE DELAYS REVOCATION OF 17K CDLS AFTER SIKH GROUP SUES
A week after immigrant groups filed a lawsuit, California said Tuesday it will delay the revocations of 17,000 commercial driver's licences (CDL) until March to allow more time to ensure that truckers and bus drivers who legally qualify for the licenses can keep them.
1 min
January 01, 2026
Hindustan Times Bengaluru
DGCA SEEKS AI'S EXPLANATION FOR OPERATING A B-787 DESPITE SNAGS
The Directorate General of Civil Aviation (DGCA) has issued a show-cause notice to Air India, flagging safety concerns over the operation of a Boeing 787-8 Dreamliner aircraft VT-ANI despite repetitive technical snags.
1 min
January 01, 2026
Hindustan Times Bengaluru
IMRAN KHAN'S SISTERS DENIED MEET WITH HIM, HOLD PROTEST
Leaders of the Pakistan Tehreek-e-Insaf (PTI) and sisters of former prime minister Imran Khan were once again prevented from meeting him at Adiala Jail, prompting them to stage a sit-in near the prison, Dawn reported.
1 min
January 01, 2026
Listen
Translate
Change font size

