Try GOLD - Free
Data that powers AI is disappearing fast
Business Standard
|July 20, 2024
For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models.
Now, that data is drying up.
Over the past year, many of the most important web sources used for training AI models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an MIT-led research group.
The study, which looked at 14,000 web domains that are included in three commonly used AI training data sets, discovered an "emerging crisis in consent," as publishers and online platforms have taken steps to prevent their data from being harvested.
The researchers estimate that in the three data sets-called C4, RefinedWeb and Dolma-5 per cent of all data, and 25 per cent of data from the highest-quality sources, has been restricted. Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt.
This story is from the July 20, 2024 edition of Business Standard.
Subscribe to Magzter GOLD to access thousands of curated premium stories, and 10,000+ magazines and newspapers.
Already a subscriber? Sign In
MORE STORIES FROM Business Standard
Business Standard
Insurer made to pay claim for sabotage
Kesar Enterprises, a limited company engaged in the manufacture and sale of sugar and allied products, had obtained a standard fire and special perils policy from National Insurance Company.
2 mins
February 23, 2026
Business Standard
Smallcaps: A silver lining
Indian equity markets present a striking paradox.
3 mins
February 23, 2026
Business Standard
Škoda rides Kylaq wave; to sharpen focus on EV, CNG
Škoda Auto India is sharpening its focus on cleaner fuel technologies, such as compressed natural gas (CNG) and electric vehicles (EV), even as the compact SUV Kylaq emerges as the brand's primary growth engine in one of India's most competitive segments.
2 mins
February 23, 2026
Business Standard
PSBs outperform private peers, yet again
Combined net profit of listed universal banks crossed ₹1 trillion for the first time in a quarter, with three banks contributing at least 50%
5 mins
February 23, 2026
Business Standard
India, US postpone trade deal talks after Trump tariff verdict
Move comes as Washington trying to figure out legalities
2 mins
February 23, 2026
Business Standard
Oil PSUs spent 81% of FY26 capex target until Jan
India’s oil public-sector undertakings (PSUs) have utilised 81 percent of their targeted capital expenditure for the current financial year by January end, according to fresh data sources from the oil ministry, as firms work aggressively to boost domestic production and refining capacities.
1 mins
February 23, 2026
Business Standard
'Need to have one common standard on AI regulation'
Mastercard’s Chief Privacy officer Caroline Louveaux, in a conversation with Avik Das on the sidelines of the AI Impact Summit in New Delhi, calls for laws that are principle-based, future-proof, and tech-neutral to help enterprises adopt artificial intelligence (AI) at scale. Edited excerpts:
2 mins
February 23, 2026
Business Standard
High-frequency indicators point to moderation in Q3 GDP growth
Following higher than expected gross domestic product (GDP) growth of 8.2 per cent in the second quarter (July-September) of FY26, the Indian economy is expected to see some moderation in the third quarter (October-December) due to an unfavourable base effect and a slowdown in several key growth indicators.
2 mins
February 23, 2026
Business Standard
Economists explain our messy lives
Everyone sounds smarter when they argue in the language of economics.
2 mins
February 23, 2026
Business Standard
RRTS corridor comes live — with fastest metro
Country’s first RRTS corridor will cut travel time between Delhi and Meerut to under an hour
2 mins
February 23, 2026
Listen
Translate
Change font size
