Intentar ORO - Gratis

To fix AI, first break it: Red teaming for AI safety

The Sunday Guardian

|

July 06, 2025

Artificial intelligence is transforming society at an unprecedented pace, from generative chatbots in customer service to algorithms aiding medical diagnoses.

- POOJA ARORA

To fix AI, first break it: Red teaming for AI safety

Along with this promise, however, come serious risks AI systems have produced biased or harmful outputs, revealed private data, or been 'tricked' into unsafe behaviour. In one healthcare study, for example, red-team testing found that roughly one in five answers from advanced AI models like GPT-4 was inappropriate or unsafe for medical use. To ensure Al's benefits can be realized safely and ethically, the tech community is increasingly turning to red teaming - a practice of stress-testing AI systems to identify flaws before real adversaries or real-world conditions do.

In simple terms, red teaming is about playing 'devil's advocate' with AI systems - actively trying to break, mislead, or misuse them to expose weaknesses.

Originally a military and cybersecurity concept, red teaming refers to an adversarial testing effort where a 'red team' simulates attacks or exploits against a target, while a 'blue team' defends.

In the AI context, AI red teaming means probing AI models and their surrounding systems for vulnerabilities, harmful behaviours, or biases by emulating the strategies a malicious or curious attacker might use.

In essence, a red teamer tries to ask, 'How could this AI go wrong or be made to do something bad?" and then systematically tests those scenarios. Red teaming in AI goes beyond just the model's answers - it can involve examining the whole pipeline (data, infrastructure, user interface) for weaknesses. As modern AI models are open-ended and creative by design, they can also be creatively misused.

MÁS HISTORIAS DE The Sunday Guardian

The Sunday Guardian

The Sunday Guardian

The world order changeth gradually, though surely

No single nation or its leader, including the USA or China, can assume stewardship of the emerging, diffused global order.

time to read

6 mins

January 04, 2026

The Sunday Guardian

The Sunday Guardian

WHY THE SHANTI BILL CAN REDEFINE INDIA’S ENERGY FUTURE

India’s clean energy transition is primarily discussed in terms of solar additions, wind corridors, and storage technologies.

time to read

4 mins

January 04, 2026

The Sunday Guardian

Fantasies about Russia may spark World War III

Peace would result in it being too obvious to hide even within Zelenskyy's European backers, that the war being conducted at great human cost was futile from the start.

time to read

5 mins

January 04, 2026

The Sunday Guardian

The Sunday Guardian

New jihadi module IMK busted in Assam

An offshoot of Bangladesh-based JMB, IMK propagates the ideology of ‘Ghazwatul Hind’

time to read

4 mins

January 04, 2026

The Sunday Guardian

Delhi court convicts man in 2017 murder case

A Delhi court has convicted a man for murdering a youth by hitting him with a bamboo stick during a late-night quarrel at the Anand Vihar ISBT in 2017.

time to read

1 mins

January 04, 2026

The Sunday Guardian

The Sunday Guardian

INDIAN NAVY PLANS TO INDUCT A WARSHIP EVERY SIX WEEKS

The Indian Navy is on track to induct ships at the rate of one every one-and-a-half months in the coming year, fuelling the economy as its maritime muscle is strengthened.

time to read

3 mins

January 04, 2026

The Sunday Guardian

PM to flag off first Vande Bharat sleeper train from Guwahati

Ahead of the upcoming assembly elections, Assam and West Bengal will get the country's first Vande Bharat sleeper train.

time to read

1 mins

January 04, 2026

The Sunday Guardian

The Sunday Guardian

Transport Ministry proposes Aadhaar-like numbers for EV batteries

The transport ministry has proposed assigning Aadhaar-like unique identification number to EV batteries to ensure their end-to-end traceability and efficient recycling.

time to read

2 mins

January 04, 2026

The Sunday Guardian

Congress’ seat claim strains Assam opposition unity

Congress's aggressive seat target unsettles allies as opposition struggles to finalise Assam election strategy.

time to read

3 mins

January 04, 2026

The Sunday Guardian

The Sunday Guardian

How CCP is ‘assimilating’ Inner Mongolia

The most decisive tool of assimilation has been language policy. Mongolian-medium education has been systematically dismantled, replaced with Mandarin instruction.

time to read

2 mins

January 04, 2026

Listen

Translate

Share

-
+

Change font size