يحاول ذهب - حر

To fix AI, first break it: Red teaming for AI safety

July 06, 2025

|

The Sunday Guardian

Artificial intelligence is transforming society at an unprecedented pace, from generative chatbots in customer service to algorithms aiding medical diagnoses.

- POOJA ARORA

To fix AI, first break it: Red teaming for AI safety

Along with this promise, however, come serious risks AI systems have produced biased or harmful outputs, revealed private data, or been 'tricked' into unsafe behaviour. In one healthcare study, for example, red-team testing found that roughly one in five answers from advanced AI models like GPT-4 was inappropriate or unsafe for medical use. To ensure Al's benefits can be realized safely and ethically, the tech community is increasingly turning to red teaming - a practice of stress-testing AI systems to identify flaws before real adversaries or real-world conditions do.

In simple terms, red teaming is about playing 'devil's advocate' with AI systems - actively trying to break, mislead, or misuse them to expose weaknesses.

Originally a military and cybersecurity concept, red teaming refers to an adversarial testing effort where a 'red team' simulates attacks or exploits against a target, while a 'blue team' defends.

In the AI context, AI red teaming means probing AI models and their surrounding systems for vulnerabilities, harmful behaviours, or biases by emulating the strategies a malicious or curious attacker might use.

In essence, a red teamer tries to ask, 'How could this AI go wrong or be made to do something bad?" and then systematically tests those scenarios. Red teaming in AI goes beyond just the model's answers - it can involve examining the whole pipeline (data, infrastructure, user interface) for weaknesses. As modern AI models are open-ended and creative by design, they can also be creatively misused.

المزيد من القصص من The Sunday Guardian

The Sunday Guardian

The Sunday Guardian

THE TERRORIST WHO CAME IN FROM THE COLD

Former insurgent-turned-president navigates shifting alliances while confronting Syria’s deepening internal crises.

time to read

5 mins

November 16, 2025

The Sunday Guardian

BJP LOOKS FOR BENGAL ENCORE POST BIHAR TRIUMPH

BJP says Bihar mandate has ‘laid the path’ to power in Bengal, giving oxygen to dislodge Mamata’s 15-year rule.

time to read

3 mins

November 16, 2025

The Sunday Guardian

The Sunday Guardian

PM’s call to sing Vande Mataram is an invitation, not an imposition

PM's initiative was not about rewriting history but reopening it so that Indians can decide for themselves what their heritage means. That is democracy at its purest essence.

time to read

5 mins

November 16, 2025

The Sunday Guardian

Karnataka’s sugarcane crisis escalates

North Karnataka’s sugarcane farmers, who launched a massive agitation over the past two weeks seeking a fair price for their crops, say that the State Government has virtually abandoned them.

time to read

1 mins

November 16, 2025

The Sunday Guardian

The Sunday Guardian

AKALI DAL SIGNALS REVIVAL THROUGH TARN TARAN BYPOLL

AAP won Tarn Taran bypoll, but the Akalis held on to their support base.

time to read

3 mins

November 16, 2025

The Sunday Guardian

TRUMP CUTS TIES WITH MARJORIE T. GREENE

PUBLIC SPLIT

time to read

1 min

November 16, 2025

The Sunday Guardian

The Sunday Guardian

Time for strategic renewal of India-ROK partnership

India and South Korea must be prepared to support one another in safeguarding their shared democratic values, national sovereignty, a stable Indo-Pacific order, and strategic autonomy amid intensifying great-power competition.

time to read

4 mins

November 16, 2025

The Sunday Guardian

Bihar propels Pradhan into BJP’s top power circle

Bihar campaign success elevates Dharmendra Pradhan’s profile within BJP leadership circles.

time to read

2 mins

November 16, 2025

The Sunday Guardian

The Sunday Guardian

The root of the problem of terrorism

Why is it important to put a label on these terrorists? Not to stigmatize an entire community. It is to bring a degree of clarity among the community; a clarity that will help them identify and stop the radicalization.

time to read

4 mins

November 16, 2025

The Sunday Guardian

CONGRESS DEFENDS RAHUL ON BIHAR

Many party leaders have questioned Congress’ losses and spoken about fixing responsibility

time to read

3 mins

November 16, 2025

Listen

Translate

Share

-
+

Change font size