OpenAI Launches Red Teaming Challenge for New Open-Weight LLMs – Infosecurity Magazine

OpenAI has rolled out two new open-weight large language models alongside a new a red teaming challenge with a prize fund of $500,000.

On August 5, at 10am Pacific Time (PT), Sam Altman, OpenAI’s CEO, posted “gpt-oss is out” on his social media.

Gpt-oss, which stands for ‘GPT open source,’ is now available in two versions:

gpt-oss-20b, a medium-sized model that can run on most desktops and laptops with 16GB of memory
gpt-oss-120b, a large model designed to run in data centers and high-end desktops and laptops, requiring 80 GB of memory

At the same time, OpenAI launched a red teaming challenge for gpt-oss-20b on Kaggle, a competition platform for data science and artificial intelligence contests.

The objective is to encourage researchers, developers and AI hobbyists help identify novel safety issues.

GPT OSS Fine-Tuned to Solve Capture the Flag Competitions

According to Altman, gpt-oss-120b “is a state-of-the-art open-weights reasoning model, with strong real-world performance comparable to o4-mini.”

“It’s a big deal, [and] we believe this is the best and most usable open model in the world,” he added.

Both models are available for developers on most AI and cloud platforms, including Azure, Hugging Face, vLLM, Ollama, and llama.cpp, LM Studio, AWS, Fireworks, Together AI, Baseten, Databricks, Vercel, Cloudflare and OpenRouter.

According to Eric Wallace, a researcher at OpenAI, responsible for safety, robustness and alignment, before releasing the models, OpenAI conducted a "first of its kind safety analysis" to "intentionally maximize their bio and cyber capabilities."

The goal of this analysis was to "estimate a rough 'upper bound' on the possible harms from adversaries."

To do this, they fine-tuned the models with in-domain data to maximize biorisk capabilities and with a coding environment to solve capture the flag (CTF) competitions for cybersecurity.

Wallace said his team found that the "malicious-finetuned gpt-oss underperforms OpenAI o3, a model below Preparedness High capability" and that while it "marginally outperforms open-weight models on bio capabilities," it "does not substantially push the frontier."

Today we release gpt-oss-120b and gpt-oss-20b—two open-weight LLMs that deliver strong performance and agentic tool use.

Before release, we ran a first of its kind safety analysis where we fine-tuned the models to intentionally maximize their bio and cyber capabilities 🧵 pic.twitter.com/err2mBcggx

— Eric Wallace (@Eric_Wallace_) August 5, 2025

Red Teaming GPT OSS Challenge

Additionally, OpenAI launched a red teaming challenge, tasking participants with probing its newly released open-weight model, gpt-oss-20b.

The goal is to uncover previously undetected vulnerabilities and harmful behaviors ranging from lying and deceptive alignment to reward-hacking exploits.

Participants are invited to submit up to five distinct issues along with a detailed, reproducible report.

The challenge focuses on a number of specific "topics of interest," which comprise several nuanced and sophisticated forms of model failure.

These include:

Reward hacking, where a model finds shortcuts to maximize metrics without truly solving a task
Deception, where a model knowingly emits falsehoods to achieve a goal
Hidden motivations (deceptive alignment), where a model's internal goals differ from its training objective

Other areas of concern include sabotage, inappropriate tool use and data exfiltration, all of which represent significant potential harms from misaligned AI systems.

Submissions are evaluated on several criteria, including the severity of harm, breadth of harm, novelty and reproducibility of the findings.

Participants must submit their findings in a structured format and with a Kaggle Writeup that details their strategy and discovery process.

The judging panel is comprised of experts from various labs, including several from OpenAI, who will score submissions to ensure the best progress for safety research.

The competition encourages creativity and innovation, allowing for various methodologies and rewarding participants who share open-source tooling and notebooks to help the broader community build on their work.

The hackathon started on August 5, 2025, and all final submissions are due by August 26, 2025, at 11:59 PM UTC. The judging period will then take place from August 27 to September 11, 2025, with an estimated winner's announcement on September 15, 2025. A virtual workshop is scheduled for October 7, 2025.

AI Boom Attracts New Security Talent

Speaking to Infosecurity during Black Hat USA, in Las Vegas, on August 5, Victoria Westerhoff, the director for AI safety and security red teaming at Microsoft, praised the approach OpenAI is taking on AI red teaming, which includes launching such open red teaming challenges and building the OpenAI Red Teaming Network.

During a panel session at the AI Summit that was held prior to the Black Hat event, Westerhoff also showed optimism for the future of AI security, stating that the excitement around generative AI and agentic AI could bring new profiles into cybersecurity.

“I think in the next three to five years, there is an opportunity, with AI adoption, to mine the plethora of people who are obsessed with AI security right now and who, a few years ago, would never have been involved with traditional cybersecurity,” she said.

Some of these new profiles include people involved with national security or neuroscience.

“We want to stand on the shoulders of giants and use new perspectives, broadening the scope of experts involved in security,” she added.