Sales January 2025 Ten Elite University teams from all over the world participated in the first Amazon Nova AA challenge, Amazon Nova Ai Challenge trusted AI. Today we are proud to announce the winners and runners of this global competition:
Defense Winner: Team Purpcorn Plan, University of Illinois Urbana-Champaign
Attack team winner: Purcl Team, Purdue University
Defense Team Runner-Up: Team Alquistcoder, Czech Technical University in Prague
Attacking Team Runner-Up: Team Redtwiz, Nova University Lisbon, Portugal
“We have worked with security before, but this was the first time we had the chance to beas for a strong, real world,” said Professor Gang Wang, faculty advisor Purpcorn plan (UIUC). “Most academic teams simply date access to models of this caliber, so much less the infrastructure to test conflicting attacks and defense in scale. This challenge does not just smooth the game field, it gave our students the best of luck to shape the field.”
“For academic red holders, tests against high -performance models are often out of reach,” said Professor Xiangyu Zhang, faculty advisor for Purcl (Purdue). “Open weight models are useful for prototype, but they rarely reflect what has been newly implemented in production. Amazon gave us access to systems and settings that asked efforts in the real world. It made research and victory a long way.”
These teams rose to the top after months of iterative development and culminated with a high -rack, offline final held in Santa Clara, California on June 26 to June 27. There went the four best red teams and four model developer teams head-to-head in a tournament designed to test the security of AI coding models under conflicting conditions and the ingenuity of the researchers trying to break them.
A new era with conflicting evaluation
The challenge tested a critical question that the industry faces: Can we build AI coding assistants who are both useful and safe?
Unlike static benchmarks, which tend to focus on isolated vulnerabilities, this tournament contained live, multi-swing conversations between attacks and defender. Red teams built automated “Jailbreak” bots to fool AI to generate uncertain code. Defenders starting from a custom 8B coding model built by Amazon to the competition used reasoning-based protection frames, police optimization and vulnerability to prevent abuse without breaking the model tool.
Teams were evaluated using new measurements that balanced security, diversity of attacks and functional code generation. Malicious responses were identified using a combination of static analysis tools (Amazon Codeguru) and Expert Human Annotation.
Prize structure
Each of the 10 participating teams received $ 250,000 in sponsorship and AWS credits in support of their work. The two winners-Team Purpcorn Plan (University of Illinois Urban-Champaign) and Team Purcl (Purdue University)-AT won another $ 250,000 split between each team’s members. The two Runners-Up Team Assistcoder (Czech Technical University of Prague) and Team Redtwiz (Nova University Lisbon, Portugal) also awarded $ 100,000 each to split among their teams, which brought the total awards for the tournament to $ 700,000.
Highlights from the challenge
Here are some of the most effective progress that is uncovered under the challenge:
Multi-Turn Attack Planning proved far more effective than one-turn button
Winning red teams used progressive escalation, starting with benign prompts and introducing malicious intent to bypass ordinary protective frames. This insight strengthens the importance of tackling contradiction interviews with multiple oscillations in evaluating AI security. Several teams developed planning and probing mechanism that can hone and identify weaknesses in a model defense.
Reasoning -based security adjustment helped prevent vulnerabilities without degrading utility
Top Model Developer teams introduced predominant reasoning, security oracles and GRPO-based political optimization to teach AI assistants to reject uncertain requests while still writing useful code. This shows that it is possible to build a system that is safe by design without sacrificing development productivity, a key requirement for the adoption of AI coding tools.
Synthetic Data Reration was critical of scale
Defense Teams used a myriad of new techniques to generate and refine training data using LLMs, while red teams developed new ways to mutated benign examples for contradictory and used LLMs to synthesize multi-swing conflicting data. These approaches offered a way to compliment human red teaming with automated, low cost to continuous improvement of the model’s safety on industrial scale.
New methods exposed real trade -offs in safety vs. Functionality
To take the system system, defender models were penalized for over-refusal or excessive blocking, urge the team to build nuanced, robust security systems. AI in the industry must balance to balance dangerous requests while still helpful. These evaluation strategies surfaces the real world tensions between security and usability and offer ways to solve them.
“I have been inspired by the creativity and technical expertise that these students brought to the challenge,” said Eric Docktor, Chief Information Security Officer, Specialized Businesses, Amazon. “Each of the teams Brush Fresh Perspectives on complex problems that will help speed up the area with safe, reliable AI-Assised software development and promote how we separate AI systems from Amazon. What makes this tournament format Special Inside is to see how security concepts Really aprosarial pressure, which is crucial to build safe, reliable AI-coding systems on. “
“I am particularly excited about how this tournament method helped to understand AI security in a deeply practical way. What is especially encouraging is that we discovered that we do not have to choose between security and utility, and the participants showed us innovative ways of achieving both,” said Rohit Prasad, be kind to Amazon Act. “Their creative strategies for both protection and exploration of these systems will direct information on how to build more secure and reliable AI models. I think this kind of conflicting evaluation will be important when we work for development models that can rely on with their most important tasks.
What is the next
Today, the finalists at the Amazon Nova AI summit in Seattle are reunited to present their conclusions, discussion new risks in A-A-assisted coding and investigate how Appoversaric tests can be used for other domains with responsible AI, from healthcare to incorrect information.
We are proud to celebrate the incredible work for all participating teams. Their innovations are not only academic. They lay the basis for a safer AI future.