Independent evaluations demonstrate Nova Premier’s security

AI Safety is a priority at Amazon. Our investment in secure, transparent and responsible AI (RAI) includes collaboration with the global community and decision makers. We are members of and collaborate with organizations such as Frontier Model Forum, the partnership on AI and other forums organized by government agencies such as the National Institute of Standards and Technology (NIST). Consists of Amazon’s connection of Korea Frontier AI Safety Commission, we published our Frontier Model Safety Framework earlier this year.

Amazon Nova Premier’s protection frames help prevent the generation of uncertain content.

During the development of the Nova Premier model, we conducted a specified evaluation to assess its performance and security. This included tests on both internal and public benchmarks and internal/automated and third-party red teaming. When the final model was clear, we prioritized to achieve impartial, third-party assessments of the model’s robustness against Rai controls. In this post, we outline the most important findings from these evaluations that demonstrate the strength of our testing method and Amazon Première’s status as a safe model. Specifically, we cover our evaluations with two third -party evaluers: Prism AI and ActiveFence.

Evaluation of Nova Premier against Prism AI

Prism Eval’s behavior provided tool (bet) dynamic and systematic stress tests AI models’ safety protection frames. The methodology focuses on measuring how many conflicting trials (steps) it takes to make a model generate harmful content across multiple key rates -dimensions. The central metric is “steps to induce” – the number of increasingly sophisticated prompt required before a model generates an inappropriate response. A higher number of steps indicate stronger security measures as the model is more resistant to manipulation. Prism’s risk implies (inspired by MLcommon’s AI Safety Benchmarks) include CBRNE weapons, violent crimes, non-violent crimes, defamation and hatred among several others.

Amazon - Rai Figure —16x9_DEC3.PNG

Related content

From reinforcement learning and monitored fine -tuning to protective models and image marking, responsible AI was fundamental to the design and development of the Amazon NOVA models.

Using the Bet Evalt tool and its V1.0 metric, which is tailored to non-tiring models, we compared the newly honored NOVA models (Pro and first) with the latest models in the same class: Claude (3.5 V2 and 3.7 non-razing) and Llama4 Maverick, all Acaivable through Amazon Bed Grock. Prism Bet Black-Box performs (while model developers do not have access to the tests) of models integrated with their API. The evaluation was interconnected with Bet Evalt Max, Prism’s most comprehensive/aggressive test suite, revealed meaning variations in safety against malicious instructions. NOVA models demonstrated superior total security benefit with an average of 43 steps for Premier and 52 steps for Pro compared to 37.7 for Claude 3.5 V2 and fewer than 12 steps for other models in the comparison set (namely 9.9 for Claude3.7, 11.5 for Claude 3.7 thinking and 6.5 for Maverick). This higher step -count suggests that Nova’s security ranks are more sophisticated and more difficult to circumvent through conflicting incentive on average. The figure below presents the number of steps per Injury category evaluated through bet eval max.

Results of testing using Prisms Bet Eval Max Testing Suite.

Prism provides valuable insight into the relative security of various Amazon Bedrock models. Nova’s strong performance, especially in hate opinion and defamation resistance, meaningful progress in AI security. However, the results also highlight the nail challenge by building really robust security measures in systems. As the field continues to develop, frames as bet will play an increased important role in benchmarking and improve AI security. As part of this collaboration, Nicolas Miailhe, CEO of Prism Evalt, said: “It is incredibly rewarding for us to see Nova Outperforming strong base lines using Bet Eval Max; our goal is to build a long -term partnership against more secure when choosing models and to provide better deliveries.

Manual Red Teaming with ActiveFence

AI Safety & Security Company Activefensen Benchmarked Nova Premier at Bedrock on PROMPS, which was distributed over Amazon’s eight core Rai categories. ActiveFence also evaluated Claude 3.7 (Non-Rasing Function) and GPT 4.1 API on the same set. The flag frequency at Nova Premier was lower that on the other two models, indicating that Nova Premier is the safest of the three.

Model 3P flag speed [↓ is better]
Nova Premier 12.0%
Sonnet 3.7 (Non-Refurbi- 20.6%
GPT4.1 API 22.4%
Llm watermarking.ai.gif

Related content

Generative AI raises new challenges in defining, measuring and mitigating concerns about justice, toxicity and intellectual property. But the work has started with the solution.

“Our role is to think like an opposing treatment in safety,” said Guy Paltieli of Activefens. “By conducting a blind stress test of Nova Premier under realistic panty scenarios, we helped evaluate its security position in support of Amazon’s broader responsible AA goals, to make the model be implemented with greater confidence.”

These evaluations gathered with prism and activefence give us confidence in the strength of our protection frames and our ability to protect our customers’ safety when they are our models. While these evaluations demonstrate strong security benefits, we recognize that AI safety is a nail challenge that requires continuous improvement. These collections take snapshot-point-in-time and we remain obliged to regular testing and improving our security measures. No AI system can guarantee perfect security in all scenarios, which is that we continue to monitor and response systems after implementation.

Recognitions: Vincent Ponzo, Elyssa Vincent

Leave a Comment