Microsoft has a new safety system that can catch hallucinations

None

The Verge spoke with Sarah Bird, Microsoft’s chief product officer for responsible AI, who says her team has created a number of new safety features that will be simple to use for Azure users who aren’t recruiting teams of red teamers to test the AI services they developed.
According to Microsoft, Azure AI users working with any model hosted on the platform can use these LLM-powered tools to identify potential vulnerabilities, keep an eye out for hallucinations “that are plausible yet unsupported,” and instantly block malicious prompts.
Since not every customer possesses extensive knowledge of hateful content or prompt injection attacks, the evaluation system creates the prompts required to mimic these kinds of attacks.
Following that, customers can view the results and receive a score, according to her.
As an example, the recent controversy surrounding explicit celebrity fakes (created by Microsoft’s Designer image generator), historically incorrect images (created by Google Gemini), and Mario piloting a plane toward the Twin Towers (created by Bing) are examples of unintended or undesirable responses resulting from generative AI.
The following three features are currently available in preview on Azure AI: Groundedness Detection, which detects and blocks hallucinations; Prompt Shields, which blocks malicious prompts from external documents that instruct models to go against their training; and safety evaluations, which evaluate model vulnerabilities.
Soon, two more tools will be available to guide models toward safe outputs and track prompts to identify users who might be a problem.
Before sending it to the model to respond, the monitoring system will check to see if it triggers any forbidden words or contains hidden prompts, regardless of whether the user is typing in a prompt or if the model is processing third-party data.
The system then evaluates the model’s response to see if the model saw anything that wasn’t in the prompt or the document.
In the case of the Google Gemini images, Microsoft claims that its Azure AI tools will enable more individualized control in this area because filters intended to lessen bias had unexpected consequences.
Bird and her team added a feature that allows Azure customers to toggle the filtering of hate speech and violence that the model sees and blocks. This was done in response to concerns expressed by Bird that Microsoft and other companies might be deciding what is or isn’t appropriate for AI models.
Users of Azure will eventually be able to obtain a report on users who try to cause unsafe outputs.
According to Bird, this makes it possible for system administrators to distinguish between users who may have more malevolent intent and those who are part of their own red team.
Bird claims that the GPT-4 and other well-known models, such as Llama 2, have the safety features “attached” right away.
However, users of smaller, less popular open-source systems might need to manually point the safety features to the models because Azure’s model garden contains a large number of AI models.
Ever since more users express interest in utilizing Azure to access AI models, Microsoft has been utilizing AI to strengthen the security and safety of its software.
The business has also endeavored to increase the quantity of potent AI models that it offers; most recently, it signed an exclusive agreement with the French AI startup Mistral in order to provide the Mistral Large model on Azure.
POSITIVE

Microsoft’s chief product officer for responsible AI, Sarah Bird, tells The Verge in an interview that her team has created a number of new safety features that are simple to use for Azure users who aren’t recruiting teams of red team members to test the AI services they’ve developed. For Azure AI customers working with any model hosted on the platform, Microsoft claims that these LLM-powered tools can identify potential vulnerabilities, keep an eye out for hallucinations “that are plausible yet unsupported,” and block malicious prompts in real time.

The evaluation system creates the prompts required to mimic these kinds of attacks because we are aware that not all of our clients have extensive experience with hateful content or prompt injection attacks. Then, she says, customers can view the results and receive a score.

This can assist in preventing generative AI-related controversies brought about by unwanted or inadvertent reactions, such as the ones that have recently involved explicit celebrity fakes (created by Microsoft Designer image generator), historically incorrect images (found on Google Gemini), and Mario flying a plane in the direction of the Twin Towers (published by Bing).

Three features are now available in preview on Azure AI: Groundedness Detection, which detects and blocks hallucinations; Prompt Shields, which blocks malicious prompts from external documents that instruct models to go against their training; and safety evaluations, which evaluate model vulnerabilities. Soon, two more tools will be available to guide models toward safe outputs and track prompts to identify users who might be a problem.

Before choosing to forward it to the model for response, the monitoring system will see if it contains any hidden prompts or triggers any prohibited words, regardless of whether the user is answering a prompt or the model is processing data from third parties. Subsequently, the system examines the model’s response to see if the model saw something that wasn’t in the prompt or document.

Microsoft claims that its Azure AI tools will enable more customized control in this area. In the case of the Google Gemini images, filters intended to reduce bias had unintended effects. In response to concerns expressed by Bird and her team about Microsoft and other businesses potentially controlling the content that AI models should or shouldn’t use, the team included a toggle feature for Azure users to control the amount of hate speech and violent content that the model allows through and filters out.

A report of users who try to cause unsafe outputs will be available to Azure users in the future. According to Bird, this enables system administrators to identify users who may have more malevolent intentions and those who are part of its own red team.

According to Bird, the safety features are “attached” right away to the GPT-4 and other well-known models, such as the Llama 2. That being said, users of less popular, smaller open-source systems might need to explicitly point the safety features to the models because Azure’s model garden has a large number of AI models.

Ever since more users express interest in utilizing Azure to access AI models, Microsoft has been utilizing AI to strengthen the security and safety of its software. The business has also made an effort to increase the quantity of potent AI models it offers; most recently, it signed an exclusive agreement with the French AI startup Mistral to make the Mistral Large model available on Azure.

scroll to top