Tech News

Microsoft’s new security system can detect hallucinations in its customers’ AI applications


Sarah Bird, Product Director of Responsible AI at Microsoft, explains The edge In an interview, his team designed several new security features that will be easy to use for Azure customers who don’t hire groups of red teams to test the AI ​​services they’ve built. Microsoft says these LLM-based tools can detect potential vulnerabilities, monitor “plausible but unsupported” hallucinations, and block malicious prompts in real time for Azure AI customers working with any model hosted on the platform. shape.

“We know that not all customers have deep expertise in rapid injection attacks or hateful content. The assessment system therefore generates the necessary prompts to simulate these types of attacks. Customers can then get a rating and see the results,” she says.

Three features: Prompt Shields, which blocks prompt injections or malicious prompts from external documents that direct models to go against their training; Groundedness Detection, which detects and blocks hallucinations; and security assessments, which assess model vulnerabilities, are now available in preview on Azure AI. Two more features to direct models towards safe exits and follow-up prompts to report potentially problematic users are coming soon.

Whether the user types a prompt or the model processes third-party data, the monitoring system will evaluate it to see if it triggers banned words or contains hidden prompts before deciding whether to send it to the model for answer to. Next, the system examines the model’s response and checks whether the model’s hallucinated information is missing from the document or prompt.

In the case of Google Gemini images, filters designed to reduce bias had unexpected effects, an area where Microsoft says its Azure AI tools will enable more personalized control. Bird acknowledges that Microsoft and other companies could decide what is and isn’t appropriate for AI models. So his team added a way for Azure customers to enable filtering of hate speech or violence that the model sees and blocks.

In the future, Azure users will also be able to get a report on users who attempt to trigger insecure exits. Bird says this allows system administrators to determine which users are its own red team and which might be people with more malicious intentions.

Bird claims that security features are immediately “attached” to GPT-4 and other popular models like Llama 2. However, because Azure’s model garden contains many AI models, users of open systems Smaller, less used sources may need to manually point security. features to the models.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button