Insights Where do off-the-shelf AI content safety systems fit into the picture?

Where do off-the-shelf AI content safety systems fit into the picture?

A major concern with AI content safety has been the fundamental vulnerability of most AI foundations models. Almost immediately after seeing AI models enter into production for the first time a year and a half ago, we saw issues with their capability to mitigate simple tricks. As time has progressed the foundation models have incrementally gotten better at mitigating those tricks and meta-prompt engineering has helped further but regardless we’ve seen those only partially effective. In order to mitigate those threats we’ve built custom shims in application logic to protect against harmful attack vectors and built walls around the non-deterministic capabilities of an AI agent. The need for these walls may never go away. However, a major step forward has been the development of off-the-shelf AI safety capabilities which complement the custom walls we’d build in an app. Rather than having to protect against everything we only need to build AI system walls against scenario specific situations, while general harms can be mitigated with the off-the-shelf platform. Let’s talk about what you’ll build yourself, vs. what you’d adopt from the a platform.

I’m going to spend most of this article using Microsoft’s Azure AI content safety system as an example. It isn’t the ONLY example, but it is a good example of intentional steps being taken by the leading vendor to understand and control the harms attached to a fast moving technology. To learn more, I’d highly recommend listening to Sarah Bird‘s latest talk on the TWIML AI Podcast. I’ve been following Sarah since before AI exploded with the GPT4 craze and have consistently been impressed with the measured approach she and Microsoft have taken to do this right.

First, let’s take a look at the content safety system. Notice in the image below, the protections for a diversity of content types, as well as security systems. Again, these aren’t perfect and in some cases you’ll turn the dials up (or down), which I’ll describe later. Remind yourself… do I really want to be trying to build content moderation for these categories, or should I (if possible), be using something I can adopt off the shelf?

Using the Content Safety System

1. Text Moderation

Text moderation is a critical aspect of the content safety system, designed to analyze and filter textual content for inappropriate language, hate speech, harassment, and other types of harmful communication.

Example of Use: Deploying text moderation in a public online forum where users can post comments and messages ensures that abusive or offensive language is promptly detected and filtered out, maintaining a respectful and safe environment for all participants.

Example of Non-Use: In certain medical settings text that might be caught as “inappropriate” might actually be necessary. In that case it could be difficult to use an off-the-shelf filter and instead additional controls might need to be built.

2. Image Moderation

Image moderation involves the analysis of visual content to detect and filter images that contain nudity, violence, or other inappropriate material.

Example of Use: Implementing image moderation on a social media platform where users can upload and share photos helps prevent the dissemination of explicit or violent images, protecting users, especially minors, from exposure to harmful content.

Example of Non-Use: For a medical consultation platform where healthcare professionals share X-ray images and other medical visuals, strict image moderation might incorrectly flag legitimate medical content, disrupting the workflow. Scenarios also where content might be flagged but not blocked would be cameras used to detect transportation injuries, such as on semi trucks or manufacturing.

3. Video Moderation

Video moderation capabilities extend the scope of content safety to moving images, analyzing entire video files for inappropriate scenes or audio.

Example of Use: Utilizing video moderation on a video-sharing website ensures that uploaded content is free of graphic violence, explicit sexual content, or hate speech, maintaining the site’s community guidelines and protecting viewers.

Example of Non-Use: On a platform dedicated to educational content for training purposes, such as non-violent de-escalation training. In these cases AI might be used to optimize the content for training purposes or even label it for “what would you do” situations.

4. Audio Moderation

Audio moderation focuses on filtering audio content for profanities, hate speech, and other harmful auditory communications.

Example of Use: Deploying audio moderation in a live-streaming service where users broadcast live sessions can help ensure that spoken content adheres to community standards, avoiding hate speech or explicit language.

Example of Non-Use: In a voice recording application used by journalists and researchers, rigid audio moderation might inadvertently censor valuable information, impeding the documentation process.

Instances Requiring Custom AI Content Safety Systems

1. Industry, or Solution-Specific Requirements

Almost every AI system has industry or solution-specific requirements that need to be considered that are not just for safety but also for pure functionality. Developers have become notorious for shipping half-baked systems that sort-of work, but then clearly show that the necessary work hasn’t been completed before pushing to production. Just because something works in POC and Production requirements show a need for more rigor is not the fault of the content safety system. Product teams need to think like true software developers when building AI systems and the necessary layers of production testing should be considered and the guardrails implemented around the system. This leads to leveraging testing frameworks that not only check against regular harms, but also evaluate a series of system-specific tests that need to work as expected, as well as building an increasing list of guard-rail tests to evaluate for functional boundaries.

2. Groundedness or Content Validation

Microsoft is releasing some pretty impressive groundedness protections, so this isn’t entirely something the off-the-shelf capabilities avoid, but we’re still finding that many cases require specific groundedness review or comparison, especially in highly unique evaluations. Where we might use groundedness protection as an initial stage, we may still run the response through a series of filters to ensure the information matches certain formats, limitations or frameworks that help limit harms and maximize the truthfulness of the content.

3. Business Systems

In many cases we are returning information from a business system that is retrieved by the AI system translating a human request into a business system query. This obviously needs to be guardrailed for security and correctness of output. Numerous protections need to be built to allow only certain interactions with the business system, mitigate attacks, and provide appropriate responses. An excellent experience here can have high rewards, but the right controls need to be built to mitigate the harms.

4. Legal and Compliance Considerations

Different jurisdictions have distinct legal requirements for content moderation. A custom AI content safety system can be built to ensure compliance with local laws and regulations, which might not be adequately covered by a broad-based system.

5. Copilot and Copilot Studio

You might be wondering, well… how does all this fit into Copilot and Copilot Studio, considering that they are built outside of the Azure platform detailed controls. For each of those there is an inheritance of the base M365 Copilot anti-harms protection, but what you might find is that there are presently limitations on what you can “dial down” in the protections. This isn’t necessarily bad, since this is where a lot of low-code apps will be built with minimal supervision. Note that there might be scenarios that come up where a desired functionality doesn’t work as expected, which might require movement into Azure Open AI and application of additional control dials, but in a sense that is a good thing.

Where to go from here?

The movement forward with AI needs to be a balance between enablement and governance, with both having equal parts in the picture. To enable without governance is irresponsible. To govern without enablement is pointless. Responsible CoEs will strike the right balance and seek to learn how harms can be mitigated while incrementally facilitating additional capabilities in appropriate stages.

Nathan