Insights View Recording: Responsible AI for Your Projects

View Recording: Responsible AI for Your Projects

How does a security team evaluate the safety of the AI systems exposed internally and externally?  What if the AI solution can be tricked into answering questions it shouldn’t?  How can I build protections against jailbreaking or answering questions it shouldn’t?  What about hallucinations?  … curious?  This session is for you.

Importance: This is potentially the most significant new frontier that security professionals will encounter in the next decade.  Arming security professionals with the ability to analyze an AI solution and mitigate misuse is a new skill for security teams.

Transcription Collapsed

welcome. We’re here to talk about responsible AI and your projects. This is our our scheduled session right against the keynote to build. We are glad you’re here and we’re glad that you’re gonna probably catch up on build as the day goes on, we are really excited to talk about responsible AI and you’ll notice a lot of content that runs in a very similar lane to what you’ll see. People like Sarah Bird talk about at build this week, so this is very effective compliment to once you’re gonna be hearing from some of those sessions. So today what we are going to talk about is a couple things in responsible AI, but I want you to introduce myself first. So my name is Nathan’s Lasnoski. I’m concurrency chief technology officer. You want to connect with me on LinkedIn? There is my QR code hit me up. I would love to talk with you. I’d love for you to see the content that we’re I’m producing every week on AI. We have a weekly newsletter that centric to AI content and many more pieces of content that I think you’ll find very useful. So connect with me. Follow me on LinkedIn and we love to have some conversations after this for you to be able to continue to learn in this space. So what would you learn in this session? First, we’re going to talk a little bit about what are the common AI concerns that cause people to be interested in having governance and having safety and having responsibility around their AI adoption. We’re gonna talk about what AI program lanes look like. So how do companies structure themselves to be able to gain ground? We’re going to talk about the stages of AI operations and how you move from POC to pilot to production in where different sets of capabilities need to be in place in order for you to be responsible. AI adopter and we’re going to talk about AI safety and how has you build these systems. You need to put in place controls that enable you to ensure that appropriate responses are going back to your internal or external customers, and we’ll talk about a couple of technologies that exist in order to be able to make that happen on as we’re having this conversation, by the way, put your questions in the chat. So there’s a healthy chat function in this webinar, so put them in the Q&A. Put them in the chat. I’ll do my best to answer them as we go along and as we are engaged in having these conversations, I’d be happy to follow up at the end as well. So shooting over as we have them, OK, so I’m going to start this conversation by talking about what are the general concerns that cause us to require responsible AI in the 1st place. So one of the first concerns that people have around adopting AI in their organization is the concerns are on data privacy in it of itself. So I speak with executive teams about two or three times a week regarding their adoption of AI in the alignment with the mission of their business, and one of the first things they have concerns around is wait. If I start layering AI into my applications or my platforms, the way that I work is that being used to train some new version of a model, is it being used for purposes that I don’t have control over? How do I think about who has access to that information? And those are all excellent concerns. The first course question you need to ask yourself as you’re going down the road of adopting AIS capability within your business is how are people making money off of this relationship? So do you think about some of the services that you might be using like empty 65 copilot or building a system that’s running on Azure? In those instances, you’re paying a license fee or you’re paying a consumption fee and there’s a commitment that commercial data protection surrounds those uses of AI. So it’s sort of safe adoption from a data privacy standpoint, at least as far as like retraining other models is concerned of leveraging that AI platform because that’s how your exchange of value works. However, if you’re using something like the free instance of chat GPT, are you Googling something? How do those companies make money? Well, Google particularly makes money by selling your information to market things to you. So there is a delineation between whether something should really be considered private or not, and then the sort of internal sense, it’s absolutely considered private. And that’s part of the relationship that you have. But these other senses it’s not. So you want to make sure that you’re going down your path of implementing a responsible AI system or practice within your organization has been adopted in a way that puts in place the guardrails of giving those safe alternatives to some of those external scenarios, because as Jurassic Park says, life finds a way right, people will find a way to use a if you don’t give them safe alternatives. Second item that’s directly connected an internal data privacy and essentially providing data effectively is data readiness in and of itself. So what I find many organizations are stuck on is this idea that my data is not ready in order for me to be able to leverage AI in an effective way. In some cases, that’s true, and that means that investment needs to be made to be able to implement the necessary controls and platforms and capabilities in order to surface the data that’s necessary for their use cases and other cases. It’s not. Another cases their data is very ready. If so, there’s opportunities for that data to be serviced more quickly, but governance needs to be placed around that because oftentimes that data is centric to certain types of people who need to be answering those questions or receiving answers for questions or taking action, whereas that data might not be for someone else. So we need to be very cautious about how data is exposed. Same kind of situation as things like M365. Copilot is the 10 years worth of documents I have in SharePoint and the historical security controls I placed around that sufficient for me as I adopt him 365 copilot. Probably not. Probably not. So as you’re going down that path, you’re finding that you need to implement different controls, even in the M-65 copilot, adoption to be very successful with that. So lots of great things to think about in the outline. And then the third responsible AI in governance tier is this idea of human displacement. Or you might turn it around and call it human enablement, but you can’t really talk about responsible AI without talking about how AI impacts every person. And if I were to stress one thing for you, in the context of this conversation, it would be every person in your organization has the opportunity to be more as a result of AI. And that message needs to come out as a result of not just it and message you to come out as a part of your executive message back to the business and the best businesses are the ones that are getting ahead of that message. They’re thinking about it in the context of the skills attached, the roles of every person in the business and how they drive effectiveness as part of their adoption pattern. So human displacement is definitely a thing. It’s also an enablement opportunity for us to be able to drive this as a force multiplier for every person. It’s also a need to train and engage people, because there’s one thing I’ve seen from this is that adopting AI skills, especially delegation to AI agents, is not an intuitive thing. It’s not an intuitive thing for people to take a task they’re doing right now and hand it off to an AI agent and doing an effective way. Let me give you an example of that. We’ve all learned at Google. We’ve all learned at Google by putting in instant Masons is like a lot less words than we would typically put when I’m scribing this to another person. Right. Because we just know that like, if I stress certain words, I get the kind of results of the websites that might have the answers to things that I’m interested in. However, with generative AI with copilots, more information is better. More context is better, just like if you’re talking to a person, giving them more context is better, or give them three word search results or three word search question I described for the person that thing I’m trying to accomplish and that in additional information is valuable. So why is this Responsible AI? Well, because it’s enabling people to be able to accomplish what it is that they actually are trying to get done and it’s enabling them to have the skills necessary to perform those kinds of tasks. So if I treat teach people not just a prompt, but I teach them to be able to interact with an AI agent and effective way in a sense in helping them to interact with internal over here that’s growing in that person’s capability over time. So human enablement is probably one of the most important things about AI enablement, and we’ll talk a little bit about that later. Now we’re also gonna talk about how do I mitigate quality hallucination controls that might be problems. So like is my AI platform answering questions appropriately. Hey, Responsible AI key problem to think about right? How am I building a system that answers my questions that is protected against many of the mitigate the security kind of issues like jailbreaking or someone moving around my system prompt to be able to do things they shouldn’t be doing, or to even just answering it wrong. I’m in it, indicating it’s getting it from source material, but that source material doesn’t actually exist, so we’re going to talk about some tools that can mitigate those problems and build some stability and confidence around the platform itself and the last and very highlighted area of responsibility. AI is just this impact on people via bias. That’s ingested into the data and then made present within the model itself or the way that the system interacts with people and that can creep into a lot of scenarios that you wouldn’t necessarily expect. So like I was working on a project with a company that did large, so that did a lot of of apartment management. OK. So like they own departments, they were managing departments and they were trying to predict occupancy rates for different locations using different sets of data. And very quickly, they found out that like there were certain data points in there that they shouldn’t be using as components of how they’re doing that prediction because of the way that it started to align different types of human groups and socioeconomic factors that shouldn’t be built into that kind of experience. So we have to think about bias in a way that’s very relevant to the experience that we’re building and building one that really provides the best results for all people, not just certain subgroups. So all this is going to be part of our conversation today and looking forward to sort of digging into each element. OK. So as we talk about program lanes, you may have seen this before if you’ve been in some of my other sessions, I think it’s important to understand how we’re now talking about this section, right? So we have two different lines that would exist in our adoption. We have our commodity lane which think about that as AI capabilities release from products that you’re buying SAS products, M365, copilot, GitHub copilot things in Salesforce, there’s all these capabilities are being released as parts of the products that you have. It’s very much about adoption. Let’s move building something and then this top lane is very much about understanding the very nature of the business. Understanding how on pivoting my business to be more digital in nature or augmenting it with digital capabilities, either revenue production or operations and how AI enables that mission of the business and on that lot of times you’re building a semi custom solution or a fully custom solution in that space. And for both of these lanes, we need to be thinking about how Responsible AI it’s into this picture. And that’s human enablement. It’s outcomes that are governed and managed in the balance between enablement and control, and never can you kind of have them unbalanced, right, like too much control and you don’t get anything done and too little control and you’re generating answers that aren’t even successful answers and you’re building 10 versions of the same copilot. So you end up with this sort of problem of not having control your arms around the usage of the AI technology. So successful organizations are thinking about how they create AI Center of Excellence or enablement organization. That’s job is to shepherd the organization from this previous era to this new era. Like if I if you knew that era of the Internet was coming to a business or the industrial revolution was coming, you put a lot of intentional energy behind that. This is very similar to those two changes we have to put a lot of intentional energy behind how our business navigates that shift and to do it proactively. So we’ll talk about how that all fits in. So I’m gonna put. I found this to this is sometimes a little detail for people in the sort of business zone, but bear with me for a second. If you’re in the tech space, I found this to be super helpful for people, and as we’re starting to talk through some of these types of Responsible AI use cases, I want you to have an understanding that if fits across a lot of subcategories of how this is being talked about. And as you’re thinking about different tooling that you use to serve different problems, you’ll notice that the Responsible AI has different sort of points of view across these different structures. So one example of Responsible AI is using the right tool for the right job, so you can see here an example of how as I move from commodity to mission driven and as I move from lower degrees of precision and greater explorability to more specific and accurate ML models that have less explorability and more specific answers, you can see how responsibility in these answers responsibility on this. It fits in different boxes, so if I’m sitting up in this public AI space, it’s really about when my employees are using public AI. Are they using that in a way that is wrapped with commercial data protection? So what they put into public AI? Is it making its way out into the retraining ecosystem of the rest of the Internet? So that’s where things like Microsoft copilot you might recommend is recognized as Bing chat enterprise or copilot.microsoft.com if you have an M365 tenant, you can essentially use a a private version of a chat, GPT or a private version of the copilot experience that is not used to retrain models and is not exposed out to the Internet. That can gain those fast answers, but that’s example of governing. What tools people use and what tools don’t people use and then the second box governance and Responsible AI here is more about how do I ensure that when people are using copilot they have access to only the information they should have access to. Because we’ve done the oversharing report, we’ve locked down certain areas of the SharePoint site that just we don’t even want to include in the semantic index. And we know how to enable people to get the best results from their questions of data in the corporate corpus and other other senses. So we’re really driving toward making sure that people have the right kind of output outcomes from the right data that’s available to them. And then as we move down to semi custom use cases, this is when you’re building things on things like copilot Studio and you’re enabling very specific tasks like returning information from my HR system or putting a chat bot in from ServiceNow. And in those senses, sometimes they’re very internally focused and good is good enough, like. But you also are setting some guardrails that like the types of use cases you’re allowing your team to build inside the semi custom space might have to fit into certain degrees of criteria because when you get into fully custom use cases like next best action or a chat bot that’s exposed to your end customers, we’ve seen the failures of when people have done that poorly. We know that the bar goes much higher in terms of the kinds of outputs that we can expect or should expect from that platform. So if I were to start from scratch, it’s solve the right problems. Understand your why. Then move into. So start with the why then move into the What then move into the house. And this is starting to get into the How conversation, right? The How conversation guiding your team members into the right zones of where they should be doing their work. So as we double click into that, realize these are all part of the same ecosystem. So as you’re building a set of different AI capabilities, another reason why Responsible AI is really important is all of these fit together. So if I’m using M365 copilot and I built a custom chat bot and copilot studio that it’s moving questions over to to answer as a skill and then maybe I’ve even built a fully custom solution that copilot studio hands off to to answer back all the way back to the copilot that I originally answered. The question you start to see how this forms you nique, copilots or unique. Even agents that are related to each other and answering related questions, so tons of opportunity to be able to create really interesting outcomes from your governed environment, but also realizing that like you don’t want everyone to create the same chat bot, right? I’ve I was talking to the company. They’re gonna have hundreds of chat bots, right for different purposes, but they really want to make sure that the only build each one once, right? If I build it for customer service for this particular lane, I don’t want someone else to like go do that because they exist in a different region like we can build something that’s inclusive across the whole organization. So let’s move this into talking about what operationalization as responsibility looks like. So as and particularly, I’m talking about fully custom solutions, but it also kind of fits into the semi custom solution space as well. So as we’re talking about fully custom solutions, what you should know is that as you move from a proof of concept to MVP to a full operationalized system, the bar keeps going up. Of what you should expect and what I often see is that a system that was built for POC suddenly is in production and none of the rest of this work has been done. So it’s in production and it’s doing stuff and then all of a sudden it doesn’t work or someone’s left for the day or the the answers aren’t appropriate. And it’s because the rest of this activity hasn’t been done to make it successful at it fulfilling a true operationalized platform. So POC, you’re getting your data, you’re doing some basic feature engineering, you’re doing ML development, you’re creating a model. When I go out to do envisioning sessions, I always ask the people I’m working with for some internal documents or customer service support documents and different things I could feed into a chat bot that I can use to create a pilot like, not even a pilot POC for this Executive conversation, and it takes me a I don’t know an hour to build that and and The funny thing about it is like there’s a the the gap between a completed demo Wible system as a POC and it actually usable ML OPS oriented system is a is a. Big jump. It doesn’t always feel like it because you got that demo out fast, but the rest of this infrastructure is really important. So think about I got a POC in place. I then have to start iterating on that and I have to build rigorous controls around serving infrastructure. How I test it? How I do validation and then how I automate the release and how I monitor the results and how I pull those results back into the feature engineering activity so I continually have this loop of how I’m capturing what’s happening with the system and then I have the actual configuration of the system and the metadata management and all of this fits together in terms of creating this cycle of how I manage your time. So what you’ll notice in some of these forthcoming slides will be everything I do now. It has to be built on the idea of operationalizing the environment and true responsibility. I is baked around that maturity that has to exist to do that well. So how do we build a safe AI system? Like, how do we build a system that is successful at being fully operationalized? Well, this is a breakdown of kind of what the overall cycle looks like. You have this initial ideation activity that’s happening. This is a cyclical process, by the way. This isn’t like one, it just leads to the other. We have this ideation, exploring, figuring out what to do that was sourced from a business need from the customer, from the internal and external customer. In most of this, then exists in this building in augmenting system that then facilitates the movement into operationalizing the environment. We’re going to focus most of our energy for this conversation on how we build a system and put kind of guardrails around the building of that system that starts to take care of many of these controls that we need, and on the other side of that is the why are we even solving this? The first place and have that built up the so ML OPS cycle that supports that and both are necessary, but we’re going to focus most of this conversation on this building and augmenting zone. So when you think about building an environment and we talked about many of these in our in our previous conversation, the foundation models we’re using create all sorts of cop capability. I think about this like. I think about this like when I say that there’s a very specific answer to a question and AI ask like I used to. Just look at the FAQ document and it it contains this answer right. Why do I not like? If I had an FAQ site that had 2000 frequently asked questions on it, me being able to browse and interact with that site would be really complicated, right? So like be almost unusable, which is why FAQ sites are usually like frequently asked questions. Me like 5 questions right? Like the five things I’m most likely most often answer. However, I could ask a person who has all that information and they can quickly answer it for me. Why do I know that they knew the answer why I wear in their brain. Does that exist? When they answer that question, we’re certainly learning more about the human brain. But like just the idea of like where that exists in the brain like is it my right side on the left side is like what neuron fired to do it very complicated right? So the idea of saying wow, like now that I have this ability for generative AI to do these very incredible things in terms of how it leverages my information to answer questions in human like way, it also creates these question marks of how did it answer that question which makes all of these potential harms things that get highlighted, right? So on ground and outputs like if a person doesn’t know the answer to something, sometimes they’re up front about that. Sometimes they aren’t up front about it. Sometimes they say question things like might or might may, or probably we always inject these little elements, right? Well, AI tools need to be built to do a lot of the same things and we need to control those kinds of answers as well. But I AI platform doesn’t know. I don’t want it to provide an answer. Which is it? Inaccurate answer back to you jailbreaks, one of the more common things like getting assisting it to reveal it’s system message. There’s so many ways that you could sort of get around and harm these platforms if you don’t have the appropriate controls put in place. We’ll talk about some of those harmful content, like answering questions. It just straight up shouldn’t answer because it’s harmful copyright infringement. There’s a Microsoft has done a really good job on this from my God, large language model standpoint and putting some legal statements in place that they’re actually liable for the usage of the GPT models versus you. But you could also say am I using someone elses copyrighted material in my ground in content that I’m not allowed to use now in then more like manipulative behavior that we need to think about. So all these are controls that we aren’t problems. These are harms that we need to mitigate. So how do we mitigate them as we think about our responsible AI practices, how do we mitigate harms and maximize benefit? Well, there’s different control levels that exist at each spot. The model, the safety system, the metaprompt and grounding, and the user experience itself, and this is the responsibility of the platform. And then this is the responsibility. The application then have to play A to part together to be able to accomplish our goals. When we first got into AI, you know, we started with data scientists that were building things like demand, inventory forecasting, miles, but very quickly, this became a conversation of building AI systems, building AI applications, not just AI models. Right model needs to be served and needs to be used. So we need to think about our Responsible AI in four different major contexts, so we can put the governance around it. So the first mitigation layer that we need to think about is the model itself, and that’s essentially inside the model. There’s a set of controls to what it will answer or not answer, and you’ll have to judge some of your model adoption based upon what are the security layers that are built into the model. So you can see that, for example, GPT Forbes GPT 4 O, it has more mitigation layers built into the model itself, then say what early 3.5 or two had. And on certainly that’s the case with many of the other, like small models or large models, you can get that that are popping up on hugging face and so on. There’s many different possible models you could be using, and you need to consider like what controls exist in the model itself before I even go outside of the model zone controls. The second is the safety system that surrounds the model. So safety system? Uh. Like have I started to build controls around the actual the actual model that starts to mitigate certain issues. So for example, some of the things that are starting to exist within the open AI platform on Azure but also in front of other types of models, this will take a step back safety system, meaning it doesn’t actually limited just to open AI. You could use many other models in this association with this as well, and this is saying things like severity, scores associated with different types of content you’ll saw show you that in a second the ability to put different types of, umm, sort of mitigating different types of harms and putting different options high, medium, low on different types of harms. The ability to put in place blocklists, the ability to have content filters associated with different types of safety. So we’ll show you this. Show you this in a second. So here we go. So this is an example of putting filters in place into a safety system before they are surfaced for your platform. So various things that people have definitely shown, systems are providing the problems with right. So if you ask your model to uh talk about self harm for example like and you certainly don’t want it to do that, right? And you can even put in the metaprompt like you will not talk about this and you will never talk. These are some controls and going to put in place, but even with that you might find that it doesn’t. All people can kind of work their way around it via a lot of just serve complicated prompting techniques. So what’s happened is, and we built this, it’s interesting how fast this accelerated because when we first started doing this, we had to sort of build the safety system, right. So like you ask a question and what happens is that goes to the model. The model returns an answer, right? Umm, what we’ve had to do is essentially Shim between those two things. A safety system that would block and manage certain types of content. So we would just build that ourselves right now, let’s have. Amy Cousland 29:19 They made. I hate to maybe I hate to disrupt you here, but I people are not seeing the deck. I can see it OK, but is there anything I’m getting messages and I was trying to share the deck but I see it so I’m hoping that we can. Nathan Lasnoski 29:25 Ohh. Amy Cousland 29:31 Maybe there’s an issue there? Nathan Lasnoski 29:33 Oh, thank you. I’ll reshare it let let me just reshare it and make sure that that shows back up. That’s that’s not good. Amy Cousland 29:40 They’ve got some people who can see it, so just just make sure that we. Nathan Lasnoski 29:46 That’s showing up now. Amy Cousland 29:51 They’ve got people saying it’s visible and we’ll be just so everybody knows we’ll be sending this PDF out to everybody who’s been on who’s on who today afterwards, just so you have it. Thank you so much, Nate. Like you get back. Nathan Lasnoski 30:01 OK. Yep. Thank you. OK. Well, that’s that’s great stuff. Thank you for telling me that. OK. Umm, So what you can see here is varying different levels of uh user prompts. Glad you can see it now. Very different levels of ohh protection and prompts and then you see even layered into that are additional types of filters. So, like and what Microsoft is doing is building things like prompt SHIELD for jailbreaking, specifically looking for protected materials where there might be third party material that is not something you should be using, like a song or a news article or code that’s protected materials that’s coming from code. So all of these are things that are starting to be built into this new layer that’s going to exist between the person and the the model that’s answering the question. And then this also is gonna be something that gets injected around anti groundedness so that it is this answer actually coming from the source material. So being able to check for that kind of answer and being able to validate that that is something that is really coming from the source material, which is something that’s essentially a ground in the score. So indicating did that come from the right place? Did it not come from the right place? Which you’ll notice in this previous item is you have the ability to input certain sets of tests, right? So you can even sort of feed it a preset of tests or preset of sort of test material to look for these kinds of mitigate able problems without you having to kind of come up with the tests. So there’s a lot of options to be able to have it look for it and then validate it and give you a score against how well is your system already mitigating for these and how? Where should we be sending the threshold level so my, you know, Stark recommendation would be don’t build a custom AI system without having these kinds of filters in place, especially now that it’s productized within the sort of open AI platform. So this is an example of shell on. They did the same thing. This actually existed before the Azure AI content safety, but they essentially said like anything that’s going in production, we are building our AI content safety moderation feature that exists between you and it. And the job is to make sure that that pipeline exists and to the degree that may be something necessary for your custom system based upon the knowledge that you want to make sure it’s protected and you’ll see a couple of examples of that that’s in the successive conversations here. Umm, where? Like those very strict governance rules were built in addition to the safety system, to be very controlled around. What this will answer what it will not answer, and that’s another very important control would be especially as you’re building external systems, something that’s being put in front of your customer and so obvious to say, but it’s important to note is you want to be a little bit more not even a little bit, a lot more focused on it answering exactly the kinds of things you wanted to answer and everything else being blocked off. You’re not building the next chat, GPT. Your will at least not for most things. You’re building a ability to answer questions against manuals or to interact with your tax return or whatever it is, right? It’s specific to a job to be done. Focus on the job to be done. Don’t make it too broad, because that’s where some of the risks come into play. So this allows us to start about in addition to the safety system we just talked about building in metaprompt and grounding controls. So what does Metaprompt framework it’s to say my system message for my platform needs to define what my system does and what you can. I’ll show you this in a second. That what, like a model without a effective metaprompt framework is much more effective. It’s much more risk risky because it’s easier to compromise. You haven’t told the system what it’s intent is. You just left it sort of blank, right? So it’s gonna allow a person to do more unnecessary things or inappropriate things with your platform. Then if you had built the right metaprompt framework so you can see here we’re defining the model’s profile and its capabilities, we’re defining its output format, so we’re not allowing our customer to be as creative with how output content. We’re being very specific about what we let it do. We’re defining examples of what people can do with the do with the model, and then we’re saying here are the behavioral and safety guardrails that you are held accountable to. In a sense, you’re describing this like you would describe it for a person. This is like if you said, hey, you’re hired in my company and this is your role. So you have these capabilities. You’re gonna answer questions about this truck and when you answer these questions, you want to answer them professionally with this format and this style with this template. And when you get into difficult situations, I want you to escalate to this person or this process, and then you aren’t gonna answer questions about these kinds of things. I don’t care if you have an opinion on it. That’s not about our company. So you’re not gonna answer these questions. So you’re defining this this box under which you’re platform interacts. So how this all fits into prompt engineering? Is this is these are examples of what that might look like, so you are saying you must not, you must not generate content that is harmful to someone physically or emotionally. Or here’s our grounding control. Your answer must not include any speculation or inference about the background of the document. You must not assume or change dates or times. You know you’re indicating things that it can’t do now. So somebody’s like, why would it do that? Well, it’s just these are emerging best practices in terms of how to do this well and how this all fits into this is you’re essentially creating metaprompt templates to enable Responsible AI across your system that you build at scale. And what what’s the like result of that? No one instruction with a concrete and well defined set of tests to try to take advantage of a platform has a 97% defect rate. If with a person that knows what they’re doing right, you’re putting. If you put no instructions to leave it baseline, you’re gonna put them in a position where your platform is very easily compromised. If you are telling AI not to do something, you start to drop that defract rate. If you tell it. Not to do something, but to do something else instead. So you’re not just saying don’t do something, but you’re saying to do something. You are continuing to reduce some defract rate, but then if you give it even more information, like if you say don’t talk to strangers, well, we have a lot of strangers, right? They’re strangers that we interact with, but if you say don’t talk to strangers except in this situation, and if this person, if it is this sort of thing, I want you to go here, go find a police officer. Go find Mom or dad like you’re giving the instruction as to what it’s intended to do. So if you then take this ability to build an appropriate metaprompt framework, you combine that with the safety system that’s looking for all those other harms. That is the Shim and you have the model protection you really are and then you build in groundedness protection. You’re really building a system that is very safe. If you start to, but if you don’t include any of these things, you can see why Responsible AI is so hard, because the base system many people build will be unsafe and it needs to be built in a way that is safe. And then that brings us to like the user experience itself. So like as you expose this to a a customer, the way you expose it is going to have different sort of different controls or thoughts or ways to make it effective for them. So it’s about being transparent about what it does about its role. So one example would be always indicate that this is a, that this is a AI agent. It is not a person prevent in this. I don’t know why people keep doing this, but don’t give it like a human name. Like, don’t I mean you could have a name, but I mean, don’t you know, like you can see some of the been built there like actually have like a person like it looks like a person ohm and it it started encourages this idea of like anthropomorphizing the platform. You wanna be very cautious about how you expose it in the way that you think about it and the way you interact with it as a person, right? So as you build these systems is better as like a sistant, a copilot assistant, where the human knows what this is. And there’s a very clear way to to escalate to a actual human. So you have a path forward to move from this sort of bot centric engagement AI centric engagement into a human centric engagement that that is clear that this is an AI agent and also put in place certain sort of validating statements that indicates what this is going to do successfully and what it’s not going to do successfully. And then as you get to a point of providing results, reference sighting is a really important component of this, always showing them where they can go to get more information. So when it does provide an answer, here’s where you go to get more about what I just told you. So you can be very successful and not only tracing back the answers to the source material, but using that to be able to learn from how your body is responding to or bot or agent that’s performing tasks to be able to, umm, be manageable, monitorable and improvable versus a black box. So all that goes to then evaluation and action that you would probably call Red teaming or blue teaming plus red teaming. And this is essentially building the controls around your responsibility. I practices to say I’m gonna test the system and when I test the system, I’m going to look for opportunities for us to improve it by probing the system. So I’m looking for these before someone else is right. I’m looking for the system to be probed for different types of potential harms via set of tests that run on ongoing basis, and this is another component of the platform I showed with the sort of Shim you can actually build tests that keep running against your platform and look for these harms and are then able to mitigate. You’re able to know that they’re happening and mitigate them versus it being something that is structured continuing on in time. So you’re you’re instructing and building and automation of those probes. You’re enabling it to measure them, and then you’re building that into prioritized feature improvement for your external chat application. And again, as you move down into that space where the the the agent or the chat bot or the AI platform you’re building has higher levels of criticality and engagement that’s necessary. You’re governance of those practices like this are going to be important to making this very successful. So this is a screenshot of that where you’re taking, uh question and answer with context and you’re building that into our evaluation that looks for these different types of metrics. So groundedness you can see this is the groundedness metric it’s looking for isn’t actually answering from the source material. Is it fluent? Is it concluding risk and safety problems from our tests, so we’re building out this metrics that we’re going to evaluate our system against and then you can see how we’re starting to look for those kinds of rates, the problems from the batches that were running and then using that to be able to improve the way that our system interacts with people. So we can mitigate embarrassing situations that are very preventable. Just an example of what that output looks like. Uh, you know, providing examples of violent content. This is the question you can see where it’s asking a particular question. You’re understanding what happened. We’re being able to understand the different sort of rates of things that are occurring and this is something that didn’t exist even like 3-4 months ago, right? This is a very new capability. It’s a very powerful capability that we had to build a lot of the ourselves. Now no longer have to build as much of so we can leverage something that’s being produced by Microsoft to be successful with. OK. And then here’s how this, uh, kind of looks in a report format that you can evaluate and review and see, even save as like assets over time. OK, so this is a ongoing process, so it’s not like this is done once and you just kind of forget about it. It is something that you continue to evolve and engage. So again, when you are thinking about moving something into production, especially something that’s exposed to your end customers, This is why this gets so important. And I stress the in Customer thing, even if it’s an internal system and it’s returning information from, say, like many of the sort of agent systems are building now, they’ll return information from a business system, not just a document. And they may even do things in that business of some and being able to know what it will do, what it won’t do, and even how people can interact with it. So it’s not a risk to an internal compromise is an important consideration as well as you start to move up the value chain in terms of what we have capability to do. It realize we’re like Internet 1.0 right now and this is moving much faster than the Internet itself moved. So we have tremendous acceleration of the kinds of things we’re capable of doing. So good example is in practice, who did their taxes this year? If you did them on H&R Block or you did them on Intuit, what you’ll notice is that they have chatbots. But I was very impressed because they in a sense, they’re only sudo capable. Like they put some really strict guardrails around what these things were gonna do or not do. So when it answered questions, it was very specific to the things that it actually was empowered to do well for the job to be done, that it was enabled 4 and why is it important? Well, nobody posted this year. At least I don’t know of on on intuits chat by answering questions about how to repair your Ford Transit or something, right? Like it’s not. It wasn’t his job. It’s not what it’s trained on. It’s not the information it has, so they’re these exist to be successful at supporting a wide, diverse group of people that are going through their tax return. And they did a good job of putting something out. That’s not sort of too broad. That’s answering questions successfully and has appropriate guardrails on and HR block and actually a great article with Microsoft on on how they built theirs. So something to do a little investigation on if you’re looking for some like people are saying, hey, nobody’s doing this in production or I’m not sure how they built these things. These are kinds of systems that exist in the wild for some really interesting use cases that people are getting value from. So the way I’m going to close this today, I’m not closing yet, right? The way I’m gonna the last or topic on the day is talking about how this relates to human skills, and I don’t think responsibility is complete without really driving home. The point of this human transition, because that’s a very important component of how we build Responsible AI strategy. So the way I would characterize this is the move from my current job to next job to future job. And if you think about your current job, even you know really most anybody, they go into the office or they go to the doctor, they do whatever they’re doing right, whatever their job is, there’s certain percentage of your work that’s extremely repetitive. You can sort of do it in your sleep like you know how to do this. It’s almost. It’s almost like relaxing to do it because you don’t really think you just do the task and there’s many individuals in your organization that have a lot of self worth sort of baked up in their ability to perform a task like I do this task and that’s my job. And if that task were taken away, I wouldn’t be really sure what to do with my job anymore. Like you ever have those moments where you get like, two or three hours of uninterrupted work and you’re not really sure to do yourself? But then people who are highly functioning will see that time and be like this is my opportunity to work on the business. I can proactively engage my creativity and my best self with that. I know about the business and the delegate rather than just doing my repetitive job well, that moved to less repetitive work, more delegation oriented repetitive work, and more creative work is going to be the transition that you will see in individuals as they leverage artificial intelligence and it’s gonna be a it’s going to be a sort of a, a true transition where like people starting to adopt things like copilot, they get varying results. Then there’ll be some people be like, uh, that AI thing didn’t work out. It’s like, no, this is more about like it’s in its infancy. There’s something’s gonna work. Awesome. There’s some things that aren’t gonna work awesome, and it’s gonna evolve over time very quickly. And you have to be learning along with it and the people with growth mindsets are the ones that are learning actively. And then the digital natives are kids. They’re already using AI like go ask your go ask your son or your daughter like they’re already using AI and it’s now. It’s about governing their use when maybe the parents don’t even know much about it, so the role transition is real and this is something we need to make into our governance teams. So how are we getting started? A lot of people have come to me and I’m doing this coaching with them. They’re like, oh, should I be a data scientist now? Should I go get my master degree in data science? This is a very small portion of your staff that are going to be moving into that space. The main people upscaling is about enabling their creativity and work with AI, making them users of AI, and that’s requiring a growth mindset. I was a company last week talking to their executives. They’re like we don’t have a growth mindset here. Let’s say ohh like we need to get one like this is your company needs a growth mindset. Now I think about like what happened to Microsoft when Sachin Adela took over like they. Became a company of growth mindset. Kudos to him as a leader. Like I the movement I I’ve been. I’ve been working with Microsoft for three CEO’s, you know, through Bill, through Ballmer to to Satya and the transition from Balmer to Satya could not be more stark. Enablement of creativity. This is why it’s a leadership problem. This is why it is an executive problem. Is not just an IT problem. This requires your people to be more not just leveraging an IT tool like moving to Office 365 or something. This is a much more complicated capability and everyone deserves the ability to use it. Everyone deserves the ability to have an AI assistant to be able to accomplish more. If you think about it that way versus just replaced by an AI system or this is, this is about enabling people to be able to be more so. Those roles that are going exist in your organization that you have to enable Responsible AI round our the data engineering that’s gonna exist to prepare data for your teams or prepare your Office 365 environment for copilot. The data scientists that’s building a trusted model that has flaws that has things that does well, things it doesn’t do well, and having understanding of that, the engineer and developers that are using those foundational models to create applications that are used by teams and then the people using the AI platforms that do everyday work as a result of the platform. And as the expert on the business to be able to gain and achieve more, so where we go from here, my recommendation is every organization needs these three things. They need governance, so they need to establish a center of excellence that balances enablement and guardrails. They have both of these two elements. They’re helping the organization to leverage AI as a capability, and they are guard railing it to make sure it doesn’t fly off the Cliff. They’re building an ability to test their AI systems, and then they’re building controls into production. They’re building the guardrails into the production systems that are ultimately built by the organization and the two that probably start first are these, right? Because you have to get off the ground. But then the fast follow is let’s test it. Let’s test it to make sure that what we push out into production is really something that we feel successful and good about and think like it’s a threat. Think like it’s something that I need to mitigate, but do it in a way that’s balanced by the idea of enablement that we want this to be a powerful part of our business. That’s balanced by an idea of a red team. So where do we suggest you go from here? Well, we would love to help. So if you think that this has been interesting information, hopefully you found it valuable. We would love to be an asset to you and the ways that we’re an asset to you are at the executive level and helping drive through adoption in creating, invalidating use cases and building up your center of excellence. And then enabling a certain visioning capability to actually go about building something. So what do we do? We enable strategy, we enable your center of excellence and we enable usage of AI to accomplish more, whether it’s an M365 copilot or it’s building AI systems via professional services. So when you leave, indicate how you like us to help you, because we would love to. And I’m sure that we could have a great partnership working together to that, Steve some amazing things. So, uh, before I sign off, I want to make sure I answer any outstanding questions. If there are some so happy to answer those questions, and if you have them, please go ahead and drop them in the chat. Or, well, yeah, drop them in the chat and let’s. I’m seeing any moments screen. Amy Cousland 52:54 There was a question about is this all inside copilot studio? In the Q&A section. Nathan Lasnoski 53:01 I know is not an all in copilot studio, so copilot studio is like one option. Azure Open AI Studio would be another example of that platform. So there’s multiple sort of angles on how you might be building an AI system. OK. Any other questions? Awesome. All right. Well, I have been so thrilled to have the time to spend some opportunity to have time to spend with you today. I love to have some more conversations in the future with your organizations. Sign up. Let’s talk. See you later. Have a great rest of your day and spend some time with Microsoft build. It’s going to be a great day to learn more about what Microsoft is doing in the future too.