Foundry Local is here—and it’s changing how organizations build, deploy, and scale AI-powered applications. Now in public preview, Foundry Local brings Azure AI capabilities directly to your Windows or macOS environment, enabling low-latency, high-performance AI without relying on the cloud.

In this session, we’ll explore how to launch and scale AI apps locally with full control over data privacy, security, and governance. Learn how on-device deployment works, what makes Foundry Local different, and how to harness your own infrastructure to drive intelligent, secure solutions at the edge.

Whether you’re experimenting with local LLMs or preparing for enterprise-scale deployment, this session will help you get started with Foundry Local—and prepare for what’s next.

Transcription Collapsed Transcription Expanded

Nick Miller 0:10 Hello and good morning. My name is Nick Miller and you’re here at the Foundry Local webinar. We’re going to talk about launching and scaling AI apps in your environment. And. A little bit about me. My name’s Nick Miller. As I said, I’m based out of Austin, TX. I started out A. while ago getting my MIS degree back at UT and then I joined the Navy. I was A. logistics officer in the Navy, moved around. Did some fun stuff, saw some cool things and I met A. special person named Heather and I brought her back to Texas with me. And when I left the Navy and went and got my MBA cause I had no clue what I was doing and I wanted to do something non-military and. During that time I got the MBA and learned A. little bit about business and got married to the Heather who I brought from San Diego. And at that time I realized I wanted to do data science and there was no such thing in 2013 as being A. data scientist unless you had either A. Masters or A. minim. Or A. PhD in statistics, economics, or I think it was math. It was like the three options basically. And so I went back and got my master’s degree in statistics and started doing data science there for A. manufacturing company. And after that, you know, life moved on, worked some more, grew the family. Now I have two kids, two dogs, and A. mistress named Golf, as my wife would say. Really happy enough about me. Really happy you’re all here today and I want to discuss this thing called Foundry Local. So what is Foundry Local? We’re going to discuss all that. What’s the architecture look like? We’re going to do A. very quick start just to show you how to download it and install it. Then we’re going to talk about integrating it with A. few different things so you can inference and get information and insights from this from your. LLMS and those models and then we’ll talk about what’s next and that means what’s next for Foundry local and what’s next for you if you’re going to deploy this and really add some business value. And then we’ll have A. close where we’ll have some call actions and we can maybe help you take the next step. Step in your AI journey. All right, first, So what is Foundry Local? Foundry Local is an on device AI inferencing platform. It’s made by Microsoft and it allows you to use LLM’s locally, which is huge. If you’ve ever used IoT, if you’ve ever deployed models to IoT, I have. You build A. computer vision model in Azure ML or Databricks or whatever you’re using. You package the model up and you make it Onyx flavor or whatever. You have some device and you have to set up the local system and it’s it’s kind of A. pain. But the good news is that when you have an IoT Edge, A. model deployed that IoT Edge device that you can use it without Internet connection. So we have clients that have manufacturing sites that are A. little bit remote or maybe they just want to make sure that even if they lose Internet connection, the manufacturing process can still function. Your vision. So when it’s on the device that allows you to do, there’s those benefits of being able to use it without Internet locally. Also, because it’s on that device and it’s not connected, it’s your data. Your data privacy never leaves your domain. That’s another benefit of it. And also it’s cost optimization, cost efficiency. So you’re not paying. You to add your subscription. You’re not paying Microsoft any money for this. I hope Microsoft’s not listening because they like to provide great solutions that cost money and they are useful, but there’s A. cost efficiency. So what that allows you to do is it allows you to maybe address use cases where the business value isn’t. Huge for each little use, but if you have A. ton of those use cases or A. ton of uses for that thing, the value grows and you’re not gonna have to pay A. ton of subscription costs for using AI models in the cloud. The other thing about Foundry local is model customization. Now AI Foundry model catalog which is in the cloud. It has several different models that are already seeded and available to you, but you can bring your own you can take. You know an open source custom vision model that has already been trained and you can use one of the packages they give you and and package that up and make that available locally through through Foundry local. You can also use any of the hugging face models. And there are LLMS, other LLMS, there are vectorizers, there are speech to text, all sorts of cool models that you can take. And there’s some caveats, of course, and we’ll discuss that later, but you can take those and make them available locally, which is awesome. And the other thing is the seamless integration. So the things I talked about before, you could put computer vision models on A. IoT Edge server. Yeah, sure, you’ve always you’ve been able to do that for A. while, but it’s somewhat difficult depending on the hardware. And what we talked about here is you’re able to connect your applications through A. sort of A. hardware agnostic platform and it allows you to interface with it via SDKS and APIs the same way. It’s pretty much the same way if you’re using or creating one of these models in the cloud as if you’re creating it locally. So it’s really cool. So it’s kind of the same experience, local verse cloud for the developer, the you may be able to reuse base models, etcetera, and it’s hardware agnostic. So now we’ve discussed what it is. Let’s look at the architecture and we’ll go deeper into some of the things I just mentioned. All right, so for the local architecture, you get that seamless experience and the way you do that is via SDKS. So you get to use an Azure Open AI SDK. We’ll talk about, we’ll show you how to do that. So you’ll integrate A. A. local model into the Azure Open AI SDK. Also, you can use VS Code, this little icon in the middle, the blue and light blue and dark blue with the stars. I don’t know what that’s supposed to be, but it’s the VS Code AI Toolkit extension so you can interact with. AI Foundry online and local with the same AI toolkit extension, so you’re still getting that seamless development environment. And the other option we have there on the right is the Foundry CLI, and Foundry CLI can be used for scripts, so we’re just starting things up, putting them down. CLI is a. A really good option there. So the way it works, all this you’ll see the little box there is called Foundry Local. The way it works is whatever models are available to you that are already seeded by Microsoft in the AI Foundry catalog, the model catalog, or whatever models you have compiled and built and put in locally. Catalogs. Those are available to you via the Foundry local model management module and what it does is it manages the model life cycle for you. So if you if you type in the command Foundry model run and then A. model name, if it’s not available it’ll go to the catalog, download it and. It will store it in A. cache, which is just local disk space, and then it will actually then load that model into memory so it can be used and then you can run it. So the other thing it does there we just mentioned is integrates with the cache. The cache is just the storage of all the models you have locally, right? You don’t. I mean these models, they’re LLMS, which is large language model, but they’re SLLMS, which is smaller, right? It’s not the GPT5, you know, Open AI is not giving that one away right now for free, but they are small, but they’re still huge. Like they can be 4 gigabytes, they can be 8 gigabytes, 16 gigabytes. And so if you start loading every model on the hard disk and into memory, you may have some issues, right? So you’re going to load and you’re going to cache on the model management module. We’ll download those modules and keep them in cache for you so they’re ready to use when you need to use them. And then when you are ready to use them, when you click load, it loads them into the runtime environment and it uses Onyx for that. Onyx is not part of Foundry local necessarily, it’s embedded in it, but Onyx Runtime has been around for A. while. I’ve used it outside of it, but it’s A. standardized. Way of compiling and packaging neural network models so they can be run in A. standardized way, right? And because Foundry Local uses Onyx, it can then they’ve done the background work to where when you wanna spin up A. model and use A. model, it automatically detects which hardware you. So when you just say when you let’s say for example I want to use the Phi or Phi for mini model, it will figure out what hardware you have. Do you have A. MPU like A. neural processing unit or do you have A. GPU or just A. CPU? It figures out what you have and it downloads the right model for your hardware. And it figures it out. You don’t have to do any integration, and if you’ve ever done that before, you’ll know how painful it is. But it’s really powerful in that way. So Onyx runtime really is the way that it makes this easy to use, hardware agnostic and. That allows to be used the same software, the same code, everything on A. Windows versus A. Mac OS platform. So that’s the compatibility. And the last thing is A. Foundry local service, which is all the code in the way that you interact with it. All right, so that’s the architecture. And the real take away from here is seamless experience, hardware agnostic. There are some things if you’re gonna run GPUs, there are certain GPU requirements, right? But if not, you can run your CPU. It was AMD, Intel or Apple Silicon or Qualcomm. And if you don’t have AMD, Intel, Qualcomm. Or Apple silicon on your CPU. I don’t know what you’re using because I don’t know of any other CPU providers. And then right here we said for our OS is Windows 1011, Windows Server 2025 and Mac OS. All right. So a quick start, what we’re going to do is we’re going to create a, we’re going to start up in CLI, we’re going to start up Foundry local and then we’re going to give it this prompt, just this text that you see here and we’re going to want it to write a SQL query for us, right? And that. Could be useful for many reasons, right? A text based query engine. So and then after we I’ll walk you through this and then we’ll go to. We’ll go next to inferencing SDKS, Lang chain and open web UI. So I’m going to stop sharing my screen here and instead share VS code. Where we can look at the terminal here. So right now this is just a real quick start. Oh yeah. There we go. So I’m going to say Foundry. Local or not Foundry Local Foundry model run and I want to run this model that I already downloaded because it takes a little bit and I don’t want to waste your time. All right, so it’s running if you can see that. And actually let me look at the chat. Can y’all see? Can everyone see the screen? Is the text large enough? Alright, we’ll just drop in the chat if the text is not long enough, uh, large enough. All right. So now we see that the. The the fee for mini model is running and I’m going to put that prompt in there and I had it on a little notepad over here, but we’re going to place that prompt right there and then the chat. This is just the CLI interaction. Remember because we talked about SDK, we’re going to look at that in just a minute. We also talked about CLI and also VS Code AI toolkit. Right now Reason CLI. So I just pasted that prompt in there, press enter and it’s going to think and based off of the request that I gave it, I said. And. Write a query that gives me the top five warehouses with the most shipments in April 2025. And then I gave it the schema, right? And there’s also a way to, you know, tell the AI agent to query to get the schema. So we’re kind of playing pretend a little bit that we got the schema from a previous step. But to give you the idea, we asked it to do that and it wrote the query and the query select warehouse name, count, order number as shipment count from fact orders joining dim warehouse on the warehouse ID. And then it says where order date is greater than April 1st, less than May 1st, screw by the warehouse, order by shipment count descending and limit 5, right? So if you write SQL query, you know that that’s actually a valid query. So we have a local AI agent that can write SQL query for us. That’s pretty cool, right? Awesome. So that’s the quick start and that would I wanted to give you the idea of what it could do and then I wanted to really quickly cover some of the this is a little bit technical here, but there’s the help to see what are the options for you. Uh, actually, let’s exit the Uh chat. All right, so we’re no longer chatting with that agent. We’re going to look at things like Foundry help and what are our options here? Well, it gives you, hey, here’s some commands, model, cache and service, and then you can get information like version and license, right? So that’s great. What about model? What can we do with the model command? Well, we can run a model which would create that chat interface that we just discussed. We can list which models are available, and that’s maybe not necessarily the ones that are downloaded. That’s all the models that are available. We can look about information of the model. We can download a model to that cache. And then we can also decide to load it and that loads it into memory and we can unload it so we remove it from memory. All right. And then the other one they discussed was model service. So what are the options with the model service? Oh, sorry, Foundry service help. All right. It lists models loaded into the service. You can start the service or stop it. You can restart it, check the status of it, and set some of the settings. And there’s other settings we can set, like when you’re looking at models, you can set things like temperature and other stuff like that. And then the last part of this was the Foundry cache. And if you remember, Foundry cache is just the place where we store models that have been downloaded from the Internet. So when you’re running and you’re offline, these are the models that you have available to you. So we’d say Foundry. Let’s see, cache. Let’s look at list. What are the models we actually have? We have a few models. We have deepseek R 114 billion parameters, 54 mini and the five 3.5 mini. All right. So that’s really just a quick start. Want to give you an idea of what the CLI looks like, kind of what commands are available to you and it’s it’s good. I think maybe for doing some of the operational stuff like you could set up scripts to download models or upgrade the models. You know, clear models in and out. But when you’re actually developing an app or a solution, you’re most likely you’re not going to use CLI, right? You’re going to use an SDK. And so the next thing we were going to look at is integrating Foundry local with an inferencing SDK. And one of the first things you want to do right is you want to have your requirements or you want to install these packages and typically you want to install it in the local environments, so like a virtual environment rather. So I use Pyenv for my Python environment for my Python versions and then I use. Them to create a virtual environment, right? And these are the requirements, so it’s not huge. You can see it’s got the iPack kernel so we can use the Jupyter notebook. We’re going to talk about open web UI in just a minute. Foundry local in general, Open AI and Ling Chain. Now let me share, let me go back to the presentation very quickly. And show you we’ll we’ll talk about the three sides of integrating. We’ll we’ll look at them together. That way we don’t have to go back and forth so many times, right. So the first thing is. Integrating with inference SDKS and what’s the value of that? Why would you want to do that? Well, it’s reusing boilerplate code. So if you already have code that allows you to connect to open AI and has functions built. So that you can you can work with the Open AI chat client. Why not reuse those boilerplate code modules right? The other thing is, isn’t it nice to have standardized development experience like so as a developer I don’t have to learn how to. Invoke and chat with AI multiple different ways. I can use an LLM and I can interface with it the same way when I’m doing cloud solutions, when I’m developing an app in Azure functions and I’m connecting back to Azure Open AI service same way. You know, client chat completions, user messages, system messages, etcetera, the whole thing. It’s the same thing. So we can reuse that boilerplate flow, but also I don’t have to learn something new, I already know how to do it. The other cool thing is if you decide to move code, you know it’s portable because it’s the same interface. So that’s really powerful. And which ones do we, which inference SDK do we have? And that’s the Open AI, which is that we’ll take a look at and then request. And this is in Python. So it’s the Python request module, which is basically just a REST API call. And so if you’re using REST API in your app, great. We have an option for you with Python. If you’re using the Open AI SDK, great, we have that too. And then after that we’re gonna look at integrating it with Langchain. And why would we do this? Again, we’re using boilerplate code. It’s about efficiency in development and then standardizing development experience as well. So same thing there. So if you have Langchain, if you have. We’re using the Chat Open AI class. You can reuse those codes and that code and those functions or classes or methods or whatever you’ve already built. You can reuse those to have local functionality with Foundry local. And then the last thing we’re gonna look at is. Integrating with Open Web UI and what is that? Open Web UI is a extensible feature rich and user-friendly self-hosted AI platform top rate entirely offline and really it’s like a chat interface. So if you’ve ever chatted with if you go, you know search online. And you use Copilot like with Edge and you’re talking to it and it has like the list. It’s the same sort of experience where you have your chat text box and it gives you answers. You can kind of go back and forth. Well, they give you that sort of UI out-of-the-box and without any work it’s available to you. Of course you can extend it and make it customizable and. And all that good stuff, but it’s pretty easy and it’s out-of-box and open. So that accelerates your development and it gives you that local chat. And just why would we want a local chat? Well, there’s two things you could use. There’s many reasons why, but you could use these LLM agents as part of a back end process. You could use LLMs and make them look at. All your IT logs as they’re streaming. Why not? Cuz you’re not paying for it. Like you have IT log. Hey, did anything interesting happen here? If so, surface that log. Or you could use it in an interactive, not in a headless way, but in an interactive way where you have a chat interface and some examples that we maybe you have a field technician. It could be, I don’t know, let’s say a extermination, like you have a pest control company and you have new technicians and there’s information. You want to provide them a support assistant out in the field where they don’t have Internet access or maybe it’s just slow or for whatever reason they have their little laptop or some sort. The device and um. And they can ask questions and you can tell and give them answers about, hey, here’s the procedures or how do I know what? How do I know what a termite infestation looks like? Answer it. It’ll bring you the answer. It can link to, you know, those pictures of that or some internal operations. So that’s a use case. Other one’s manufacturing floor. Let’s say you’re a manufacturing floor, you’re a technician and you’re changing over the line and you need to review some sort of procedure about how to set up the line. You don’t have everything memorized. Or even if you do, maybe you just want to have the procedure open and you just don’t have that. You know, you want really quick access to it or you want to ask. A question of it and need a specific answer. So that’s another use case where you would use an open web UI. All right. So now that I’ve explained those, then we’ll go back. We’re going to switch now to Paige. It looks like Paige, maybe you’re typing a question, so I’m going to hold for one second. Before I transition. Paige Wamser 22:38 I was just putting the survey link in. You can you can keep chatting. Nick Miller 22:42 Oh, gotcha. OK, alright, cool. OK, so I’m going to switch over back to VS Code and we’ll start looking at the STK. OK, so now that we have looked at, we built the virtual environment, it’s this one here. And we if you remember when we looked at the cache, we had the five. Actually you can see right here still at the bottom. Here’s the actual model name 5 for me instruct generic GPU, but. You always want to use the alias and the reason for that is when you use the alias, the Foundry local will look at your hardware and see what’s available and it will use the best model for what you have, right? CPU is going to be the slowest. So if you have a GPU and it’s a compatible GPU, it’ll automatically load the GPU version of the model. So that’s pretty cool, right? And that also means you don’t need to change your code, because what if you’re running this on one computer that has a GPU and another one that has an NPU, a neural processing unit? You don’t care. It doesn’t matter, right? Foundry local’s taking care of the word for that for you. So we’ll go ahead and restart this. And clear all the outputs. Just so you know it’s not, it’s not magic. I didn’t fake it. This is real. So what we’re doing first is we’re setting our alias and we’re loading the Foundry local manager. And what that’s going to do is that’s going to basically be the same thing as Foundry model run. Fee for mini and it should stop writing here in just a second. All right, great. So now what we’re gonna do is we’re gonna show how the Open AI chat client, we can use that fee for mini model and it’s really not a lot. I don’t wanna don’t be disappointed when there’s not tons of code, but basically that Foundry Local Manager object, all we had to. To do was use an Open AI client. When we’re instantiating that Open AI client, we just set the base URL to that endpoint. Right now we’re having local usage, so we don’t have an API key, but we use the manager client in its API key and at that point now we have an Open AI client. That we can chat with, that’s all and everything. The only thing different from the code that I have in like online functions and Azure functions and stuff like that is right here. Manager endpoint and manager API key and so. When we say the response and and instead of pointing to a model that’s deployed for example in AI Foundry, we just say manager get model info and our alias dot ID and this right here. This is all the standard open AI chat completion. You have client dot chat dot completions dot create response and then we have a message here. What is the golden ratio? Which I’m not sure I even know what that is. I think it’s some sort of oh, hey, look, fee. There you go. Greek letter fee. And we’re using the fee model. How cool. It’s an irrational number approximately equal to this. That’s not very interesting. Great. It works. I don’t think we all need to know that. The other thing to show is the point of that was to show that it’s really that’s how you use it. Like if you’re using Open AI chat, great. Now you can use it with Local Foundry. We can also do streaming, right? So it’s pretty much the same thing, same model. We’re just gonna ask a different question. So give me 3 sent summary of the benefits. Of using multi agent systems and we’re gonna stream it and so it’s gonna get its questions, it’s gonna start streaming so you can see it typing its answer out. So if you need streaming chat, that’s available to you, right? And oftentimes streaming is nice when the answer is a little bit longer and. You know, you don’t want people to think that that nothing is happening. You can kind of stream the results. So it works with our, you know, really impatient nature that we have these days for everything on demand. All right. So we have covered integrating with the Open AI inferencing SDK and streaming with it and then what we’re going to do now is. We’re going to do the same thing, but we’re going to instead use requests, right? So this is the Python package that allows us to basically call REST API type of a thing. So it’s a request dot post. We have our URL which is our manager endpoint and then. Add chat slash completions and then our payload, which is the model name and the message. And if you look at any of the Open AI models that we use in AI Foundry and you’re looking at the playground, if you say show me the code, you’ll see this sort of code in the background. Right. The payload will have a model and then the message and then headers and then right here you’re just posting that. And so this is the same here. So we’re saying give me a concise summary of a specific use case that will save me time enabled by local AI agents on my smartphone, right? So now that agents are becoming local, you can imagine that you can use them for all sorts of you know things and and it’s you don’t have to pay for every interaction. So it’s going to be start coming ubiquitous, right. So that’s pretty cool. I’m excited about that, but. What is a specific use case? We’ll see what happens. I don’t know. It brings up different ones each time, so I’m kind of interested to see what it says here. All right, so a home assistant with your home’s IoT devices. You have a smart home set up with lights, thermostat, et cetera. The AI agent on your smart. Connect to the central hub. OK, so if you’re familiar with AI agents, then you’ll know that you just tell the agent what to do and it has tools. And so maybe your smart home assistant has APIs that allows it to turn on a light or change the temperature of your home or. Lock your children’s door so they can’t leave and bug you. They’re a blessing though. So anyways, those APIs are made available to your agent and all you have to do is talk to your agent and it knows which tools to use when and it can be local so you’re not having to pay cloud fees for that, right? And also it’s really high latency. Really the low latency rather. And also anything I say, like lock my children in the room, it’s not being, it’s only on my phone, so it’s not getting put to the cloud. So no one can call CPS Child Protective Services on me for locking my kids in my room because they’re annoying. And so that’s great, right? That’s really cool. So there’s going to be a lot of use cases for it here. There are a lot of things that we’ll see, I think, in the near future with all these local models being available for use. All right. So tongue in cheek. Just kidding. I love my children. They’re amazing. There. So that’s the inferencing SDK. So we looked at that. Hopefully that gave you an idea of what’s available and let me check and just see if I have any questions here. OK, do you have simple agentic AI use case feature? Local Foundry Divya. So I I so Local Foundry has been out for like a month and a half I think. So that is Divya. I I did share some use case there but not want that integrated I think probably. Um, the. Sort of things. So you have a IT logs being analyzed and if something comes up the agent can can let you know what’s going on and and alert you. All right, so just check the chat there. All right. Next thing we talked about is we’re gonna say. We’re going to look at length chain, right? So how do you integrate Foundry local with length chain? So right here we created a little class. We’re going to use the length chain and this is this is it right here. So if you’re using chat open AI in some of your code, you can integrate Foundry local with it. And we have some other packages that we need like the prompt. So when we instantiate this class, it’s going to instantiate with an with a model and we’ll have our standard model and you can tell it whatever you want. Then it’s going to give us that manager and then it’s going to create an LLM from that chalk Open AI. And again, similar to the Open AI, it’s going to get the model info that we that we gave it the alias. From the manager, the base URL from the manager and the key. So very similar. And then after that after this thing right here, it’s pretty much standard length chain, right? So if you have change you’re using and these these chains use chat open AI in the cloud, you can now use Foundry local for those. And so here we have a little helper function sets the prompt says you’re a helpful assistant to translate input language to output language, only return the translation of the input and then you have your human input and that’s your prompt. And the last part is we give it a translate and we say. And. And we say, you know, give me the input the the the thing you want translated in the two languages and then it’s going to do the prompt and then it’s going to invoke the chain and it’s going to return the content. All right. And so for right here we’re going to. Instantiate that class that we just created called Translator, and we’re going to translate the text. Wow, Parisians are so nice and accommodating, and we’re going to translate that from English to French. And so when you’re traveling in France and you meet a local person, a Parisian. And they interact with you. You can respond with them with this very useful phrase. All right, so it’s going to run and it’s translating and it says, wow. And so one thing to worry about though, with with AI especially, is that sometimes it hallucinates and it lies. And that’s probably very applicable in this case if you’ve ever been to Paris. But anyways, that’s just a simple example about. Using Langchain, using Local Foundry of Langchain. All right. And then the last one we’re going to look at is the open web UI. I’m gonna stop sharing. I have a different screen, all right. OK. So one of the things with this setup, the way I had it set up is you need to integrate your. Integrate your specific model into open web UI and the way that you do that is you have to do that CLI command. If you remember we did Foundry service. And one of the options for Foundry services status and it gives you the port where your where your Foundry service is running so that we can access models and then from there we go into. Settings and we do manage direct connections and we’re just going to change the port here. So it’s our local host. We’re changing the ports. Of course, if you’re doing this in a production environment, you wouldn’t be using local host. There’s a lot of things you do and you would have these options to set it up so that the port whenever you set up, whenever you start up the. Foundry service, it would update the port here in this direct connection, right? But for now we this is not production grade, we’re just kind of showing. So we can have a new chat and you see right, remember we looked at the cache, I had the deep seat fee 4 and fee 3.5, well 3.5 was one of the ones. So let’s ask 3.5 something. What is the meaning of life? The universe? And everything. And if you’ve ever watched Tastehiker’s Guide to the Galaxy, then you’ll know that the answer is 42. And it doesn’t give you just the answer 42. It explains, and it does it in different ways each time, but it explains it to you. And it says the meaning of life is most famously associated with the science fiction series Hitchhiker’s Guide to the Galaxy, where the answer 42 is suggested. And it’s 42 is funny because it’s nothing that complex, nothing that that complex of an answer can’t be, you know, answer with the number 42. This gives you an idea of how you can very quickly create a chat UI. So all we did was install Foundry local. We downloaded those models so they’re in our cache and then we downloaded open web UI and then we added a direct connection as you saw just there to Foundry local. And now you can chat when you have a user interface. So that’s pretty powerful. Of course, if you’re going to do this in production, you’d want to set up the background services connectly, correctly rather, and maybe put, you know, login information and branding and things like that. But the point is, is with with almost zero effort, you get a UI. It’s available to you and if you had, it’s available to you locally. All right. We’ll go back to the chat just to check. Any questions? No. OK. All right. The last thing here is we’re going to look, we’re going to go back to the slides and we’re going to talk about some of the next steps. All right, so the next steps here is. You’ve seen how do you install Foundry local. You’ve you’ve seen how to kind of interact with it in CLI and run some of those commands. You have seen how to use it with an SDK and integrate it with the Open AI inference. Yeah, yeah, yeah. Documentation will do. And sorry, there’s a question in the chat about sharing documentation. Yes, you can. Also use the Open AI inference. You can use Lang Chain. You can integrate with a open web UI. That’s great. So what’s next? What do you do? How do you take this a little bit further? What’s coming down the pipeline from Microsoft? Well, one thing you can do is you can add your your custom models, right? So if the models. Pytorch, Tensorflow, JAX. In general, those models can be converted to these Onyx models and Microsoft has an open source tool called Olive and Olive is the basically it quantizes and. It trims the model, makes the model the model smaller, and it packages it in an Onyx format so that it’s compatible with Foundry local, right? And you can do that with any of the models that meet those requirements, whether they’re hugging face models, whether they’re vectorizers or LLM S. Or multimodal or, you know, computer vision models, open source computer, computer vision models. Those those can be converted with Olive, all right? And you can download those. Models from Hugging Face. There’s some tooling for that. They can come from your model catalog where you have you have some models that are already there for Microsoft. The list is expanding. There’s now a an open AI. OSS model, so open open source to one of their models and it’s a chat response model. So you have some options there so you can control your destiny and what do you want to use and you can now have those available to Foundry local. And then another thing that’s coming soon, what’s coming down the pipe is Foundry local agents, right. So we talked about some agentic behaviour right now it it’s more like a chat interface thing or you could have. You know inline processes or back end headless processes where you have a you’ve integrated this this software into some sort of application that you’re running locally. Maybe it’s something to watch IT logs or something’s made available with chat UI or something like that. Those are available now and working. But with agents, you have that agentic behavior. We have agents where they have sort of an identity, they have tools that they can access. So we talked about calling an API to check the weather or you know, calling an API to perform some sort of action. When that comes out, it’s it’s it’s out in private preview, so you have to request access to it. It’s there and it exists. It’s just not public preview. So I’ve requested access. I haven’t heard back yet. Fingers crossed. But that’s coming down the line. Also, there are things in order to use those, you’re going to need an MCP server or functions, right? So you can write functions which are specific to the tooling that you’re using, but probably a better way to do it is the MCP server. So MCP servers are Model Context Protocol, and that’s where you. You can host tools in the cloud with AI Foundry. You have AI Foundry agent service, you have open AI spec, open API specs. And so when you take a UI or sorry an API from a third party or you built an Azure function that has certain capability, you wrap that in an. Open API spec and it tells the model everything it needs to know about the API, which things to use and when, and it’s that kind of universal language. So no matter this is the standard, right? VHS or Betamax or I’m trying to think of other not so old. Platform things, but MCP is the standard way. It’s what Anthropic is using. Microsoft, everybody’s collaborating on it. So you don’t want to have to write custom functions for models to interact with tools if you use it in MCP server, no matter. What model you’re using is always the same on every platform, right? So you could use local MCP servers. Windows. Windows has an AI platform, so Windows AI platform. That platform uses Foundry local underneath the hood. They also have things like local semantic search, et cetera. They also have their own MCP. Platform, which includes a registry and a server, right? So MCP is the way to go, but you can create those. There’s code out there for creating your own MCP servers and adding tools to it. You can find it on Thropic’s website because they were the kind of the initial developer of these MCP servers. And so you could go and you could get some tooling from there and develop your own MCP servers. So there’s a ton you can do. And one of the things I think will, I think what we’re going to see is we’re going to see more and more models being made available to Foundry local. So that will include things that are a little bit larger like. Multimodal models or speech to text or you can roll your own, right? There’s some effort to do it, but it’s made available to you. And so those are things to look for. That’s how you can really take this thing. What I’ve showed you is already valuable for many use cases, but when this comes out, you can really. Really take it to the next level and one consideration obviously is because it’s local and these models are even the small ones are 8 to 18 gigabytes. You have to have the right level of RAM for your system. So if you do an IoT Edge, if it’s a small model, yeah, you can put in an IoT Edge device, but if it’s a. Larger model, you’re gonna need an IoT Edge server, right? Something larger and more robust to to provide that inferencing. All right. And then the next thing is, if this makes sense, if it’s interesting to you, if you think you have some use cases that you’d like to explore, you’re not sure you thought I talked funny and had a weird Texas accent and you want to hear it again. Any sort of thing, right? If you want some help in this, if you just want to ask questions, we can set up a Foundry local discovery session. So we can say you could share, hey, what you could share. These are the things that we think we could do with this. Does it make sense? How would we advise going forward? What would that look like, right? So that’s a Foundry local discovery. So you can reach out to us for that or an AI envisioning session. So this is Foundry local, but it’s also cloud based. So it’s the AI Foundry agent service. It’s the AI Foundry with. With semantic kernel or any other things that includes rag and all sorts of searches like AI envisioning system is basically a way for you, us to collaborate and discuss like what are the AI use cases you have? Well, how feasible are they? What would the next steps look like? Can we help you make some progress in those areas? Awesome. So those two things that both of those sessions are free and you can reach out to me or you can reach out to the any of the contacts that were in the webinar and and we can schedule one of those sessions with you and your team. And that is all I have for today’s session. And so I wanted to say thank you. I know we all have jobs to do, right? We have things. We’re busy. And so I just want to say thank you for taking the time to join us and learn more about Foundry local and hopefully this can be the seed to get. Get some ideas going and you could start adding some real business value for some of those low latency, high security or just you don’t want to spend the money use cases. Thank you so much for your time.

Delivering the Future – Build on Azure

Exploring Beyond Today, Shaping Tomorrow’s Success

What’s New With Concurrency

Discover Our Story

View Recording: Foundry Local: Launching and Scaling AI Apps in Your Environment

Transcription Collapsed Transcription Expanded

Other Events

View Recording: Govern with Confidence: AI Safety, Red Teaming & Compliance with Microsoft Purview

View Recording: Mastering Multi-Agent Systems with Copilot Studio