Nicholas

Inside Stainless: The Developer Tools Startup Anthropic Just Bought for $300 Million

Nicholas

If your MCP server has dozens of tools, it's probably built wrong. You need tools that are specific and clear for each use case—but you also can't have too many. This creates an almost impossible tradeoff that most companies don't know how to solve. That's why we interviewed Alex Rattray, the founder and CEO of Stainless. Stainless builds APIs, SDKs, and MCP servers for companies like OpenAI and Anthropic. Alex has spent years mastering how to make software talk to software, and he came on the show to share what he knows. We get into MCP and the future of the AI-native internet. [Disclosure: Dan is a small investor in Stainless.] If you found this episode interesting, please like, subscribe, comment, and share. To hear more from Dan Shipper: Subscribe to Every: https://every.to/subscribe Follow him on X: https://twitter.com/danshipper Get started with Braintrust at https://www.braintrust.dev/ Timestamps: 00:01:15 - Introduction 00:05:09 - APIs and MCP, the connectors of the new internet 00:11:00 - Why MCP exists 00:17:15 - Why MCP servers are hard to get right 00:20:24 - Design principles for reliable MCP servers 00:25:06 - Using MCP for business ops at Stainless 00:40:57 - Alex's take on the security model for MCP 00:44:42 - How one-off AI actions become permanent production software Links to resources mentioned in the episode: Alex Rattray: Alex Rattray (@RattrayAlex), Alex Rattray Stainless: https://www.stainless.com/Inside Stainless: The Developer Tools Startup Anthropic Just Bought for $300 Million

Published
Published May 20, 2026
Uploaded
Uploaded Jun 12, 2026
File type
Podcast
Queried
0

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:36

[00:00] The internet runs on computers talking to each other, but its entire architecture was built for a pre-AI world. Now we're trying to hook AI up to the internet with MCP, Model Context Protocol, which turns any website or web service into a set of tools that an AI can use natively to get work done. And the software companies that learn how to do MCP well are going to win over the next decade. That's why I brought Alex Rattray, the founder and CEO of Stainless, onto the show. [00:30] computers talk to each other. They make the API and SDKs for all the big companies that you know about, like OpenAI and Anthropic, and they're starting to build MCP servers too. So Alex and I get into the nitty gritty of what the future of MCP looks like, how to design good MCPs, why MCPs are actually really hard to scale and possibly insecure. And we try to figure out together what a better model for allowing AIs to use the internet might look like. This is a great episode. Alex is a [00:59] Let's dive in. [01:14] ALEX SMYTH: Alex, welcome to the show. [01:16] Thanks, Dan. It's really exciting to be here. It's good to have you. So for people who don't know, you are the founder and CEO of Stainless, which is the API company. You make APIs for companies like OpenAI and Anthropic and just name your big company that you might use their API. Stainless is probably behind it. Before that, you worked at Stripe doing their API. Surprise.

1:37-3:14

[01:37] And before that, most importantly, we were very good friends in college and we remained good friends. [01:46] i'm a tiny investor in stainless uh but it's been really really fun to watch your journey and get to get to hang out together so much over the years and [01:56] uh, I'm just very excited to bring you on to talk about AI and, and what you're doing at stainless. Thanks, Dan. Yeah. It's, um, it's, uh, [02:05] been really fun over the years. I mean, you know, when we were in college, I was working on a startup [02:10] you were working on a startup, you had a conference room, um, at a venture capitalist office, um, as your office. And, uh, you let me crash there, um, with, uh, with my co-founder and team. Um, and we were just like on the other side of the conference table, hacking away into the evening. Um, uh, and, you know, very fond memories of those days. And these days it's, it's not every evening, but you know, on the weekends, whatever, same thing is still happening. Um, and it's, [02:37] You don't see that every day, and it's really a nice feeling. And it's been great to see everything happening with Hevery along the way. Thank you. As I say, it started from the bottom, now we're here. [02:51] And, yeah, I mean, the thing that I always say when people – [02:56] when I run into people and they ask me about you, [02:59] Um, in order to embarrass you, I just talk about how you're the only person that I know of who has consistently run barefoot through the streets of Philadelphia. Uh, cause when we first met you were, you were not a fan of shoes and you were a fan of running. You want to talk about that?

3:15-4:45

[03:15] Yeah, it wasn't that I didn't like the concept of shoes. It's that I couldn't find a good pair. [03:21] And at a certain point, you know, it's like I was running through Nikes and they would they would bust open every few months. It was actually going on as I had really wide feet and was I was buying probably narrow shoes, but they would choose a constantly ruined. [03:36] and, um, you know, on a college budget, it's just like, this is, this is no good. Um, and, uh, [03:45] Eventually I decided, okay, the longer you wear your shoes, the more worn out they get, but the longer you just wear your feet, the tougher they get. So the longer you wear your feet. [03:57] Bye. [03:58] try it out try this at home what could go wrong uh i actually currently have a really annoying splinter in one of my feet uh that i was uh and so don't actually uh try this at home but uh are you still running barefoot no no this is just from around the house um i see dangerous yeah yeah but see that's the thing if i had been going around uh on the asphalt um without socks on [04:28] Thank you. [04:28] Um, so when you're not running barefoot, uh, you're running, you're running stainless. [04:38] So you're running stainless. And so how many people are you? You know, you're around 50, right? Just about. Yeah. Yeah.

4:46-6:27

[04:46] That's pretty wild. And you started Stainless in a pre-AI world, and now we're in an AI world. And I think you have some... [04:55] ideas for what the future of [04:59] AI is going to be and maybe how how APIs fit into that maybe how MCPs fit into that do you want to like paint a little bit of a picture for us about where we're going [05:07] Yeah, I would love to. So to start, like what's an API? Not everybody's familiar with that. So it stands for application programming interface. There will not be a quiz, right? Right, Dan? No quizzes? [05:22] No, no quizzes. Great. But basically, it's how one computer program talks to another computer program. It's how it's how computers talk to computers, how apps talk to apps. And so APIs are the dendrites of the Internet. Dendrites are where your neurons connect and actually exchange information with each other. [05:52] And if you think about the internet, if all these servers in the cloud... [05:57] weren't talking to each other, you wouldn't, [05:58] you wouldn't have internet right like there's there's nothing going on um if uh you know programs internet software is doing nothing uh without apis without connections to to other programs um and so it's really fundamental to the mesh to the mesh of um pretty much all modern software um everything that we think of when we think about technology at this point um apis are kind of at the the heart and center of that just like um dendrites are you know the center of the

6:28-8:01

[06:28] brain and how we think. And Stainless' mission from day one was sort of to make it easier for computers to talk to computers. So, [06:39] and, um, uh, [06:43] you know it's the long-running trend of technology to have more automation right automation is [06:52] what we mean when we say, okay, we're going to, you know, we're going to, we're going to apply technology to that, you know, we're generally going to be making things more efficient. And APIs are how most business to business interactions in some format or another, become, become real, become automated. And, you know, [07:11] What we see with the [07:13] the rise of AI is that there is a new, a new computer has entered the chat, right? There's a new, there's a new kind of system that can talk to other systems, or at least we would like it to be able to. You used to have either, you know, [07:28] Humans interacting with the computer through a user interface, a UI, or a computer acting with a computer through through an API. And now we have LLMs interacting with computers, right? And what's that through? [07:39] And I'm sure anyone familiar with, you know, with Avery and his regular listeners is going to be familiar with MCP, Model Context Protocol, which is a system for connecting devices. [07:51] LLMs to computers, broadly speaking. And it's an area that we're investing in at Stainless. It's really, I think, part of our core mission of...

8:02-9:41

[08:02] Like I said, [08:03] make it easy for computers to talk to computers and, um, [08:08] We've invested a lot of time, you know, at Stainless, the core product that we first brought to market is software development kits, SDKs. And so these are ways of saying, okay, Stripe has this great REST API. [08:22] You can send JSON over HTTP and get back JSON over HTTP. [08:28] And if you want that to be really convenient, you're going to use the Stripe Python library, the Stripe Python SDK. So you can go, if you're a Python developer, you'll go pip install Stripe. And then in your application code, you'll write stripe.customers.create. And all of a sudden, you have a nice new customer object in sort of your Stripe database. And you're off to the races. [08:52] create in the old days to charge a credit card. [08:58] And SDKs are what gives developers that easy way to interface with an API. [09:05] What's the thing that gives LLMs an easy way to interface with an API? And you might say MCP, and in a sense, you'd be right. But what we're seeing so far as MCP is rolling out into the world and people are experimenting with it and trying it out is, [09:23] Is that it's not working so great. Like there's, it's, it's difficult to deliver on what I see as the core vision of, of what's so exciting about MCP, which is just like a dashboard and a user interface lets you click around.

9:42-11:13

[09:42] see a bunch of stuff, [09:44] fill out forms, click buttons, do things. Anything that you would do while you're interacting with the software, you'd do through the user interface generally. But LLMs interacting through MCP, it tends to be much more restricted. You can only do a few little things. [09:58] There's usually not a ton of tools that you're going to be exposing to the models. [10:04] And just to stop you there, so I think what I'm hearing you say is what MCP does is just like a website is built for humans to be used, MCP is sort of the equivalent, and you can think of it in certain ways, of exposing a set of tools for the model that it can use to perform certain functions. [10:28] Yeah. [10:29] a bunch of things they can click on or use to get work done. So an example might be, you know, and a Gmail MCP has like a send mail tool or like a compose mail tool or a read inbox tool, that kind of thing. And instead of a human going on the Gmail website and doing it, it's the, it's the LLM is like, you know, essentially logging in and using it itself. And it's a, it's a native interface for, for language models. But you're saying that that's not working that well. Can you tell me more about that? [10:58] Yeah. So let's let's start actually with with kind of what I see is the big vision of MCP. And in some sense, the big vision of agentic AI in the first place. And I'll start with the most pedestrian example you can imagine. It's gonna be funny, given some of our context.

11:14-12:56

[11:14] Um, which is, let's say, you know, Dan walks into my store and, um, buys a pair of stripy socks, um, and maybe a few other things. And then the next day I hear back from Dan, um, that there was something wrong. Unfortunately, it happens, you know, and I turned to someone on my team and I say, Hey, um, can we refund Dan for those stripy socks he bought yesterday and send him a discount code for, for the next time he comes in with like a little thank you note. [11:44] Um, yeah, [11:46] This is like the most normal thing to do in software is some little task like this. And what you're going to do, what the member of my team would be doing would be opening up their internal admin and looking around for some things. [12:16] required depending find the right one then go to the screen where you can create a refund create a refund make sure it's the right amount then go and create that discount and then take that discount code and send it [12:29] over to some other SaaS app where you log in to send some mail automatically, right? And of course, if you step away from the consumer version of this to a business-to-business context, of course, you might be going into Salesforce and sending a Slack message to an account administrator, you know, an account manager, so on and so forth. And in the normal course of work, it's just the most normal thing in the world to be doing,

12:58-14:33

[12:58] involved [12:59] Going through five different apps each time, 15 different clicks and scrolls and loading spinners, just to do sort of like one simple thing. And the promise of agentic AI is to be able to take that same prompt I just said and type it into chat GPT or cloud or whatever and say, hey, chatty, buddy, can you help refund my friend Dan? [13:29] the 15 different screens and the various different, you know, button presses to complete the task and then come back and say, great, it's done. [13:40] That, um, [13:41] In order to do that... [13:43] Now, there's only so many tool calls you have to make as an AI model to perform that exact linear chain of events. It's somewhat tractable. But if you think about this in the general case, you want the LLM to be able to do – you want your – [13:59] agentic AI to be able to do anything that that human operator would have done. And you would want them to be able to do it. [14:08] without having to wait for a bunch of JavaScript to load on a website or anything like that. And that means you need not only the Stripe create refund tool and the Stripe list transactions tool and the Stripe, you know, [14:23] list products and look up customer and, you know, create discount tool, you need not only those tools, but you need everything that you can do in the Stripe dashboard.

14:33-16:12

[14:33] which is basically everything that you can do [14:36] in the Stripe API. [14:38] And that's actually a lot. Like there are hundreds of different endpoints that you have access to in the Stripe API. The Stripe dashboard is actually massive. It's a huge application. [14:52] And if you were to take that list of tools today and go to an LLM, [14:58] and say, hey, here's our MCP definition for all of this. Here's a create refund tool. Here's a create transactions tool, so on and so forth. And you tell it all about those tools. Here's the description. Here's all the different request properties that you can send. Here's the response properties you can get back. Here's all the documentation for each of those things. [15:16] Everyone listening to this should already know that, [15:19] You've just burned through your entire context budget. [15:23] That's, you know, maybe hundreds of thousands of tokens just there. And pretty much translating the Stripe Open API spec directly over to MCP tools. And today's models not only can't handle that amount of context, it's a poor use of context because you have a lot else going on. But it's also confusing to the model. It's just too much to hold in your brain at one time. [15:49] the straight part of it, right? Because what you're really trying to do is enable your operators to do anything they would normally do. [15:57] And again, that spans many, many different SaaS tools, right? In the course of one interaction, it might be five. In the next interaction, it might be a different five. And so if you think about every single SaaS tool that your business uses on a daily basis, right?

16:12-17:47

[16:12] to get your work done. [16:14] Ideally, you would want every single one of those tools to be exposed to your operators in their AI chat with every single tool available in there, with every single nook and cranny and corner case available so that you can do anything through AI. That's the vision. Now, there's a lot of problems with that. The biggest one that I mentioned is sort of this context approach. [16:37] window limit. [16:39] But you also have all sorts of security and permissions problems because, you know, [16:44] you don't want the AI to color outside the lines and say, okay, in addition to refunding Dan Sox, I also refunded every customer for all transactions ever, you know, and then I sent, you know, a bunch of money to my own AI bank account, ha ha ha. And so there's more to the challenge, but [16:58] That's the vision I see. [17:01] But I think, you know, the place we started there was... [17:04] You said it's not working. [17:06] Um... [17:07] But I don't think that that's the reason why it's not working today, right? Or is that the reason why it's not working today? [17:13] So what people do with MCP today is sometimes they'll try to expose all parts of their API. The way people build MCP tools is, generally speaking, they have an underlying API, usually a REST API, and they wrap different parts of that, different endpoints, different operations. [17:33] in MCP tools. And you can kind of do that in a one-to-one mapping, or you can kind of handcraft things for the MCP. And today, in order to succeed, people are finding that you really have to kind of handcraft it to the MCP.

17:47-19:18

[17:47] to the LLMs. You have to say, okay, I'm making one specialized tool to look up a customer and refund their transaction based on a description. So there's all these like decisions that you have to make where you need to have like the ergonomics of the model and how the model thinks in mind in order to make sure the model does the right thing more often than not. [18:08] Yeah, it's hard. It's hard. [18:12] Yeah, yeah. So I use this SDK analogy sometimes. So it took a long time for humanity to get to the point where we could make a really good Python SDK for a Python developer, wrapping it in API. And I think we've, we've, we've cracked that nut. Stainless offers really great Python libraries, but, you know, we're building on the shoulders of giants here. A lot of people have... [18:33] have done this over time. [18:35] We haven't figured out how to expose an API ergonomically, [18:40] to, [18:41] an LLM in the same way that we've figured out how to expose it ergonomically to a Python developer. And that's kind of like a new research problem in a sense. [18:49] And it's harder because I can go learn how to be a Python developer if I want. I can't really learn how to go... [18:56] think or see like an LLM. Um, [19:00] But, uh... [19:02] You know sure would be powerful if I could and I [19:07] And that makes it tricky. We do have at Stainless, I think, some things that we're cooking up to address some of these problems, including the ones that you also mentioned. Like, LM's have a really hard time with...

19:19-20:49

[19:19] a repeated sustained chain of of actions um [19:24] And, you know, even like if you get an API response back around, hey, like list all the transactions, there's so much data and you might have to go through the next page and the next page and the next page to go through all the transactions to find the one that has Dan with the stripy socks. And that's, again, a ton of context with. [19:41] one or two small needles in the haystack. And LMs are pretty good at that, but they're not perfect. And with too much hay, we all kind of end up throwing up our hands, and that's true for LMs too. So yeah, so there's a lot of challenges today. And so when you look at, I mean, you're building MCP servers for people, but when you build them and just generally when you see people doing it well today, [20:11] What are the principles or how do you think about making an NCP server that... [20:17] One, people use, which is actually a big one. And then two, when it is used, actually does the right job. [20:23] There have been relatively few times that I've seen it done well. I have seen it done well. We're kicking something up that I'm really excited about. But with today's technology, you really have to do a good job of product management. I mean, you have to go out into the market and talk to your customers and see what their actual needs are and look over their shoulders as they use and operate your software and think about technology. [20:47] What could we unlock?

20:50-22:38

[20:50] through ai where people would be doing things that they can't really do with our software today um because it just got so much easier and then you have to do kind of a lot of engineering work usually [21:01] to wrap it up in a bow that works for [21:04] for the models. And you have to, you know, you have to set up a really good system for evals. And if you're doing MCP, you have to think about the different clients that people might be using. Are they using cursor? Are they using cloud code? Are they using something else? And the different models underlying all that. So you end up with this pretty crazy matrix of things that you might want to optimize for, and ways that you might want to evaluate and make sure that what you're offering [21:34] feedback. [21:35] back to your servers so that you can find out, hey, we gave a tool call response here. We gave an answer of some kind. Was it actually any good? Did the user like it? Was the LM able to use it? And that's a problem that I think I haven't seen a lot of people think. [21:54] solve yet as well. And so thinking about that as a first class thing, maybe you have like a send feedback tool. That's something that we've been thinking about doing. Just so if a user like says out loud, you know, in the chat, oh, man, that was useless garbage. Like, okay. Now, now that these MCP servers going to find out about that. Yeah. [22:15] But is there anything specific you've learned about how to do it well, other than like, obviously you got to talk to your customers, think about your use cases, but like more concrete, more applicable stuff about how to design a good MCP server? You want to keep the number of tools relatively small, relatively low. You want to have the tool name and the description be really precise and specific.

22:40-24:17

[22:40] Aren't those two things at odds? Yes. Good writing is hard. [22:44] Um, yeah, I mean, that's, that's like, you know, you can make a great tool of look up person by name and. [22:52] product description and then refund them you can make a great tool that does that [22:57] and you also want a small number of of in you know properties in the input schema you want a small number of parameters and you want them concisely described but sufficiently described [23:09] this is this is also hard and you want the response data to come back with a very small amount of data only exactly what the model will need that's also very hard because you may not know [23:21] a priori which things the model is really looking for um and you know [23:26] We have a technique that we use in our MCP servers today where we give the model a JQ filter, which is a way of filtering out JSON. And that can work pretty well. But that's kind of a special trick. Doesn't this mean that like MCP just needs another level of like a search tool function, search tools? [23:46] like find a list of relevant tools given my task. [23:49] The tool browsing problem is definitely one very serious one. And that is one approach. And so we actually do this at Stainless today, where you can get an MCP server for your API that just has, like I was saying earlier, the very simple thing of every endpoint is exposed as a tool. And if you have a small API, that works great. And you can also filter it out. So you expose an MCP server with only a small subset of your endpoints. That works great.

24:19-26:08

[24:19] kind of what we call dynamic mode, where there's three tools, no matter how big your API is. One is, you know, list endpoints. The other is get endpoint and learn about it. And then the last one is execute endpoint. And so that enables this context thing to scale really well. But it means there's three turns of the model just to do one thing. And so that that gets slower, it's more [24:49] It performs pretty well usually, but not quite as well because the tools aren't loaded up in quite the same way. Are you using MCP servers yourself? Yeah, I use MCP to... [25:11] Actually, funnily enough, not so much on the coding side, but I use it on the business side. So I'll use like the Notion, HubSpot, Gong, MCP servers to kind of say, hey, like an action MCP server for our database, a read only copy of our database and say, hey, what are the interesting customers that signed up for Stainless last week? [25:41] our notes in Notion, maybe even look at transcripts in Gong and tell me all about it. It's incredible. Lots of us are shipping AI to production, which is great for productivity, but it also comes with anxiety. You tweak a prompt, swap models, adjust parameters, and everything looks fine in testing, so you merge. And then three days later, or even sooner, the support tickets start rolling in. The AI is giving your customers unexpected answers, and you have no idea when it happened or why.

26:11-27:42

[26:11] fixes this. It connects evals and observability in one workflow. That way you see what actually happened in production and can measure whether changes made things better or worse. Traces show the full execution path, evals define what good looks like, and experiments let you compare prompts and models side by side before shipping. Production traces feed directly into your eval datasets. [26:31] Every failure becomes a test case, you catch regressions in CI before they reach users, and teams at Notion, Stripe, Zapier, Vercel, and RAMP use it to ship quality AI at scale. [26:42] Braintrust is designed for teams building production AI systems where silent regressions are expensive. It's built for any stack. They have SDKs for Python, TypeScript, Go, Ruby, C#. There's no framework lock-in or vendor dependencies. It's SOC 2, Type 2 certified, and GDPR and HIPAA compliant. Get started at braintrust.dev. That's braintrust.dev. [27:03] And now, back to the episode. [27:05] And so, so that's one of your, that's one of your big use cases. Like, are you doing that like every week or how, like, how are you, I'm numb. [27:12] interested, not even from an MCP perspective, but for anyone running a, um, [27:17] business that has some complexity and you're like, I want to know what's going on in the business. Like, what is, what are you actually doing and what is the report that comes out and how often are you doing that and all that kind of stuff? So I can tell me so I can steal it. Yeah. Um, uh, for me, it's still usually in kind of like playing around mode. One of the things is the MCP servers disconnect and then I get annoyed. Um, and so, you know, you have to just kind of reconnect and whatever. It's not a huge deal. Um, uh,

27:42-29:12

[27:42] but there are there are a lot of little paper cuts still in a technology this new that you're going to expect um that that can hold back um some amount of your usage uh one of the things that i've i found really helpful kind of at the meta level um i'm sure you've had other guests talk about this um is the practice of just collecting notes [28:01] for the for the ai by the ai um that and and kind of edited and curated by yourself so um you know [28:09] I have a, like a, [28:11] I can't remember if I call it a note. I think I have a notes folder, a research folder, something like that in a special Git repo that I use just for this sort of like internal stuff. And I'm like, hey, when you find interesting customer quotes, put them in this folder and give the full citation. So that the next time I start asking interesting questions, it doesn't have to go searching through the MCP servers again. [28:41] files. Wait, that's crazy. Wait, so how are you getting, like, what are you what are you using to write into that into that Git repo? Like, is it cloud code? Is it, are you using touch EBT? Like, how does it get in there? Yeah, I use I use cloud code these days for that kind of thing. [28:55] And so you just have a cloud code open and running and then a new customer testimonial comes in and you're just like, hey, can you throw this in my like Git page? [29:04] master company get knowledge repository basically and um and then whenever you need anything later you're like

29:13-31:06

[29:13] Claude, go search through my master repository to figure out where the best customer quote is for this. [29:19] totally that's fucking so cool um can we see it um no it's too messy and probably has a lot of confidential information uh the latter being more more important um is it um when you say it's messy like are you having claude organize it at all or like how is it structured there's a lot that that i want us to do here um that we haven't had the chance to do yet there's some there's some other lower hanging fruit that [29:46] that I'm working through that our business team is working through right now. Um, just on the, on the, [29:51] basics of your kind of CRM systems and so on. Um, [29:56] but, um, and so it's not as, it's not well structured now, but I think that's fine. Um, I, yeah, I, I, I w I'm not, I don't plan to prioritize structuring it super, super well until we're using it more. I'm using it more broadly because, you know, I use this stuff some of the time. Um, one of the, one of the, [30:15] Business people on the team uses it a fair amount. I think like one or two kind of of our [30:20] customer support engineers, um, use, uses this stuff a lot. Uh, but it's not yet kind of, [30:27] broader than that. And I would like it to get there. And once we see how everything's evolving, I think that's when we'll start bringing in more structure. But as it is, CloudCode can handle unstructured stuff really well. So you don't have to think about it, [30:41] too too hard in advance in my view um you can move things around later what else do you have in there other than customer quotes um sql queries um so you know i'm a software developer um uh i don't write a lot of code these days but you know i spend a lot of time doing that and so um when i say hey you know can you look up uh you know i might be hey how is our month-on-month growth of

31:11-32:46

[31:11] last board prep. And it came out with a pretty good answer right away. And I was like, wow, this is awesome. And then I kind of looked a little bit deeper. And I was like, Oh, I actually want to exclude, you know, these users from this analysis, and I want to filter it this way and filter it that way. And I kind of imbued more of this business context into that SQL query. And I iterated with with cloud code, to get it to be better and better for the specific kind of [31:41] story that I was trying to tell and then I got it to a good place I was like great let's dump this to you know an analysis folder um for um or an analytics folder um for future use [31:54] And then next time you're doing your board prep, you can be like, hey, what was that query that we did last time? And it'll presumably go get it. [32:00] Yeah, that's really cool. What else? [32:03] You know, as any software team is these days, we're using this also for, hey, a customer comes in with a question. [32:14] can can cloud code just fix it um uh [32:18] And so you'll have, in some cases, a linear ticket is filed. And then our support engineers are really very technical. [32:28] And so they may not have the wall clock time to go down and chase down the fix themselves to an incoming bug. They have the technical skill. But guess what? Another customer writes in two minutes later and they want to jump on that. They don't want to be knee deep in a debugger.

32:48-34:22

[32:48] And so something that we do sometimes is they'll file the ticket in case, and by default, it'll maybe they intend to do it later or some other engineer is going to be doing it later. But hey, can we... [33:02] Can we see if Cloud Code can just take a crack at it? Is that going to work out? [33:07] 100% of the time? Definitely not. Is that going to work out 50% of the time? Still no, to be honest with you. But... [33:16] Can that improve the overall efficiency? [33:19] Um... [33:21] Yeah, maybe. We're still, I would say, experimental there. But we're seeing a lot of promise. [33:28] That's really interesting. [33:30] Okay. Well, I know you also, you know, in our, in our pre-production call, you were talking about, you have a big vision for the future of AI. Do you want to, do you want to talk, talk me through that? [33:41] Yeah, yeah, I would love to, you know, we talked earlier about how agentic AI can can make. [33:51] Operators lies a lot easier by taking their data, you know, certain pedestrian tasks and sort of running with it independently. And that's something that I think, as an industry, we're almost on the cusp of. And... [34:05] If you start stepping, you know, you ask how you get there and you also start asking about the steps beyond that and beyond that. A big part of the way I see things unfolding from here, I like to say is the future of AI is cyborgs.

34:24-36:01

[34:24] Which is like sort of like extra ridiculous because like, what is a cyborg other than like already like a robot? [34:35] like part you know person and then part machine um [34:40] And in this case, I mean... [34:43] When you go and talk to an agent, you know, [34:47] what you're going to be getting is part of, [34:51] GPT NeuralNet LLM Part AI and Part Code. [34:57] Um, where the, the machine quote unquote that I'm talking about is, is, um, traditional CPU, not GPU software. Um, and, uh, [35:09] To me, I think I expect this to play out in two main ways. One is your kind of one-off operational use cases, like we were talking about a minute ago. And then the other is production software. And in the use case we were talking about a minute ago, where... [35:28] someone needs to kind of perform some tricky one-off action with a bunch of points and clicks and now we want an AI to just do a bunch of tool calls. [35:36] Thank you. [35:37] The way I actually see that happening and what we're building towards is code execution. So rather than the model having a bajillion tools, model has two tools, one to... [35:51] Execute code where it just kind of has a text box of like hey put in some typescript and you're going to use this api's TypeScript SDK and you're just going to write stripe dot

36:02-37:37

[36:02] transactions.list or stripe.charges.list. And you're going to stripe.customers.retrieve and stripe.refunds.create. This is really easy for models. They're really good at writing code. And if you give that tool a little bit of sort of a readme, [36:22] where you say, here's an example request, and here's some other resources, some other API calls that you can make. It's really good at extrapolating from patterns with if the SDK is sort of an API or well-formed and predictable. And then you give it an additional tool to kind of search the docs and ask questions to the docs. [36:40] And anything it's not sure about or gets wrong on the first try, [36:45] you give it the documentation. [36:48] And what this does for that scenario that we were talking about earlier is you have very, very limited impact on the context window up front. And we're talking about a thousand tokens or something like that, maybe less. And the context impact of doing a whole bunch of paginated list requests is... [37:11] Zero, you know, the the model will go look for somebody named Dan and it'll double check that the purchase of stripy socks and you might write three nested for loops. [37:23] But then only at the end when it found the right thing, it'll console.log, found Dan, customer ID, blah, blah, blah, transaction ID, blah, blah, blah. And then create refund, you know, refund ID, one, two, three. And the context...

37:39-39:32

[37:39] hit coming back from all of this is going to be [37:42] like 10 lines of text you know it's it's really minimal um and all of this will run really really quickly too so you don't have a round trip to the model every time you're doing something like this it's just cpu code and it runs in a server in the cloud right next to the stripe api in aws somewhere probably um and it goes super super fast okay so what i'm understanding you saying is like the language model [38:07] has a tool where it can write code and send that code to this tool that the you know whoever the company is whether it's stripe or whatever whoever's mcp server you're using they'll go and execute that code and that code is going to interact with their api and then return the results rather than like these sort of you know you have 50 different you have 50 different possible tool calls and [38:30] You know all that stuff. It's just [38:32] Model writes API code and API provider executes that code, runs it on their API and returns the results. Why wouldn't my model just write the code that I then run myself instead of relying on an API provider to do it? [38:49] I expect that that will happen a lot more. I expect that the code execution tool is going to become the most widely used tool. One of the problems that we have today is that the code execution tool doesn't work so well with libraries. [39:08] LLMs have a hard time working with library and knowing exactly what version of the library it's using, using the right version, probably usually the latest version, and not hallucinating aspects of the API and knowing how to iterate if it hallucinates wrong. And if it can't use any library off NPM or

39:32-41:13

[39:32] or, you know, [39:33] Python package index or anything like that really, really well, basically perfectly out of the box, then okay, well, forget about [39:43] using a library, at that point, you just have to hit [39:47] the raw HTTP API. And at that point, in order to figure out what's in there, you need the whole open API spec and you're back at square one because that document is massive. And furthermore, something that's really scary about that is if you don't have a typed library with static typing, where the computer can say what you're trying to do is wrong, [40:08] then the LLM will try to make an API request that is wrong. [40:12] some percentage of the time. The code execution tool can run a type checker and say, oh, you know, you're asking about Stripe.transactions.list, but that actually doesn't exist. Stripe doesn't have a transactions API. You might want payment intents, you might want orders, you might want balanced transactions, which one do you want? And if the API provider is doing a great job building this tool... [40:33] It'll return the documentation for all of these things in line. It might have its own AI, look at what the model's trying to do and come up with a suggestion. And that sub-agent, you know, is well-trained, specified, always updating, and isn't burdened with the context of the full conversation. [40:54] What do you think of the security model? The security model is really, really interesting. [40:59] This is another area where we're really starting to think about things at Stainless, and I'm getting really excited about it. So if any listeners are really interested in this and have some ideas or want to talk, please do reach out.

41:15-43:00

[41:15] At the end of the day, I think the security has to take place at the API layer itself. [41:23] sort of limiting what's exposed through MCP. And that kind of makes sense, but... [41:27] at the end of the day, you could do anything that's in the API under the hood, right? [41:37] And... [41:38] What people should be doing is using... [41:41] OAuth with granular... [41:44] permissions with with with with proper scopes and at that point [41:49] the security happens the right place, which is at the API layer. There's limitations to OAuth scopes, and it's pretty hard to build. So it'd be nice if someone made that easy. But in my view, that direction is sort of the right layer. So going back to my earlier question, I'm thinking about the idea of having a model write code that then the API provider... [42:13] executes to interact with their API and then returns the results. [42:21] creating a [42:23] tool use tool that developers use. For example, I'm thinking about for Quora. [42:30] Got all these tools. Maybe Gmail is going to build a code use thing or whatever, but really, I would probably use what you're talking about inside of Quora, but we would need a tool use tool. It's not a tool use tool. It's a computer use tool. And I know OpenAI has this, but it's not really well built for lots of libraries and stuff. It's not a custom environment. I need a computer use tool.

43:00-44:46

[43:00] where I control the environment and I can install different libraries in it and be able to call it any time to then call any API or it has to have network access, basically. Yeah. You guys should build that. We're working on it. Fuck yeah. You're building it for developers who want to access MCP servers or people who are providing MCP servers? We're starting with people who are [43:30] you can give the model a code execution environment where it can hit not only the Stripe integration, but also the Salesforce integration and also anything else. But not too much anything else, right? And so one of the advantages of starting where we're starting of just one API provider is that you ensure that there's no network connections allowed out of that sandbox where we're running the code to anything other than, in this case, api.stripe.com. And that's really, really critical for security for something like this. [44:00] And so there's ways to expand that bit by bit and keep things secure. [44:08] It'll take some time. The other thing I think to point out as you see some of these generalizations is it's not just that you want this like code execution sandbox to work really well for any API, for any library, which I think we really do. I think we really need that. You also start to see that... [44:30] This is just a powerful model for AI doing stuff. And sometimes you want, you realize that the thing that the AI did this one time in this one-off case is actually enduringly useful. Maybe anytime a customer writes into support and says, hey...

44:47-46:21

[44:47] my socks had holes in them, you should automatically get a refund, you know, um, maybe you want that, maybe you don't, but there's a lot of stuff that people do one or one time, and then two times and then three times and then they say, okay, we should automate this. [45:00] right and that's and that's what software teams do all day every day right and you [45:06] We're going to be I think we're also going to be seeing that with AI where the same the same code search tool that we're talking about all the same prompting that will make an AI really, really good at interacting with an API in one of these code sandboxes kind of like almost quote unquote in its brain. We're going to like write code in its head, run the code in its head, see the results and then move forward with your with your with your query. [45:29] with your task, it should be able to say, okay, actually, this is enduringly useful code. Let me commit this to the repo. Yeah, yeah, yeah, yeah. [45:38] It's like, you know, chat is a really good interface for exploring, but sometimes you just want a dashboard. You know, I just want to like log into my Stripe dashboard and see all the stuff without having to be like, what is my MRR? It should just show up, you know, because I just do that every day. But I want to push you as a hashtag value add investor, because I think that there's this... [46:02] thing that happens in AI where often the first attempt at something like this, people try to be really cautious. And I'm sure that your customers care about you being cautious, like big enterprise customers. But [46:16] The things that get adopted are often the ones that are willing to take the risk to be YOLO.

46:22-47:52

[46:22] very early. So an example is, um, Dolly was like totally private for like a long time and people were like posting some images, but you couldn't get in. And then a stable diffusion was just like, fuck it. Like anyone can use this. And then that just really started the whole, um, [46:37] image generation wave obviously stable diffusion sort of fumbled the bag but they had a lead for a little while um same thing for for cloud code honestly like if you look at uh codex is not like this as much anymore but if you look at the difference between codex cli and cloud code cloud code was just like fuck it like yolo mode it's super industrious it has a sandbox but you can just do dangerously skip permissions and codex just fell way behind because it was first [47:07] thing was locked down. And then it was in the CLI, but it was really built for pair programming. And so it just wasn't particularly industrious. It wouldn't go off and do a bunch of stuff. It would get locked out of doing certain things, even if you did full auto mode. And now they've caught up because they're... [47:26] Yeah, you can just let it do whatever you want. And so I would really push you on... [47:30] there might be a version that you could do like today or tomorrow or like very soon for individual developers that would let them set up this environment that, for example, I would use like immediately. And I care about security, but I care a lot less than some X, you know, gigantic enterprise company. But I think the people like me who are building at this scale...

47:52-49:36

[47:52] are eventually hopefully going to be the big companies, but we're the ones that are really doing the AI first adoption, not the big companies. Well, I would love to get this in your hands. What are some of the APIs your team uses the most? [48:06] I'm thinking, we have a bunch of different products, but I'm thinking right now about Quora, the email assistant. [48:12] And... [48:13] uh it has all of the like the the big apis that it's using it's mostly the gmail the gmail api um and so you're interacting with the assistant over chat and then it has a list of tools that are like you know archive email or draft email or send email or whatever uh like there's a whole categorized tool so it categorizes your mail mail in certain ways and yeah [48:35] I think we would definitely try out something like this because it would... [48:40] If it ran the same way, it would make it much more flexible for us to make more tools and not break old ones, you know? [48:53] It's really interesting. I mean, in a sense, what I actually predict is that people who are quote unquote building tools, once we have a code execution kind of super tool like I'm talking about, is that the only way you really quote unquote build a tool is with... [49:10] instructions with prompts. And the full power of everything you could possibly do in the API, in the Gmail API, for example, it's all there in one tool. [49:20] But sometimes you have specific tasks or specific categories of work that you want to describe in a particular way to help the LLM perform a sequence of actions as productively as possible. And at that point,

49:37-51:19

[49:37] The only work in engineering that you have to do is prompt engineering. [49:41] We'll see if it's that quote unquote easy. [49:45] As we all know, prompt engineering can be really tricky. [49:48] It's hard. Yeah. But but I think I think that's that's part of the vision. [49:54] That being said, you know, we do have some pretty nifty ways with the MCP servers that we generate today to help developers mix and match all the parts of the different tools underlying all the different parts of the API as they compose and write their own tools. This is awesome. So for people who are listening and want to know more from you and know more from Stainless, where should they find you? [50:16] um stainless.com um our is that's that's our website awesome or at least visit stainless.com uh alex great to have you on i can't wait to do more of this uh when you have some of these new things launched this is really really fun and uh yeah great to great to chat [50:33] Thanks, Dan. You too. [50:42] Oh my gosh, folks. You absolutely, positively have to smash that like button and subscribe to AI&I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard. But instead of gold, it's filled with pure, unadulterated knowledge bombs about chat GPT. [51:04] on the edge of your seat. [51:06] craving for more. It's not just a show, it's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor, hit like, smash subscribe and strap in for the ride of your life.

51:19-51:24

[51:19] And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.

Want to learn more?