I Ran Claude Code for FREE… Here's How

Alex Ziskind · 192.0K views · 5.7K likes

Analysis Summary

30% Low Influence

mildmoderatesevere

“Be aware that the 'financial pain' of multiple AI subscriptions is emphasized specifically to make the $60/year sponsored tool feel like an essential cost-saving measure.”

Ask yourself: “Did I notice what this video wanted from me, and did I decide freely to say yes?”

Transparency Mostly Transparent

Primary technique

Human Detected

95%

Signals

The video features a known tech personality, Alex Ziskind, using natural, unscripted speech patterns including self-corrections and spontaneous reactions. The content is a hands-on technical tutorial with live demonstrations that align with the human narration.

Natural Speech Disfluencies The speaker corrects himself in real-time: 'I don't mean VRAM, I mean uh unified memory.'

Personal Anecdotes and Context Mentions specific hardware specs ('my Mac has 128 GB of RAM') and personal workflow preferences.

Conversational Fillers Use of 'uh', 'boom', and 'right?' in a way that matches natural human cadence rather than a pre-scripted synthetic voice.

Worth Noting

Positive elements

This video provides a highly practical, step-by-step technical guide for developers to bridge Claude Code with local inference servers, which is a non-trivial configuration task.

Be Aware

Cautionary elements

The use of artificial urgency ('grab it soon') in the sponsored segment to push a subscription service.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 23, 2026 at 20:38 UTC Model google/gemini-3-flash-preview-20251217 Prompt Pack bouncer_influence_analyzer 2026-03-08a App Version 0.1.0

More on This Topic

Related content covering similar topics.

Apple’s New M5 Max Changes the Local AI Story

Alex Ziskind

Low Mostly Transparent

local llms apple silicon

Your Local LLM Is 3x Slower Than It Should Be

Alex Ziskind

Low Mostly Transparent

local llms apple silicon

Transcript

Cloud Code, a very popular tool, can now use local models. In other words, I can be running an LLM right on my laptop and Cloud Code can talk to it. But that's not all. LM Studio, the tool that makes it super easy to run local models. It's a graphical tool, also now supports Cloud Code. Is it even worth it? Because with Claude Code, the best thing to use is the models from Enthropic, right? Opus 4.5. Who can beat that? Or can we? Well, that's what we're about to find out. >> Merlin AI. It's an all-in-one AI tool, and they gave my audience a big discount. I keep multiple AI tools around because each one is good at something, but it gets really expensive, and bouncing between tabs breaks my focus. Merlin AI puts Chad GPT, Claw, Gemini, and more in one place so I can pick the best one for the moment. Whether I'm coding, researching, or writing for a video. Watch this. I click the Merlin AI extension, chat with the web page to summarize what I'm reading and pull out the important parts, and I even have my choice of models right at my fingertips. If I need something deeper, I turn on deep research, and it builds a clean, structured report from multiple sources. And it also has quick modes like web, academic, and Reddit search. If you pay separately, chat GPT is $20, Claude is $20, Gemini is $20, and that adds up fast. Merlin AI is cheaper because they buy AI API access in bulk. APIs cost less than the $20 plans, and most people don't even use $20 worth of API in a month. And here's the discount. I click pricing, continue. It takes me to Stripe. I enter the promo code, and the total drops to $60 for the year. That's basically five bucks a month. I don't know how long this deal will be available, so grab it soon. The link is in the description. So, first, if you don't already have LM Studio installed, download it from the LM Studio page. I'm on Mac OS, but it also works on Windows or Linux. It's a crossplatform thing. Click the download button and install it. When you open it up, it'll give you some notes including the new release notes which tell you that it now supports cloud code. Boom. Let's close that down. You can search for models right here. There's a ton of different models out there. I suggest starting with something small so that you get a sense of what you're doing and how it works before you commit to a very large download. However, large models will give you better results typically. But as I found out, not all models actually give you good results when it comes to using it with claw code. To download a model, you can search for it or pick from the list. And I'm on Apple Silicon, so the best ones for me are going to be MLXbased models. They're just a little bit better performing. And up here, you can search for MLX or GGUF. MLX is only going to work on Apple Silicon. GGUF will work everywhere, including Apple Silicon. So, if you want your model to be portable, if you're going to be storing it on a drive and then taking it to a PC or a Linux machine or a Windows machine, then get the GGUF model. But if you're staying on a Mac, get the MLX version. Here's a Quinc Coder 30B 4bit is 17 GB, so it's a commitment. Quen coder 30B Q4KM is also a 4bit quantized model, but that one is a GGUF and it's 18.63 GB in size, so a little bit bigger. I'm going to stick with MLX for now and click download. Now, this can download in the background. And there it goes. By the way, my Mac has 128 GB of RAM. So, yeah, I can run pretty big models on there. I can run much bigger models than 17 GB. But if you have uh let's say a MacBook Air with only 16 gigs of VRAM, then pick a smaller model. I don't mean VRAM, I mean uh unified memory. VRAM is if you have a graphics card like for example an RTX 5060 which only has 16 GB of VRAM that's downloading in the background. However, I've already downloaded a couple of models. I have GPTOSS 12B which we're going to check out and GPTOSS 20B and this I just found this one. It's a small model. It's a 1.2 billion parameter model. So really tiny. I've never even heard of this one before but I thought I'd give it a try. And guess what? it didn't really work so well. So, I'm going to skip that one. Let's go to 20B right away. And I'm going to bump the context all the way up to maximum because I want to have a large context. I want this thing to remember a lot of my conversation and I want Cloud Code to be able to pass files back and forth from the model. Let's click load model here. And if you want to watch what happens with the memory, you'll see that it's loading up right here. I have 128 total. Now, memory use is 57.6. Plenty of room to go. Now, this is fine if you want to just use this thing and chat with it. But we want to use this with cloud code. If you don't have cloud code installed, it's really quite easy. You just go to the docs quick start guide. And this is really the easiest way to do it is by running this curl command right here. If you're on Windows, you can use PowerShell, but I'm not. I'm going to use this command. In fact, I already have it installed, but let's see what happens if I do it again. Probably nothing. Boom. Maybe it just updated. I don't know. But it's it's new. Okay. If I want to run it, all I got to do is just say claude now in my project directory or really anywhere. But I did go inside my project directory, which is this LLM inference calculator. I created this a few videos ago and it's basically a calculator that tells you how much RAM is needed based on the number of parameters you want in your LLM, the quantization and so on. It's a Reactbased application, just front end, and it's uh built on VIT. We are a couple of versions behind which is why I want to use Claude to update this for us. Now if I launch Claude inside my project directory, it's either going to try to connect to my Enthropic account or use my Anthropic API key to use the credits. That's the first time you do it. I've already done it, so that's not what's happening here. But the first time you launch it, it'll ask you. So let's try using Opus 4.5 and then we'll compare it to the other models that are local. This will be like the golden standard. Update the dependencies to the latest. I didn't tell it what kind of project it is. It has to go and figure all that out. Update the dependencies, then build it and run it. All those things it has to do. Let's see how long it takes. Now it's 9:51. It's going to ask me some questions. So timing might not be exact, but it'll give us a general sense. I'm going to say yes to pretty much everything it's going to ask me. So this is your typical cloud code experience until now where it goes and talks to the cloud service. But the whole reason we're doing this is so that we can run this locally. We know Opus is good and we know it's good at its job. It's done already. What is 952? You got to be kidding me. Come on. And it's already running the server. I mean that's going to be hard to beat. First let's take a look at what it did. And it is running the server. It's working. It's working fine. Let's take a look at the codebase. It updated two files. Package.json and package lock.json. And yeah, it updated to the latest versions. Nice. So 19.24, I guess, is the latest and some other dependencies here. And everything works together. Beautiful. Okay, I'm going to get reset this to undo all those changes and clean my repository. Get it back to the initial state. By the way, this repository is also on my GitHub, so you can go check that out. I'm going to quit cloud code here because now I want to use my local model. So let's go back to LM Studio. And by the way, LM Studio is not the only thing you can use to host a model that Claude Code will talk to. And I have a couple of member videos that describe the process of setting things up like Open Code, for example. I've already loaded this model up, but this is not where we need to be. This is the chat interface. So I can start a chat and say hello to my model. And I know how some of you really love that. This is not where I want to be. I want to be over here on this tab which is the developer tab. This is a way to serve the models so to make them available to other applications like claude code. So here I'm going to load a model and I can pick any of these models and look quen coder 30B is done. But we're going to start off with this GPoss 20B which is 11 GB in size. I'm going to bump up the context length to 131,000 and load it. Now the server is reachable at this IP address and this port. So, how do I get cloud code to talk to this? Well, you need to create a JSON file. When you install cloud code, it creates a cloud directory, a hidden directory in your home folder. This is on a Mac, by the way, but it does a similar thing on Windows and Linux. And that's available in your home directory. So, on a Mac, that's users my name.claude. Let's go there. And let's pop this open in VS Code. This is what you get. And there's a settings file here which you can append and you can create your own settings here. But you can also create a separate file. I'm going to call mine lmstudios settings.json and I'm going to paste in some settings here which is a JSON object with an env object inside of there. So we're going to point to an anthropic base URL. This is going to be the base URL and it's going to point to my IP address of this machine. Or if you're running LM Studio on another machine, you would just point it to that IP address. This port right here is a well-known LM Studio port 1 2 3 4. And if you forget what that is, it's right here on your developer tab in LM Studio. Enthropic o token. Don't need it. You could set up LM Studio to have an all token so you don't accidentally share. And then enthropic model, you could set this or you can leave it as default model. And that's actually a good setting. I like leaving it at default model because you don't need to come into this file and change things all the time if you change your model. So, I'm going to leave it there. Save this file. And now back on the command line, I'm going to go back to my project. And when I launch Claude, I'm actually going to do a dash dash and pass in some settings if I could spell. Yeah. And instead of the regular settings, I'm going to pass in that file that I just created, the LM Studio settings.json. JSON file. Boom. All right. This is the key. Now, we're not talking to Opus 4.5 anymore. Nope. I'm going to go to model. And yeah, I could still select Opus, but lotus number four is default model. That's the one that's selected right now. Let's uh minimize this window, shall we? So, I can show you what's happening here in LM Studio. Developer logs down here. You can maximize that. Let's clear that. And I'm going to have it side by side here so we can watch it update this project to the latest dependencies. Build and run. Boom. Check it out. On the right side, we are talking to our model now. And notice something else. Prompt processing. It's taking a while because it's not just this that is sending along. It's sending along all the context. That's the files and everything else that Claude knows how to send to give the model the full context and the information it needs to be able to execute this task. Yeah, we're still prompt processing. Yeah, that's crazy cuz these prompts are huge. That's why we give it so much context. Oh, I can hear my MacBook Pro. The fans are spinning up. All right, we're going to proceed here again with the prompt processing. When you hear people saying, "Oh, is prompt processing important?" Uh, yeah, it is. As you can see here, it's it's going to be a very important thing. You often see benchmarks where there's prompt processing and token generation, and I cover this in other videos. Well, different machines process these in different ways. For example, the DJX Spark is really fast at prompt processing while it's not that fast at decode or token generation. The M4 Max chip and the M3 Ultra are really good at token generation, really fast, because they have fast memory bandwidth. But prompt processing is a little slow on them until we get the M5s. We'll see what happens there. Okay, while I was talking, it finished. I think it also didn't take that long, but I don't see that it has really said it's running this app. Let's take a look at the code here. Hm, it only changed one file, and that's the package lock file, not the package.json file. So, I don't think it actually did the thing. npm rundev. Okay, it runs, but it's still the older version. It's not the updated version. So, I'm going to call that task kind of a failure. And this brings us to the next thing, which is not all models are equal. This GPT OSS 20B, it might be good at certain things, but it's not good at this particular task. I'm going to exit this, go back to LM Studio, and I'm going to eject this model. Boom. Out of there. I'm going to load a new model, and we're going to go with this GPOSS 120B, which is a much bigger model. And I just noticed that I have the GGUF version and not the MLX version. Okay, that's fine. I'm going to bump up the context load model. Let's take a look at that. And you can see the memory going up again. Oh yeah, look at that. Look at that little memory pressure bump going up. Wow. Because even though the model is 60 GB on disk, when we add all that context in, it's much much larger. That's where the demand of the memory goes up and the VRAMm or unified memory on Apple Silicon really goes up. That's why I made the LLM calculator in the first place. So, you can see we're almost at that limit here now that our model is loaded. Let's clear our developer logs. I've already exited Claude, but I'm going to go back into it and I'm not going to change the settings file at all because we're pointed to the default model. So, that's just going to use the same model that we have loaded in LM Studio. Update the project dependencies to the latest version. Build and run. Boom. Let's see if we get any better luck with this larger model, which is supposed to be better. And it's 10:05 now. Let's say yes. It's going to ask a couple of questions as usual. Needs permission to do stuff. Oh yeah, it's happening. It's happening, folks. Get a little bit more of an insight on what's happening here as opposed to shooting it somewhere in the cloud and not knowing what's going on. But check it out. It's executing our commands. npm install. Check updates. Install npm run build. That's a little bit different than what we saw with the 20 billion parameter model already. Is it done? So, it didn't take that long. Maybe a little bit longer, but I think it's done. Is it running though? These fans are going crazy. Oh, look at that. Package.json has been updated. React 1924. It did it. 120B did it. Let's run that. And it runs. So, it didn't do one step, which is running the actual project, but it got the update done. You have to wonder if I try an even bigger model, is it going to actually run the thing? I don't know. But at least this gives you some ideas of how to run cloud code with LM Studio with local models and the difference between how these models behave compared to the cloud model. I think we're getting there. I think we're getting there. If you like this video, you're definitely going to like this one next. Thanks for watching and I'll see you next time.

Video description

Claude Code can now talk to a local model in LM Studio — I’ll show you the one-file setup, plus the “gotcha” that decides whether it works or falls apart. Try MerlinAI here: https://www.getmerlin.in/pricing?coupon=merlin and use code is AZ5 for a big discount 🛒 Gear Links 🛒 🪛🪛Highly rated precision driver kit: https://amzn.to/4fkMVfg 💻☕ Favorite 15" display with magnet: https://amzn.to/3zD1DhQ 🎧⚡ Great 40Gbps T4 enclosure: https://amzn.to/3JNwBGW 🛠️🚀 My nvme ssd: https://amzn.to/3YLEySo 📦🎮 My gear: https://www.amazon.com/shop/alexziskind 🎥 Related Videos 🎥 🧳🧰 Mini PC portable setup - https://youtu.be/4RYmsrarOSw 🍎💻 Dev setup on Mac - https://youtu.be/KiKUN4i1SeU 💸🧠 Cheap mini runs a 70B LLM 🤯 - https://youtu.be/xyKEQjUzfAk 🧪🔥 RAM torture test on Mac - https://youtu.be/l3zIwPgan7M 🍏⚡ FREE Local LLMs on Apple Silicon | FAST! - https://youtu.be/bp2eev21Qfo 🧠📉 REALITY vs Apple’s Memory Claims | vs RTX4090m - https://youtu.be/fdvzQAWXU7A 🧬🐍 Set up Conda - https://youtu.be/2Acht_5_HTo ⚡💥 Thunderbolt 5 BREAKS Apple’s Upcharge - https://youtu.be/nHqrvxcRc7o 🧠🚀 INSANE Machine Learning on Neural Engine - https://youtu.be/Y2FOUg_jo7k 🧱🖥️ Mac Mini Cluster - https://youtu.be/GBR6pHZ68Ho * 🛠️ Developer productivity Playlist - https://www.youtube.com/playlist?list=PLPwbI_iIX3aQCRdFGM7j4TY_7STfv2aXX 🔗 AI for Coding Playlist: 📚 - https://www.youtube.com/playlist?list=PLPwbI_iIX3aSlUmRtYPfbQHt4n0YaX0qw My LLM inference calculator GitHub repo: https://github.com/alexziskind1/llm-inference-calculator — — — — — — — — — ❤️ SUBSCRIBE TO MY YOUTUBE CHANNEL 📺 Click here to subscribe: https://www.youtube.com/@AZisk?sub_confirmation=1 — — — — — — — — — Join this channel to get access to perks: https://www.youtube.com/channel/UCajiMK_CY9icRhLepS8_3ug/join — — — — — — — — — 📱 ALEX on X: https://x.com/digitalix #macstudio #gdxspark #claudecode