Back to Blog

I wanted to run my own AI. My laptop says not yet

·5 min read

The pitch sells itself. An assistant that's entirely mine, running on my own machine, needing no connection, with nothing I type ever leaving the room. No company counting my tokens. No subscription. No outage on someone else's servers wrecking my afternoon (I'm looking at you, Anthropic, and your recurring outages). I wanted that badly enough that I spent a few months chasing it, and I want to tell you honestly where it left me.

24GB sounds like plenty until you load a model

My Mac has 24GB of memory. This felt generous when I bought it, but then you load a real language model and that number shrinks fast. The system keeps its cut because the computer needs to keep running, and what's left for the model is closer to two-thirds of the sticker figure. The models actually worth trusting sit right at that ceiling or just past it. So you have to choose: a model that fits comfortably and isn't very bright, or a smarter one that leaves the machine gasping.

The obvious fix is more memory, but have you seen memory prices lately? The timing could not be worse. Memory got expensive in a way that still surprises people who haven't shopped for it in a while. DRAM has roughly doubled in price since the start of 2025, and the analysts who watch this space think it could climb another 70% or so across 2026. Storage is worse in spots. The raw NAND wafers that SSDs are cut from are trading at something like eight times where they sat in the middle of last year, and a 4TB drive I'd have paid about $250 for not long ago now wants north of $700 — and because the market is so volatile right now, when this blog post goes live these numbers might be totally different because it's 2026 and who knows how much RAM and SSDs will cost.

The reason for this insanity is also the reason behind half the stories in tech right now — AI. The big datacenter buildouts are on track to swallow around 70% of the world's high-end memory this year, and the cloud giants have signed contracts that lock up production years ahead. They are, quite literally, buying chips that haven't been made yet. Micron went as far as shutting down its consumer Crucial brand to aim everything at the AI market.

Sit with that for a second. The same boom that makes a private little local model so appealing is the boom pricing most of us out of the hardware to run one properly. I'd find it funnier if it weren't my wallet.

I blamed the software first

Before I accepted any of that, I was sure I'd just picked the wrong tools. So I went shopping for harnesses. If the LLM is the brain, the harness is what gives it hands — the tools that make it actually useful: Pi. OpenCode. Hermes. Each promised to be the one that made local AI click.

I landed on Pi, and not for its features. The models simply ran best there, with quicker answers and less friction between me and whatever I'd asked. But the shopping taught me that the wrapper is not necessarily the bottleneck. I could swap them all day and hit the same wall, because the wall was the model and the memory holding it up. No interface fixes a model that hasn't got the room to think.

So this leaves me at a place where I don't trust the local models I can run enough to lean on them for anything where being wrong can cost me a lot. A quick reword, a throwaway question, fine. But the second a task needs real judgment I'm back in the cloud almost before I've decided to be. I wanted to stop depending on paid assistants, and not because they're impossibly expensive. What I wanted was independence from the cloud companies, their whims and their outages. Tokens are so heavily subsidized right now that these more powerful models will probably price me out before long.

What my local AI is actually good for

It's not all bad news, though. One job the local model does frequently, and does well, is analyzing job ads for me.

My job hunt runs as a background batch. A script pulls the listings, bins the obvious junk, and hands what's left to the model on my laptop, which reads each one against a brief I wrote and tells me how good a fit it looks. Free and offline. My CV never leaves the machine, and it works whether there are AI outages in the cloud or not. It's a bit of the independence I was chasing, just at a far smaller scope than I'd hoped for.

It only works because of how the newer models are built. The Gemma one I use is a mixture-of-experts design, which in plain terms means it only wakes the slice of itself a given question needs instead of firing the whole thing at once. It's light enough to genuinely reason on my hardware without grinding for an age, and takes about thirty seconds per ad. For something humming away while I get on with my day, thirty seconds is nothing.

So, not yet

I haven't written off local AI. I'm frustrated by the distance between what I wanted and what 24GB plus a brutal memory market will give me today. But the mixture-of-experts trick already moved that line once, and hopefully it'll move again. Models keep getting smarter per gigabyte rather than only bigger, prices will ease eventually, and some version of me a couple of years out probably runs all of this locally without thinking twice about it. For now I've got exactly one workflow living on my own hardware while everything else still phones home. That one's mine, completely. It's a start.