Gosh I just read a really hellish thread on what frontier LLMs will become as they're infected with advertising, I hope apple manages to break locsl LLMs (and training?) Into the public discourse
I've been way out of the local game for a while now, what's the best way to run models for a fairly technical user? I was using llama.cpp in the command line before and using bash files for prompts.
I didn't experience that at all. I know there are lots of rumblings around here about that, but I'm posting this to show this wasn't a universal experience.
Well, I reinstalled LM Studio today after some ~10 months since I last used it, just to test Gemma 4. On my PC with 32GB RAM and 4070 Ti (12GB VRAM), it (Gemma 4 26B A4B Q4_K_M) loads and runs reasonably fast, with no manual parameter or configuration tuning - just out of the box, on fresh install - and delivers results usable results on the level I remember expecting from SOTA cloud models 12-16 months ago. And handles image input, too. I'm quite impressed with it, TBH. It's something I can finally see myself using, and yay, it even leaves some RAM and VRAM left for doing other stuff.
Look for the current crop of local Mixture of Experts models, where it seems like they've made inroads on the O(n^2) context attention cost problem. Several folks have mentioned Qwen, but there's many more of that ilk. Several of them actually score really high on benchmarks. But when I mess with one of them locally by hand myself, (I have a 3090), it feels a bit like last year's Sonnet. They don't quite make the leaps of understanding you get from Opus.
You can run SOTA local MoE models very slowly by streaming the weights in from a fast PCIe 5 SSD. Kimi 2.5 (generally considered in the ballpark of current sonnet, not opus of course) has been measured as 2 tok/s on Apple M5 hardware, which is the best-case performance unless you have niche HEDT hardware with lots of PCIe lanes to attach storage to and figure out how to use that amount of parallel transfer throughput.
A ~$5000 USD Macbook can run open source models that are competitive with GPT 3.5 or Sonnet 3. So on nice consumer hardware you can have the original groundbreaking ChatGPT experience that runs locally.
Last I heard, claude was the model powering maven when it bombed that school. Most aren't up-to date on that because anthropic launders their culpability through palanntir. Anthropic is better at optics not ethics.
No matter what you say, you know yourself the truth that the DoW wanted to go over the red lines of anthropic and they said no, while openai said yes. This is as clear as day to everyone and you are just lying yourself to believe something else.
This kind of bureaucracy sounds like the stuff my teachers criticised communism and socialism for in school. Isn't our 'capitalist' system supposed to thwart this?
Is qwen 3.5 any good for chatting? I use chatgpt for 'light therapy' (basically sounding out confusing social situations my friends don't want to walk me through) and it's honestly been amazing. But I would rather not give all that to openai.
reply