Hacker Newsnew | past | comments | ask | show | jobs | submit | boppo1's commentslogin

Yep. QE was a monumental mistake that killed economic mobility. Asset owners vs wage earners.

>massive interest in local AI

Gosh I just read a really hellish thread on what frontier LLMs will become as they're infected with advertising, I hope apple manages to break locsl LLMs (and training?) Into the public discourse


I've been way out of the local game for a while now, what's the best way to run models for a fairly technical user? I was using llama.cpp in the command line before and using bash files for prompts.

Running llama-server (it belongs to llama.cpp) starts a HTTP server on a specified port.

You can connect to that port with any browser, for chat.

Or you can connect to that port with any application that supports the OpenAI API, e.g. a coding assistant harness.


I havent been using my claude sub lately but I liked 4.6 three weeks ago. Did something change?

2 weeks ago the rolling session usage plummeted to borderline unusable. I'd say I get a weekly output equivalent to 2 session windows before change.

I didn't experience that at all. I know there are lots of rumblings around here about that, but I'm posting this to show this wasn't a universal experience.

https://marginlab.ai/trackers/claude-code/

Seems like there is evidence for that.


Even just in chats with Opus 4.6 I noticed hitting limits so much faster.

>Especially as local LLM continues to develop so fast.

I'm sorry is there anything even close to sonnet, much less opus, that can be run on a 4080? Or 64gb of ram, even slowly?


Well, I reinstalled LM Studio today after some ~10 months since I last used it, just to test Gemma 4. On my PC with 32GB RAM and 4070 Ti (12GB VRAM), it (Gemma 4 26B A4B Q4_K_M) loads and runs reasonably fast, with no manual parameter or configuration tuning - just out of the box, on fresh install - and delivers results usable results on the level I remember expecting from SOTA cloud models 12-16 months ago. And handles image input, too. I'm quite impressed with it, TBH. It's something I can finally see myself using, and yay, it even leaves some RAM and VRAM left for doing other stuff.


And the smaller Gemma 4 models can do audio too.

The Qwen models are also really good.


Look for the current crop of local Mixture of Experts models, where it seems like they've made inroads on the O(n^2) context attention cost problem. Several folks have mentioned Qwen, but there's many more of that ilk. Several of them actually score really high on benchmarks. But when I mess with one of them locally by hand myself, (I have a 3090), it feels a bit like last year's Sonnet. They don't quite make the leaps of understanding you get from Opus.

* Weird thing of the day: https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-...


You can run SOTA local MoE models very slowly by streaming the weights in from a fast PCIe 5 SSD. Kimi 2.5 (generally considered in the ballpark of current sonnet, not opus of course) has been measured as 2 tok/s on Apple M5 hardware, which is the best-case performance unless you have niche HEDT hardware with lots of PCIe lanes to attach storage to and figure out how to use that amount of parallel transfer throughput.


Qwen 3.5, Gemma 4


A ~$5000 USD Macbook can run open source models that are competitive with GPT 3.5 or Sonnet 3. So on nice consumer hardware you can have the original groundbreaking ChatGPT experience that runs locally.


Last I heard, claude was the model powering maven when it bombed that school. Most aren't up-to date on that because anthropic launders their culpability through palanntir. Anthropic is better at optics not ethics.


No matter what you say, you know yourself the truth that the DoW wanted to go over the red lines of anthropic and they said no, while openai said yes. This is as clear as day to everyone and you are just lying yourself to believe something else.


>render video assets without needing FFmpeg on the server.

Help me understand: able to do video with less compute? Or offload compute to client browsers?


This kind of bureaucracy sounds like the stuff my teachers criticised communism and socialism for in school. Isn't our 'capitalist' system supposed to thwart this?


Is qwen 3.5 any good for chatting? I use chatgpt for 'light therapy' (basically sounding out confusing social situations my friends don't want to walk me through) and it's honestly been amazing. But I would rather not give all that to openai.


Damn I saw the headline and though it was a bill about general computing


Yes I was wondering if the right applied to people who aren't age-verified.


No, that kind of computing contributed a great deal less to election campaigns, so obviously not.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: