Hacker Newsnew | past | comments | ask | show | jobs | submit | kcb's commentslogin

This reads a lot like "the way I choose to live is the best and everyone else is sad." Anyone in a dense suburb is getting all the fresh food they want from a choice of 6 different grocery stores. And it's silly to complain about suburbs being crowded in comparison to cities.

Especially since America is happier than most European countries [1]. And the ones that are happier are the Nordics and Ireland which are more suburban and less dense.

[1] https://data.worldhappiness.report/table


Nonsense, Apple has on package memory and the primary reason for that is overall packaging and layout not performance

We can't let people install the applications they choose because my grandma. Is a pretty prevailing opinion

Just because an opinion is common doesn’t make it prevailing.

Yes there are many commenters here who say that, but I bet if we could somehow take a poll they would not be the majority.

I don’t know when people started expecting everyone on a given site to share the same opinion, but it is tiring.


Of course it didn't prevail. We live in an age where Russia and China demand VPNs get removed from the App Store. The US Government removed ICEBlock from all mobile storefronts. The worst-case scenario is staring us right in the face.

It's downright appalling that HN entertained these arguments against sideloading. No self-respecting software engineer can look at the centralized architecture of a billion-dollar software business and surmise that it wouldn't be used against them. The detractors against sideloading deliberately (or foolishly) ignored an outsized, glaringly obvious threat to their personal freedoms that was repeatedly emphasized by their opposition.

Oppression, censorship and surveillance are HN's just deserts.


Congratulations on completely ignoring what I said.

In perhaps clearer terms: HN is not a monolith. There are a variety of opinions here and intense disagreement. It’s very difficult to claim that any particular position is supported by a majority of users, given the arguments that erupt on nearly every topic.

(Or perhaps you are claiming that 100% of a site’s users are responsible for every opinion that is aired on a site, even if they disagree with it.)


I never claimed that the majority of HN shared that opinion, or that they should. You manifested both of those ideas from wholecloth.

The common opinion is still harmful, and it's enabled the harms to scale to the point we see them today. For an analog in modern politics, look at minority opinions like "think of the children" or "unnamed terrorist threat" and their role in manufacturing consent for tyranny.


> It's downright appalling that HN entertained these arguments against sideloading.

> Oppression, censorship and surveillance are HN's just deserts.

What is this if not an implication that a majority, or all, of HN users share this opinion and are thus responsible/deserving of the fallout?


A statement of fact? We share a common fate, switching to Linux or protesting Meta doesn't exempt you from the rule of law.

Edit: Oppression, censorship and surveillance are not a hypothetical consequence. The "justness" might be debatable, but the existence of it is objective.


It’s actually not a statement of fact, “just desserts” (implying that one is deserving of punishment or suffering) is a moral argument. Moral arguments are not statements of fact, although this does not make them necessarily invalid.

Speaking of which, HN arguably entertains executing censorship as much as any government, corporation, or organization. Often what is seen or presented, so people can think that's the prevailing view, is not a complete view. It's controlled and manipulated.

... and one that has quite the merit. A few hours worth of watching Scammer Payback will do that to anyone.

The thing is, wide parts of the population are extremely IT illiterate. The governments didn't act to protect them (say, by threatening the host countries of the scammers aka India in the case of the US or Turkey/Bulgaria/Romania in the case of Europe), so private companies had no other choice.

And hell even the best of us like Brian Krebs can fall victim to attacks [1].

I'm really out of ideas how we can reconcile the needs of the 99% vs the needs of the 1% without making life hell for the other group.

[1] https://www.businessinsider.com/security-journalist-brian-kr...


... of course, the EU has the power to get the banks to block those money transfers. Hell, central banks have to be involved in those scams (hopefully/probably unaware). But they CAN shut it down, HARD. They're not doing that, at all.

> so private companies had no other choice.

Because Microsoft has demonstrated how it's done on their platforms? Obviously governments, EU or otherwise, have quite serious tolerance for scams.


You can just configure the device to not give the child the ability to download apps without approval.

There's already like 17 different parental control solutions out there for every device platform. You can and should use one and don't let your kid go to any website or use any specific app without your approval first.

In a few months Google will automatically deploy new software on our devices. This will be for our benefit and to help protect us.

If you still want to sideload dangerous unnaproved applications, first just ask Google for permission and then a day later they'll let you sideload applications to your device. I'm so grateful that they are allowing us to do this and protecting us.


There are countless games on the store that let you kill endless hordes of humans in detail...

What benefit is there to dropping $50k on GPUs to run this personally besides being a cool enthusiast project?


It will run exactly the same tomorrow, and the next day, and the day after that, and 10 years from now. It will be just as smart as the day you downloaded the weights. It won't stop working, exhaust your token quota, or get any worse.

That's a valuable guarantee. So valuable, in fact, that you won't get it from Anthropic, OpenAI, or Google at any price.


That's why we all still use our e machines its never obsolete PCs. Works just the same it did 20 years ago, though probably not because I've never heard of hardware that's guaranteed not to fail.


Intel has just released a high VRAM card which allows you to have 128GB of VRAM for $4k. The prices are dropping rapidly. The local models aren't adapted to work on this setup yet, so performance is disappointing. But highly capable local models are becoming increasingly realistic. https://www.youtube.com/watch?v=RcIWhm16ouQ


That's 4 32GB GPUs with 600GB/s bw each. This model is not running on that scale GPUs. I think something like 96GB RTX PRO 6000 Blackwells would be the minimum to run a model of this size with performance in the range of subscription models.


> I think something like 96GB RTX PRO 6000 Blackwells would be the minimum to run a model of this size with performance in the range of subscription models.

GLM 5.1 has 754B parameters tho. And you still need RAM for context too. You'll want much more than 96GB ram.


Why would anyone need more than 640Kb of memory?


Exactly the point though. In the 640KB days there was no subscription to ever increasing compute resources as an alternative.


Well, there kinda was - most computing then was done on mainframes. Personal / Micro computers were seen as a hobby or toy that didn't need any "serious" amounts of memory. And then they ate the world and mainframes became sidelined into a specific niche only used by large institutions because legacy.

I can totally see the same happening here; on-device LLMs are a toy, and then they eat the world and everyone has their own personal LLM running on their own device and the cloud LLMs are a niche used by large institutions.


The difference is computers post text terminal are latency and throughput dependent to the user. LLMs are not particularly.


Sorry, I don't understand that comment. Can you clarify, please?


My point is LLMs aren't more usable if the hardware is in your room versus a few states away. Personal computers still to this day aren't great when the hardware is fully remote.


Agreed. But you couldn't do much on a PC when they launched, at least compared to a mainframe. The hardware was slow, the memory was limited, there was no networking at all, etc. If you wanted to do any actual serious computing, you couldn't do that on a PC. And yet they ate the world.

I can easily see the advantage, even now, of running the LLM locally. As others have said in this topic. I think it'll happen.

edit: thanks for clarifying :)


Is it so hard to project out a couple product cycles? Computers get better. We’ve gone from $50k workstation to commodity hardware before several times


Subscription services get all the same benefits from computer hardware getting better. But actually due to scale, batching, resource utilization, they'll always be able to take more advantage of that.


Agree directionally but you don't need $50k. $5k is plenty, $2-3k arguably the sweet spot.


as a local LLM novice, do you have any recommended reading to bootstrap me on selecting hardware? It has been quite confusing bring a latecomer to this game. Googling yields me a lot of outdated info.


First answer: If you haven't, give it a shot on whatever you already have. MoE models like Qwen3 and GPT-OSS are good on low-end hardware. My RTX 4060 can run qwen3:30b at a comfortable reading pace even though 2/3 of it spills over into system RAM. Even on an 8-year-old tiny PC with 32gb it's still usable.

Second answer: ask an AI, but prices have risen dramatically since their training cutoff, so be sure to get them to check current prices.

Third answer: I'm not an expert by a long shot, but I like building my own PCs. If I were to upgrade, I would buy one of these:

Framework desktop with 128gb for $3k or mainboard-only for $2700 (could just swap it into my gaming PC.) Or any other Strix Halo (ryzen AI 385 and above) mini PC with 64/96/128gb; more is better of course. Most integrated GPUs are constrained by memory bandwidth. Strix Halo has a wider memory bus and so it's a good way to get lots of high-bandwidth shared system/video RAM for relatively cheap. 380=40%; 385=80%; 395=100% GPU power.

I was also considering doing a much hackier build with 2x Tesla P100s (16gb HBM2 each for about $90 each) in a precision 5820 (cheap with lots of space and power for GPUs.) Total about $500 for 32gb HBM2+32gb system RAM but it's all 10-year-old used parts, need to DIY fan setup for the GPUs, and software support is very spotty. Definitely a tinker project; here there be dragons.


Agree on the framework, last week you could get a strix halo for $2700 shipped now it's over $3500, find a deal on a NVME and the framework with the noctua is probably going to be the quietest, some of them are pretty loud and hot.

I run qwen 122b with Claude code and nanoclaw, it's pretty decent but this stuff is nowhere prime time ready, but super fun to tinker with. I have to keep updating drivers and see speed increases and stability being worked on. I can even run much larger models with llama.cpp (--fit on) like qwen 397b and I suppose any larger model like GLM, it's slow but smart.


The 4-bit quants are 350GB, what hardware are you talking about?


qwen3:0.6b is 523mb, what model are you talking about? You seem to have a specific one in mind but the parent comment doesn't mention any.

For a hobby/enthusiast product, and even for some useful local tasks, MoE models run fine on gaming PCs or even older midrange PCs. For dedicated AI hardware I was thinking of Strix Halo - with 128gb is currently $2-3k. None of this will replace a Claude subscription.


> qwen3:0.6b is 523mb, what model are you talking about?

1) What are you going to use that for? 0.6 model gives you what you could get from Siri when it first launched at most unless you do some tunning.

2) Pretty clear that they are talking about GLM-5.1 4-bit quant.


Nemotron 3 Super was released recently. That's a direct competitor to gpt-oss-120b. https://developer.nvidia.com/blog/introducing-nemotron-3-sup...


In terms of ability, maybe, in terms of speed, it's not even close. Check out the Prompt Processing speeds between them: https://kyuz0.github.io/amd-strix-halo-toolboxes/

gpt-oss-120b is over 600 tokens/s PP for all but one backend.

nemotron-3-super is at best 260 tokens/s PP.

Comparing token generation, it's again like 50 tokens/sec vs 15 tokens/sec

That really bogs down agentic tooling. Something needs to be categorically better to justify halving output speed, not just playing in the margins.


In my case with vLLM on dual RTX Pro 6000

gpt-oss-120b: (unknown prefill), ~175 tok/s generation. I don't remember the prefill speed but it certainly was below 10k

Nemotron-3-Super: 14070 tok/s prefill, ~194.5 tok/s generation. (Tested fresh after reload, no caching, I have a screenshot.)

Nemotron-3-Super using NVFP4 and speculative decoding via MTP 5 tokens at a time as mentioned in Nvidia cookbook: https://docs.nvidia.com/nemotron/nightly/usage-cookbook/Nemo...


Hmm you might be able to tweak the settings further. Under llama.cpp on one RTX 6000 Pro I get ~215 tok/s generation speed. The key for me was setting min_p greater than 0. My settings:

``` #!/bin/bash

llama-server \ -hf ggml-org/gpt-oss-120b-GGUF \ -c 0 \ -np 1 \ --jinja \ --no-mmap \ --temp 1.0 \ --top-p 1.0 \ --min-p 0.001 \ --chat-template-kwargs '{"reasoning_effort": "high"}' \ --host 0.0.0.0 ```


I gave it a whirl but was unenthused. I'll try it again, but so far have not really enjoyed any of the nvidia models, though they are best in class for execution speed.


I'll pipe in here as someone working on an agentic harness project using mastra as the harness.

Nemotron3-super is, without question, my favorite model now for my agentic use cases. The closest model I would compare it to, in vibe and feel, is the Qwen family but this thing has an ability to hold attention through complicated (often noisy) agentic environments and I'm sometimes finding myself checking that i'm not on a frontier model.

I now just rent a Dual B6000 on a full-time basis for myself for all my stuff; this is the backbone of my "base" agentic workload, and I only step up to stronger models in rare situations in my pipelines.

The biggest thing with this model, I've found, is just making sure my environment is set up correctly; the temps and templates need to be exactly right. I've had hit-or-miss with OpenRouter. But running this model on a B6000 from Vast with a native NVFP4 model weight from Nvidia, it's really good. (2500 peak tokens/sec on that setup) batching. about 100/s 1-request, 250k context. :)

I can run on a single B6000 up to about 120k context reliably but really this thing SCREAMS on a dual-b6000. (I'm close to just ordering a couple for myself it's working so well).

Good luck .. (Sometimes I feel like I'm the crazy guy in the woods loving this model so much, I'm not sure why more people aren't jumping on it..)


> I'm not sure why more people aren't jumping on it

Simple: most of the people you’re talking to aren’t setting these things up. They’re running off the shelf software and setups and calling it a day. They’re not working with custom harnesses or even tweaking temperature or templates, most of them.


I’d be very interested in trying it if you could spare the time to write up how to tune it well. If not thanks for the input anyway.


Because the initial announcement included none of that... it wasn't addressed at all until the harsh sentiment.


It still hasn't been addressed. They walked back half of their wholly unreasonable position in an attempt to legitimize the other half.


Then shouldn't we celebrate the victory, drop it, and move on?


Victory is my device and its OS working the same way it always worked and the way it worked when I bought it.


Just don't install the OS updates then.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: