More

kcb · 2026-04-23T16:44:53 1776962693

This reads a lot like "the way I choose to live is the best and everyone else is sad." Anyone in a dense suburb is getting all the fresh food they want from a choice of 6 different grocery stores. And it's silly to complain about suburbs being crowded in comparison to cities.

gpt5 · 2026-04-23T17:20:33 1776964833

Especially since America is happier than most European countries [1]. And the ones that are happier are the Nordics and Ireland which are more suburban and less dense.

[1] https://data.worldhappiness.report/table

kcb · 2026-04-21T20:08:26 1776802106

Nonsense, Apple has on package memory and the primary reason for that is overall packaging and layout not performance

kcb · 2026-04-18T16:54:29 1776531269

We can't let people install the applications they choose because my grandma. Is a pretty prevailing opinion

iamnothere · 2026-04-18T19:34:21 1776540861

Just because an opinion is common doesn’t make it prevailing.

Yes there are many commenters here who say that, but I bet if we could somehow take a poll they would not be the majority.

I don’t know when people started expecting everyone on a given site to share the same opinion, but it is tiring.

bigyabai · 2026-04-18T19:53:59 1776542039

Of course it didn't prevail. We live in an age where Russia and China demand VPNs get removed from the App Store. The US Government removed ICEBlock from all mobile storefronts. The worst-case scenario is staring us right in the face.

It's downright appalling that HN entertained these arguments against sideloading. No self-respecting software engineer can look at the centralized architecture of a billion-dollar software business and surmise that it wouldn't be used against them. The detractors against sideloading deliberately (or foolishly) ignored an outsized, glaringly obvious threat to their personal freedoms that was repeatedly emphasized by their opposition.

Oppression, censorship and surveillance are HN's just deserts.

iamnothere · 2026-04-18T23:07:41 1776553661

Congratulations on completely ignoring what I said.

In perhaps clearer terms: HN is not a monolith. There are a variety of opinions here and intense disagreement. It’s very difficult to claim that any particular position is supported by a majority of users, given the arguments that erupt on nearly every topic.

(Or perhaps you are claiming that 100% of a site’s users are responsible for every opinion that is aired on a site, even if they disagree with it.)

bigyabai · 2026-04-18T23:13:08 1776553988

I never claimed that the majority of HN shared that opinion, or that they should. You manifested both of those ideas from wholecloth.

The common opinion is still harmful, and it's enabled the harms to scale to the point we see them today. For an analog in modern politics, look at minority opinions like "think of the children" or "unnamed terrorist threat" and their role in manufacturing consent for tyranny.

iamnothere · 2026-04-18T23:15:35 1776554135

> It's downright appalling that HN entertained these arguments against sideloading.

> Oppression, censorship and surveillance are HN's just deserts.

What is this if not an implication that a majority, or all, of HN users share this opinion and are thus responsible/deserving of the fallout?

bigyabai · 2026-04-18T23:17:06 1776554226

A statement of fact? We share a common fate, switching to Linux or protesting Meta doesn't exempt you from the rule of law.

Edit: Oppression, censorship and surveillance are not a hypothetical consequence. The "justness" might be debatable, but the existence of it is objective.

iamnothere · 2026-04-18T23:20:26 1776554426

It’s actually not a statement of fact, “just desserts” (implying that one is deserving of punishment or suffering) is a moral argument. Moral arguments are not statements of fact, although this does not make them necessarily invalid.

baranul · 2026-04-19T15:04:34 1776611074

Speaking of which, HN arguably entertains executing censorship as much as any government, corporation, or organization. Often what is seen or presented, so people can think that's the prevailing view, is not a complete view. It's controlled and manipulated.

mschuster91 · 2026-04-18T17:15:13 1776532513

... and one that has quite the merit. A few hours worth of watching Scammer Payback will do that to anyone.

The thing is, wide parts of the population are extremely IT illiterate. The governments didn't act to protect them (say, by threatening the host countries of the scammers aka India in the case of the US or Turkey/Bulgaria/Romania in the case of Europe), so private companies had no other choice.

And hell even the best of us like Brian Krebs can fall victim to attacks [1].

I'm really out of ideas how we can reconcile the needs of the 99% vs the needs of the 1% without making life hell for the other group.

[1] https://www.businessinsider.com/security-journalist-brian-kr...

spwa4 · 2026-04-18T18:26:09 1776536769

... of course, the EU has the power to get the banks to block those money transfers. Hell, central banks have to be involved in those scams (hopefully/probably unaware). But they CAN shut it down, HARD. They're not doing that, at all.

> so private companies had no other choice.

Because Microsoft has demonstrated how it's done on their platforms? Obviously governments, EU or otherwise, have quite serious tolerance for scams.

kcb · 2026-04-17T12:17:32 1776428252

You can just configure the device to not give the child the ability to download apps without approval.

kcb · 2026-04-17T12:15:45 1776428145

There's already like 17 different parental control solutions out there for every device platform. You can and should use one and don't let your kid go to any website or use any specific app without your approval first.

kcb · 2026-04-12T21:45:04 1776030304

In a few months Google will automatically deploy new software on our devices. This will be for our benefit and to help protect us.

If you still want to sideload dangerous unnaproved applications, first just ask Google for permission and then a day later they'll let you sideload applications to your device. I'm so grateful that they are allowing us to do this and protecting us.

kcb · 2026-04-12T21:43:54 1776030234

There are countless games on the store that let you kill endless hordes of humans in detail...

kcb · 2026-04-07T23:56:10 1775606170

What benefit is there to dropping $50k on GPUs to run this personally besides being a cool enthusiast project?

CamperBob2 · 2026-04-08T01:25:06 1775611506

It will run exactly the same tomorrow, and the next day, and the day after that, and 10 years from now. It will be just as smart as the day you downloaded the weights. It won't stop working, exhaust your token quota, or get any worse.

That's a valuable guarantee. So valuable, in fact, that you won't get it from Anthropic, OpenAI, or Google at any price.

kcb · 2026-04-08T03:24:02 1775618642

That's why we all still use our e machines its never obsolete PCs. Works just the same it did 20 years ago, though probably not because I've never heard of hardware that's guaranteed not to fail.

deminature · 2026-04-08T00:20:15 1775607615

Intel has just released a high VRAM card which allows you to have 128GB of VRAM for $4k. The prices are dropping rapidly. The local models aren't adapted to work on this setup yet, so performance is disappointing. But highly capable local models are becoming increasingly realistic. https://www.youtube.com/watch?v=RcIWhm16ouQ

kcb · 2026-04-08T02:18:00 1775614680

That's 4 32GB GPUs with 600GB/s bw each. This model is not running on that scale GPUs. I think something like 96GB RTX PRO 6000 Blackwells would be the minimum to run a model of this size with performance in the range of subscription models.

acchow · 2026-04-08T04:25:32 1775622332

> I think something like 96GB RTX PRO 6000 Blackwells would be the minimum to run a model of this size with performance in the range of subscription models.

GLM 5.1 has 754B parameters tho. And you still need RAM for context too. You'll want much more than 96GB ram.

marcus_holmes · 2026-04-08T01:22:42 1775611362

Why would anyone need more than 640Kb of memory?

kcb · 2026-04-08T02:10:55 1775614255

Exactly the point though. In the 640KB days there was no subscription to ever increasing compute resources as an alternative.

marcus_holmes · 2026-04-08T02:45:09 1775616309

Well, there kinda was - most computing then was done on mainframes. Personal / Micro computers were seen as a hobby or toy that didn't need any "serious" amounts of memory. And then they ate the world and mainframes became sidelined into a specific niche only used by large institutions because legacy.

I can totally see the same happening here; on-device LLMs are a toy, and then they eat the world and everyone has their own personal LLM running on their own device and the cloud LLMs are a niche used by large institutions.

kcb · 2026-04-08T02:55:05 1775616905

The difference is computers post text terminal are latency and throughput dependent to the user. LLMs are not particularly.

marcus_holmes · 2026-04-08T02:59:42 1775617182

Sorry, I don't understand that comment. Can you clarify, please?

kcb · 2026-04-08T03:18:44 1775618324

My point is LLMs aren't more usable if the hardware is in your room versus a few states away. Personal computers still to this day aren't great when the hardware is fully remote.

marcus_holmes · 2026-04-08T03:42:59 1775619779

Agreed. But you couldn't do much on a PC when they launched, at least compared to a mainframe. The hardware was slow, the memory was limited, there was no networking at all, etc. If you wanted to do any actual serious computing, you couldn't do that on a PC. And yet they ate the world.

I can easily see the advantage, even now, of running the LLM locally. As others have said in this topic. I think it'll happen.

edit: thanks for clarifying :)

blizdiddy · 2026-04-08T00:14:24 1775607264

Is it so hard to project out a couple product cycles? Computers get better. We’ve gone from $50k workstation to commodity hardware before several times

kcb · 2026-04-08T02:06:44 1775614004

Subscription services get all the same benefits from computer hardware getting better. But actually due to scale, batching, resource utilization, they'll always be able to take more advantage of that.

fwipsy · 2026-04-08T01:48:54 1775612934

Agree directionally but you don't need $50k. $5k is plenty, $2-3k arguably the sweet spot.

unlikelytomato · 2026-04-08T01:52:21 1775613141

as a local LLM novice, do you have any recommended reading to bootstrap me on selecting hardware? It has been quite confusing bring a latecomer to this game. Googling yields me a lot of outdated info.

fwipsy · 2026-04-08T03:42:16 1775619736

First answer: If you haven't, give it a shot on whatever you already have. MoE models like Qwen3 and GPT-OSS are good on low-end hardware. My RTX 4060 can run qwen3:30b at a comfortable reading pace even though 2/3 of it spills over into system RAM. Even on an 8-year-old tiny PC with 32gb it's still usable.

Second answer: ask an AI, but prices have risen dramatically since their training cutoff, so be sure to get them to check current prices.

Third answer: I'm not an expert by a long shot, but I like building my own PCs. If I were to upgrade, I would buy one of these:

Framework desktop with 128gb for $3k or mainboard-only for $2700 (could just swap it into my gaming PC.) Or any other Strix Halo (ryzen AI 385 and above) mini PC with 64/96/128gb; more is better of course. Most integrated GPUs are constrained by memory bandwidth. Strix Halo has a wider memory bus and so it's a good way to get lots of high-bandwidth shared system/video RAM for relatively cheap. 380=40%; 385=80%; 395=100% GPU power.

I was also considering doing a much hackier build with 2x Tesla P100s (16gb HBM2 each for about $90 each) in a precision 5820 (cheap with lots of space and power for GPUs.) Total about $500 for 32gb HBM2+32gb system RAM but it's all 10-year-old used parts, need to DIY fan setup for the GPUs, and software support is very spotty. Definitely a tinker project; here there be dragons.

terbo · 2026-04-08T06:57:33 1775631453

Agree on the framework, last week you could get a strix halo for $2700 shipped now it's over $3500, find a deal on a NVME and the framework with the noctua is probably going to be the quietest, some of them are pretty loud and hot.

I run qwen 122b with Claude code and nanoclaw, it's pretty decent but this stuff is nowhere prime time ready, but super fun to tinker with. I have to keep updating drivers and see speed increases and stability being worked on. I can even run much larger models with llama.cpp (--fit on) like qwen 397b and I suppose any larger model like GLM, it's slow but smart.

kcb · 2026-04-08T02:05:34 1775613934

The 4-bit quants are 350GB, what hardware are you talking about?

fwipsy · 2026-04-08T03:27:04 1775618824

qwen3:0.6b is 523mb, what model are you talking about? You seem to have a specific one in mind but the parent comment doesn't mention any.

For a hobby/enthusiast product, and even for some useful local tasks, MoE models run fine on gaming PCs or even older midrange PCs. For dedicated AI hardware I was thinking of Strix Halo - with 128gb is currently $2-3k. None of this will replace a Claude subscription.

0x457 · 2026-04-08T17:14:18 1775668458

> qwen3:0.6b is 523mb, what model are you talking about?

1) What are you going to use that for? 0.6 model gives you what you could get from Siri when it first launched at most unless you do some tunning.

2) Pretty clear that they are talking about GLM-5.1 4-bit quant.

kcb · 2026-04-02T18:49:46 1775155786

Nemotron 3 Super was released recently. That's a direct competitor to gpt-oss-120b. https://developer.nvidia.com/blog/introducing-nemotron-3-sup...

evilduck · 2026-04-02T21:16:32 1775164592

In terms of ability, maybe, in terms of speed, it's not even close. Check out the Prompt Processing speeds between them: https://kyuz0.github.io/amd-strix-halo-toolboxes/

gpt-oss-120b is over 600 tokens/s PP for all but one backend.

nemotron-3-super is at best 260 tokens/s PP.

Comparing token generation, it's again like 50 tokens/sec vs 15 tokens/sec

That really bogs down agentic tooling. Something needs to be categorically better to justify halving output speed, not just playing in the margins.

mratsim · 2026-04-02T22:06:28 1775167588

In my case with vLLM on dual RTX Pro 6000

gpt-oss-120b: (unknown prefill), ~175 tok/s generation. I don't remember the prefill speed but it certainly was below 10k

Nemotron-3-Super: 14070 tok/s prefill, ~194.5 tok/s generation. (Tested fresh after reload, no caching, I have a screenshot.)

Nemotron-3-Super using NVFP4 and speculative decoding via MTP 5 tokens at a time as mentioned in Nvidia cookbook: https://docs.nvidia.com/nemotron/nightly/usage-cookbook/Nemo...

coder68 · 2026-04-09T01:27:34 1775698054

Hmm you might be able to tweak the settings further. Under llama.cpp on one RTX 6000 Pro I get ~215 tok/s generation speed. The key for me was setting min_p greater than 0. My settings:

``` #!/bin/bash

llama-server \ -hf ggml-org/gpt-oss-120b-GGUF \ -c 0 \ -np 1 \ --jinja \ --no-mmap \ --temp 1.0 \ --top-p 1.0 \ --min-p 0.001 \ --chat-template-kwargs '{"reasoning_effort": "high"}' \ --host 0.0.0.0 ```

coder68 · 2026-04-02T19:36:37 1775158597

I gave it a whirl but was unenthused. I'll try it again, but so far have not really enjoyed any of the nvidia models, though they are best in class for execution speed.

markab21 · 2026-04-02T21:08:57 1775164137

I'll pipe in here as someone working on an agentic harness project using mastra as the harness.

Nemotron3-super is, without question, my favorite model now for my agentic use cases. The closest model I would compare it to, in vibe and feel, is the Qwen family but this thing has an ability to hold attention through complicated (often noisy) agentic environments and I'm sometimes finding myself checking that i'm not on a frontier model.

I now just rent a Dual B6000 on a full-time basis for myself for all my stuff; this is the backbone of my "base" agentic workload, and I only step up to stronger models in rare situations in my pipelines.

The biggest thing with this model, I've found, is just making sure my environment is set up correctly; the temps and templates need to be exactly right. I've had hit-or-miss with OpenRouter. But running this model on a B6000 from Vast with a native NVFP4 model weight from Nvidia, it's really good. (2500 peak tokens/sec on that setup) batching. about 100/s 1-request, 250k context. :)

I can run on a single B6000 up to about 120k context reliably but really this thing SCREAMS on a dual-b6000. (I'm close to just ordering a couple for myself it's working so well).

Good luck .. (Sometimes I feel like I'm the crazy guy in the woods loving this model so much, I'm not sure why more people aren't jumping on it..)

girvo · 2026-04-02T22:30:41 1775169041

> I'm not sure why more people aren't jumping on it

Simple: most of the people you’re talking to aren’t setting these things up. They’re running off the shelf software and setups and calling it a day. They’re not working with custom harnesses or even tweaking temperature or templates, most of them.

pertymcpert · 2026-04-03T05:26:28 1775193988

I’d be very interested in trying it if you could spare the time to write up how to tune it well. If not thanks for the input anyway.

kcb · 2026-03-30T23:22:34 1774912954

Because the initial announcement included none of that... it wasn't addressed at all until the harsh sentiment.

fc417fc802 · 2026-03-31T01:52:36 1774921956

It still hasn't been addressed. They walked back half of their wholly unreasonable position in an attempt to legitimize the other half.

TGower · 2026-03-30T23:25:15 1774913115

Then shouldn't we celebrate the victory, drop it, and move on?

kcb · 2026-03-30T23:41:36 1774914096

Victory is my device and its OS working the same way it always worked and the way it worked when I bought it.

TGower · 2026-03-31T00:00:24 1774915224

Just don't install the OS updates then.