More

qeternity · 2026-05-28T19:26:20 1779996380

> Venture capitalists & private investors are sucking all of the possible growth and future upside from these companies and then dumping them on retail investors when there's nothing left.

A lot of the money that is deployed by VCs comes from pension funds and asset managers that ultimately manage money for the average Joe.

andriy_koval · 2026-05-28T19:54:16 1779998056

Is there any evidence of what is the share/volume of such assets involved?

qeternity · 2026-04-14T11:44:24 1776167064

I haven't read TFA yet but a common technique is speculative decoding where a fast draft model will generate X tokens, which are then verified by the larger target model. The target model may accept some Y <= X tokens but the speedup comes from the fact that this can be done in parallel as a prefill operation due to the nature of transformers.

So let's say a draft model generates 5 tokens, all 5 of these can be verified in parallel with a single forward pass of the target model. The target model may only accept the first 4 tokens (or whatever) but as long as the 5 forward passes of the draft model + 1 prefill of the target model is faster than 4 forward passes of the target, you will have a speedup while maintaining the exact output distribution as the target.

qeternity · 2026-04-12T18:10:13 1776017413

> They paid a billion dollars for a vibe coded mess just for the opportunity to associate themselves with the hype.

Lol no they didn't. It wasn't even an acquihire. They just hired Peter.

Maybe they are paying him incredibly well, but not a billion dollars well.

qeternity · 2026-04-12T17:10:30 1776013830

> It's not any company, its Meta and the channels they administrate come with a set of responsibilities and principles

Sorry, which laws stipulate these special responsibilities and principles?

qeternity · 2026-04-09T12:54:51 1775739291

> or if the model might actually have emitted the formatting tokens that indicate a user message.

These tokens are almost universally used as stop tokens which causes generation to stop and return control to the user.

If you didn't do this, the model would happily continue generating user + assistant pairs w/o any human input.

qeternity · 2026-04-09T12:48:09 1775738889

This does not solve the problem at all, it's just another bandaid that hopefully reduces the likelihood.

qeternity · 2026-03-21T08:19:38 1774081178

Yes, it is written for a specific audience.

That is not a reason for snark.

As other commenters have noted, it’s well written.

qeternity · 2026-03-04T09:49:19 1772617759

> LLMs are inherently non-deterministic.

This isn't true, and certainly not inherently so.

Changes to input leading to changes in output does not violate determinism.

magicalhippo · 2026-03-04T11:35:34 1772624134

> This isn't true

From what I understand, in practice it often is true[1]:

Matrix multiplication should be “independent” along every element in the batch — neither the other elements in the batch nor how large the batch is should affect the computation results of a specific element in the batch. However, as we can observe empirically, this isn’t true.

In other words, the primary reason nearly all LLM inference endpoints are nondeterministic is that the load (and thus batch-size) nondeterministically varies! This nondeterminism is not unique to GPUs — LLM inference endpoints served from CPUs or TPUs will also have this source of nondeterminism.

[1]: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

qeternity · 2026-03-04T18:15:46 1772648146

Yes, lots of things can create indeterminism. But nothing is inherent.

yomismoaqui · 2026-03-04T11:37:58 1772624278

Quoting:

"But why aren’t LLM inference engines deterministic? One common hypothesis is that some combination of floating-point non-associativity and concurrent execution leads to nondeterminism based on which concurrent core finishes first."

From https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

qeternity · 2026-03-04T18:15:52 1772648152

Yes, lots of things can create indeterminism. But nothing is inherent.

qeternity · 2026-03-01T12:44:55 1772369095

> With prompt caching, verbose context that gets reused is basically free.

But it's not. It might be discounted cost-wise, however it will still degrade attention and make generation slower/more computationally expensive even if you have a long prefix you can reuse during prefill.

qeternity · 2026-02-23T00:49:59 1771807799

> Tradition warrants a negotiation phase when one party wishes to change the terms of an agreement, or becomes cognizant that the counterparty may wish to do the same.

They didn't change the agreement. One party violated it, and the other party withdrew as a result.

This is so vanilla. But people will moan because they want subsidized tokens.

salawat · 2026-02-23T06:04:47 1771826687

I don't have a pony in this race my good poster, I just calls it how I see it, and I have a long history of calling out the fundamentally abusive character on non-negotiable one way contracting, and the ill effects it has on society.

Only people moaning here seem to be a bunch of wannabe Google PO's upset that people are handing machines a data construct they are designed to accept, and the machine is accepting, and using the token the way they were designed. Looks for some reason Google appears to resent that their lack of automating checks to deny those OAuth tokens is being utilized, and seems to think termination of customers who could probably be corrected with a simple message is the most reasonable response.

With instincts like that, it makes me happy everyday that for my needs, I can make do with doing things on my own hardware I've collected over the years. The Cloud has too much drama potential tied up in it.