derbaum's comments

derbaum · 2025-06-27T06:08:39 1751004519

Now I'm curious... Is your last suggestion correct? Wouldn't the time to cool down between pause intervals be proportionally longer due to the higher thermal mass and cancel out any savings gained by the long pause? Maybe the overall energy draw is even higher because the heat losses are higher when you spend a longer time with a high dT.

rtkwe · 2025-06-27T13:39:51 1751031591

The water bottles don't warm up as quick as the air they replace that flows out of the fridge when you open it; so they have two effects first they take up space that new hot indoor air can't move into and second they then help chill that air slightly through their own thermal mass.

derbaum · 2025-06-02T08:56:27 1748854587

Whether you multiply by 10 or 2, the same "counter" argument from the article stands. Only now you don't have a trailing zero after infinite nines, you have a trailing 8.

ndsipa_pomu · 2025-06-02T14:50:49 1748875849

I don't understand how you can even have a trailing zero after an infinite number of nines. Surely any place that someone would want to put the zero can be refuted by correctly stating that a nine goes there (it's an infinite number of them, after all) and there is literally no "last" place.

hinkley · 2025-06-02T18:42:21 1748889741

I’ve seen videos of actual mathematicians complaining to each other about how the general public thinks like GP. There is no last digit. Every time you reach the horizon there’s another horizon.

anthk · 2025-06-02T16:45:39 1748882739

Technically you don't have an '8', you keep doing a carried sum forever, think about it. The last eight will be set to 9 forever and appended a new one to it. Thus, you are getting a periodical 1.9_ in practice.

hinkley · 2025-06-02T18:41:11 1748889671

There is no eight. This is something I’ve heard actual mathematicians complain about to other actual mathematicians: the non math public misunderstands infinite series as “imagine a number so big you can’t fathom it and add 1 more number to it. That’s not how things work.

Going as far as you can imagine and a little farther is an infinitesimal of the real infinite.

derbaum · 2025-04-28T21:42:04 1745876524

Very rough (!) napkin math: for a q8 model (almost lossless) you have parameters = VRAM requirement. For q4 with some performance loss it's roughly half. Then you add a little bit for the context window and overhead. So a 32B model q4 should run comfortably on 20-24 GB.

Again, very rough numbers, there's calculators online.

derbaum · on March 12, 2025

The ollama page shows Gemma 27B beating Deepseek v3 and o3-mini on lmarena. I'm very excited to try it out.

Hiskias · on March 14, 2025

Same!

derbaum · on Jan 18, 2025

One of the things I'm still struggling with when using LLMs over NLP is classification against a large corpus of data. If I get a new text and I want to find the most similar text out of a million others, semantically speaking, how would I do this with an LLM? Apart from choosing certain pre-defined categories (such as "friendly", "political", ...) and then letting the LLM rate each text on each category, I can't see a simple solution yet except using embeddings (which I think could just be done using BERT and does not count as LLM usage?).

macNchz · on Jan 18, 2025

I've used embeddings to define clusters, then passed sampled documents from each cluster to an LLM to create labels for each grouping. I had pretty impressive results from this approach when creating a category/subcategory labels for a collection of texts I worked on recently.

derbaum · on Jan 18, 2025

That's interesting, it sounds a bit like those cluster graph visualisation techniques. Unfortunately, my texts seem to fall into clusters that really don't match the ones that I had hoped to get out of these methods. I guess it's just a matter of fine-tuning now.

thaumasiotes · on Jan 18, 2025

Take two documents.

Feed one through an LLM, one word at a time, and keep track of words that experience greatly inflated probabilities of occurrence, compared to baseline English. "For" is probably going to maintain a level of likelihood close to baseline. "Engine" is not.

Do the same thing for the other one.

See how much overlap you get.

derbaum · on Jan 18, 2025

Wouldn't a simple comparison of the word frequency in my text against a list of usual word frequencies do the trick here without an LLM? Sort of a BM25?

thaumasiotes · on Jan 19, 2025

It might; it's not going to do the same thing. The LLM will tell you words that would likely appear in a similar text. Word frequency will tell you words that have actually appeared in your text. I'm postulating that the first kind of list is much more likely to show strong overlap between two similar documents than the second kind of list.

Vocabulary style matters a lot to what words are actually used, but much less to what words are likely to be used. If I'm following a style guide that says to use "automobile" instead of "car", appearance probabilities for "automobile" will be greatly inflated. And appearance probabilities for "car" will also be greatly inflated, just to a lesser extent than for "automobile". Whereas actual usage of "car" will be pegged at zero.

Determining how similar two texts are is something that an LLM should be good at. It should be better than a simple comparison of word frequency. Whether it's better enough to justify the extra compute is a different question.

derbaum · on Jan 14, 2025

The "issue" with saying an LLM can't do this is that CFD simulations are not actually that niche. Many university courses ask their students to write these types of algorithms for their course project. All this knowledge is present freely on the internet (as is evident by the Youtube videos that the author mentioned), and as such can be learned by an LLM. The article is of course still very impressive.

adriand · on Jan 15, 2025

Great point. Niche to me, but not to thee. I was unaware. This is actually one of the frustrating things about the LLMs - they don’t tell you when what you asked for is outside their training data!

derbaum · on Jan 7, 2025

I'm a bit surprised by the amount of comments comparing the cost to (often cheap) cloud solutions. Nvidia's value proposition is completely different in my opinion. Say I have a startup in the EU that handles personal data or some company secrets and wants to use an LLM to analyse it (like using RAG). Having that data never leave your basement sure can be worth more than $3000 if performance is not a bottleneck.

lolinder · on Jan 7, 2025

Heck, I'm willing to pay $3000 for one of these to get a good model that runs my requests locally. It's probably just my stupid ape brain trying to do finance, but I'm infinitely more likely to run dumb experiments with LLMs on hardware I own than I am while paying per token (to the point where I currently spend way more time with small local llamas than with Claude), and even though I don't do anything sensitive I'm still leery of shipping all my data to one of these companies.

This isn't competing with cloud, it's competing with Mac Minis and beefy GPUs. And $3000 is a very attractive price point in that market.

logankeenan · on Jan 7, 2025

Have you been to the localLlama subreddit? It’s a great resource for running models locally. It’s what got me started.

https://www.reddit.com/r/LocalLLaMA/

lolinder · on Jan 7, 2025

Yep! I don't spend much time there because I got pretty comfortable with llama before that subreddit really got started, but it's definitely turned up some helpful answers about parameter tuning from time to time!

ynniv · on Jan 7, 2025

I'm pretty frugal, but my first thought is to get two to run 405B models. Building out 128GB of VRAM isn't easy, and will likely cost twice this.

rsanek · on Jan 7, 2025

You can get a M4 Max MBP with 128GB for $1k less than two of these single-use devices.

ynniv · on Jan 7, 2025

These are 128GB each. Also, Nvidias inference speed is much higher than Apple's.

I do appreciate that my MBP can run models though!

ganoushoreilly · on Jan 7, 2025

I read the Nvidia units are 250 Tflops vs the M4 Pro 27 Tflops. If they perform as advertised i'm in for two.

lolinder · on Jan 7, 2025

Don't these devices provide 128GB each? So you'd need to price in two Macs to be a fair comparison to two Digits.

layer8 · on Jan 7, 2025

But then you have to use macOS.

sensesp · on Jan 7, 2025

100% I see many SMEs not willing to send their data to some cloud black box.

jckahn · on Jan 7, 2025

Exactly this. I would happily give $3k to NVIDIA to avoid giving 1 cent to OpenAI/Anthropic.

originalvichy · on Jan 7, 2025

Even for established companies this is great. A tech company can have a few of these locally hosted and users can poll the company LLM with sensitive data.

diggan · on Jan 7, 2025

The price seems relatively competitive even compared to other local alternatives like "build your own PC". I'd definitely buy one of this (or even two if it works really well) for developing/training/using models that currently run on cobbled together hardware I got left after upgrading my desktop.

627467 · on Jan 7, 2025

> Having that data never leave your basement sure can be worth more than $3000 if performance is not a bottleneck

I get what you're saying, but there are also regulations (and your own business interest) that expects data redundancy/protection which keeping everything on-site doesnt seem to cover

btbuildem · on Jan 7, 2025

Yeah that's cheaper than many prosumer GPUs on the market right now

derbaum · on Dec 19, 2024

Hey Jeremy, very exciting release! I'm currently building my first product with RoBERTa as one central component, and I'm very excited to see how ModernBERT compares. Quick question: When do you think the first multilingual versions will show up? Any plans of you training your own?