More

leod · on March 14, 2024

Happy to see people working on vector search in Rust. Keep it up!

As far as HNSW implementations go, this one appears to be almost entirely unfinished. Node insertion logic is missing (https://github.com/swapneel/hnsw-rust/blob/b8ef946bd76112250...) and so is the base layer beam search.

leod · on June 5, 2023

This is a fair point and also called out in the discussion section. To some degree, this could be mitigated by hotloading shader code (and compiling shaders in debug mode). However, this remains as a fundamental downside of the approach.

Personally, I think that this is a price worth paying!

kjs3 · on June 5, 2023

"I will pound this square peg into this round hole no matter what...I'm really fond of square pegs."

tadfisher · on June 6, 2023

Conversely, I despise round pegs. On the other end of the hole is a machine that shreds your peg into sawdust, mixes in a liter of epoxy, and reconstitutes it into a Klein bottle, so I'm not sure why I should care about the shape of the hole.

kjs3 · on June 6, 2023

Is that the rustacian perspective? Force a Klein bottle when what I actually needed was a peg in a hole, a peg which has now been destroyed and useless for my purposes? Or will we be torturing another aphorism to death to get to the point.

tadfisher · on June 6, 2023

I don't use Rust. But the point I'm making is that, whether it's GLSL, HLSL, or whatnot, it all has more of a guiding influence over the code that gets compiled and actually runs on your GPU.

leod · on Dec 15, 2022

Awesome work.

Would you be willing to share details about the fine-tuning procedure, such as the initialization, learning rate schedule, batch size, etc.? I'd love to learn more.

Background: I've been playing around with generating image sequences from sliding windows of audio. The idea roughly works, but the model training gets stuck due to the difficulty of the task.

leod · on March 2, 2020

Interesting. They train an image classifier to detect images that were generated by a GAN-trained CNN. I wonder if it could be possible to include this classifier in the training loss, such that the generated images fly under its radar as much as possible. If this makes sense, then I guess the cat-and-mouse game just gained another level. On the other hand, what the classifier is detecting could be a fingerprint of the CNN architecture itself.

(Full disclosure: I have only read the abstract so far.)

NoodleIncident · on March 2, 2020

> Due to the difficulties in achieving Nash equilibria, none of the current GAN-based architectures are optimized to convergence, i.e. the generator never wins against the discriminator.

If I understand the terms used, it sounds like you're suggesting adding this classifier to the discriminator, to avoid detection. Since they are already failing to pass their existing discriminators, it seems like they could try to not be detected, but they wouldn't actually succeed.

leod · on Jan 23, 2020

I've trained a Transformer encoder-decoder model (this was slightly before GPT2 came out) to generate HN comments from titles. There is a demo running at https://hncynic.leod.org

CDSlice · on Jan 23, 2020

It doesn't seem very accurate, there isn't close to enough Electron hate whenever it is in the title.

This is pure gold though:

> How does one make a web app using a standard framework? I've never used it, but it sounds like someone has been able to put together something like a Web app with only one app.

Edit: This is even better.

> Rewriting a Linux kernel in Rust, by hand, is definitely the right thing to do as a beginner/intermediate programmer.

jandrese · on Jan 23, 2020

> Rewriting a Linux kernel in Rust, by hand, is definitely the right thing to do as a beginner/intermediate programmer.

Absolute perfection.

nekopa · on Jan 23, 2020

My favorite, even sounds like it would work:

Title: My Emacs Productivity Tricks/Hacks

hncynic 1 minute ago

I used this for some time and never looked back.

In my .emacs.d file, the arrow keys, a key with a cursor keys (which are the key bindings for the .emacs.d file above) and then a shortcut to switch to the command that makes use of those.

But I now have a full screen keyboard and mouse.

Here's another way to do it:

M-x { C-c }

You go in the current directory, move up the left arrow key, press escape and hit the backspace key.

leod · on Jan 23, 2020

Ha! In the model's defense, its training data [1] ends in 2017 -- not sure if hatred for Electron was as prevalent back then.

[1] https://archive.org/details/14566367HackerNewsCommentsAndSto...

andybak · on Jan 24, 2020

Tesla kills babies hncynic 1 minute ago

The original source (posted from the comments):

http://news.ycombinator.com/item?id=626065

hncynic 1 minute ago

> " After some research about the potential implications of being a woman in a Tesla, the first thing Tesla carmaker said was ‘We can do it, but we do the opposite.’”

What does it mean to have an attractive woman in a supercharger with an attractive female face that doesn't have a baby attached?

hncynic 1 minute ago

I think the article needs to be updated to explain what happened here.

As in, Tesla lost a few babies to the first one (the car was still in the hands of two babies) so it was a very minor factor. But what happened to the last one would take a very long time.

The title is a bit misleading. The Tesla was an individual that was given birth in a manner that prevented them from getting it.

They didn't take away the babies from the Model S as well. They took away the babies in the Model S's hands and made it a minor factor, including the fact that the car broke down on the front of the vehicle. The Tesla's only reply is if the Model S would not have had any special features. In my opinion it should have given more minor facts.

modeless · on Jan 23, 2020

Wow. It generated an extremely plausible looking Google Maps URL for me. It doesn't actually go anywhere, but it's crazy to think that the model memorizes random stuff like the common URL parameters and specific formatting of Google Maps URLs. http://maps.google.com/maps?sll=3.00664238,2.2633658&data=!3...

thrwaway69 · on Jan 24, 2020

Wasted some time - https://ibb.co/album/kmmEgF

Favorite: https://ibb.co/rxKzMwF

Nice job!

rahimnathwani · on Jan 23, 2020

This is cool. If you were to cache the results and generate a unique URL for each, people could easily share the funniest ones.

leod · on Jan 23, 2020

Thanks! I actually planned to make results shareable at the start, but, knowing the internet, I did not like the idea of being held responsible for whatever content (say offensive or even illegal things) people would put into the titles.

leod · on May 5, 2019

I haven't heard of gradient checkpointing yet, thank you for the link! Do you know how it compares to gradient accumulation? The latter basically reduces the batch size, but takes the sum of multiple gradients before actually performing an update, thereby having the same effect as the original batch size.

The generated titles are great! You can put them into hncynic (https://github.com/leod/hncynic) to get closer to a fully generated HN experience.

gwern · on May 5, 2019

Gradient accumulation and gradient checkpointing are orthogonal. You might want to use them simultaneously.

If I had to compare them, I'd say that accumulation is about working on a minibatch datapoint by datapoint and faking being able to run an entire large minibatch in a single shot, while checkpointing is about working on a model layer by layer and faking being able to run an entire model in a single shot.

The problem with GPT-2-335M and why nshepperd had to mess with gradient checkpointing is that the GPT-2-335M model will literally not fit in your standard 11GB GPU (and from Twitter comments about people trying it on the new 16GB Google Colab instances, it's unclear if 16GB would be enough either!). You can't even run minibatch n=1. It doesn't fit. It OOMs.

The model itself is only a gigabyte or so, the problem is that the self-attention layers, when run, use up a huge amount of memory for their intermediate steps, which must be stored in order to trace everything backwards through each step for the backprop part of training.

(Right now I believe nshepperd's code punts on doing gradient accumulation simultaneous with gradient checkpointing, so we've just been reducing the learning rate, which is sort of similar to faking large minibatches with gradient accumulation.)

Fortunately, because the self-attention layers are so small and cheap to compute, they work well with gradient checkpointing. They're cheap to recompute on the fly, so it's more important to save memory and allow training at all. (This is also how OpenAI is training the Sparse Transformers which are enormous; they haven't said either way, but I assume this is how they trained the larger GPT-2s like the 1.5b parameter version, because I can't imagine what hardware would fit even a single GPT-2 1.5b without tricks.)

leod · on May 5, 2019

Thank you so much for your comprehensive answer, this helps a lot.

If I understand nshepperd's code correctly, it uses a constant and small learning rate. Do you know if this works better than the learning rate schedule that is usually used for Transformer models (https://www.tensorflow.org/alpha/tutorials/text/transformer_...)?

gwern · on May 5, 2019

It's a constant, yes. We haven't tried any other learning rate schedules (for my poetry GPT-2s, I simply drop the LR 10x each day or so). I have no idea if this is optimal for transfer learning or not.

indalo · on May 5, 2019

wow! I just had a blast putting titles into that. the results are amazing. kudos!

leod · on May 4, 2019

If humans are not limited by the halting problem, it would be great if you could tell me if the following function f halts for all integers n: https://gist.github.com/leod/9b89af30cff21cb925d4522a68c990d...

yters · on May 4, 2019

Not being limited by the halting problem does not entail humans can solve every halting problem. They could be inbetween solving more than a Turing machine but less than a complete halting oracle.

leod · on April 21, 2019

Thank you!

The model weighs in at 1.2GB with 100M parameters, which is similar to the smallest GPT-2 model.

I wouldn't be suprised if GPT-2 small (+ finetuning on HN data) performed better than what I have trained. Other than hyperparameters, I think there are two main differences: First, I pretrained the model solely on Wikipedia data, while GPT-2 used more general web data. Second, I used an encoder-decoder model, while GPT-2 is a language model. I'm suspecting that the encoder is not very useful for this task.

leod · on Dec 24, 2018

This is true for some scenarios, like invalidating iterators through deletion (detected at compile-time by the borrow checker), but other scenarios still require runtime checks, right? Consider e.g. array out of bound accesses -- are you aware of approaches that move bounds checks to compile-time? It seems to me that this would be a painstaking process that would require programmers to annotate their code in many places to enable compile-time verification.

steveklabnik · on Dec 24, 2018

It just depends. Yes, Rust has no UB in safe Rust (modulo bugs). Sometimes, that means compile-time checks. Sometimes, that means runtime checks. It depends on the specific thing.

The compiler _will_ attempt to prove that bounds checks aren't needed and eliminate them; see https://godbolt.org/z/7QPfhR vs https://godbolt.org/z/Vx39fv for example. In the first, there's an array and the compiler knows that it has a length of 3, so an index of zero needs no checks. In the second, we don't know how long the slice is, so we have to do the check.

leod · on Jan 4, 2018

This analysis seems to be based on some notion of "Cognitive Complexity", but I can't find its definition in the article. Am I missing something?

maturain · on Jan 4, 2018

a metric created by sonar source https://www.sonarsource.com/resources/white-papers/cognitive...

saycheese · on Jan 4, 2018

There’s no singular definition within that document for cognitive complexity, just a bunch of metrics.