Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Well no, you still want to minimize garbage production (and related GC overhead). Wear leveling doesn’t mean produce more garbage.

Surely some % of garbage helps as you're garbage collecting and reliably trying to shuffle data around to wear-level more evenly?

Lets say you have 99% used data and 1% garbage. You have very little room for (static) wear leveling. Ex: If you're writing 10MB of new data, and your drive is 99% full of allocated data... you'd have to move 1000MB of data around to "find" the 10MB of garbage, on the average.

In the other extreme case: 0% data used and 100% garbage (say a TRIM operation just completed), then you simply just write 10MB without having to worry about moving any data around.

The 50% data + 50% garbage scales as appropriate. 10MB of written new data will need you to find and coalesce 10MB of garbage to write the new data. This will happen after moving 20MB of overall data around.

----------

I'm oversimplifying of course. But even in a real life system, I'd expect that the more "garbage" you have sitting around, the better the various algorithms work for FTL / SSD (static or dynamic) wear leveling.



I can’t state any more clearly: minimize the production of garbage.

“Surely some % of garbage helps as you're garbage collecting ”

Garbage that doesn’t exist doesn’t need collecting.

The flaw here is confusing free space with garbage. You shouldn’t have written in the first place if you could have avoided it.

Every environmentalist knows this: RRR, the first R is reduce not recycle.


Any append-only data-structure will have data that was true (when it was written), but has become false/obsolete/garbage at a later time. This is unavoidable.

I'm not saying that we write needless garbage to the logs or filesystem or whatever. I'm saying that the amount of garbage in your stream that you leave behind aids in later steps of (static) wear-leveling. So therefore, its not a big deal. You're going to be making plenty of this (initially true data, but later garbage data) as files get updated, moved around filesystems, or whatnot.

"Garbage" in this sense is perhaps the wrong word. Its more like "Obsolete data", or "outdated data".


If you read the article though, many of the updated nodes (which are now garbage) don't see any updates to their “data” but to internal tree pointers.

So lots of “data” is being copied, and garbage is being generated, only for the benefit of tidying up tree structure, not because the actual “data” in those pages changed.

Not generating such garbage in the first place is an obvious benefit.


I think I see what you mean now. Thanks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: